Add helpers that dump information only about PGs that haven't reached
the desired state when we fail. Previously we dumped the output of
"ceph pg dump" before failing, which prints a lot of unnecessary information
about PGs that are not responsible for the failure, making debugging harder.
Also, try to make the failure messages distinct.
Signed-off-by: Neha Ojha <nojha@redhat.com>
qa/tasks/radosbench: use long form of option for compatibility
Reviewed-by: Yuri Weinstein <yweinste@redhat.com>
Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
Code in qa/ uses both StringIO and BytesIO. Let's use StringIO
exclusively (unless necessary) for uniformity. The reason for using
StringIO over BytesIO is that tests mostly need stdout as string than
as bytes and StringIO is used more frequently used in qa/ code at this
point.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Add support for RBD Image Format v1:
- This format lacks ID field, required for dashboard. Instead,
RBD image `block_name_prefix` is used as unique ID (together with pool
id and namespace)
- Additionally, `image_format` is now exposed.
- In the front-end side:
- Copy action on a v1 image will cause the image to be copied to v2
format.
- List doesn't allow Move to Trash on v1 images,
- Details section now shows `image_format` for images,
- Edit Form disables flags not supported for v1 (`deep-flatten`,
`layering`, `exclusive-lock`).
- Protect does not work on v1 images or v2 images created from v1
ones.
Fixes: https://tracker.ceph.com/issues/36354
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
qa/tasks/vstart_runner: do not teardown test_path if "create-cluster-…
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Kiefer Chang <kiefer.chang@suse.com>
otherwise we could be removing a "None" directory when tearing down the cluster,
and have following failure:
Exception ignored in: <bound method LocalContext.__del__ of <__main__.LocalContext object at 0x7f99fd4a6cc0>>
Traceback (most recent call last):
File "../qa/tasks/vstart_runner.py", line 1189, in __del__
shutil.rmtree(self.teuthology_config['test_path'])
File "/tmp/tmp.mmM2ugspuR/venv/lib/python3.6/shutil.py", line 477, in rmtree
onerror(os.lstat, path, sys.exc_info())
File "/tmp/tmp.mmM2ugspuR/venv/lib/python3.6/shutil.py", line 475, in rmtree
orig_st = os.lstat(path)
TypeError: lstat: path should be string, bytes or os.PathLike, not NoneType
Signed-off-by: Kefu Chai <kchai@redhat.com>
* refs/pull/34288/head:
mds: flag backtrace scrub failures for new files as okay
Reviewed-by: Zheng Yan <zyan@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/34839/head:
qa/cephfs: add FUSE module before running mount -t fusectl
Reviewed-by: Zheng Yan <zyan@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
* refs/pull/34838/head:
vstart_runner: don't use namespaces by default
qa/cephfs: run nsenter commands with superuser privileges
qa/cephfs: look for mountpoint in cmdline file
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/34782/head:
qa/tasks/cephfs/mount.py: remove netns name parsing in mountpoint setter
qa/tasks/vstart_runner.py: add kwargs parameter to ignore the ones it does not understand
Reviewed-by: Rishabh Dave <ridave@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
And add a method that sets self.fuse_daemon.subproc.pid to the PID of
the process that doesn't have sudo in its arguments. For example, when
"sudo ceph-fuse /mnt/cephfs" is run on the shell, it launches process
with arguments "ceph-fuse /mnt/cephfs". The added method gets PID of
latter/child process and sets that as the fuse daemon's PID. Not doing
so kills the former/parent process but the not the child process.
Also, since we are around cleanup this method a bit.
Fixes: https://tracker.ceph.com/issues/45339
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Currently the max size is determined by the number of OSDs, which is
compared with the maximum of the current crush rule.
The problem with that is, that this is wrong for every crush rule that
doesn't have OSDs as failure domain and that don't have the root of the
cluster set as root of the crush rule.
Now the crush map will be used to determine how many failure domains are
really available in the cluster and how many can really be used in the
end. This number now defines the maximum size you can enter.
The crush detail view will now the new attribute usable_size and hide
the redundant information steps, ruleset, type and rule_name.
Fixes: https://tracker.ceph.com/issues/44620
Signed-off-by: Stephan Müller <smueller@suse.com>
New, unwritten files, fail when backtracing during scrub.
This is not necessarily bad. So flag such failures as okay and continue
with other entries.
Fixes: https://tracker.ceph.com/issues/43543
Signed-off-by: Milind Changire <mchangir@redhat.com>
* qa/tasks/keystone.py:
instead of prefilling keystone manually, use "keystone-manage bootstrap"
instead. it helps to setup the admin user, a "Default" domain with
"default" id, and wire them up with the expected role and a "admin" project,
etc. as id of the admin domain is known to be "default", we can just use it
in our tests without querying openstack for the id of "Default"
domain. this is very handy.
* qa/suites/rgw/tempest/tasks/rgw_tempest.yaml:
use "Default" for domain name. as "Default" is the name of the domain
created by bootstrap, while "default" is its id.
* qa/suites/rgw/crypt/2-kms/barbican.yaml:
remove settings to bootstrap keystone
Signed-off-by: Kefu Chai <kchai@redhat.com>
* also generate a sample conf file following the document at
https://github.com/openstack/keystone/tree/17.0.0.0rc2/etc
* use "projects" instead of "tenants" to match the terminology used by
openstack identify API 3.0.
* test API 3.0 instead of API 2.0, by changing
`rgw_keystone_api_version` from "2" to "3"
* explicitly specify a domain "default" for project to be created,
otherwise a POST request will fail with:
```
{"error":{"code":400,"message":"You have tried to create a resource using the admin token. As this token is not within a domain you must explicitly include a domain for this resource to belong
to.","title":"Bad Request"}}
````
* create "default" domain, and use it, othewise a GET request fails
like:
```
2020-05-28T11:17:28.751 INFO:teuthology.orchestra.run.smithi092.stderr:http://smithi092.front.sepia.ceph.com:35357 "GET /v3/domains/default HTTP/1.1" 404 87
2020-05-28T11:17:28.752 INFO:teuthology.orchestra.run.smithi092.stderr:RESP: [404] Content-Length: 87 Content-Type: application/json Date: Thu, 28 May 2020 11:17:28 GMT Server: WSGIServer/0.2
CPython/3.6.9 Vary: X-Auth-Token x-openstack-request-id: req-bc33796f-2bc3-411c-a7fb-1208918e0dbd
2020-05-28T11:17:28.752 INFO:teuthology.orchestra.run.smithi092.stderr:RESP BODY: {"error":{"code":404,"message":"Could not find domain: default.","title":"Not Found"}}
```
* add user to "default" domain when creating it.
* use "type" as the positional argument, per
https://docs.openstack.org/keystone/pike/admin/cli-keystone-manage-services.html
otherwise we will have failures like:
```
2020-05-28T13:38:24.867 INFO:teuthology.orchestra.run.smithi198.stderr:openstack service create: error: unrecognized arguments: --type keystone
```
* update `create_endpoint()` to use the V3 API,
see
https://docs.openstack.org/python-openstackclient/pike/cli/command-objects/endpoint.html
Fixes: https://tracker.ceph.com/issues/45692
Signed-off-by: Kefu Chai <kchai@redhat.com>
* refs/pull/34672/head:
qa/tasks/cephfs: Enable multiple exports tests
mgr/nfs: Instead of 'auth del' use 'auth rm'
qa/tasks/cephfs: Don't enable cephadm in TestNFS
qa/tasks/cephfs: Add tests for nfs exports
mgr/volumes/nfs: Fix idempotency of cluster and export commands
mgr/volumes/nfs: Fix incorrect read only access_type value
mgr/fs/nfs: Use check_mon_command() instead of mon_command()
qa/cephfs: Add tests for nfs
mgr/volumes/nfs: Remove type option from export create interface
vstart: Instead of CACHEINODE use MDCACHE
mgr/volumes: Rearrange nfs export interface
mgr/volumes/nfs: Delete common config object on cluster deletion
mgr/volumes/nfs: Delete all exports on cluster deletion
mgr/volumes: Make nfs create export interface idempotent
vstart: Add watch url for conf-nfs object
mgr/volumes/nfs: Delete user on removing export
mgr/volumes: Create user for given path and fs
vstart: Ensure cephadm and NFS does not conflict
vstart: Update details about ganesha packages
mgr/volumes/nfs: Add delete cephfs export command
mgr/volumes/nfs: Add RADOS notify for common config object
mgr/volumes/nfs: Pass cluster_id directly to NFSCluster {create, update, delete} methods
mgr/volumes: Add nfs cluster delete interface
mgr/volumes: Add nfs cluster update interface
vstart: Enable test_orchestrator in start_ganesha()
mgr/volumes: Add placement option to create nfs cluster interface
mgr/volumes/nfs: Change common ganesha config object name to 'conf-nfs.ganesha-<cluster_id>'
mgr/volumes/nfs: Call orch nfs apply
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Sebastian Wagner <swagner@suse.com>
Reviewed-by: Ramana Raja <rraja@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Look for self.mountpoint in the contents of /proc/<pid>/cmdline file
when finding asok file for the client so that vstart_runner.py won't end
up picking asok file for a client not created in current run.
This usually never happens so far because PID of newly created processes
is higher than that of previously created processes and list of asok
files returned by "glob.glob(asok_path)" in find_socket() is in
descending order of PIDs.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
If the ceph-fuse client need to flush the caps and does sync wait,
the umount() will just return successfully, then the netns container
will be destroyed and the network will not be reachable, but the
ceph-fuse daemon is still stucked and waiting for the flush caps ack.
This will cause the ceph-fuse daemon get stuck forever and if the
mds daemons get restarted, it will try to reconnect the clients,
but the stucked ceph-fuse daemnon won't reply to it, because it is
not reachable any more.
Fixes: https://tracker.ceph.com/issues/45665
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Adding a check for already bootstrapped clusters where the image is
already set to avoid overriding it.
Signed-off-by: Georgios Kyratsas <gkyratsas@suse.com>