* refs/pull/54031/head:
qa: add test to mangle lost+found directory object and ensure safety
qa: run scrub before mounting client and validations
Reviewed-by: Kotresh Hiremath Ravishankar <khiremat@redhat.com>
qa: set mds config with `config set` for a particular test
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Reviewed-by: Jos Collin <jcollin@redhat.com>
Reviewed-by: Rishabh Dave <ridave@redhat.com>
* refs/pull/53999/head:
PendingReleaseNotes: support for subvolumes and subvolume groups in snap_schedule
snap_schedule/tests: fix db upgrade issue
qa: add yaml for on demand subvol version testing
qa: add test cases for testing --subvol and --group arguments
mgr/volumes: conditionalize subvolume upgrade
mgr/volumes: ensure correct init of v1 subvol
mgr/snap_schedule: add subvol and subvol group arguments to cli
mds/snap_schedule: add subvolume group column management
mgr/volumes: add remote helper methods to fetch subvolume info
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Failure without fix looks like:
2023-12-21T16:05:55.737+0000 7fbe585b0700 0 [devicehealth DEBUG root] loading object ABC_DEADB33F_FA
2023-12-21T16:05:55.737+0000 7fbe585b0700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'devicehealth' while running on mgr.x: [errno 2] RADOS object not found (Failed to operate read op for oid ABC_DEADB33F_FA)
2023-12-21T16:05:55.737+0000 7fbe585b0700 -1 devicehealth.serve:
2023-12-21T16:05:55.737+0000 7fbe585b0700 -1 Traceback (most recent call last):
File "/home/pdonnell/ceph/src/pybind/mgr/devicehealth/module.py", line 394, in serve
self._do_serve()
File "/home/pdonnell/ceph/src/pybind/mgr/mgr_module.py", line 524, in check
return func(self, *args, **kwargs)
File "/home/pdonnell/ceph/src/pybind/mgr/devicehealth/module.py", line 354, in _do_serve
finished_loading_legacy = self.check_legacy_pool()
File "/home/pdonnell/ceph/src/pybind/mgr/devicehealth/module.py", line 326, in check_legacy_pool
if self._load_legacy_object(ioctx, obj.key):
File "/home/pdonnell/ceph/src/pybind/mgr/devicehealth/module.py", line 300, in _load_legacy_object
ioctx.operate_read_op(op, oid)
File "rados.pyx", line 3723, in rados.Ioctx.operate_read_op
rados.ObjectNotFound: [errno 2] RADOS object not found (Failed to operate read op for oid ABC_DEADB33F_FA)
Credit to Greg Farnum for postulating the cause.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This is to allow us to pull the latest build of
cephadm off of a stable branch (currently the only
valid option for that is reef, although this hopefully
will work with squid, T release, etc. in the future).
This should allow us to bootstrap cliusters based on
those stable branches for use in upgrade testing
Signed-off-by: Adam King <adking@redhat.com>
Earlier ceph versions didn't allow the lost+found directory to be removed
and nor the entries inside it. Users are recommended to fail the filesystem
and remove the directory object using rados cli commands. Therefore, include
this step as part of our testing.
Signed-off-by: Venky Shankar <vshankar@redhat.com>
* refs/pull/52196/head:
qa: configure balancer for multi-mds workloads
qa: create qa subvolumes in named subvolumegroup
qa: do not rely on default max_mds value
qa: add automate_balance to dashboard qa schema
doc/cephfs: add docs for balance_automate
doc/cephfs: use bash prompt for shell code
mds: add balance_automate fs setting
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
* refs/pull/48895/head:
qa: test cases for checking the health status after scrub repair
mds: scrub repair does not clear earlier damage health status
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
This is the more modern variant. Crimson doesn't currently
support the pg <pgid> deep_scrub variant, so let's just use
this one generally.
Signed-off-by: Samuel Just <sjust@redhat.com>
A basic test for ceph-nvmeof[1] where
nvmeof initiator is created.
It requires use of a new task "nvmeof_gateway_cfg"
under cephadm which shares config information
between two remote hosts.
[1] https://github.com/ceph/ceph-nvmeof/
Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
* refs/pull/53431/head:
qa: add test cases to verify error reporting works as expected
mgr: fix some doc strings in object_format.py
mgr/tests: test returning error status works as expected
mgr: make object_format's Responder class capable of responding err status
mgr/nfs: report proper errno with err status
Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Generate a name that is shorter and easier to remember.
Also, write a simpler, faster & better helper method for generating
unique names. This method will also have shorter and more concise name,
so this will be easier to type and easier to read.
Fixes: https://tracker.ceph.com/issues/63680
Signed-off-by: Rishabh Dave <ridave@redhat.com>
kernel 5.4 (Ubuntu 20.04) has the following missing commits:
- 5a9e2f5d5590 ceph: add ceph.{cluster_fsid/client_id} vxattrs
- 247b1f19dbeb ceph: add status debugfs file
fs suite relies on these debugfs entries to gather mount information
(client-id, addr/inst) which are required by some tests. In fs suite,
the disto kernel gets overridden by the testing kernel and therefore
even if Ubuntu 20.04 is chosen as the distro, the testing kernel is
installed. However, with smoke suite, the distro kernel is used and
the missing patches causes certain essential information gathering to
fail early on (client-id, etc..) causing the test to not even start
execution. PR #54515 fixes a bug in the client-id fetching path but
isn't complete due to the missing patches - details here:
https://tracker.ceph.com/issues/63488#note-8
But its essential to have the smoke tests running since those tests
have lately uncovered bugs in the MDS (w/ distro kernels). In order
to benefit from those tests, this change ignores failures when
gathering mount information (which aren't used by the fs relevant
smoke tests). The test (in fs suite) that rely on this piece of
information would fail when run with 20.04 distro kernel (but the
fs suite overrides it with the testing kernel).
Signed-off-by: Venky Shankar <vshankar@redhat.com>
The duration is impresise and sometimes will give a false alarm
just in case the shell command itself is issued late.
https://tracker.ceph.com/issues/63587
Signed-off-by: Xiubo Li <xiubli@redhat.com>