Mapping rbd images to nbd devices using ioctl interface is not
robust. It was discovered that the device size or the md5 checksum
of the nbd device was incorrect immediately after mapping using
ioctl method. When using the nbd netlink interface to map RBD images
the issue was not encountered. Switch to using nbd netlink interface
for mapping.
Fixes: https://tracker.ceph.com/issues/64063
Signed-off-by: Ramana Raja <rraja@redhat.com>
fix: resolve inconsistent judgment of osd_pg_stat_report_interval_max
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Matan Breizman <Matan.Brz@gmail.com>
Include device identifier or cookie in the message sent to the kernel
to resize images mapped to NBD devices using netlink. Otherwise,
netlink_resize() fails and the size of the device isn't updated.
Fixes: https://tracker.ceph.com/issues/64139
Signed-off-by: Ramana Raja <rraja@redhat.com>
common/options/crimson: increase crimson_osd_obc_lru_size to 512
Reviewed-by: Matan Breizman <mbreizma@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
SKIP_IF_CRIMSON won't work here since we try to create EC pools
prior to the test being run.
Skip if the entire test instead by seperating EC tests.
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
qa/suites/rados/thrash: modify selection of max-scrubs configuration values
Reviewed-by: Matan Breizman <mbreizma@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
qa: set mds config with `config set` for a particular test
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Reviewed-by: Jos Collin <jcollin@redhat.com>
Reviewed-by: Rishabh Dave <ridave@redhat.com>
As the osd-max-scrubs default was increased from 1 to (currently) 3, the
original set of optional values under rados/thrash/3-scrub-overrides are
no longer useful. This commits changes the set of optional values to
reflect the current default.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
* refs/pull/53999/head:
PendingReleaseNotes: support for subvolumes and subvolume groups in snap_schedule
snap_schedule/tests: fix db upgrade issue
qa: add yaml for on demand subvol version testing
qa: add test cases for testing --subvol and --group arguments
mgr/volumes: conditionalize subvolume upgrade
mgr/volumes: ensure correct init of v1 subvol
mgr/snap_schedule: add subvol and subvol group arguments to cli
mds/snap_schedule: add subvolume group column management
mgr/volumes: add remote helper methods to fetch subvolume info
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Failure without fix looks like:
2023-12-21T16:05:55.737+0000 7fbe585b0700 0 [devicehealth DEBUG root] loading object ABC_DEADB33F_FA
2023-12-21T16:05:55.737+0000 7fbe585b0700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'devicehealth' while running on mgr.x: [errno 2] RADOS object not found (Failed to operate read op for oid ABC_DEADB33F_FA)
2023-12-21T16:05:55.737+0000 7fbe585b0700 -1 devicehealth.serve:
2023-12-21T16:05:55.737+0000 7fbe585b0700 -1 Traceback (most recent call last):
File "/home/pdonnell/ceph/src/pybind/mgr/devicehealth/module.py", line 394, in serve
self._do_serve()
File "/home/pdonnell/ceph/src/pybind/mgr/mgr_module.py", line 524, in check
return func(self, *args, **kwargs)
File "/home/pdonnell/ceph/src/pybind/mgr/devicehealth/module.py", line 354, in _do_serve
finished_loading_legacy = self.check_legacy_pool()
File "/home/pdonnell/ceph/src/pybind/mgr/devicehealth/module.py", line 326, in check_legacy_pool
if self._load_legacy_object(ioctx, obj.key):
File "/home/pdonnell/ceph/src/pybind/mgr/devicehealth/module.py", line 300, in _load_legacy_object
ioctx.operate_read_op(op, oid)
File "rados.pyx", line 3723, in rados.Ioctx.operate_read_op
rados.ObjectNotFound: [errno 2] RADOS object not found (Failed to operate read op for oid ABC_DEADB33F_FA)
Credit to Greg Farnum for postulating the cause.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
I believe this check was originally added because
the 2->3 migration migrated some nfs related bits. Since
then we've had to update the migration this checks
for every time we bump the max migration. This change
is intended to instead just have it check for a
miration > 2 so we don't have to keep updating it.
Signed-off-by: Adam King <adking@redhat.com>
The compiled zipapp cephadm that began in reef needs
to be pulled differently than the old single python script
cephadm from earlier releases. This commit updates the reef-x
upgrade suite to pull cephadm in this new way.
Signed-off-by: Adam King <adking@redhat.com>
This is to allow us to pull the latest build of
cephadm off of a stable branch (currently the only
valid option for that is reef, although this hopefully
will work with squid, T release, etc. in the future).
This should allow us to bootstrap cliusters based on
those stable branches for use in upgrade testing
Signed-off-by: Adam King <adking@redhat.com>
Adds a test that will set the default cephadm command
timeout and then force a timeout to occur by holding
the cephadm lock and triggering a device refresh.
This works because cephadm ceph-volume commands
require the cephadm lock to run, so the command will
timeout waiting for the lock to become available.
Signed-off-by: Adam King <adking@redhat.com>