scrub/osd: add clearer reminders that a scrub is blocked
Reviewed-by: Laura Flores <lflores@redhat.com>
Reviewed-by: Matan Breizman <mbreizma@redhat.com>
As some Teuthology tests seem to block objects for long minutes,
we must not issue the "scrub is blocked for too long" warning
(that warning causes the tests to fail).
A new configuration parameter now controls the grace period before
the warning is issued. Some tests were modified to set this
configuration parameter to a large value.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Establishing a watch on rbd_mirroring object and skipping rescanning
image mirror snapshots on periodic refresh unless rbd_mirroring object
gets notified in the interim is flawed. rbd_mirroring object is
notified when mirroring is enabled or disabled on some image (including
when the image is removed), but it is not notified when images are
promoted or demoted. However, load_pool_images() discards images that
are not primary at the time of the scan. If the image is promoted
later, no snapshots are created even if the schedule is in place. This
happens regardless of whether the schedule is added before or after the
promotion.
This effectively reverts commit 69259c8d37 ("mgr/rbd_support: make
mirror_snapshot_schedule rescan only updated pools"). An alternative
fix could be to stop discarding non-primary images (i.e. drop
if not info['primary']:
continue
check added in commit d39eb283c5 ("mgr/rbd_support: mirror snapshot
schedule should skip non-primary images")), but that would clutter the
queue and therefore "rbd mirror snapshot schedule status" output with
bogus entries. Performing a rescan roughly every 60 seconds should be
manageable: currently it amounts to a single mirror_image_status_list
request, followed by mirror_image_get, get_snapcontext and snapshot_get
requests for each snapshot-based mirroring enabled image and concluded
by a single dir_list request. Among these, per-image get_snapcontext
and snapshot_get requests are necessary for determining primaryness.
Fixes: https://tracker.ceph.com/issues/53914
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
mon: verify data pool is already not in use by any file system
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
Reviewed-by: Jos Collin <jcollin@redhat.com>
Reviewed-by: Neeraj Pratap Singh <neesingh@redhat.com>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
TestMDSMetrics.test_delayed_metrics is failing due to
the absence of omit_sudo parameter in the remote.run()
of set_inter_mds_block() in qa/tasks/cephfs/filesystem.py
Fixes: https://tracker.ceph.com/issues/56065
Signed-off-by: Neeraj Pratap Singh <neesingh@redhat.com>
If --group_name=_nogroup is provided in the command then
throw error permission denied as it is internal group of ceph fs.
Fixes: https://tracker.ceph.com/issues/55759
Signed-off-by: Nikhilkumar Shelke <nshelke@redhat.com>
When setting the ec pool to the layout the filesystem may not be
ready, so when mounting a fuse client it will fail. To fix this we
need to wait at least the rank 0 to be in up:active state.
Fixes: https://tracker.ceph.com/issues/55824
Signed-off-by: Xiubo Li <xiubli@redhat.com>
The 'size' shown in the output of snapshot info command relies on
rstats which is incorrect snapshot size. It tracks size of the
subvolume from the snapshot has been taken instead of the snapshot
itself. Hence having the 'size' field in the output of 'snapshot info'
doesn't make sense until the rstats is fixed.
Fixes: https://tracker.ceph.com/issues/55822
Signed-off-by: Nikhilkumar Shelke <nshelke@redhat.com>
https://github.com/ceph/teuthology/pull/999 never got overridden in ceph.git. We've been using a years-old checkout of teuthology for the `teuthology` user.
With the master->main change, that checkout needed to go. Then when trying to schedule new nightlies, teuthology-suite was defaulting to ceph-ci.git which either has very old versions of the release branches (octopus, pacific, etc.) or they don't exist at all.
Signed-off-by: David Galloway <dgallowa@redhat.com>
Show "No RBD pools available" error page when accessing block/rbd if there are no rbd pools.
Add a "button_name" and "button_route" property to `ModuleStatusGuardService` config to customize the button on the error page.
Modify `ModuleStatusGuardService` to execute API calls to `/ui-api/<uiApiPath>/status` which uses the `UIRouter`.
Fixes: https://tracker.ceph.com/issues/42109
Signed-off-by: Melissa Li <melissali@redhat.com>
mds: fix crash when exporting unlinked dir
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Modify Iscsi tcmu-runner container to be run demonized in the same
systemd slice as all other ceph processes
Signed-off-by: Teoman ONAY <tonay@redhat.com>
Enable snapshot mirroring from the Pools -> Image
Also show the mirror-snapshot in the image where snapshot is enabled
When parsing images if an image has the snapshot mode enabled, it will
try to run commands that don't work with that mode. The solution was
not running those for now and appending the mode in the get call.
Fixes: https://tracker.ceph.com/issues/55648
Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
Signed-off-by: Nizamudeen A <nia@redhat.com>
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Validates the subvolume removal is successful if the
corresponding group's quota is set.
Fixes: https://tracker.ceph.com/issues/53509
Signed-off-by: Kotresh HR <khiremat@redhat.com>
qa/cephfs: set omit_sudo False when sudo is set to True
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Reviewed-by: Nikhilkumar Shelke <nshelke@redhat.com>
qa/cephfs: fix minor bug in caps_helper.py's run_mon_cap_tests()
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Ramana Raja <rraja@redhat.com>
The RWL mode needs DAX and is dog slow otherwise -- qemu_xfstests.yaml
job always hits the 6 hour max_job_time limit.
As our tmpfs instance is limited and qemu_xfstests.yaml opens three
images at the same time, reduce the "big cache" size to 5G. This facet
was added to iron out 32-bit head/tail pointer issues and 5G still does
the job there.
Going through the loop device is needed because tmpfs doesn't support
O_DIRECT.
Fixes: https://tracker.ceph.com/issues/55400
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
CephFSMount.get_key_from_keyfile() should raise an exception instead of
returning None if key is not found in keyring file.
Fixes: https://tracker.ceph.com/issues/50010
Signed-off-by: Rishabh Dave <ridave@redhat.com>