Create the initial mClock QoS params at CONF_DEFAULT level using
set_val_default(). This allows switching to a custom profile on a
running OSD and to make necessary changes to the desired QoS params.
Note that Switching to ‘custom’ profile and then subsequently changing
the QoS params using “config set osd.n …” will be at a higher level i.e.
at CONF_MON.
But When switching back to a built-in profile, the new values won’t take
effect since CONF_DEFAULT < CONF_MON. For the values to take effect, the
config keys created as part of the ‘custom’ profile must be removed from
the ConfigMonitor store after switching back to a built-in profile.
- Added a couple of standalone tests to exercise the scenario.
- Updated the mClock configuration document and the mClock internal
documentation with a couple of typos relating to the best effort weights.
- Added new sections to the mClock configuration document outlining the
steps to switch between the built-in and custom profile and vice-versa.
Fixes: https://tracker.ceph.com/issues/55153
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
If any clone is in pending or in-progress state then
show these clones in 'fs subvolume snapshot info'
command output.
Fixes: https://tracker.ceph.com/issues/55041
Signed-off-by: Nikhilkumar Shelke <nshelke@redhat.com>
mds,qa: some balancer debug messages (<=5) not printed when debug_mds is >=5
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Ramana Raja <rraja@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
1. If data or metadata pool is already in-use by filesystem
then it is not allowed to reuse the same pool for another
filesystems.
2. Test is failing because above(1) restrictions/checks comes
before checking erasure-code pools. Hence test is failing
and not finding expected error string in output.
3. Proposed fix checks newly added error string instead of
'erasure-code'.
4. Also adding new tests to verify string 'erasure-code'
by passing --force option so that check for pools reuse(1)
will be skipped and check for 'erasure-code' will be hit.
Fixes: https://tracker.ceph.com/issues/56384
Signed-off-by: Nikhilkumar Shelke <nshelke@redhat.com>
We currently run "iogen -n 5 -s 2g" for about 10 minutes. This workload
does not always generate export/import of subtrees that is being checked
by iogen.yaml. iogen workload is suited for running heavily fragmented I/O
on a file system, and not for growing directory trees.
Fixes: https://tracker.ceph.com/issues/54108
Signed-off-by: Ramana Raja <rraja@redhat.com>
Right now, run_shell() in mount.py accepts both "sudo" and "omit_sudo"
as parameters. It's better to accept only one of these two parameters.
A call to run_shell() where both are set to opposing values will be
buggy. Therefore, methods calling run_shell() must add "sudo" to command
arguments before call and set omit_sudo to False in call.
As a result of this change, methods like stat() and run_python() in
mount.py are now modified to add "sudo" to command arguments
and set omit_sudo to False within their own definitions.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Methods run_as_user() and run_python() don't set omit_sudo to False even
when command arguments contain sudo. This will cause vstart_runner.py to
delete "sudo" from command arguments which will/might lead to a bug.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
scrub/osd: add clearer reminders that a scrub is blocked
Reviewed-by: Laura Flores <lflores@redhat.com>
Reviewed-by: Matan Breizman <mbreizma@redhat.com>
As some Teuthology tests seem to block objects for long minutes,
we must not issue the "scrub is blocked for too long" warning
(that warning causes the tests to fail).
A new configuration parameter now controls the grace period before
the warning is issued. Some tests were modified to set this
configuration parameter to a large value.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Establishing a watch on rbd_mirroring object and skipping rescanning
image mirror snapshots on periodic refresh unless rbd_mirroring object
gets notified in the interim is flawed. rbd_mirroring object is
notified when mirroring is enabled or disabled on some image (including
when the image is removed), but it is not notified when images are
promoted or demoted. However, load_pool_images() discards images that
are not primary at the time of the scan. If the image is promoted
later, no snapshots are created even if the schedule is in place. This
happens regardless of whether the schedule is added before or after the
promotion.
This effectively reverts commit 69259c8d37 ("mgr/rbd_support: make
mirror_snapshot_schedule rescan only updated pools"). An alternative
fix could be to stop discarding non-primary images (i.e. drop
if not info['primary']:
continue
check added in commit d39eb283c5 ("mgr/rbd_support: mirror snapshot
schedule should skip non-primary images")), but that would clutter the
queue and therefore "rbd mirror snapshot schedule status" output with
bogus entries. Performing a rescan roughly every 60 seconds should be
manageable: currently it amounts to a single mirror_image_status_list
request, followed by mirror_image_get, get_snapcontext and snapshot_get
requests for each snapshot-based mirroring enabled image and concluded
by a single dir_list request. Among these, per-image get_snapcontext
and snapshot_get requests are necessary for determining primaryness.
Fixes: https://tracker.ceph.com/issues/53914
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
mon: verify data pool is already not in use by any file system
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
Reviewed-by: Jos Collin <jcollin@redhat.com>
Reviewed-by: Neeraj Pratap Singh <neesingh@redhat.com>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
commit 4fbf4c4f58 increases the
number of tags used in snaptest-git-ceph.sh tests. This makes
the tests run longer (than default 3h) thereby timing out.
Signed-off-by: Venky Shankar <vshankar@redhat.com>
TestMDSMetrics.test_delayed_metrics is failing due to
the absence of omit_sudo parameter in the remote.run()
of set_inter_mds_block() in qa/tasks/cephfs/filesystem.py
Fixes: https://tracker.ceph.com/issues/56065
Signed-off-by: Neeraj Pratap Singh <neesingh@redhat.com>
If --group_name=_nogroup is provided in the command then
throw error permission denied as it is internal group of ceph fs.
Fixes: https://tracker.ceph.com/issues/55759
Signed-off-by: Nikhilkumar Shelke <nshelke@redhat.com>
When setting the ec pool to the layout the filesystem may not be
ready, so when mounting a fuse client it will fail. To fix this we
need to wait at least the rank 0 to be in up:active state.
Fixes: https://tracker.ceph.com/issues/55824
Signed-off-by: Xiubo Li <xiubli@redhat.com>
The 'size' shown in the output of snapshot info command relies on
rstats which is incorrect snapshot size. It tracks size of the
subvolume from the snapshot has been taken instead of the snapshot
itself. Hence having the 'size' field in the output of 'snapshot info'
doesn't make sense until the rstats is fixed.
Fixes: https://tracker.ceph.com/issues/55822
Signed-off-by: Nikhilkumar Shelke <nshelke@redhat.com>