With the new mClock default profile, tests were failing with "Exiting scrub checking -- not all pgs scrubbed" due to slower scrubs.
Changing the default profile to high_recovery_ops for testing purposes will fix this issue.
Fixes: https://tracker.ceph.com/issues/61228
Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
crimson/os/seastore/btree: link fixedkvbtree's nodes and logical extents with forward and backward pointers, and drop the pin_set
Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
After involving intra-fixed-kv-btree parent-child pointers, we need to keep the
invariant that it's only when extents are not in transactions' read_set that
we can directly query cache with inspecting the transaction
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
The qa tests are not client I/O centric and mostly focus on triggering
recovery/backfills and monitor them for completion within a finite amount
of time. The same holds true for scrub operations.
Therefore, an mClock profile that optimizes background operations is a
better fit for qa related tests. The osd_mclock_profile is therefore
globally overriden to 'high_recovery_ops' profile for the Rados suite as
it fits the requirement.
Also, many standalone tests expect recovery and scrub operations to
complete within a finite time. To ensure this, the osd_mclock_profile
options is set to 'high_recovery_ops' as part of the run_osd() function
in ceph-helpers.sh.
A subset of standalone tests explicitly used 'high_recovery_ops' profile.
Since the profile is now set as part of run_osd(), the earlier overrides
are redundant and therefore removed from the tests.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Modify the relevant documentation to reflect:
- change in the default mClock profile to 'balanced'
- new allocations for ops across mClock profiles
- change in the osd_max_backfills limit
- miscellaneous changes related to warnings.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
The osd_mclock_max_sequential_bandwidth_ssd is changed to 1200 MiB/s as
a reasonable middle ground considering the broad range of SSD capabilities.
This allows the mClock's cost model to extract the SSDs capability
depending on the cost of the IO being performed.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
The earlier limit of 3 was still aggressive enough to have an impact on
the client and other competing operations. Retain the current default
for mClock. This can be modified if necessary after setting the
osd_mclock_override_recovery_settings option.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Let's use the middle profile as the default.
Modify the standalone tests accordingly.
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Now that recovery operations are split between background_recovery and
background_best_effort, rebalance qos params to avoid penalizing
background_recovery while idle.
Signed-off-by: Samuel Just <sjust@redhat.com>
Otherwise, these end up as PGOpItem and therefore as immediate:
class PGOpItem : public PGOpQueueable {
...
op_scheduler_class get_scheduler_class() const final {
auto type = op->get_req()->get_type();
if (type == CEPH_MSG_OSD_OP ||
type == CEPH_MSG_OSD_BACKOFF) {
return op_scheduler_class::client;
} else {
return op_scheduler_class::immediate;
}
}
...
};
This was probably causing a bunch of extra interference with client
ops.
Signed-off-by: Samuel Just <sjust@redhat.com>
Recovery operations on pgs/objects that have fewer than the configured
number of copies should be treated more urgently than operations on
pgs/objects that simply need to be moved to a new location.
Signed-off-by: Samuel Just <sjust@redhat.com>
is_qos_item() was only used in operator<< for OpSchedulerItem. However,
it's actually useful to see priority for mclock items since it affects
whether it goes into the immediate queues and, for some types, the
class. Unconditionally display both class_id and priority.
Signed-off-by: Samuel Just <sjust@redhat.com>