fix: resolve inconsistent judgment of osd_pg_stat_report_interval_max
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Matan Breizman <Matan.Brz@gmail.com>
as the scrub reservation changes had made it obsolete.
Note - it is not an issue of fixing the test, but rather
that the tested functionality is no longer there.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
osd_pg_stat_report_max was previously used as either a max time in seconds
or a max number of epochs. Instead, separate into two configs and adjust
PeeringState::prepare_stats_for_publish to check both.
Additionally, this commit removes a superfluous check in
PeeringState::Active::react(const AdvMap&) and calls publish_stats_to_osd
unconditionally as with other callers in PeeringState.
Fixes: https://tracker.ceph.com/issues/63520
Signed-off-by: zhangjianwei2 <zhangjianwei2@cmss.chinamobile.com>
... instead of a simple counter.
This - as a preparation for the next commit, which will decouple
the "being reserved" state from the handling of scrub requests.
The planned changes to the scrub state machine will make
it harder to know when to clear the "being reserved" state.
The changes here would allow us to err on the side of caution,
i.e. trying to "un-count" a remote reservation even if it was not
actually reserved or was already deleted.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Using
ceph tell $pgid [deep]-scrub
to initiate an 'operator initiated' scrub, and
ceph tell $pgid schedule[-deep]-scrub
for causing a 'periodic scrub' to be scheduled.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
from ScrubQueue::select_pg_and_scrub().
Clearing the path to moving some ScrubQueue methods into
OscScrub. Starting here with the CPU load tracker.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Ceph status fail to report pool application warning if
the pool is empty. Report pool application warning
even if pool has 0 objects stored in it.
Add POOL_APP_NOT_ENABLED cluster warnings to log-ignorelist
to fix rados suite.
Fixes: https://tracker.ceph.com/issues/57097
Signed-off-by: Prashant D <pdhange@redhat.com>
qa/standalone/osd/divergent-prior.sh: Divergent test 3 with pg_autoscale_mode on pick divergent osd
Reviewed-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
When creating new pool, the current code pick the divergent osd by
the first pg out of pg dump pgs, that can be in "unknown" status
which means the up_primary = -1 and that will fail the test.
We need to wait unitl the first pg is active+clean
Fixes: https://tracker.ceph.com/issues/56034
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
1. Setting frequent scrub status updates, to compensate for the removal
of some 'send updates' in PR#50283.
2. Switching back to using the wpq scheduler, as otherwise the number of
concurrent recovery operations is below what the test expects.
Fixes: https://tracker.ceph.com/issues/61386
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Initialize standalone test for stretched clusters,
testing uneven weight warnings and != 2 buckets
warnings.
Added `wait_for_health_gone()` function in ceph-helpers.sh
this function allows us to wait for health condition to
disappear when doing standalone tests.
Signed-off-by: Kamoltat <ksirivad@redhat.com>
This is a follow-up to PR: https://github.com/ceph/ceph/pull/48703.
This commit also considers changes made ephemerally using either the
'daemon' or the 'tell' interfaces to override the built-in mClock
QoS parameters. In such a scenario, the ephemeral changes are removed
using the rm_val() method exposed by the config subsytem and logging
this information.
Other changes:
1. Add a standalone test to exercise the fix.
2. Add documentation note on the outcome of the attempt to modify
built-in profile defaults.
Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
The qa tests are not client I/O centric and mostly focus on triggering
recovery/backfills and monitor them for completion within a finite amount
of time. The same holds true for scrub operations.
Therefore, an mClock profile that optimizes background operations is a
better fit for qa related tests. The osd_mclock_profile is therefore
globally overriden to 'high_recovery_ops' profile for the Rados suite as
it fits the requirement.
Also, many standalone tests expect recovery and scrub operations to
complete within a finite time. To ensure this, the osd_mclock_profile
options is set to 'high_recovery_ops' as part of the run_osd() function
in ceph-helpers.sh.
A subset of standalone tests explicitly used 'high_recovery_ops' profile.
Since the profile is now set as part of run_osd(), the earlier overrides
are redundant and therefore removed from the tests.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Let's use the middle profile as the default.
Modify the standalone tests accordingly.
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Fix an issue where an overridden mClock recovery setting (set prior to
an osd restart) could be lost after an osd restart.
For e.g., consider that prior to an osd restart, the option
'osd_max_backfill' was successfully set to a value different from the
mClock default. If the osd was restarted for some reason, the
boot-up sequence was incorrectly resetting the backfill value to the
mclock default within the async local/remote reservers. This fix
ensures that no change is made if the current overriden value is
different from the mClock default.
Modify an existing standalone test to verify that the local and remote
async reservers are updated to the desired number of backfills under
normal conditions and also across osd restarts.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
The mClock scheduler's cost model for HDDs/SSDs is modified and now
represents the cost of an IO in terms of bytes.
The cost parameters, namely, osd_mclock_cost_per_io_usec_[hdd|ssd]
and osd_mclock_cost_per_byte_usec_[hdd|ssd] which represent the cost
of an IO in secs are inaccurate and therefore removed.
The new model considers the following aspects of an osd to calculate
the cost of an IO:
- osd_mclock_max_capacity_iops_[hdd|ssd] (existing option)
The measured random write IOPS at 4 KiB block size. This is
measured during OSD boot-up using OSD bench tool.
- osd_mclock_max_sequential_bandwidth_[hdd|ssd] (new config option)
The maximum sequential bandwidth of of the underlying device.
For HDDs, 150 MiB/s is considered, and for SSDs 750 MiB/s is
considered in the cost calculation.
The following important changes are made to arrive at the overall
cost of an IO,
1. Represent QoS reservation and limit config parameter as proportion:
The reservation and limit parameters are now set in terms of a
proportion of the OSD's max IOPS capacity. The earlier representation
was in terms of IOPS per OSD shard which required the user to perform
calculations before setting the parameter. Representing the
reservation and limit in terms of proportions is much more intuitive
and simpler for a user.
2. Cost per IO Calculation:
Using the above config options, osd_bandwidth_cost_per_io for the osd is
calculated and set. It is the ratio of the max sequential bandwidth and
the max random write iops of the osd. It is a constant and represents the
base cost of an IO in terms of bytes. This is added to the actual size of
the IO(in bytes) to represent the overall cost of the IO operation.See
mClockScheduler::calc_scaled_cost().
3. Cost calculation in Bytes:
The settings for reservation and limit in terms a fraction of the OSD's
maximum IOPS capacity is converted to Bytes/sec before updating the
mClock server's ClientInfo structure. This is done for each OSD op shard
using osd_bandwidth_capacity_per_shard shown below:
(res|lim) = (IOPS proportion) * osd_bandwidth_capacity_per_shard
(Bytes/sec) (unitless) (bytes/sec)
The above result is updated within the mClock server's ClientInfo
structure for different op_scheduler_class operations. See
mClockScheduler::ClientRegistry::update_from_config().
The overall cost of an IO operation (in secs) is finally determined
during the tag calculations performed in the mClock server. See
crimson::dmclock::RequestTag::tag_calc() for more details.
4. Profile Allocations:
Optimize mClock profile allocations due to the change in the cost model
and lower recovery cost.
5. Modify standalone tests to reflect the change in the QoS config
parameter representation of reservation and limit options.
Fixes: https://tracker.ceph.com/issues/58529
Fixes: https://tracker.ceph.com/issues/59080
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Separate `mon-stretch` from `mon`.
Renamed `mon-stretched-cluster.sh` to
`mon-stretch-fail-recovery.sh`.
This isolation of stretch cluster test will enable
developers to get results faster for stretch-cluster
related stuff.
Signed-off-by: Kamoltat <ksirivad@redhat.com>
1e44d86b2 swapped this to a pg tell command which doesn't actually
need the primary specified. Drop the now unnecessary lookup.
Signed-off-by: Samuel Just <sjust@redhat.com>
osd/PeeringState: Add logs around can_serve_replica_read() / last_complete_ondisk()
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Ronen Friedman <rfriedma@redhat.com>
The test performs shallow scrubs, intentionally using small chunk
sizes to allow dump commands time to check specific details.
Following commit ffda64119f
(PR#44749), shallow scrubs chunks are controlled by a separate
configuration parameter. This PR fixes the test to use the
correct parameter.
An additional minor change is an adjustment to the test loop sleep time:
it is now reduced to guarantee that a dump followed by a counter
increase will be performed in more-or-less the scrubs frequency.
Fixes: https://tracker.ceph.com/issues/58797
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Generally, it's more portable not to rely on specific system files to be
readable. Specifically, container environments may not have an fstab.
Instead, just generate another random file.
Signed-off-by: Samuel Just <sjust@redhat.com>
Using the existing common default chunk size for scrubs that are
not deep scrubs is wasteful: a high ratio of inter-OSD messages
per chunk, while the actual OSD work per chunk is minimal.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
The QoS parameters (res, wgt, lim) of mClock profiles are not allowed to
be modified by users using commands like "config set" or via admin socket.
handle_conf_change() does not allow changes to any built-in mClock profile
at the mClock scheduler. But the config subsystem showed the change as
expected for the built-in mClock profile QoS parameters. This misled the
user into thinking that the change was made at the mClock server when
it was not the case.
The above issue is the result of the config "levels" used by the config
subsystem. The inital built-in QoS params are set at the CONF_DEFAULT
level. This allows the user to modify the built-in QoS params using
"config set" command which sets values at CONF_MON level which has higher
priority than CONF_DEFAULT level. The new value is persisted on the mon
store and therefore the config subsystem shows the change when "config
show" command is issued.
To prevent the above, this commit adds changes to restore the defaults set
for the built-in profiles by removing the new config changes from the MON
store. This results in the original defaults to come back into effect and
maintain a consistent view of the built-in profile across all levels.
To accomplish this, the mClock scheduler is provided with additional
information like the OSD id, shard id and a pointer to the MonClient
using which the Mon store command to remove the option is executed.
A standalone test is added to verify that built-in params cannot be
modified and the original profile params are retained.
Fixes: https://tracker.ceph.com/issues/57533
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Set osd_mclock_override_recovery_settings option to true for tests that
modify recovery/backfill configuration options. This prevents logging of
the cluster warning when modifying recovery/backfill limits.
Fixes: https://tracker.ceph.com/issues/57529
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
- Consolidate all mclock standalone tests under
qa/standalone/misc/mclock-config.sh.
- Revert existing tests in ceph-helpers.sh that verified the earlier hard
override of recovery/backfill limits.
- Add new tests to verify the procedure to change the recovery/backfill
limits with mclock scheduler.
Fixes: https://tracker.ceph.com/issues/57529
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Problem:
--mon-initial-members does nothing but causes monmap
to populate ``removed_ranks`` because the way we start
monitors in standalone tests uses ``run_mon $dir $id ..``
on each mon. Regardless of --mon-initial-members=a,b,c, if
we set --mon-host=$MONA,$MONB,$MONC (which we do every single tests),
everytime we run a monitor (e.g.,run mon.b) it will pre-build
our monmap with
```
noname-a=mon.noname-a addrs v2:127.0.0.1:7127/0,
b=mon.b addrs v2:127.0.0.1:7128/0,
noname-c=mon.noname-c addrs v2:127.0.0.1:7129/0,
```
Now, with --mon-initial-members=a,b,c we are letting
monmap know that we should have initial members name:
a,b,c, which we only have `b` as a match. So what
``MonMap::set_initial_members`` do is that it will
remove noname-a and noname-c which will
populate `removed_ranks`.
Solution:
remove all instances of --mon-initial-members
in the standalone test as it has no impact on
the nature of the tests themselves.
Fixes: https://tracker.ceph.com/issues/58132
Signed-off-by: Kamoltat <ksirivad@redhat.com>
This bases on two commits:
* 7bbc92eda3 and
* 6b22d47863 which seems to be
a fixup to former one.
In contrast to them, in `OSDMonitor::create_initial()` I updated
also `newmap.require_osd_release` to pacific when
`mon_debug_no_require_reef` and `mon_debug_no_require_quincy`.
Please take have an extra look on that during the review.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
The test (in the standalone/scrub suite) verifies that the scrubber
detects (and issues a cluster-log error) whenever a mapping entry
("SNA_") is missing in the SnapMapper DB.
Specifically, here the entry is corrupted - shortened as per
https://tracker.ceph.com/issues/56147.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>