Ceph status fail to report pool application warning if
the pool is empty. Report pool application warning
even if pool has 0 objects stored in it.
Add POOL_APP_NOT_ENABLED cluster warnings to log-ignorelist
to fix rados suite.
Fixes: https://tracker.ceph.com/issues/57097
Signed-off-by: Prashant D <pdhange@redhat.com>
qa/standalone/osd/divergent-prior.sh: Divergent test 3 with pg_autoscale_mode on pick divergent osd
Reviewed-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
When creating new pool, the current code pick the divergent osd by
the first pg out of pg dump pgs, that can be in "unknown" status
which means the up_primary = -1 and that will fail the test.
We need to wait unitl the first pg is active+clean
Fixes: https://tracker.ceph.com/issues/56034
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
1. Setting frequent scrub status updates, to compensate for the removal
of some 'send updates' in PR#50283.
2. Switching back to using the wpq scheduler, as otherwise the number of
concurrent recovery operations is below what the test expects.
Fixes: https://tracker.ceph.com/issues/61386
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Initialize standalone test for stretched clusters,
testing uneven weight warnings and != 2 buckets
warnings.
Added `wait_for_health_gone()` function in ceph-helpers.sh
this function allows us to wait for health condition to
disappear when doing standalone tests.
Signed-off-by: Kamoltat <ksirivad@redhat.com>
This is a follow-up to PR: https://github.com/ceph/ceph/pull/48703.
This commit also considers changes made ephemerally using either the
'daemon' or the 'tell' interfaces to override the built-in mClock
QoS parameters. In such a scenario, the ephemeral changes are removed
using the rm_val() method exposed by the config subsytem and logging
this information.
Other changes:
1. Add a standalone test to exercise the fix.
2. Add documentation note on the outcome of the attempt to modify
built-in profile defaults.
Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
The qa tests are not client I/O centric and mostly focus on triggering
recovery/backfills and monitor them for completion within a finite amount
of time. The same holds true for scrub operations.
Therefore, an mClock profile that optimizes background operations is a
better fit for qa related tests. The osd_mclock_profile is therefore
globally overriden to 'high_recovery_ops' profile for the Rados suite as
it fits the requirement.
Also, many standalone tests expect recovery and scrub operations to
complete within a finite time. To ensure this, the osd_mclock_profile
options is set to 'high_recovery_ops' as part of the run_osd() function
in ceph-helpers.sh.
A subset of standalone tests explicitly used 'high_recovery_ops' profile.
Since the profile is now set as part of run_osd(), the earlier overrides
are redundant and therefore removed from the tests.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Let's use the middle profile as the default.
Modify the standalone tests accordingly.
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Fix an issue where an overridden mClock recovery setting (set prior to
an osd restart) could be lost after an osd restart.
For e.g., consider that prior to an osd restart, the option
'osd_max_backfill' was successfully set to a value different from the
mClock default. If the osd was restarted for some reason, the
boot-up sequence was incorrectly resetting the backfill value to the
mclock default within the async local/remote reservers. This fix
ensures that no change is made if the current overriden value is
different from the mClock default.
Modify an existing standalone test to verify that the local and remote
async reservers are updated to the desired number of backfills under
normal conditions and also across osd restarts.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
The mClock scheduler's cost model for HDDs/SSDs is modified and now
represents the cost of an IO in terms of bytes.
The cost parameters, namely, osd_mclock_cost_per_io_usec_[hdd|ssd]
and osd_mclock_cost_per_byte_usec_[hdd|ssd] which represent the cost
of an IO in secs are inaccurate and therefore removed.
The new model considers the following aspects of an osd to calculate
the cost of an IO:
- osd_mclock_max_capacity_iops_[hdd|ssd] (existing option)
The measured random write IOPS at 4 KiB block size. This is
measured during OSD boot-up using OSD bench tool.
- osd_mclock_max_sequential_bandwidth_[hdd|ssd] (new config option)
The maximum sequential bandwidth of of the underlying device.
For HDDs, 150 MiB/s is considered, and for SSDs 750 MiB/s is
considered in the cost calculation.
The following important changes are made to arrive at the overall
cost of an IO,
1. Represent QoS reservation and limit config parameter as proportion:
The reservation and limit parameters are now set in terms of a
proportion of the OSD's max IOPS capacity. The earlier representation
was in terms of IOPS per OSD shard which required the user to perform
calculations before setting the parameter. Representing the
reservation and limit in terms of proportions is much more intuitive
and simpler for a user.
2. Cost per IO Calculation:
Using the above config options, osd_bandwidth_cost_per_io for the osd is
calculated and set. It is the ratio of the max sequential bandwidth and
the max random write iops of the osd. It is a constant and represents the
base cost of an IO in terms of bytes. This is added to the actual size of
the IO(in bytes) to represent the overall cost of the IO operation.See
mClockScheduler::calc_scaled_cost().
3. Cost calculation in Bytes:
The settings for reservation and limit in terms a fraction of the OSD's
maximum IOPS capacity is converted to Bytes/sec before updating the
mClock server's ClientInfo structure. This is done for each OSD op shard
using osd_bandwidth_capacity_per_shard shown below:
(res|lim) = (IOPS proportion) * osd_bandwidth_capacity_per_shard
(Bytes/sec) (unitless) (bytes/sec)
The above result is updated within the mClock server's ClientInfo
structure for different op_scheduler_class operations. See
mClockScheduler::ClientRegistry::update_from_config().
The overall cost of an IO operation (in secs) is finally determined
during the tag calculations performed in the mClock server. See
crimson::dmclock::RequestTag::tag_calc() for more details.
4. Profile Allocations:
Optimize mClock profile allocations due to the change in the cost model
and lower recovery cost.
5. Modify standalone tests to reflect the change in the QoS config
parameter representation of reservation and limit options.
Fixes: https://tracker.ceph.com/issues/58529
Fixes: https://tracker.ceph.com/issues/59080
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Separate `mon-stretch` from `mon`.
Renamed `mon-stretched-cluster.sh` to
`mon-stretch-fail-recovery.sh`.
This isolation of stretch cluster test will enable
developers to get results faster for stretch-cluster
related stuff.
Signed-off-by: Kamoltat <ksirivad@redhat.com>
1e44d86b2 swapped this to a pg tell command which doesn't actually
need the primary specified. Drop the now unnecessary lookup.
Signed-off-by: Samuel Just <sjust@redhat.com>
osd/PeeringState: Add logs around can_serve_replica_read() / last_complete_ondisk()
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Ronen Friedman <rfriedma@redhat.com>
The test performs shallow scrubs, intentionally using small chunk
sizes to allow dump commands time to check specific details.
Following commit ffda64119f
(PR#44749), shallow scrubs chunks are controlled by a separate
configuration parameter. This PR fixes the test to use the
correct parameter.
An additional minor change is an adjustment to the test loop sleep time:
it is now reduced to guarantee that a dump followed by a counter
increase will be performed in more-or-less the scrubs frequency.
Fixes: https://tracker.ceph.com/issues/58797
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Generally, it's more portable not to rely on specific system files to be
readable. Specifically, container environments may not have an fstab.
Instead, just generate another random file.
Signed-off-by: Samuel Just <sjust@redhat.com>
Using the existing common default chunk size for scrubs that are
not deep scrubs is wasteful: a high ratio of inter-OSD messages
per chunk, while the actual OSD work per chunk is minimal.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
The QoS parameters (res, wgt, lim) of mClock profiles are not allowed to
be modified by users using commands like "config set" or via admin socket.
handle_conf_change() does not allow changes to any built-in mClock profile
at the mClock scheduler. But the config subsystem showed the change as
expected for the built-in mClock profile QoS parameters. This misled the
user into thinking that the change was made at the mClock server when
it was not the case.
The above issue is the result of the config "levels" used by the config
subsystem. The inital built-in QoS params are set at the CONF_DEFAULT
level. This allows the user to modify the built-in QoS params using
"config set" command which sets values at CONF_MON level which has higher
priority than CONF_DEFAULT level. The new value is persisted on the mon
store and therefore the config subsystem shows the change when "config
show" command is issued.
To prevent the above, this commit adds changes to restore the defaults set
for the built-in profiles by removing the new config changes from the MON
store. This results in the original defaults to come back into effect and
maintain a consistent view of the built-in profile across all levels.
To accomplish this, the mClock scheduler is provided with additional
information like the OSD id, shard id and a pointer to the MonClient
using which the Mon store command to remove the option is executed.
A standalone test is added to verify that built-in params cannot be
modified and the original profile params are retained.
Fixes: https://tracker.ceph.com/issues/57533
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Set osd_mclock_override_recovery_settings option to true for tests that
modify recovery/backfill configuration options. This prevents logging of
the cluster warning when modifying recovery/backfill limits.
Fixes: https://tracker.ceph.com/issues/57529
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
- Consolidate all mclock standalone tests under
qa/standalone/misc/mclock-config.sh.
- Revert existing tests in ceph-helpers.sh that verified the earlier hard
override of recovery/backfill limits.
- Add new tests to verify the procedure to change the recovery/backfill
limits with mclock scheduler.
Fixes: https://tracker.ceph.com/issues/57529
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Problem:
--mon-initial-members does nothing but causes monmap
to populate ``removed_ranks`` because the way we start
monitors in standalone tests uses ``run_mon $dir $id ..``
on each mon. Regardless of --mon-initial-members=a,b,c, if
we set --mon-host=$MONA,$MONB,$MONC (which we do every single tests),
everytime we run a monitor (e.g.,run mon.b) it will pre-build
our monmap with
```
noname-a=mon.noname-a addrs v2:127.0.0.1:7127/0,
b=mon.b addrs v2:127.0.0.1:7128/0,
noname-c=mon.noname-c addrs v2:127.0.0.1:7129/0,
```
Now, with --mon-initial-members=a,b,c we are letting
monmap know that we should have initial members name:
a,b,c, which we only have `b` as a match. So what
``MonMap::set_initial_members`` do is that it will
remove noname-a and noname-c which will
populate `removed_ranks`.
Solution:
remove all instances of --mon-initial-members
in the standalone test as it has no impact on
the nature of the tests themselves.
Fixes: https://tracker.ceph.com/issues/58132
Signed-off-by: Kamoltat <ksirivad@redhat.com>
This bases on two commits:
* 7bbc92eda3 and
* 6b22d47863 which seems to be
a fixup to former one.
In contrast to them, in `OSDMonitor::create_initial()` I updated
also `newmap.require_osd_release` to pacific when
`mon_debug_no_require_reef` and `mon_debug_no_require_quincy`.
Please take have an extra look on that during the review.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
The test (in the standalone/scrub suite) verifies that the scrubber
detects (and issues a cluster-log error) whenever a mapping entry
("SNA_") is missing in the SnapMapper DB.
Specifically, here the entry is corrupted - shortened as per
https://tracker.ceph.com/issues/56147.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Fix no-scrub & nodeep-scrub related code to match requirements:
- deep scrubs should be allowed to execute when no-scrub is set;
- some initiated scrubs (i.e. not periodic ones) might be changed
from the requested 'deep' to 'shallow'.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Create the initial mClock QoS params at CONF_DEFAULT level using
set_val_default(). This allows switching to a custom profile on a
running OSD and to make necessary changes to the desired QoS params.
Note that Switching to ‘custom’ profile and then subsequently changing
the QoS params using “config set osd.n …” will be at a higher level i.e.
at CONF_MON.
But When switching back to a built-in profile, the new values won’t take
effect since CONF_DEFAULT < CONF_MON. For the values to take effect, the
config keys created as part of the ‘custom’ profile must be removed from
the ConfigMonitor store after switching back to a built-in profile.
- Added a couple of standalone tests to exercise the scenario.
- Updated the mClock configuration document and the mClock internal
documentation with a couple of typos relating to the best effort weights.
- Added new sections to the mClock configuration document outlining the
steps to switch between the built-in and custom profile and vice-versa.
Fixes: https://tracker.ceph.com/issues/55153
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Modify test_activate_osd() to get the type of scheduler in use and then
verify the value of osd_max_backfills. This is because mclock scheduler
overrides this option to 1000 upon OSD initialization.
The test earlier used to pass because the OSD daemon was killed but not
marked down and upon being brought up, the wait for OSD up check was
passing quickly. But the OSD still didn't have the latest config values.
But now upon killing the OSD, the osd_fast_shutdown sequence notifies the
mon (see PR: https://github.com/ceph/ceph/pull/44807) and is marked down
and dead. Upon bringing it up, the wait for OSD up check takes a longer
time and this is sufficient for the config values to be updated. This
results in the correct values being read from the config 'Values' map.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Add the snaptrim duration to the json formatted output of the pg dump
stats. Define methods for a PG to set the snaptrim begin time and then to
calculate the total time spent to trim all the objects for the snaps in
the snap_trimq for the PG.
Tests:
- Librados C and C++ API tests to verify the time spent for a snaptrim
operation on a PG. These tests use the self-managed snaps APIs.
- Standalone tests to verify snaptrim duration using rados pool snaps.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Add a new column, OBJECTS_TRIMMED, to the pg dump stats that shows the
number of objects trimmed when a snap is removed.
When a pg splits, the stats from the parent pg is copied to the child
pg. In such a case, reset objects_trimmed to 0 for the child pg
(see PeeringState::split_into()). Otherwise, this will result in incorrect
stats to be shown for a child pg after the split operation.
Tests:
- Librados C and C++ API tests to verify the number of objects trimmed
during snaptrim operation. These tests use the self-managed snaps APIs.
- Standalone tests to verify objects trimmed using rados pool snaps.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Fix the expected log message to match the scrub code, by removing
the redundant part.
Fixes: https://tracker.ceph.com/issues/54458
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Update allocation file when we expand-device
Add the expended space to the allocator and then force an update to the allocation file
There is also a new standalone test case for expand
Fixes: https://tracker.ceph.com/issues/53699
Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>