Commit Graph

624 Commits

Author SHA1 Message Date
Ronen Friedman
ffda64119f osd/scrub: create a separate chunk size conf for shallow scrubs
Using the existing common default chunk size for scrubs that are
not deep scrubs is wasteful: a high ratio of inter-OSD messages
per chunk, while the actual OSD work per chunk is minimal.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2023-02-14 07:58:01 +02:00
Sridhar Seshasayee
b12780667f osd: Restore defaults of mClock built-in profiles upon modification
The QoS parameters (res, wgt, lim) of mClock profiles are not allowed to
be modified by users using commands like "config set" or via admin socket.
handle_conf_change() does not allow changes to any built-in mClock profile
at the mClock scheduler. But the config subsystem showed the change as
expected for the built-in mClock profile QoS parameters. This misled the
user into thinking that the change was made at the mClock server when
it was not the case.

The above issue is the result of the config "levels" used by the config
subsystem. The inital built-in QoS params are set at the CONF_DEFAULT
level. This allows the user to modify the built-in QoS params using
"config set" command which sets values at CONF_MON level which has higher
priority than CONF_DEFAULT level. The new value is persisted on the mon
store and therefore the config subsystem shows the change when "config
show" command is issued.

To prevent the above, this commit adds changes to restore the defaults set
for the built-in profiles by removing the new config changes from the MON
store. This results in the original defaults to come back into effect and
maintain a consistent view of the built-in profile across all levels.

To accomplish this, the mClock scheduler is provided with additional
information like the OSD id, shard id and a pointer to the MonClient
using which the Mon store command to remove the option is executed.

A standalone test is added to verify that built-in params cannot be
modified and the original profile params are retained.

Fixes: https://tracker.ceph.com/issues/57533
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2023-01-09 19:54:44 +05:30
Kamoltat Sirivadhna
4aa8af29ae
Merge pull request #48991 from kamoltat/wip-ksirivad-fix-bz-2121452
mon/Elector: Change how we handle removed_ranks and notify_rank_removed()
Reviewed-by: Gregory Farnum <gfarnum@redhat.com>
2022-12-15 17:07:38 -05:00
Sridhar Seshasayee
5b2fee21e8 qa: Allow tests to override recovery configs with mClock scheduler enabled
Set osd_mclock_override_recovery_settings option to true for tests that
modify recovery/backfill configuration options. This prevents logging of
the cluster warning when modifying recovery/backfill limits.

Fixes: https://tracker.ceph.com/issues/57529
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2022-12-12 18:12:46 +05:30
Sridhar Seshasayee
9c72116b1c qa/standalone: Add/Modify tests to verify mclock recovery/backfill limits
- Consolidate all mclock standalone tests under
  qa/standalone/misc/mclock-config.sh.
- Revert existing tests in ceph-helpers.sh that verified the earlier hard
  override of recovery/backfill limits.
- Add new tests to verify the procedure to change the recovery/backfill
  limits with mclock scheduler.

Fixes: https://tracker.ceph.com/issues/57529
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2022-12-12 12:43:50 +05:30
Kamoltat
4d8b0c29bf qa/standalone/mon: remove --mon-inital-members setting
Problem:

--mon-initial-members does nothing but causes monmap
to populate ``removed_ranks`` because the way we start
monitors in standalone tests uses ``run_mon $dir $id ..``
on each mon. Regardless of --mon-initial-members=a,b,c, if
we set --mon-host=$MONA,$MONB,$MONC (which we do every single tests),
everytime we run a monitor (e.g.,run mon.b) it will pre-build
our monmap with

```
noname-a=mon.noname-a addrs v2:127.0.0.1:7127/0,
b=mon.b addrs v2:127.0.0.1:7128/0,
noname-c=mon.noname-c addrs v2:127.0.0.1:7129/0,
```

Now, with --mon-initial-members=a,b,c we are letting
monmap know that we should have initial members name:
a,b,c, which we only have `b` as a match. So what
``MonMap::set_initial_members`` do is that it will
remove noname-a and noname-c which will
populate `removed_ranks`.

Solution:

remove all instances of --mon-initial-members
in the standalone test as it has no impact on
the nature of the tests themselves.

Fixes: https://tracker.ceph.com/issues/58132

Signed-off-by: Kamoltat <ksirivad@redhat.com>
2022-12-09 15:43:45 +00:00
Laura Flores
c1e6f7c470 qa/standalone/erasure-code: give osdmap 5 seconds to refresh
Fixes: https://tracker.ceph.com/issues/57883
Signed-off-by: Laura Flores <lflores@redhat.com>
2022-10-25 17:03:24 +00:00
Radoslaw Zarzynski
bf46d3736d
Merge pull request #47458 from rzarzynski/wip-all-kickoff-r
kickoff v18 reef

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Guillaume Abrioux <gabrioux@redhat.com>
Reviewed-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Laura Flores <lflores@redhat.com>
2022-10-04 22:39:19 +02:00
Radoslaw Zarzynski
130704e815 doc, qa/standalone/mon/misc: verify that len(monmap.features.persistent) == 10
Also updates the release checklist.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2022-10-04 00:27:28 +02:00
Yuri Weinstein
0d5e2e5dc9
Merge pull request #47340 from kamoltat/wip-ksirivad-recreate-zilla-2104207
mon/OSDMonitor: Added extra check before mon.go_recovery_stretch_mode()

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
2022-10-03 13:18:18 -07:00
Radoslaw Zarzynski
905540db14 doc, common, mon, qa: Mon-related updates for reef
This bases on two commits:
  * 7bbc92eda3 and
  * 6b22d47863 which seems to be
    a fixup to former one.

In contrast to them, in `OSDMonitor::create_initial()` I updated
also `newmap.require_osd_release` to pacific when
`mon_debug_no_require_reef` and `mon_debug_no_require_quincy`.
Please take have an extra look on that during the review.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2022-09-20 14:26:59 +00:00
Ronen Friedman
a85ef8e798 osd/scrub: modify SnapMapper.cc to use ceph::buffer::list
... systematically, over ceph::bufferlist.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2022-09-02 10:40:54 +03:00
Ronen Friedman
84d9c4d177 tests/osd: creating a Teuthology test re missing SnapMapper entries
The test (in the standalone/scrub suite) verifies that the scrubber
detects (and issues a cluster-log error) whenever a mapping entry
("SNA_") is missing in the SnapMapper DB.

Specifically, here the entry is corrupted - shortened as per
https://tracker.ceph.com/issues/56147.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2022-09-02 10:40:54 +03:00
Laura Flores
8ccd4e2533
Merge pull request #47046 from rzarzynski/wip-dup-trimming-test2
osd, tools, kv: non-aggressive, on-line trimming of accumulated dups
2022-08-26 14:07:44 -05:00
NitzanMordhai
c916f568aa standalone/osd: Test adjust with new trimming function
Change the number of dups trimmied according to the new loop.

Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
2022-08-24 08:19:18 +00:00
Kamoltat
62fe3cb8b9 qa/standalone/mon: init mon-stretched-cluster.sh
Added bug reproducer for
https://bugzilla.redhat.com/show_bug.cgi?id=2104207

Added more logs in MON.

Signed-off-by: Kamoltat <ksirivad@redhat.com>
2022-08-09 18:27:17 +00:00
Ronen Friedman
ba77676163 osd/scrub: modify scrub behaviour under no-scrub
Fix no-scrub & nodeep-scrub related code to match requirements:
- deep scrubs should be allowed to execute when no-scrub is set;
- some initiated scrubs (i.e. not periodic ones) might be changed
  from the requested 'deep' to 'shallow'.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2022-08-08 09:55:10 +00:00
Ronen Friedman
b04ef6ebfc tests/osd: fix a test to follow an output formatting change
PR#47255 modified the formatting of std::set-s. That broke
the osd-scrub-snaps.sh standalone test.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2022-08-04 15:46:11 +00:00
Yuri Weinstein
48d78184f0
Merge pull request #46561 from NitzanMordhai/wip-nitzan-add-pglog-dups-length
osd, mon: add pglog dups length

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2022-07-21 13:29:07 -07:00
Sridhar Seshasayee
e0b5316171 osd: Set initial mClock QoS params at CONF_DEFAULT level
Create the initial mClock QoS params at CONF_DEFAULT level using
set_val_default(). This allows switching to a custom profile on a
running OSD and to make necessary changes to the desired QoS params.
Note that Switching to ‘custom’ profile and then subsequently changing
the QoS params using “config set osd.n …” will be at a higher level i.e.
at CONF_MON.

But When switching back to a built-in profile, the new values won’t take
effect since CONF_DEFAULT < CONF_MON. For the values to take effect, the
config keys created as part of the ‘custom’ profile must be removed from
the ConfigMonitor store after switching back to a built-in profile.

- Added a couple of standalone tests to exercise the scenario.
- Updated the mClock configuration document and the mClock internal
  documentation with a couple of typos relating to the best effort weights.
- Added new sections to the mClock configuration document outlining the
  steps to switch between the built-in and custom profile and vice-versa.

Fixes: https://tracker.ceph.com/issues/55153
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2022-07-06 16:15:58 +05:30
NitzanMordhai
2aecf0d7b6 standalone/osd: Test log_dups_size output from pg query
Add to the current test of log_size the log_dups_size output test

Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
2022-06-30 08:10:29 +00:00
Sridhar Seshasayee
3aa2df2e0f qa/standalone: Fix test_activate_osd() test in ceph-helpers.sh
Modify test_activate_osd() to get the type of scheduler in use and then
verify the value of osd_max_backfills. This is because mclock scheduler
overrides this option to 1000 upon OSD initialization.

The test earlier used to pass because the OSD daemon was killed but not
marked down and upon being brought up, the wait for OSD up check was
passing quickly. But the OSD still didn't have the latest config values.

But now upon killing the OSD, the osd_fast_shutdown sequence notifies the
mon (see PR: https://github.com/ceph/ceph/pull/44807) and is marked down
and dead. Upon bringing it up, the wait for OSD up check takes a longer
time and this is sufficient for the config values to be updated. This
results in the correct values being read from the config 'Values' map.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2022-03-25 22:10:31 +05:30
Sridhar Seshasayee
a86ead953d osd: Add snaptrim duration to pg dump stats.
Add the snaptrim duration to the json formatted output of the pg dump
stats. Define methods for a PG to set the snaptrim begin time and then to
calculate the total time spent to trim all the objects for the snaps in
the snap_trimq for the PG.

Tests:
  - Librados C and C++ API tests to verify the time spent for a snaptrim
    operation on a PG. These tests use the self-managed snaps APIs.
  - Standalone tests to verify snaptrim duration using rados pool snaps.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2022-03-16 00:33:24 +05:30
Sridhar Seshasayee
00249dc0cc mon, osd: Add objects trimmed to pg dump stats.
Add a new column, OBJECTS_TRIMMED, to the pg dump stats that shows the
number of objects trimmed when a snap is removed.

When a pg splits, the stats from the parent pg is copied to the child
pg. In such a case, reset objects_trimmed to 0 for the child pg
(see PeeringState::split_into()). Otherwise, this will result in incorrect
stats to be shown for a child pg after the split operation.

Tests:
 - Librados C and C++ API tests to verify the number of objects trimmed
   during snaptrim operation. These tests use the self-managed snaps APIs.
 - Standalone tests to verify objects trimmed using rados pool snaps.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2022-03-16 00:30:56 +05:30
Ronen Friedman
d654839222 test: osd-scrub-snaps.sh: fix expected 'missing snaps' log string
Fix the expected log message to match the scrub code, by removing
the redundant part.

Fixes: https://tracker.ceph.com/issues/54458

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2022-03-03 08:03:00 +00:00
Yuri Weinstein
e419a29be5
Merge pull request #42735 from amathuria/wip-amathuria-scrub-stats
osd/scrub: Add stats to PG dump for number of objects scrubbed

Reviewed-by: Ronen Friedman <rfriedma@redhat.com>
2022-01-14 10:46:28 -08:00
Gabriel BenHanokh
a39b1f3cf7 tools/ceph-bluestore-tool: Fix bluefs-bdev-expand command
Update allocation file when we expand-device
Add the expended space to the allocator and then force an update to the allocation file

There is also a new standalone test case for expand

Fixes: https://tracker.ceph.com/issues/53699
Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
2022-01-12 18:07:59 +02:00
Aishwarya Mathuria
91885f1a87 qa/standalone: add test to check if objects_scrubbed is equal to number of objects in a PG once a scrub finishes
Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
2022-01-12 14:57:40 +05:30
Yuri Weinstein
58faf5712e
Merge pull request #43919 from ronen-fr/wip-rf-test-nodeep
osd/scrub (& qa/standalone): test for scrub behavior when no-scrub is set but no-deep-scrub is not


Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Matan Breizman <Matan.Brz@gmail.com>
2021-12-08 13:04:57 -08:00
Neha Ojha
89d5b2a79e
Merge pull request #43336 from ifed01/wip-fix-bluefs-volumes-ops
qa/osd-bluefs-volume-ops: fix bluefs volumes ops test case

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
2021-12-02 08:39:41 -08:00
Ronen Friedman
20dd022715 qa/standalone: osd-scrub-repair.sh: fix expected "not scrubbed since" warnings count
Following PR#43244, the 'ceph tell pg deep_scrub' now sets both
deep-scrub and "regular" scrub time-stamps. This necessitated a modification
to TEST_scrub_warning, as more PGs in this test are late for their regular scrubbing.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2021-11-23 16:43:21 +00:00
Ronen Friedman
7008b85fc5 qa/standalone: test for scrub behavior when noscrub is set but nodeepscrub is not
A bug (https://tracker.ceph.com/issues/52901 - now fixed) resulted in
this combination of conditions leaving the PG in "scrubbing" state
forever. That bug was fixed by PR#43521. The patch here adds a
test to detect the (now fixed) wrong behavior.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2021-11-23 15:08:42 +00:00
Ronen Friedman
9dda986bd5 qa/standalone: fix scrub/osd-scrub-dump following changes to 'pg dump pgs' output
Make osd-scrub-dump test ignore the 'scrubbing' that might be late to disappear
from the modified (PR #43403) 'pg dump' output.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2021-11-12 18:43:41 +00:00
Ronen Friedman
10909c3cba osd/scrub: update the stand-alone tests to check 'scrub scheduling' entries
Analyzing and verifying the relevant entries in 'pg query' and
'pg dump' output.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2021-11-05 17:07:57 +02:00
Igor Fedotov
efb67445c2 qa/osd-bluefs-volume-ops: retry data writing if spillover hasn't
happened.

Fixes: https://tracker.ceph.com/issues/52676
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2021-11-02 17:26:39 +03:00
Ronen Friedman
52e9fa16ef tests: modify osd-scrub-repair to match PR #43239 changes
PR #43239 has modified ECBackend::get_hash_info() behavior.
Modified the standalone scrub test to match.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2021-10-20 06:42:51 +00:00
Zack Cerza
b57539dc94 Revert "qa: support isal ec test for aarch64"
This commit has been causing scheduled jobs to request e.g. aarch64
smithi machines, which don't exist. The dispatcher then tries to find them forever, requiring the dispatcher to be killed and restarted. The queue
will sit idle until someone notices the problem.

Signed-off-by: Zack Cerza <zack@redhat.com>
2021-10-12 12:53:58 -06:00
Dai Zhiwei
eaa385f3da qa: support isal ec test for aarch64
modified:   qa/standalone/erasure-code/test-erasure-code-plugins.sh
	new file:   qa/suites/rados/thrash-erasure-code-isa/arch/aarch64.yaml

Signed-off-by: Dai Zhiwei <daizhiwei3@huawei.com>
2021-10-08 14:37:25 +08:00
Aishwarya Mathuria
1b4e416f81 osd/scrub: Add scrub duration to pg dump stats
Addition of a new column, SCRUB_DURATION, to the pg stats that stores the time taken for a PG scrub.

Fixes: https://tracker.ceph.com/issues/52605
Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
2021-10-01 13:27:27 +05:30
Neha Ojha
e273418bbb
Merge pull request #42604 from sseshasa/wip-skip-osd-benchmark
osd: Add config option to skip running the osd benchmark during init and update documentation.

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-09-08 11:03:09 -07:00
Sridhar Seshasayee
f539bedc96 qa/standalone: Add standalone test to validate osd-mclock-skip-benchmark option
Add a standalone test - test_activate_osd_skip_benchmark() in ceph-helpers.sh
that exercises the osd-mclock-skip-benchmark option.

Fixes: https://tracker.ceph.com/issues/52025
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-09-01 14:19:03 +05:30
Igor Fedotov
0b0f8ef12f qa/osd-bluefs-volume-ops: reproduce bluefs migrate bug
Reproduces: https://tracker.ceph.com/issues/40434
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2021-08-31 16:23:22 +03:00
Sridhar Seshasayee
464e9ea6c0 qa/standalone/misc: ver-health.sh: Increase wait_for_health_string() timeout
Modified test cases:

1. ver-health.sh:
  a. TEST_check_version_health_1():
    To avoid intermittent timeouts observed in wait_for_health_string(),
    increase the wait time to 20 secs.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
33d2a2c93b qa/standalone/scrub: Force a subset of scrub tests to use "wpq" scheduler
The following tests in the test files mentioned below use the
"osd_scrub_sleep" option to introduce delays during scrubbing to help
determine scrubbing states, validate reservations during scrubbing etc..
This works when using the "wpq" scheduler.

But when the "mclock_scheduler" is enabled, the "osd_scrub_sleep" is
disabled and overridden to 0. This is done to delegate the scheduling of
the background scrubs to the "mclock_scheduler" based on the set QoS
parameters. Due to this, the checks to verify the scrub states,
reservations etc. fail since the window to check them is very short
due to scrubs completing very quickly. This affects a small subset of
scrub tests mentioned below,

1. osd-scrub-dump.sh -> TEST_recover_unexpected()
2. osd-scrub-repair.sh -> TEST_auto_repair_bluestore_tag()
3. osd-scrub-test.sh -> TEST_scrub_abort(), TEST_deep_scrub_abort()

Only for the above tests, until there's a reliable way to query scrub
states with "--osd-scrub-sleep" set to 0, the "osd_op_queue" config
option is set to "wpq".

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
f658ff3511 qa/standalone/erasure-code: Modify erasure-code tests for mclock scheduler
Modified test cases:

1. test-erasure-eio.sh:
  a. Test_ec_backfill_unfound():
    - Set osd_mclock_profile to high_recovery_ops profile.
    - Increase the wait for backfill_unfound timeout to 240 secs.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
bdf36cf045 qa/standalone/osd-backfill: Modify backfill tests for mclock scheduler
Modified test cases:

1. osd-backfill-prio.sh:
  Set osd_op_queue = wpq for all tests since the mclock doesn't
  consider recovery priority as part of its scheduling algorithm.

2. osd-backfill-space.sh:
  Set osd_mclock_profile to high_recovery_ops and increase the wait
  for backfills timeout to 1200 secs for the following tests:
  - TEST_backfill_test_simple()
  - TEST_backfill_test_multi()
  - TEST_backfill_test_sametarget()
  - TEST_backfill_multi_partial()
  - TEST_ec_backfill_simple()
  - TEST_ec_backfill_multi()
  - SKIP_TEST_ec_backfill_multi_partial()
  - SKIP_TEST_ec_backfill_multi_partial()

3. osd-backfill-stats:
  - TEST_backfill_ec_down_all_out():
   Set osd_mclock_profile to high_recovery_ops and increase the wait
   for recovery timeout to 240 secs.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
2c577040cb qa/standalone/osd: Modify osd tests for mclock scheduler
Modified test cases:
1. osd-recovery-prio.sh:
   Set osd_op_queue = wpq for all tests since mclock
   doesn't consider recovery priority as part of its
   scheduling algorithm.

2. osd-recovery-stats.sh:
   a. TEST_recovery_undersized():
     - Set osd_mclock_profile to high_recovery_ops profile.
     - Increase wait for recovery timeout to 300 secs.

3. osd-rep-recov-eio.sh:
   a. TEST_rep_backfill_unfound():
     - Set osd_mclock_profile to high_recovery_ops profile.
     - Increase wait for backfill_unfound to 360 secs.

4. repeer-on-acting-back.sh:
   a. TEST_repeer_on_down_act():
     - Set osd_mclock_profile to high_recovery_ops profile.
       (To improve the test duration)

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Sridhar Seshasayee
5a85a6a035 qa/standalone: Modify ceph-helpers.sh tests for mclock scheduler.
List of changes:

1. Remove the enforcement to use osd_op_queue=wpq when an osd is brought
   up in the following functions:
   - run_osd()
   - run_osd_filestore() and
   - activate_osd()

2. New functions:
   - get_op_scheduler() - Get the current osd_op_queue for an osd.

3. Modified test cases:
   - test_run_osd() - Add check for osd_max_backfill count.
     The mclock scheduler overrides the count to 1000.

4. New test cases:
   - test_activate_osd_after_mark_down()
   - test_get_op_scheduler()

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-30 18:16:00 +05:30
Neha Ojha
2c528248df
Merge pull request #42410 from ronen-fr/wip-ronenf-standalone-repair
qa/standalone: fixing the timings when waiting for deep-scrub to start

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-07-21 06:57:41 -07:00
Ronen Friedman
ed45acee34 qa/standalone: fixing the timings when waiting for deep-scrub to start
initiate_and_fetch_state() initiates a scrub, then polls the published
PG state looking for 'scrubbing'. Calling flush_pg_stats() as part of
the polling process might cause the scrub and the following recovery to
be missed altogether.

Note: this polling mechanism is definitely not robust. Will be
redesigned in the future.

Fixes: https://tracker.ceph.com/issues/51581
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2021-07-20 08:57:37 +03:00