ceph/qa/suites/rados
Laura Flores 40062676c2 qa/suites/rados/thrash-erasure-code-big/thrashers: add osd max backfills setting to mapgap and pggrow
All `rados/thrash-erasure-code-big` tests that die due to the “wait_for_recovery” timeout have one thing in common: They contain either `thrashers/pggrow` or `thrashers/mapgap`.

The difference between pggrow and mapgap vs. all other non-offending thrashers (default, careful, fastread, and morepggrow) is that they lack an override setting for `osd max backfills`. `osd max backfills` is the max number of backfill operations allowed to/from an OSD. The higher the number, the quicker the recovery. By default, this value is 1. On all of the non-offending thrashers (default, careful, fastread, and morepggrow), the default 1 value gets overridden in their .yaml files with a value > 1. This is not the case for pggrow and mapgap, however, as they lack an `osd max backfills` override setting.

The mclock op scheduler is known to override `osd max backfills` with a high value, but all of the thrash-erasure-code-big thrashers have their op queue set to “debug_random”, which chooses randomly between op queues (the debug_random op queue is set to override the default mclock_scheduler in qa/config/rados.yaml). So, coupled with the “debug_random” op queue, the low `osd max backfill` setting is causing some tests to time out in recovery.

WITHOUT `osd max backfills`, as they are now, “mapgap” and “pggrow” tests die due to timed-out recovery about 17/100 times, as seen here with a pggrow test: http://pulpito.front.sepia.ceph.com/lflores-2022-05-18_14:24:29-rados:thrash-erasure-code-big-master-distro-default-smithi/

WITH `osd max backfills` specified, as I have suggested in this PR, 99/100 tests passed, with one test failing for a different reason:
http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_22:40:27-rados:thrash-erasure-code-big-master-distro-default-smithi/

I also scheduled 145 tests WITH `osd max backfills` that are a mix of pggrow and mapgap thrashers. 144/145 tests passed, with one test failing for a different reason. http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_15:27:54-rados:thrash-erasure-code-big-master-distro-default-smithi/

Fixes: https://tracker.ceph.com/issues/51076
Signed-off-by: Laura Flores <lflores@redhat.com>
2022-05-19 18:29:00 -05:00
..
basic Merge pull request #38120 from kiizawa/wip-cls-remote-read 2021-04-12 16:42:52 +08:00
cephadm qa/suites/rados: reduce the number of cephadm tests 2022-01-21 23:38:53 +00:00
dashboard Merge pull request #43987 from rhcs-dashboard/53123-dashboard-nfs-cleanup 2021-11-19 20:40:41 +01:00
mgr qa: fix or add missing .qa links 2022-02-03 10:08:30 -05:00
monthrash Merge pull request #38120 from kiizawa/wip-cls-remote-read 2021-04-12 16:42:52 +08:00
multimon qa,pybind/mgr: allow disabling .mgr pool 2021-06-11 19:35:17 -07:00
objectstore qa: Use osd_op_queue=wpq for tests using filestore backend. 2021-09-02 18:15:54 +05:30
perf qa: fix or add missing .qa links 2022-02-03 10:08:30 -05:00
rest qa: log-whitelist -> log-ignorelist 2020-08-24 19:53:08 +00:00
singleton qa: Added workunit test for noautoscale flag 2021-12-22 21:42:28 +00:00
singleton-bluestore qa/suites: move RADOS tests to use new debug log objectstores 2021-03-03 14:47:59 -05:00
singleton-nomsgr qa/suites/rados: add crushdiff test 2021-08-27 17:45:40 +03:00
standalone qa/suites/rados/standalone: remove mon_election symlink 2021-05-07 00:42:53 +00:00
thrash qa: fix or add missing .qa links 2022-02-03 10:08:30 -05:00
thrash-erasure-code qa,pybind/mgr: allow disabling .mgr pool 2021-06-11 19:35:17 -07:00
thrash-erasure-code-big qa/suites/rados/thrash-erasure-code-big/thrashers: add osd max backfills setting to mapgap and pggrow 2022-05-19 18:29:00 -05:00
thrash-erasure-code-isa Revert "qa: support isal ec test for aarch64" 2021-10-12 12:53:58 -06:00
thrash-erasure-code-overwrites test: add a mon_election directory to the rados and upgrade suites 2020-07-08 04:26:03 +00:00
thrash-erasure-code-shec Merge pull request #39757 from aclamk/wip-qa-test-bluestore-reshard 2021-03-17 22:41:34 +08:00
thrash-old-clients qa/suites/rados/thrash-old-clients: remove centos_8.3_container_tools_3.0 2022-02-02 23:26:54 +00:00
upgrade qa/tests: changed simlink to upgrade/parallel only 2021-04-23 08:20:01 -07:00
valgrind-leaks qa: fix or add missing .qa links 2022-02-03 10:08:30 -05:00
verify Merge pull request #38120 from kiizawa/wip-cls-remote-read 2021-04-12 16:42:52 +08:00
.qa
rook qa/suites/rados: include rook test in rados 2021-05-20 12:41:52 -05:00