Commit Graph

18 Commits

Author SHA1 Message Date
Kamoltat
f06da20dff pybind/mgr/progress: disable pg recovery event by default
The progress module disabled the pg recovery event by default
since the event is expensive and has interrupted other serviceis
when there is OSDs being marked in/out from the the cluster.

To turn the event on manually:

ceph config set mgr mgr/progress/allow_pg_recovery_event true

Updated qa/tasks/mgr/test_progress.py to enable
the pg recovery event when testing the progress module.

Signed-off-by: Kamoltat <ksirivad@redhat.com>
2022-02-03 17:51:42 +00:00
Kamoltat
5f33f2f6e0 mgr/test_progress.py: Delay recover in test_progress
Changes some the tests in teuthology to make
the test more deterministic.
Using:

`ceph osd set norecover` and
`ceph osd set nobackfill` when marking osds in
or out. As this will delay the recovery and make
sure it the test cases get the chance to check
that there is actually events poping up in
the progress module.

took out test_osd_cannot_recover from
tasks/mgr/test_progress.py since it is no longer
a relevant test case since recovery will get
triggered regardless if pg is unmoved.

Ignoring `OSDMAP_FLAGS` in teuthology
because we are using norecover and nobackfill
to delay the recovery process, therefore, it
will create a health warning and fails the
teuthology test.

Signed-off-by: Kamoltat <ksirivad@redhat.com>
2021-07-13 19:33:20 +00:00
Kamoltat
4b00f1c2bd pybind/mg/progress: Disregard unreported pgs
The global recovery event progress calculations only
takes into account pgs with `reported_epoch < start_epoch_of_event`
but sometimes the pgs doesn't get move before or after the creation
of the global recovery event, therefore this might result in a bug
where the global event gets stuck forever unless there is another
event that specifically makes the pgs that get stuck moves and updates
its `reported_epoch`.

Therefore, we decided to disregard pgs that are in active+clean state
but has `reported_epoch < start_epoch_of_event`.

Fixes: https://tracker.ceph.com/issues/49988

Signed-off-by: Kamoltat <ksirivad@redhat.com>
2021-06-09 15:11:32 +00:00
Sridhar Seshasayee
328271d587 qa/tasks: Enhance wait_until_true() to check & retry recovery progress
With mclock scheduler enabled, the recovery throughput is throttled based
on factors like the type of mclock profile enabled, the OSD capacity among
others. Due to this the recovery times may vary and therefore the existing
timeout of 120 secs may not be sufficient.

To address the above, a new method called _is_inprogress_or_complete() is
introduced in the TestProgress Class that checks if the event with the
specified 'id' is in progress by checking the 'progress' key of the
progress command response. This method also handles the corner case where
the event completes just before it's called.

The existing wait_until_true() method in the CephTestCase Class is
modified to accept another function argument called "check_fn". This is
set to the _is_inprogress_or_complete() function described earlier in the
"test_turn_off_module" test that has been observed to fail due to the
reasons already described above. A retry mechanism of a maximum of 5
attempts is introduced after the first timeout is hit. This means that
the wait can extend up to a maximum of 600 secs (120 secs * 5) as long as
there is recovery progress reported by the 'ceph progress' command result.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
2021-06-02 14:19:48 +05:30
Neha Ojha
1523bf9bdb
Merge pull request #38107 from ceph/wip-mgr-progress-fix-48217
qa/mgr/test_progress: add _get_osd_in_out_events to account for osd marked in/out events

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-11-18 07:15:22 -08:00
Kamoltat
52fe9dbdae qa/mgr/test_progress: fix bug 48217
Fixes a failing test case regarding osd coming back
after being marked out. The old test case wasn't accounting
for a specific event, therefore this resulted in the failure.
The fix basically accounts for a specific event of osd being
marked in/out.

Fixes: https://tracker.ceph.com/issues/48217

Signed-off-by: Kamoltat <ksirivad@redhat.com>
2020-11-17 07:54:35 +00:00
Kamoltat
993bb02b30 mgr/progress: introduce turn off/on feature
progress module can be turned off/on by using
the commands: 'progress off' and 'progress on'

As well as refractoring teuthology test suite
to prevent future bugs that can possibly occur

fixes: https://tracker.ceph.com/issues/47238

Signed-off-by: kamoltat <ksirivad@redhat.com>
2020-11-16 03:46:42 +00:00
Kamoltat
2af2afa5e9 mgr/progress: Global Recovery Event in ceph -s
Modified the progress module and BaseMgrModule to
support Global Recovert Event. Adding more arguments
to update_progress_event, ceph_update_progress_event.
To only show global recovery event progress with `ceph -s`.
All sub events have been move to `ceph progress`

Signed-off-by: Kamoltat <ksirivad@redhat.com>
2020-10-22 16:44:50 +00:00
Kefu Chai
7d37226548 qa/tasks/mgr: use relative import
for better readability, and to ease the pain of developer to track back
to the top level python package for referencing a submodule

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-03-27 14:51:24 +08:00
Kefu Chai
947a74349d qa: import with full path
to be py3 compatible

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-03-24 18:27:55 +08:00
Kefu Chai
7d262db114 qa/tasks: call super class's setUp()
to address the regression introduced by
8729281121

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-02-15 12:39:08 +08:00
Ricardo Dias
b03537949a
qa/mgr/progress: fix timeout error when waiting for osd in event
Fixes: https://tracker.ceph.com/issues/40618

Signed-off-by: Ricardo Dias <rdias@suse.com>
2019-09-03 11:44:05 +01:00
Kamoltat (Junior) Sirivadhna
baa714117c qa/tasks/mgr/test_progress.py: fix bug in 9b4dbf0
follow-up-fix for 9b4dbf0

basically we wanna look at the list that has inprogress events to inprogress+complete

Fixes: http://pulpito.ceph.com/kchai-2019-07-28_14:30:09-rados-wip-kefu2-testing-2019-07-28-1941-distro-basic-mira/4160881/

Signed-off-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
2019-08-05 11:03:33 -04:00
Kefu Chai
9b4dbf0749 qa/tasks/mgr/test_progress.py: s/ev/new_event/
as a follow-up fix for 5604ba4e

Fixes: http://tracker.ceph.com/issues/40618
Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-07-28 19:32:24 +08:00
Kamoltat (Junior) Sirivadhna
5604ba4ec1 qa/mgr/progress: Update the test suite for progress module
Update the test suite to reflect a feature
change that has been merged to master in
progress module where you create an event
when an osd is marked in.

Fixes: http://tracker.ceph.com/issues/40618
with success QA run in sepia:
http://pulpito.ceph.com/kchai-2019-07-16_10:10:01-rados-master-distro-basic-mira/

Signed-off-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
2019-07-22 11:09:42 -04:00
Patrick Donnelly
1071f73c76
qa: use skipTest method instead of exception
This is the recommended method to skip a test according to [1]. It also lets us
avoid an unnecessary import.

[1] https://docs.python.org/2/library/unittest.html#unittest.TestCase.skipTest

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2019-04-24 09:38:52 -07:00
Sage Weil
1d305f1264 mgr/progress: revise message syntax a bit
"osd.0", not "OSD 0"

Signed-off-by: Sage Weil <sage@redhat.com>
2019-02-08 13:50:27 -06:00
John Spray
5ecd69099d qa: add tests for progress module
Signed-off-by: John Spray <john.spray@redhat.com>
2018-09-11 11:21:35 +01:00