With mclock scheduler enabled, the recovery throughput is throttled based
on factors like the type of mclock profile enabled, the OSD capacity among
others. Due to this the recovery times may vary and therefore the existing
timeout of 120 secs may not be sufficient.
To address the above, a new method called _is_inprogress_or_complete() is
introduced in the TestProgress Class that checks if the event with the
specified 'id' is in progress by checking the 'progress' key of the
progress command response. This method also handles the corner case where
the event completes just before it's called.
The existing wait_until_true() method in the CephTestCase Class is
modified to accept another function argument called "check_fn". This is
set to the _is_inprogress_or_complete() function described earlier in the
"test_turn_off_module" test that has been observed to fail due to the
reasons already described above. A retry mechanism of a maximum of 5
attempts is introduced after the first timeout is hit. This means that
the wait can extend up to a maximum of 600 secs (120 secs * 5) as long as
there is recovery progress reported by the 'ceph progress' command result.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
* refs/pull/38443/head:
qa: set "shell" to False for run_ceph_w()
vstart_runner: make "shell" a default argument
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Setting shell to True in call to run() in LocalCephManager.run_ceph_w()
leads to a crash when self.subproc.communicate() is executed for the
process created by running "ceph -w".
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Also filter out client-id's starting with "mirror" when
cleaning leftover auth-ids since teuthology would be
configured to create client.mirror and client.mirror_remote
clients before executing mirroring tests.
Signed-off-by: Venky Shankar <vshankar@redhat.com>
To make this easier to catch. It is still a RuntimeError so it should
not affect current tests by default.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This provides a generic framework for modifying Ceph configuration
changes in tests through the monitors rather than the asok interface or
local ceph.conf changes. Any changes are reverted during test teardown.
A future patch will convert existing tests manipulating the local
ceph.conf or admin socket.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Some tests set this to a dynamic value, it'd be helpful to know how long
a test is planning to wait.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Move it up into CephTestCase so that mgr tests can
use it too, and pick it up in vstart_runner.py so
that these tests will work neatly there.
Signed-off-by: John Spray <john.spray@redhat.com>
With this change, we avoid the disabling/enabling of the ceph-mgr module
being tested for each test function declared in each test case. Now
the ceph-mgr module being tested is disabled/enabled only once for each
test case.
Signed-off-by: Ricardo Dias <rdias@suse.com>
Add support for testing recovery of CephFS metadata into an alternate
RADOS pool, useful as a disaster recovery mechanism that avoids
modifying the metadata in-place.
Signed-off-by: Douglas Fuller <dfuller@redhat.com>