Change TEST_recovery_scrub_2 to create more objects and use
osd_recovery_sleep to prevent recovery from finihing before
we start to scrub. Verify that at least 1 scrub was started
while the pg was reovering.
Fixes: https://tracker.ceph.com/issues/49779
Signed-off-by: David Zafman <dzafman@redhat.com>
This reverts commit 1323bdb839.
The tests needs to scrub while recovery is in progress, so catching
recovery from the logs after the fact isn't the proper setup.
We can use osd_recovery_sleep config.
Signed-off-by: David Zafman <dzafman@redhat.com>
Given and initial (set of) osd(s), if provide up to N OSDs that can be
stopped together without making PGs become unavailable.
This can be used to quickly identify large(r) batches of OSDs that can be
stopped together to (for example) upgrade.
Signed-off-by: Sage Weil <sage@newdream.net>
in beb62c029a, FEATURE_QUINCY was added to
ceph::features::mon::get_persistent(), so update the test accordingly.
Signed-off-by: Kefu Chai <kchai@redhat.com>
The 'recovering' state is transitory. Existing code looks for it by
polling 'pg stat', missing from time to time.
New version searches the tails of the relevant OSDs' logs.
Fixes: https://tracker.ceph.com/issues/48719
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
While the relevant comment says:
'# Execute the command and prepend the output with its pid'
the actual PID logged is the same for all background processes,
which isn't very helpful.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Stop waiting for a scrub to happen if the Primary for the target
PG changes.
Fixes: https://tracker.ceph.com/issues/48720
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
in my test bed, it takes 11 seconds to boot the 3 OSDs and to restart
one of them, this fails the test.
so we need to take the time into consideration. in this change, the
delay is added to the total "warn_older_version_delay", so the monitor
does not start sending warning earlier than expected.
Signed-off-by: Kefu Chai <kchai@redhat.com>
in e5b1ae5554, a new option named
"debug_version_for_testing" is introduced to override the version so
we can test version check.
in crimson, we have two families of shared functions.
- one of them is used by alien store. they are compiled with
-DWITH_SEASTAR and -DWITH_ALIEN, to enable the shim code between
seastar and POSIX thread.
- another is used by crimson in general. where no lock is allowed.
currently, we use the "crimson" and "ceph" namespace to differentiate
these two families of functions, so they can colocate in the same
executable without violating the ODR. see src/include/common_fwd.h for
more details.
the functions defined in src/common/version.cc are also shared by
alien store and crimson code. and because we have different
implementations of `CephContext` in crimson and in classic OSD (i.e.
alienstore), we have to have different implementations of this function
as well, if we follow the same approach. but since these functions are
very simple and are non-blocking, there is not much value in
differentiating them, it is better to inject the test settings using
environment variable instead of using ceph option subsystem.
in this change, "ceph_debug_version_for_testing" environment variable is
checked instead, so that crimson and alienstore can share the same
compilation unit of version.cc. and "debug_version_for_testing" option
is removed.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Add test case for permitted hours to make sure scrub doesn't start
Remove permitted hours in extended sleep test
Fixes: https://tracker.ceph.com/issues/48077
Signed-off-by: David Zafman <dzafman@redhat.com>
While creating erasure-coded profile make sure
that user is specifying valid crush-failure-domain.
Fixes: https://tracker.ceph.com/issues/47452
Signed-off-by: Prashant Dhange <pdhange@redhat.com>
This overrides what the CephContext believes to be the current quorum of
monitors (retrieved from other instances of the MonClient), introduced
by [1]. Tests need to be able to target a specific monitor for
exercising forwarding and other things.
[1] 731e2db9fb
Fixes: https://tracker.ceph.com/issues/47180
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
The test should mark the OSD out to check if only "in" OSDs are considered by
the osdmap trimming logic.
Fixes: https://tracker.ceph.com/issues/47309
Signed-off-by: Neha Ojha <nojha@redhat.com>
we could pass `text=True` for better readability, but that's introduced
in python3.7, or pass `error="ignore"` but it's too long.
Signed-off-by: Kefu Chai <kchai@redhat.com>
no need to check for their existence, and prepare a replacement.
because we've migrated to python3. and we only support python3.6 and up.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Test that the osd doesn't crash when it gets a bad incremental osdmap.
Related-to: https://tracker.ceph.com/issues/46443
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>