The 'recovering' state is transitory. Existing code looks for it by
polling 'pg stat', missing from time to time.
New version searches the tails of the relevant OSDs' logs.
Fixes: https://tracker.ceph.com/issues/48719
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
While the relevant comment says:
'# Execute the command and prepend the output with its pid'
the actual PID logged is the same for all background processes,
which isn't very helpful.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Stop waiting for a scrub to happen if the Primary for the target
PG changes.
Fixes: https://tracker.ceph.com/issues/48720
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
in my test bed, it takes 11 seconds to boot the 3 OSDs and to restart
one of them, this fails the test.
so we need to take the time into consideration. in this change, the
delay is added to the total "warn_older_version_delay", so the monitor
does not start sending warning earlier than expected.
Signed-off-by: Kefu Chai <kchai@redhat.com>
in e5b1ae5554, a new option named
"debug_version_for_testing" is introduced to override the version so
we can test version check.
in crimson, we have two families of shared functions.
- one of them is used by alien store. they are compiled with
-DWITH_SEASTAR and -DWITH_ALIEN, to enable the shim code between
seastar and POSIX thread.
- another is used by crimson in general. where no lock is allowed.
currently, we use the "crimson" and "ceph" namespace to differentiate
these two families of functions, so they can colocate in the same
executable without violating the ODR. see src/include/common_fwd.h for
more details.
the functions defined in src/common/version.cc are also shared by
alien store and crimson code. and because we have different
implementations of `CephContext` in crimson and in classic OSD (i.e.
alienstore), we have to have different implementations of this function
as well, if we follow the same approach. but since these functions are
very simple and are non-blocking, there is not much value in
differentiating them, it is better to inject the test settings using
environment variable instead of using ceph option subsystem.
in this change, "ceph_debug_version_for_testing" environment variable is
checked instead, so that crimson and alienstore can share the same
compilation unit of version.cc. and "debug_version_for_testing" option
is removed.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Add test case for permitted hours to make sure scrub doesn't start
Remove permitted hours in extended sleep test
Fixes: https://tracker.ceph.com/issues/48077
Signed-off-by: David Zafman <dzafman@redhat.com>
While creating erasure-coded profile make sure
that user is specifying valid crush-failure-domain.
Fixes: https://tracker.ceph.com/issues/47452
Signed-off-by: Prashant Dhange <pdhange@redhat.com>
This overrides what the CephContext believes to be the current quorum of
monitors (retrieved from other instances of the MonClient), introduced
by [1]. Tests need to be able to target a specific monitor for
exercising forwarding and other things.
[1] 731e2db9fb
Fixes: https://tracker.ceph.com/issues/47180
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
The test should mark the OSD out to check if only "in" OSDs are considered by
the osdmap trimming logic.
Fixes: https://tracker.ceph.com/issues/47309
Signed-off-by: Neha Ojha <nojha@redhat.com>
we could pass `text=True` for better readability, but that's introduced
in python3.7, or pass `error="ignore"` but it's too long.
Signed-off-by: Kefu Chai <kchai@redhat.com>
no need to check for their existence, and prepare a replacement.
because we've migrated to python3. and we only support python3.6 and up.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Test that the osd doesn't crash when it gets a bad incremental osdmap.
Related-to: https://tracker.ceph.com/issues/46443
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
I have absolutely no idea why it's counting features, but
apparently it is and bumping the value to 7 makes it pass.
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
Include test case
Configurable by setting mon_osd_warn_num_repaired (default 10)
Ignore new health warning with random eio injection test
Fixes: https://tracker.ceph.com/issues/41564
Signed-off-by: David Zafman <dzafman@redhat.com>