Commit Graph

559 Commits

Author SHA1 Message Date
David Zafman
eec821b6e5 test: osd-recovery-scrub.sh: Test fails if no scrubs happened for a recovering pg
Change TEST_recovery_scrub_2 to create more objects and use
osd_recovery_sleep to prevent recovery from finihing before
we start to scrub.  Verify that at least 1 scrub was started
while the pg was reovering.

Fixes: https://tracker.ceph.com/issues/49779

Signed-off-by: David Zafman <dzafman@redhat.com>
2021-03-14 16:19:46 -07:00
David Zafman
a4fd1d650e Revert "qa/standalone/scrub/osd-recovery-scrub: fix unnoticed recovery state"
This reverts commit 1323bdb839.

The tests needs to scrub while recovery is in progress, so catching
recovery from the logs after the fact isn't the proper setup.
We can use osd_recovery_sleep config.

Signed-off-by: David Zafman <dzafman@redhat.com>
2021-03-13 11:40:55 -08:00
David Zafman
dd63577ab3 test: Add test for scrub parallelism
Signed-off-by: David Zafman <dzafman@redhat.com>
2021-03-05 11:41:26 -08:00
Sage Weil
5e197a21e6 Merge PR #39455 into master
* refs/pull/39455/head:
	doc/man/8/ceph: document --max option
	src/test/osd/safe-to-destroy: adjust test
	ceph: print command output to stdout even on error
	mgr/DaemonServer: include details in 'osd ok-to-stop' output
	mgr: add --max <n> to 'osd ok-to-stop' command
	mgr: relax osd ok-to-stop condition on degraded pgs

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2021-02-27 10:15:27 -05:00
Sage Weil
33dee7d7bf crush/CrushWrapper: update shadow trees on update_item()
insert_item() already does this, but update_item did not.

Fixes: https://tracker.ceph.com/issues/48065
Signed-off-by: Sage Weil <sage@newdream.net>
2021-02-22 14:21:04 -06:00
Sage Weil
722f57dee1 mgr: add --max <n> to 'osd ok-to-stop' command
Given and initial (set of) osd(s), if provide up to N OSDs that can be
stopped together without making PGs become unavailable.

This can be used to quickly identify large(r) batches of OSDs that can be
stopped together to (for example) upgrade.

Signed-off-by: Sage Weil <sage@newdream.net>
2021-02-20 09:53:51 -05:00
Kefu Chai
8dc097ff46 qa/standalone/mon/misc: verify that len(monmap.features.persistent) == 9
in beb62c029a, FEATURE_QUINCY was added to
ceph::features::mon::get_persistent(), so update the test accordingly.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-01-30 22:45:20 +08:00
Sage Weil
7bbc92eda3 mon: updates for quincy
Signed-off-by: Sage Weil <sage@newdream.net>
2021-01-28 13:29:28 -06:00
Neha Ojha
5c11f40c12
Merge pull request #38856 from dzafman/wip-48789
test: Fix osd-scrub-scaps.sh to handle DB format change

Reviewed-by: Ronen Friedman <rfriedma@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2021-01-15 16:27:59 -08:00
Neha Ojha
6fc9166af4
Merge pull request #38726 from ronen-fr/wip-ronenf-48720
qa/standalone/scrub/osd-recovery-scrub: handle primary change when waiting for scrub

Reviewed-by: David Zafman <dzafman@redhat.com>
2021-01-15 13:46:30 -08:00
David Zafman
af9befb0f4 test: Fix osd-scrub-scaps.sh to handle DB format change
Caused by: f9c95fa7fc

Fixes: https://tracker.ceph.com/issues/48789

Signed-off-by: David Zafman <dzafman@redhat.com>
2021-01-15 10:35:30 -08:00
David Zafman
4814648155 test: osd-recovery-prio.sh replace sleep with wait for both PGs
recovering

fixes: https://tracker.ceph.com/issues/48842

Signed-off-by: David Zafman <dzafman@redhat.com>
2021-01-11 17:30:00 -08:00
Ronen Friedman
1323bdb839 qa/standalone/scrub/osd-recovery-scrub: fix unnoticed recovery state
The 'recovering' state is transitory. Existing code looks for it by
polling 'pg stat', missing from time to time.
New version searches the tails of the relevant OSDs' logs.

Fixes: https://tracker.ceph.com/issues/48719
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2021-01-04 13:29:41 +02:00
Ronen Friedman
bb848cfd90 qa/standalone/ceph-helpers.sh: log meaningful PIDs for run_in_background()
While the relevant comment says:
'# Execute the command and prepend the output with its pid'
the actual PID logged is the same for all background processes,
which isn't very helpful.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2020-12-28 10:47:02 +02:00
Ronen Friedman
445db7f171 qa/standalone/scrub/osd-recovery-scrub: handle a Primary change
Stop waiting for a scrub to happen if the Primary for the target
PG changes.

Fixes: https://tracker.ceph.com/issues/48720
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2020-12-28 10:42:41 +02:00
Ronen Friedman
dff7faaf3c qa/standalone/scrub/osd-scrub-snaps.sh: fix Python print syntax
Fixes: https://tracker.ceph.com/issues/48690
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2020-12-21 16:52:27 +02:00
Kefu Chai
694ed23e9d qa/standalone/misc/ver-health.sh: include the bootup-time
in my test bed, it takes 11 seconds to boot the 3 OSDs and to restart
one of them, this fails the test.

so we need to take the time into consideration. in this change, the
delay is added to the total "warn_older_version_delay", so the monitor
does not start sending warning earlier than expected.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-12-11 16:14:03 +08:00
Kefu Chai
4bcfa139ab mon/HealthMonitor: use timespan for mon_warn_older_version_delay
for better user experience

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-12-11 16:12:47 +08:00
Kefu Chai
1f5406a752 src/*: do not pass cct to ceph_version_to_str()
in e5b1ae5554, a new option named
"debug_version_for_testing" is introduced to override the version so
we can test version check.

in crimson, we have two families of shared functions.

- one of them is used by alien store. they are compiled with
  -DWITH_SEASTAR and -DWITH_ALIEN, to enable the shim code between
  seastar and POSIX thread.
- another is used by crimson in general. where no lock is allowed.

currently, we use the "crimson" and "ceph" namespace to differentiate
these two families of functions, so they can colocate in the same
executable without violating the ODR. see src/include/common_fwd.h for
more details.

the functions defined in src/common/version.cc are also shared by
alien store and crimson code. and because we have different
implementations of `CephContext` in crimson and in classic OSD (i.e.
alienstore), we have to have different implementations of this function
as well, if we follow the same approach. but since these functions are
very simple and are non-blocking, there is not much value in
differentiating them, it is better to inject the test settings using
environment variable instead of using ceph option subsystem.

in this change, "ceph_debug_version_for_testing" environment variable is
checked instead, so that crimson and alienstore can share the same
compilation unit of version.cc. and "debug_version_for_testing" option
is removed.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-12-10 18:26:39 +08:00
Ronen Friedman
43b1129030 test: cancelling both noscrub *and* nodeep-scrub
as part of osd-scrub-test.sh.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2020-12-09 20:16:23 +02:00
haoyixing
0e7e036aa7 doc/dev: use http://docs.ceph.com/en/latest/ instead of /docs/master/ for docs
Several links under http://docs.ceph.com/docs/master/ were unable to access.
Change them to http://docs.ceph.com/en/lastest so we can access them directly.

Signed-off-by: haoyixing <haoyixing@kuaishou.com>
2020-11-24 12:49:47 +08:00
David Zafman
89af82bf4f
Merge pull request #38054 from dzafman/wip-test-fixes
test: Fix osd-scrub-test.sh and ver-health.sh tests

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-11-18 08:52:28 -08:00
David Zafman
38c3130654 test: Fix TEST_scrub_extended_sleep test (corrected test name)
Didn't really test extended sleep in original code:
Cause by: 3bfb5c2621

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-11-16 18:30:14 -08:00
David Zafman
0a0ed890c2 test: Improve version checking test, to improve reliability
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-11-16 18:30:14 -08:00
Kefu Chai
0463a774c9
Merge pull request #37908 from dzafman/wip-47930
test: Fix race in TEST_recovery_scrub test

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-11-16 01:00:56 +08:00
David Zafman
870bde04a5 test: Changes based on code review comments
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-11-11 15:31:26 -08:00
David Zafman
93373746f5 osd test: Delay reporting until mon_warn_older_version_delay has passed
Move release notes description to 16.0.0 and update
Update documentation

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-11-11 15:10:11 -08:00
David Zafman
9d988c3dbc test: Simple test case for version health warning
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-11-11 15:10:11 -08:00
David Zafman
410e230d09 test: Fix race in TEST_recovery_scrub test
Fixes: https://tracker.ceph.com/issues/47930

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-11-10 00:45:13 +00:00
David Zafman
d3cc647583 osd: Eliminate day of weeek 7 and hour 24
Add test case for permitted hours to make sure scrub doesn't start
Remove permitted hours in extended sleep test

Fixes: https://tracker.ceph.com/issues/48077

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-11-09 22:47:00 +00:00
David Zafman
ef47a3e708 test: set mon_allow_pool_size_one for consistency with original test intention
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-11-03 21:49:00 +00:00
Neha Ojha
343107766e
Merge pull request #37483 from dzafman/wip-46405
osd/osd-rep-recov-eio.sh: TEST_rados_repair_warning:  return 1

Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-10-08 11:44:00 -07:00
David Zafman
3ba7ebd3e2 test: Avoid races by waiting for PGs go clean before query
Fixes: https://tracker.ceph.com/issues/46405

Signed-off-by: David Zafman <dzafman@redhat.com>
2020-10-01 19:43:57 +00:00
David Zafman
b20a277f05 test: Inconsequential change to get object names as desired
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-09-29 18:01:24 +00:00
Prashant D
f8b7fddc4c mon: validate crush-failure-domain
While creating erasure-coded profile make sure
that user is specifying valid crush-failure-domain.

Fixes: https://tracker.ceph.com/issues/47452

Signed-off-by: Prashant Dhange <pdhange@redhat.com>
2020-09-22 07:27:22 -04:00
Patrick Donnelly
7eceaf45de
Merge PR #37202 into master
* refs/pull/37202/head:
	mon: allow overriding the initial mon_host

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-09-18 18:54:57 -07:00
Neha Ojha
8ba0a61a51
Merge pull request #35906 from gregsfortytwo/wip-stretch-mode
Add a new stretch mode for 2-site Ceph clusters

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2020-09-18 14:31:45 -07:00
Patrick Donnelly
ed3782e60a
mon: allow overriding the initial mon_host
This overrides what the CephContext believes to be the current quorum of
monitors (retrieved from other instances of the MonClient), introduced
by [1]. Tests need to be able to target a specific monitor for
exercising forwarding and other things.

[1] 731e2db9fb
Fixes: https://tracker.ceph.com/issues/47180
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2020-09-16 18:34:23 -07:00
Greg Farnum
9506d09e3b Merge remote-tracking branch 'origin/master' into wip-stretch-mode
Conflicts:
	src/include/ceph_features.h

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
2020-09-15 02:25:07 +00:00
David Zafman
5b0ba0e5a8 test: Modify test to check new feature might_have_unfound added to list_unfound
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-09-14 13:06:29 -07:00
Greg Farnum
d02625331c Merge remote-tracking branch 'origin/master' into wip-stretch-mode 2020-09-14 02:32:19 +00:00
Kefu Chai
e5b9b08cc4
Merge pull request #36962 from tchaikov/wip-qa-py3-cleanup
qa: py3 cleanups

Reviewed-by: Neha Ojha <nojha@redhat.com>
2020-09-10 09:39:20 +08:00
Neha Ojha
21c08f0be2 qa/*/mon/mon-last-epoch-clean.sh: mark osd out instead of down
The test should mark the OSD out to check if only "in" OSDs are considered by
the osdmap trimming logic.

Fixes: https://tracker.ceph.com/issues/47309
Signed-off-by: Neha Ojha <nojha@redhat.com>
2020-09-04 22:09:05 +00:00
Kefu Chai
5c758f63aa qa/standalone: always decode output from check_output()
we could pass `text=True` for better readability, but that's introduced
in python3.7, or pass `error="ignore"` but it's too long.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-09-03 13:09:16 +08:00
Kefu Chai
eda90040ad qa: always use subprocess.{DEVNULL,check_output}
no need to check for their existence, and prepare a replacement.
because we've migrated to python3. and we only support python3.6 and up.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-09-03 13:09:16 +08:00
Kefu Chai
4f6443737e
Merge pull request #30838 from ifed01/wip-ifed-single-alloc
os/bluestore: use single allocator for shared bluestore/bluefs device

Reviewed-by: Sage Weil <sage@redhat.com>
2020-08-03 18:00:16 +08:00
Igor Fedotov
9a8f1ae492 os/bluestore: fix bluefs migrate/expand to match single allocator.
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2020-07-31 15:36:47 +03:00
Dan van der Ster
b550112dba qa/standalone/osd: add bad-inc-map.sh
Test that the osd doesn't crash when it gets a bad incremental osdmap.

Related-to: https://tracker.ceph.com/issues/46443
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
2020-07-28 23:15:42 +02:00
David Zafman
365e48d6ec test: Check for interuption of scrubs with nosrub/nodeep_scrub
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-07-24 11:41:20 -07:00
David Zafman
f272768802 test: mon-last-epoch-clean.sh fixed to avoid shell globbing
Signed-off-by: David Zafman <dzafman@redhat.com>
2020-07-24 11:40:24 -07:00