Commit Graph

352 Commits

Author SHA1 Message Date
David Zafman
590b4138ae
Merge pull request #28302 from dzafman/wip-40078
test: Make sure that extra scheduled scrubs don't confuse test

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-06-05 14:43:30 -07:00
Kefu Chai
cdba0f1420 qa/standalone/ceph-helpers: resurrect all OSD before waiting for health
address the regression introduced by e62cfceb
in e62cfceb, we wanted to test the newly introduced TOO_FEW_OSDS
warning, so we increased the number of OSD to the size of pool, so if
the number of OSD is less than pool size, monitor will send a warning
message.

but we need to bring all OSDs back if we are expecting a healthy
cluster. in this change, all OSDs are resurrect before
`wait_for_health_ok`.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-05-30 23:52:36 +08:00
Kefu Chai
f6b022bdbe
Merge pull request #27806 from ashitakasam/add-osd-alarm
osd: Better error message when OSD count is less than osd_pool_default_size

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-05-30 21:28:54 +08:00
David Zafman
893d227c82 test: Make sure that extra scheduled scrubs don't confuse test
Fixes: http://tracker.ceph.com/issues/40078

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-05-29 14:03:57 -07:00
David Zafman
7959159e83 test: Adding standalone test of log copy handling
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-05-10 15:31:51 -07:00
zjh
e62cfceb95 qa/standalone: remove osd_pool_default_size in test_wait_for_health_ok
Signed-off-by: zjh <jhzeng93@foxmail.com>
2019-05-06 14:35:54 +08:00
Samuel Just
5ea5c47152 test-erasure-eio: first eio may be fixed during recovery
The changes to the way EC/ReplicatedBackend communicate read
t showerrors had a side effect of making first eio on the object in
TEST_rados_get_subread_eio_shard_[01] repair itself depending
on the timing of the killed osd recovering.  The test should
be improved to actually test that behavior at some point.

Signed-off-by: Samuel Just <sjust@redhat.com>
2019-05-01 11:22:28 -07:00
sjust@redhat.com
252d5c20cf osd/: move stat updates and publishing to PeeringState
Signed-off-by: Samuel Just <sjust@redhat.com>
2019-05-01 11:22:24 -07:00
David Zafman
66b041fa4a
Merge pull request #27769 from dzafman/wip-39333
osd-backfill-space.sh test failed in TEST_backfill_multi_partial()

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-26 11:55:04 -07:00
David Zafman
9931023457 test: osd-backfill-spsace.sh doesn't matter which PG wins the race
Fixes: http://tracker.ceph.com/issues/39333

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-26 10:11:00 -07:00
David Zafman
39cc14bdc1
Merge pull request #27503 from dzafman/wip-39099
osd: Give recovery for inactive PGs a higher priority

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-25 15:06:56 -07:00
David Zafman
71d254647a test: osd-recovery-scrub.sh ignore error from kill_daemons()
Another work around for http://tracker.ceph.com/issues/38195

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-25 13:53:27 -07:00
David Zafman
71d82dbeb9 test: Add tests for pool recovery priority conversion
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-25 13:53:27 -07:00
David Zafman
444aa9f9fe osd, mon: New pool recovery priority range -10 to 10
Use OSD_POOL_PRIORITY_MAX and OSD_POOL_PRIORITY_MIN constants
Scale legacy priorities if exceeds maximum

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-25 13:53:27 -07:00
David Zafman
3a234164d0
Merge pull request #27279 from dzafman/wip-divergent
Improvements to standalone tests

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-24 10:58:11 -07:00
Sage Weil
a3a4af3454 Merge PR #27656 into master
* refs/pull/27656/head:
	doc/dev/erasure-coded-pool: update
	doc/rados/operations/erasure-code*: update default ec profile references
	common/options: change default erasure-code-profile to k=2 m=2

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-24 08:14:55 -05:00
David Zafman
7e77898001 test: Divergent testing of _merge_object_divergent_entries() cases
Case 1: A more recent update exists
Case 2: The first entry in the divergent sequence is a create
Case 3  NOT TESTED - Ohject currently missing
Case 4: We can rollback all of the entries
Case 5: We cannot rollback at least 1 of the entries

Support starting OSDs even when "noup" is set (don't wait for up).
Move create_ec_pool() to ceph-helpers.sh

Fixes: https://tracker.ceph.com/issues/39162

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-22 18:50:24 -07:00
Sage Weil
755e8c4ef2 Merge PR #27595 into master
* refs/pull/27595/head:
	osd: add 'ceph osd stop <osd.nnn>' command

Reviewed-by: Sage Weil <sage@redhat.com>
2019-04-20 08:52:01 -05:00
Sage Weil
3e86be7d50 common/options: change default erasure-code-profile to k=2 m=2
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-19 16:47:57 -05:00
xie xingguo
5dbae13ce0 osd: add 'ceph osd stop <osd.nnn>' command
stop command can be used to force stopping a specified osd daemon, e.g.,
you don't have to pre-figure out where it located.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-04-18 13:55:02 +08:00
David Zafman
96861a8116 ceph-objectstore-tool: Rename dump-import to dump-export
If user specifies dump-import it will still work, but isn't
in the usage that way.

Fixes: http://tracker.ceph.com/issues/39284

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-12 13:17:45 -07:00
Sage Weil
dc97651cbd Merge PR #27499 into master
* refs/pull/27499/head:
	qa/standalone/osd/osd-markdown: fix dup command disabling

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-12 06:54:58 -05:00
Sage Weil
f7216d0b2c qa/standalone/osd/osd-markdown: fix dup command disabling
The ceph cli tool checks for the presence of the variable, not its value.

Fixes: http://tracker.ceph.com/issues/38359
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-10 16:44:38 -05:00
David Zafman
69fa515c95 test: Make most tests use default objectstore bluestore
Change run_osd() to default objectstore bluestore
Use run_osd_filestore() to use the non-default objectstore
Fix inject_eio to handle any objectstore if config prefixed with type

Remaining tests using filestore:
	osd-pool-create.sh TEST_pool_create_rep_expected_num_objects
		Test filestore directory creation
	qa/standalone/osd/osd-dup.sh TEST_filestore_to_bluestore
		Obvious
	qa/standalone/osd/osd-rep-recov-eio.sh TEST_rep_read_unfound
		Requires data digest in object info
	qa/standalone/scrub/osd-scrub-repair.sh multiple tests
		Erasure code pools append mode for filestore is tested
	qa/standalone/special/ceph_objectstore_tool.py
		Test code verifies COT by directly examining filestore contents

Fixes: https://tracker.ceph.com/issues/39162

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-10 08:55:04 -07:00
Kefu Chai
3805935ae0
Merge pull request #26806 from xiexingguo/wip-repair-eio-rep
osd: automatically repair replicated replica on pulling error

Reviewed-by: David Zafman <dzafman@redhat.com>
2019-04-08 19:46:36 +08:00
xie xingguo
6a8aedc107 qa: add new test case for pulling error
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-04-04 11:04:43 +08:00
David Zafman
11f072fee1 Add checking of num_shards_repaired in osd stats
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-04 11:04:42 +08:00
Sage Weil
3c9db396ae Merge PR #27141 into master
* refs/pull/27141/head:
	mon/OSDMonitor: fix osd boot feature vs require_osd_release check
	include/ceph_features: retire 7 other old features
	include/ceph_features: retire ERASURE_CODE_PLUGINS_V2
	include/ceph_features: retire OSD_ERASURE_CODES
	include/ceph_features: update comment to align with N+2 upgrades
	include/ceph_features: adjust whitespace for retired and now usable features
	mon: remove check for jewel mons
	mds/FSMap: remove support for encoding jewel FSMap
	include/ceph_features: enable SERVER_OCTOPUS
	test/cli/osdmaptool/feature-set-unset-list: add octopus to output
	test/cli/osdmaptool/feature-set-unset-list: change unknown feature bit
	qa/releases/octopus.yaml: add octopus upgrade final step
	osd/OSDMap: octopus encoding features
	mon/OSDMonitor: add mon_debug_no_require_octopus
	mon/OSDMonitor: allow 'osd require-osd-release octopus'
	mon: add ondisk incompat octopus feature
	mon/mon_types: add mon feature for octopus
	include/ceph_features: SERVER_O -> SERVER_OCTOPUS

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-03 14:59:03 -05:00
Sage Weil
d667228c2e Merge PR #27146 into master
* refs/pull/27146/head:
	mon/MonMap: add min_quorum_size() helper
	mon/MDSMonitor: add 'mds ok-to-stop' command
	mon: add 'mon ok-to-{stop,add-offline,rm}' commands

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-04-03 13:49:19 -05:00
Sage Weil
3760e8f918 mon/OSDMonitor: add mon_debug_no_require_octopus
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-02 16:19:43 -05:00
Sage Weil
aa33a26e32 mon/MDSMonitor: add 'mds ok-to-stop' command
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-01 14:58:50 -05:00
Sage Weil
fbfa772047 mon/mon_types: add mon feature for octopus
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-01 11:26:33 -05:00
Sage Weil
cfba0acc01 mon: add 'mon ok-to-{stop,add-offline,rm}' commands
Helpers to decide when it is safe to stop a mon, add a mon that is
not started, or remove a mon.  (Adding and start a mon would always
be safe, but it takes time to sync, so it's not really possible to do
quickly.)

Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-01 11:05:52 -05:00
Sage Weil
420edba243 Merge PR #27169 into master
* refs/pull/27169/head:
	common/config: parse --default-$option as a default value

Reviewed-by: Sébastien Han <seb@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-03-27 09:48:33 -05:00
Sage Weil
fdd2000631 common/config: parse --default-$option as a default value
Sometimes it is useful to specify an alternative default value for an
option via the command line such that it has a lower priority than the
mon config database, config file, the rest of the command line, or the
environment.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-26 11:00:27 -05:00
David Zafman
57abdb11fa osd, test: Add num_shards_repaired to osd_stat_t for pushes with repair set 3(3)
Fixes: http://tracker.ceph.com/issues/38616

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-03-25 16:03:36 -07:00
David Zafman
d2ca3d2feb osd: Track num_objects_repaired in pg stats 2(3)
Leave repair pg state on until recovery finishes or a new scrub starts

Fixes: http://tracker.ceph.com/issues/38616

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-03-25 16:03:36 -07:00
David Zafman
2202e5d0b1 test, osd: Improvements to auto_repair 1(3)
Allow auto_repair for replicated bluestore pools
Regular scrub within auto repair parameters will trigger deep scrub
New state failed_repair if PG repair attempt could not fix everything
Set failed_repair if not possible to repair anything

Fixes: http://tracker.ceph.com/issues/38616

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-03-23 09:52:40 -07:00
David Zafman
315d324889 test: osd-scrub-repair.sh: use corrupt_and_repair_lrc for lrc tests
Fix for argument handling of create_ec_pool()
Always pass a value for allow_overwrites for consistency

Caused by: 3ca750d41d

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-03-23 09:52:40 -07:00
Sage Weil
be1187575b Merge PR #27021 into master
* refs/pull/27021/head:
	msg: remove XioMessenger
	qa/suites/rados/thrash-old-clients: add nautilus
	qa/suites/rados/thrash-old-clients: add mimic v1 variant
	qa/suites/rados/thrash-old-clients: add mimic
	qa/suites/rados/thrash-old-clients: collapse msgr and client choice
	qa: remove simplemessenger tests
	ceph_test_msgr: remove simple
	msg: remove SimpleMessenger

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Reviewed-by: Matt Benjamin <mbenjami@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-03-22 04:42:30 -05:00
Kefu Chai
f2b3bfa3aa
Merge pull request #26955 from liewegas/wip-slow-add
crush: various fixes for weight-sets, the osd_crush_update_weight_set option, and tests

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-03-22 15:42:13 +08:00
Sage Weil
28b4392a71 qa: remove simplemessenger tests
Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-20 06:10:25 -05:00
Sage Weil
4c741c109d qa/standalone/crush/crush-choose-args: add weight-set tests
Verify we have the expected behavior for creates and moves that
maintain bucket summation, both with and without the
osd_crush_update_weight_set option enabled.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-20 04:57:51 -05:00
Sage Weil
f20c736e99 qa/standalone/crush/crush-choose-args: fix test
- Make the initial weight-set actually consistent (summing)
- Fix the intermediate state so that it reflects a correctly
  maintained summation.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-20 04:57:51 -05:00
Sage Weil
13d7c4f4ec Merge PR #26898 into nautilus
* refs/pull/26898/head:
	osd/PG: invalidate PG if merging with unexpected version
	osd,mon: include more pg merge metadata in pg_pool_t
	qa/standalone/osd/pg-split-merge.sh: reproduce pg merge problem with empty pgs
	osd: add osd_debug_no_{acting_change,purge_strays}

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-03-14 22:37:18 -05:00
Sage Weil
4bb4f7a891 Merge PR #26894 into nautilus
* refs/pull/26894/head:
	qa/standalone/erasure-code/test-erasure-code: adjust test to avoid m=0
	erasure-code: ensure m >= 1
	mon/OSDMonitor: set ec min_size to k + min(1, m - 1)

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-03-13 22:07:45 -05:00
Sage Weil
52d5797c3d qa/standalone/erasure-code/test-erasure-code: adjust test to avoid m=0
_DD is k=2 m=0, which we don't allow.  Switch it to cDD.

I confess I don't fully understand why this was _DD to begin with, but
I'm pretty sure mapping is there to control the order of results so that
it can be mapped to the CRUSH rule output sanely, and the coding portion
is not relevant to the test.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-13 12:46:50 -05:00
Sage Weil
fb915c4805 osd/PG: invalidate PG if merging with unexpected version
If the source or target PG version is 0'0, we may silently take the max
of the source and target and still leave the PG complete.  This
specifically can happen with an empty PG, as seen with bug 38655.  In
theory we could encounter one of the PGs with some other last_update
that doesn't match what we expect.  If that ever happens, make sure the
result is incomplete so that backfill can clean up.

Additionally check that the pool metadata for the last merge matches the
PGs at all.  This could mismatch if we have an osdmap gap and are forced
to do some merge without merge info at all... in which case we should
definitely invalidate: there should be newer copies of the PG(s), and we
have no idea whether the PGs we are merging are what we want.  If this is
some disaster recovery situation, an operator is always free to use
ceph-objectstore-tool to re-mark a PG complete (at their own peril!).

Fixes: http://tracker.ceph.com/issues/38655
Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-12 10:08:46 -05:00
David Zafman
51a45e796e qa/test-erasure-code.sh: Don't grep entire bluestore directory
Bluestore caused grep crash with "grep: memory exhausted" due to
size of "block" storage.

Fixes: http://tracker.ceph.com/issues/38678

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-03-11 18:47:29 -07:00
David Zafman
d4915ee503 qa: Don't create rbd pool because it creates an object
This also reverts commit 10b9626ea7.

Fixes: http://tracker.ceph.com/issues/38631

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-03-11 16:57:51 -07:00