Commit Graph

365 Commits

Author SHA1 Message Date
Sage Weil
ff7813aa14 qa/standalone/scrub/osd-scrub-snaps.sh: adjust expected output
SnapSet now dumps just seq, not a (fake) SnapContext.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-12 09:55:06 -05:00
Sage Weil
03b9c66080 ceph-objectstore-tool: fix use of SnapSet::snaps
Instead, use clone_snaps to identify clones.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-12 09:55:06 -05:00
Sage Weil
23eaf7c498 qa/standalone/scrub/osd-scrub-snaps: fix kv grep
SnapMapper keys are now SNA_, not MAP_.

Fixes: http://tracker.ceph.com/issues/40725
Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-12 08:11:21 -05:00
Sage Weil
b2eb5232de Merge PR #28901 into master
* refs/pull/28901/head:
	qa/standalone/scrub/osd-scrub-repair: fix 'scrub ok' grep
	osd/osd_types: remove 'snap_context' from SnapSet::dump()

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-07-08 08:36:05 -05:00
Sage Weil
a960f2faa7 qa/standalone/scrub/osd-scrub-repair: fix 'scrub ok' grep
The log now also has a 'purged_snaps scrub ok' message that (generally)
precedes the first scrubbed PG.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-04 18:27:37 -05:00
Sage Weil
70ad54a0b3 osd/osd_types: remove 'snap_context' from SnapSet::dump()
We no longer have a snaps field with real values, so dumping this as a
"snap_context" is silly.  Instead, just dump the seq.

Adjust qa/standalone/scrub/osd-scrub-repair.sh accordingly.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-04 18:24:41 -05:00
Sage Weil
71e5cba00b Merge PR #28867 into master
* refs/pull/28867/head:
	qa/standalone/ceph-helpers: more osd debug

Reviewed-by: David Zafman <dzafman@redhat.com>
2019-07-03 21:27:20 -05:00
David Zafman
fe3b693d0f
Merge pull request #28334 from dzafman/wip-40073
osd: Fix the way that auto repair triggers after regular scrub

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-07-03 15:27:27 -07:00
Sage Weil
0d0759531a qa/standalone/ceph-helpers: more osd debug
debug_ms=1
debug_monc=20

Hunting down http://tracker.ceph.com/issues/40666

Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-03 16:53:00 -05:00
David Zafman
27918bb906 osd: Handle scrub interval changes
Global changes reschedule all PG scrubs
Pool changes reschedule pool PG scrubs

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-06-27 14:20:54 -07:00
Neha Ojha
bd15824567
Merge pull request #28204 from dzafman/wip-39555
mon: Improve health status for backfill_toofull and recovery_toofull

Reviewed-by: Joao Eduardo Luis <joao@suse.de>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-06-20 11:12:10 -07:00
David Zafman
fa698e18e1 mon: Improve health status for backfill_toofull and recovery_toofull
Treat backfull_toofull as a warning condition because it can resolve itself.
Includes test case for PG_BACKFILL_FULL
Includes test case for recovery_toofull / PG_RECOVERY_FULL

Fixes: https://tracker.ceph.com/issues/39555

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-06-20 02:22:01 +00:00
xie xingguo
ec27a162de mgr, osd: 'ceph osd df' by pool
Our test admin has been asking for this for the past few years:-)
Besides, this is also useful for operating on large Ceph clusters with
mutliple storage pools possibly spanning over all osds.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-06-18 20:29:40 +08:00
David Zafman
590b4138ae
Merge pull request #28302 from dzafman/wip-40078
test: Make sure that extra scheduled scrubs don't confuse test

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-06-05 14:43:30 -07:00
Kefu Chai
cdba0f1420 qa/standalone/ceph-helpers: resurrect all OSD before waiting for health
address the regression introduced by e62cfceb
in e62cfceb, we wanted to test the newly introduced TOO_FEW_OSDS
warning, so we increased the number of OSD to the size of pool, so if
the number of OSD is less than pool size, monitor will send a warning
message.

but we need to bring all OSDs back if we are expecting a healthy
cluster. in this change, all OSDs are resurrect before
`wait_for_health_ok`.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2019-05-30 23:52:36 +08:00
Kefu Chai
f6b022bdbe
Merge pull request #27806 from ashitakasam/add-osd-alarm
osd: Better error message when OSD count is less than osd_pool_default_size

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-05-30 21:28:54 +08:00
David Zafman
893d227c82 test: Make sure that extra scheduled scrubs don't confuse test
Fixes: http://tracker.ceph.com/issues/40078

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-05-29 14:03:57 -07:00
David Zafman
7959159e83 test: Adding standalone test of log copy handling
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-05-10 15:31:51 -07:00
zjh
e62cfceb95 qa/standalone: remove osd_pool_default_size in test_wait_for_health_ok
Signed-off-by: zjh <jhzeng93@foxmail.com>
2019-05-06 14:35:54 +08:00
Samuel Just
5ea5c47152 test-erasure-eio: first eio may be fixed during recovery
The changes to the way EC/ReplicatedBackend communicate read
t showerrors had a side effect of making first eio on the object in
TEST_rados_get_subread_eio_shard_[01] repair itself depending
on the timing of the killed osd recovering.  The test should
be improved to actually test that behavior at some point.

Signed-off-by: Samuel Just <sjust@redhat.com>
2019-05-01 11:22:28 -07:00
sjust@redhat.com
252d5c20cf osd/: move stat updates and publishing to PeeringState
Signed-off-by: Samuel Just <sjust@redhat.com>
2019-05-01 11:22:24 -07:00
David Zafman
66b041fa4a
Merge pull request #27769 from dzafman/wip-39333
osd-backfill-space.sh test failed in TEST_backfill_multi_partial()

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-26 11:55:04 -07:00
David Zafman
9931023457 test: osd-backfill-spsace.sh doesn't matter which PG wins the race
Fixes: http://tracker.ceph.com/issues/39333

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-26 10:11:00 -07:00
David Zafman
39cc14bdc1
Merge pull request #27503 from dzafman/wip-39099
osd: Give recovery for inactive PGs a higher priority

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-25 15:06:56 -07:00
David Zafman
71d254647a test: osd-recovery-scrub.sh ignore error from kill_daemons()
Another work around for http://tracker.ceph.com/issues/38195

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-25 13:53:27 -07:00
David Zafman
71d82dbeb9 test: Add tests for pool recovery priority conversion
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-25 13:53:27 -07:00
David Zafman
444aa9f9fe osd, mon: New pool recovery priority range -10 to 10
Use OSD_POOL_PRIORITY_MAX and OSD_POOL_PRIORITY_MIN constants
Scale legacy priorities if exceeds maximum

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-25 13:53:27 -07:00
David Zafman
3a234164d0
Merge pull request #27279 from dzafman/wip-divergent
Improvements to standalone tests

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-24 10:58:11 -07:00
Sage Weil
a3a4af3454 Merge PR #27656 into master
* refs/pull/27656/head:
	doc/dev/erasure-coded-pool: update
	doc/rados/operations/erasure-code*: update default ec profile references
	common/options: change default erasure-code-profile to k=2 m=2

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-24 08:14:55 -05:00
David Zafman
7e77898001 test: Divergent testing of _merge_object_divergent_entries() cases
Case 1: A more recent update exists
Case 2: The first entry in the divergent sequence is a create
Case 3  NOT TESTED - Ohject currently missing
Case 4: We can rollback all of the entries
Case 5: We cannot rollback at least 1 of the entries

Support starting OSDs even when "noup" is set (don't wait for up).
Move create_ec_pool() to ceph-helpers.sh

Fixes: https://tracker.ceph.com/issues/39162

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-22 18:50:24 -07:00
Sage Weil
755e8c4ef2 Merge PR #27595 into master
* refs/pull/27595/head:
	osd: add 'ceph osd stop <osd.nnn>' command

Reviewed-by: Sage Weil <sage@redhat.com>
2019-04-20 08:52:01 -05:00
Sage Weil
3e86be7d50 common/options: change default erasure-code-profile to k=2 m=2
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-19 16:47:57 -05:00
xie xingguo
5dbae13ce0 osd: add 'ceph osd stop <osd.nnn>' command
stop command can be used to force stopping a specified osd daemon, e.g.,
you don't have to pre-figure out where it located.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-04-18 13:55:02 +08:00
David Zafman
96861a8116 ceph-objectstore-tool: Rename dump-import to dump-export
If user specifies dump-import it will still work, but isn't
in the usage that way.

Fixes: http://tracker.ceph.com/issues/39284

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-12 13:17:45 -07:00
Sage Weil
dc97651cbd Merge PR #27499 into master
* refs/pull/27499/head:
	qa/standalone/osd/osd-markdown: fix dup command disabling

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-12 06:54:58 -05:00
Sage Weil
f7216d0b2c qa/standalone/osd/osd-markdown: fix dup command disabling
The ceph cli tool checks for the presence of the variable, not its value.

Fixes: http://tracker.ceph.com/issues/38359
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-10 16:44:38 -05:00
David Zafman
69fa515c95 test: Make most tests use default objectstore bluestore
Change run_osd() to default objectstore bluestore
Use run_osd_filestore() to use the non-default objectstore
Fix inject_eio to handle any objectstore if config prefixed with type

Remaining tests using filestore:
	osd-pool-create.sh TEST_pool_create_rep_expected_num_objects
		Test filestore directory creation
	qa/standalone/osd/osd-dup.sh TEST_filestore_to_bluestore
		Obvious
	qa/standalone/osd/osd-rep-recov-eio.sh TEST_rep_read_unfound
		Requires data digest in object info
	qa/standalone/scrub/osd-scrub-repair.sh multiple tests
		Erasure code pools append mode for filestore is tested
	qa/standalone/special/ceph_objectstore_tool.py
		Test code verifies COT by directly examining filestore contents

Fixes: https://tracker.ceph.com/issues/39162

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-10 08:55:04 -07:00
Kefu Chai
3805935ae0
Merge pull request #26806 from xiexingguo/wip-repair-eio-rep
osd: automatically repair replicated replica on pulling error

Reviewed-by: David Zafman <dzafman@redhat.com>
2019-04-08 19:46:36 +08:00
xie xingguo
6a8aedc107 qa: add new test case for pulling error
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-04-04 11:04:43 +08:00
David Zafman
11f072fee1 Add checking of num_shards_repaired in osd stats
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-04 11:04:42 +08:00
Sage Weil
3c9db396ae Merge PR #27141 into master
* refs/pull/27141/head:
	mon/OSDMonitor: fix osd boot feature vs require_osd_release check
	include/ceph_features: retire 7 other old features
	include/ceph_features: retire ERASURE_CODE_PLUGINS_V2
	include/ceph_features: retire OSD_ERASURE_CODES
	include/ceph_features: update comment to align with N+2 upgrades
	include/ceph_features: adjust whitespace for retired and now usable features
	mon: remove check for jewel mons
	mds/FSMap: remove support for encoding jewel FSMap
	include/ceph_features: enable SERVER_OCTOPUS
	test/cli/osdmaptool/feature-set-unset-list: add octopus to output
	test/cli/osdmaptool/feature-set-unset-list: change unknown feature bit
	qa/releases/octopus.yaml: add octopus upgrade final step
	osd/OSDMap: octopus encoding features
	mon/OSDMonitor: add mon_debug_no_require_octopus
	mon/OSDMonitor: allow 'osd require-osd-release octopus'
	mon: add ondisk incompat octopus feature
	mon/mon_types: add mon feature for octopus
	include/ceph_features: SERVER_O -> SERVER_OCTOPUS

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-03 14:59:03 -05:00
Sage Weil
d667228c2e Merge PR #27146 into master
* refs/pull/27146/head:
	mon/MonMap: add min_quorum_size() helper
	mon/MDSMonitor: add 'mds ok-to-stop' command
	mon: add 'mon ok-to-{stop,add-offline,rm}' commands

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2019-04-03 13:49:19 -05:00
Sage Weil
3760e8f918 mon/OSDMonitor: add mon_debug_no_require_octopus
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-02 16:19:43 -05:00
Sage Weil
aa33a26e32 mon/MDSMonitor: add 'mds ok-to-stop' command
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-01 14:58:50 -05:00
Sage Weil
fbfa772047 mon/mon_types: add mon feature for octopus
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-01 11:26:33 -05:00
Sage Weil
cfba0acc01 mon: add 'mon ok-to-{stop,add-offline,rm}' commands
Helpers to decide when it is safe to stop a mon, add a mon that is
not started, or remove a mon.  (Adding and start a mon would always
be safe, but it takes time to sync, so it's not really possible to do
quickly.)

Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-01 11:05:52 -05:00
Sage Weil
420edba243 Merge PR #27169 into master
* refs/pull/27169/head:
	common/config: parse --default-$option as a default value

Reviewed-by: Sébastien Han <seb@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-03-27 09:48:33 -05:00
Sage Weil
fdd2000631 common/config: parse --default-$option as a default value
Sometimes it is useful to specify an alternative default value for an
option via the command line such that it has a lower priority than the
mon config database, config file, the rest of the command line, or the
environment.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-26 11:00:27 -05:00
David Zafman
57abdb11fa osd, test: Add num_shards_repaired to osd_stat_t for pushes with repair set 3(3)
Fixes: http://tracker.ceph.com/issues/38616

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-03-25 16:03:36 -07:00
David Zafman
d2ca3d2feb osd: Track num_objects_repaired in pg stats 2(3)
Leave repair pg state on until recovery finishes or a new scrub starts

Fixes: http://tracker.ceph.com/issues/38616

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-03-25 16:03:36 -07:00