Commit Graph

101 Commits

Author SHA1 Message Date
David Zafman
fa698e18e1 mon: Improve health status for backfill_toofull and recovery_toofull
Treat backfull_toofull as a warning condition because it can resolve itself.
Includes test case for PG_BACKFILL_FULL
Includes test case for recovery_toofull / PG_RECOVERY_FULL

Fixes: https://tracker.ceph.com/issues/39555

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-06-20 02:22:01 +00:00
David Zafman
7959159e83 test: Adding standalone test of log copy handling
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-05-10 15:31:51 -07:00
sjust@redhat.com
252d5c20cf osd/: move stat updates and publishing to PeeringState
Signed-off-by: Samuel Just <sjust@redhat.com>
2019-05-01 11:22:24 -07:00
David Zafman
66b041fa4a
Merge pull request #27769 from dzafman/wip-39333
osd-backfill-space.sh test failed in TEST_backfill_multi_partial()

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-26 11:55:04 -07:00
David Zafman
9931023457 test: osd-backfill-spsace.sh doesn't matter which PG wins the race
Fixes: http://tracker.ceph.com/issues/39333

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-26 10:11:00 -07:00
David Zafman
39cc14bdc1
Merge pull request #27503 from dzafman/wip-39099
osd: Give recovery for inactive PGs a higher priority

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-25 15:06:56 -07:00
David Zafman
444aa9f9fe osd, mon: New pool recovery priority range -10 to 10
Use OSD_POOL_PRIORITY_MAX and OSD_POOL_PRIORITY_MIN constants
Scale legacy priorities if exceeds maximum

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-25 13:53:27 -07:00
David Zafman
3a234164d0
Merge pull request #27279 from dzafman/wip-divergent
Improvements to standalone tests

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-24 10:58:11 -07:00
David Zafman
7e77898001 test: Divergent testing of _merge_object_divergent_entries() cases
Case 1: A more recent update exists
Case 2: The first entry in the divergent sequence is a create
Case 3  NOT TESTED - Ohject currently missing
Case 4: We can rollback all of the entries
Case 5: We cannot rollback at least 1 of the entries

Support starting OSDs even when "noup" is set (don't wait for up).
Move create_ec_pool() to ceph-helpers.sh

Fixes: https://tracker.ceph.com/issues/39162

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-22 18:50:24 -07:00
Sage Weil
755e8c4ef2 Merge PR #27595 into master
* refs/pull/27595/head:
	osd: add 'ceph osd stop <osd.nnn>' command

Reviewed-by: Sage Weil <sage@redhat.com>
2019-04-20 08:52:01 -05:00
xie xingguo
5dbae13ce0 osd: add 'ceph osd stop <osd.nnn>' command
stop command can be used to force stopping a specified osd daemon, e.g.,
you don't have to pre-figure out where it located.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-04-18 13:55:02 +08:00
Sage Weil
dc97651cbd Merge PR #27499 into master
* refs/pull/27499/head:
	qa/standalone/osd/osd-markdown: fix dup command disabling

Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-12 06:54:58 -05:00
Sage Weil
f7216d0b2c qa/standalone/osd/osd-markdown: fix dup command disabling
The ceph cli tool checks for the presence of the variable, not its value.

Fixes: http://tracker.ceph.com/issues/38359
Signed-off-by: Sage Weil <sage@redhat.com>
2019-04-10 16:44:38 -05:00
David Zafman
69fa515c95 test: Make most tests use default objectstore bluestore
Change run_osd() to default objectstore bluestore
Use run_osd_filestore() to use the non-default objectstore
Fix inject_eio to handle any objectstore if config prefixed with type

Remaining tests using filestore:
	osd-pool-create.sh TEST_pool_create_rep_expected_num_objects
		Test filestore directory creation
	qa/standalone/osd/osd-dup.sh TEST_filestore_to_bluestore
		Obvious
	qa/standalone/osd/osd-rep-recov-eio.sh TEST_rep_read_unfound
		Requires data digest in object info
	qa/standalone/scrub/osd-scrub-repair.sh multiple tests
		Erasure code pools append mode for filestore is tested
	qa/standalone/special/ceph_objectstore_tool.py
		Test code verifies COT by directly examining filestore contents

Fixes: https://tracker.ceph.com/issues/39162

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-10 08:55:04 -07:00
xie xingguo
6a8aedc107 qa: add new test case for pulling error
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2019-04-04 11:04:43 +08:00
David Zafman
11f072fee1 Add checking of num_shards_repaired in osd stats
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-04 11:04:42 +08:00
Sage Weil
420edba243 Merge PR #27169 into master
* refs/pull/27169/head:
	common/config: parse --default-$option as a default value

Reviewed-by: Sébastien Han <seb@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-03-27 09:48:33 -05:00
Sage Weil
fdd2000631 common/config: parse --default-$option as a default value
Sometimes it is useful to specify an alternative default value for an
option via the command line such that it has a lower priority than the
mon config database, config file, the rest of the command line, or the
environment.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-26 11:00:27 -05:00
David Zafman
d2ca3d2feb osd: Track num_objects_repaired in pg stats 2(3)
Leave repair pg state on until recovery finishes or a new scrub starts

Fixes: http://tracker.ceph.com/issues/38616

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-03-25 16:03:36 -07:00
Sage Weil
be1187575b Merge PR #27021 into master
* refs/pull/27021/head:
	msg: remove XioMessenger
	qa/suites/rados/thrash-old-clients: add nautilus
	qa/suites/rados/thrash-old-clients: add mimic v1 variant
	qa/suites/rados/thrash-old-clients: add mimic
	qa/suites/rados/thrash-old-clients: collapse msgr and client choice
	qa: remove simplemessenger tests
	ceph_test_msgr: remove simple
	msg: remove SimpleMessenger

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Reviewed-by: Matt Benjamin <mbenjami@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2019-03-22 04:42:30 -05:00
Sage Weil
28b4392a71 qa: remove simplemessenger tests
Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-20 06:10:25 -05:00
Sage Weil
fb915c4805 osd/PG: invalidate PG if merging with unexpected version
If the source or target PG version is 0'0, we may silently take the max
of the source and target and still leave the PG complete.  This
specifically can happen with an empty PG, as seen with bug 38655.  In
theory we could encounter one of the PGs with some other last_update
that doesn't match what we expect.  If that ever happens, make sure the
result is incomplete so that backfill can clean up.

Additionally check that the pool metadata for the last merge matches the
PGs at all.  This could mismatch if we have an osdmap gap and are forced
to do some merge without merge info at all... in which case we should
definitely invalidate: there should be newer copies of the PG(s), and we
have no idea whether the PGs we are merging are what we want.  If this is
some disaster recovery situation, an operator is always free to use
ceph-objectstore-tool to re-mark a PG complete (at their own peril!).

Fixes: http://tracker.ceph.com/issues/38655
Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-12 10:08:46 -05:00
Sage Weil
f978b27d2b qa/standalone/osd/pg-split-merge.sh: reproduce pg merge problem with empty pgs
This reproduces http://tracker.ceph.com/issues/38655

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-11 17:10:28 -05:00
Sage Weil
bf74c1adc4 qa/standalone/osd/osd-rep-recov-eio: fix better
- no need for the default pool size
- no initial osds or it will collide with setup_osds later
- no need for rbd pool at all

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-08 17:41:11 -06:00
Sage Weil
b59ff3860f qa/standalone/osd/osd-force-create-pg: create more pgs
Avoid warnings about too few pgs.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-06 16:27:56 -06:00
Sage Weil
cba0483b09 qa/standalone: make sure an osd is running before create_rbd_pool
'rbd pool init' now does IO.  Drop the pool, or change the pool size to 1.

Fixes: http://tracker.ceph.com/issues/38585
Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-06 16:27:56 -06:00
Sage Weil
01316aa7bd qa/standalone/osd/pg-split-merge: fix import_after_merge_and_gap
This test introduces a map gap.  What *should* happen is that when there is
such a gap, we cannot import.  Previously, the test didn't reliably produce
a map gap at all, and didn't check that import failed--it verified that it
passed.

Fix the test so that it reliably produces a gap *and* reports
min_last_epoch_clean to the mon so we can trim.  Then verify we fail to
import, but can with --force.  But remove the pg again, because if we
force an import with a map gap the osd will refuse to start.

Fixes: http://tracker.ceph.com/issues/38525
Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-03 10:23:27 -06:00
Sage Weil
c6a7b2cbd1 qa/standalone/osd/osd-markdown: disable CLI command dups
The markdown test is based on marking down a specific number of times, but
the duplicate commands from the CLI may not get absorbed/batched by the
mon, breaking the test.  Override the default qa/tasks/workunit.py
behavior of sending dups.

Fixes: http://tracker.ceph.com/issues/38359
Signed-off-by: Sage Weil <sage@redhat.com>
2019-02-18 15:02:25 -06:00
David Zafman
64beabc4c6 test: Limit loops waiting for force-backfill/force-recovery to happen
Fixes: http://tracker.ceph.com/issues/38309

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-02-13 17:44:53 -08:00
David Zafman
910a95b9c8 test: osd-backfill-stats.sh Fix check of multi backfill OSDs, skip remapped test
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-02-07 20:05:58 -08:00
David Zafman
690ff9a21f
Merge pull request #26213 from dzafman/wip-38041
osd: Fix recovery and backfill priority handling

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-02-07 17:26:34 -08:00
David Zafman
ca5cf14fa8 test: Add scripts to test backfill/recovery priority handling
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-02-07 15:46:23 -08:00
David Zafman
36e305c4b6 test: Ignore kill_daemons() error
Workaround for: http://tracker.ceph.com/issues/38195

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-02-05 11:31:32 -08:00
David Zafman
cc6339c0cd test: Increase timeouts in osd-backfill-space.sh because of failure seen
Fixes: http://tracker.ceph.com/issues/38027

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-02-05 11:29:32 -08:00
David Zafman
99ddd3666b
Merge pull request #22797 from dzafman/wip-19753
osd: Deny reservation if expected backfill size would put us over bac…

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-01-18 07:42:00 -08:00
Vikhyat Umrao
8a694fc2f9 qa: specify filestore for misc tests
Signed-off-by: Vikhyat Umrao <vumrao@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-16 13:09:19 -06:00
Sage Weil
b92be2ca9b qa/standalone/osd/osd-fast-mark-down: use v1 addr w/ simplemessenger
Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-03 11:17:31 -06:00
David Zafman
094d39aa09 test: Add testing for erasure code backfill out of space detection
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-12-18 09:30:44 -08:00
David Zafman
3b8f86c8b0 test: Add testing for backfill out of space detection
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-12-18 09:30:44 -08:00
Igor Fedotov
d07c10dfc0 os/bluestore: add main device expand capability.
One can do that via ceph-bluestore-tool's bluefs-bdev-expand command

Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-11-29 12:48:20 +03:00
Sage Weil
c8a8dc21fd Merge PR #24828 into master
* refs/pull/24828/head:
	qa/osd-bluefs-volume-ops: use ceph-bluestore-tool for fsck
	qa/osd-bluefs-volume-ops: reduce space usage for the test case

Reviewed-by: David Zafman <dzafman@redhat.com>
2018-11-08 16:26:52 -06:00
Sage Weil
9ab9dcfc0d Merge PR #24809 into master
* refs/pull/24809/head:
	os/bluestore: omit redundant '/' in OSD path for ceph-bluestore-tool if
	os/bluestore: improve error handling for migrate ops in
	qa/standtalone/osd-bluefs-volume-ops: remove redundant code.

Reviewed-by: Sage Weil <sage@redhat.com>
2018-10-30 15:09:45 -05:00
Igor Fedotov
f5520ea304 qa/osd-bluefs-volume-ops: use ceph-bluestore-tool for fsck
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-10-30 15:38:16 +03:00
Igor Fedotov
80e67abdfd qa/osd-bluefs-volume-ops: reduce space usage for the test case
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-10-30 15:38:15 +03:00
Sage Weil
c40685ebdd Merge PR #24787 into master
* refs/pull/24787/head:
	Merge PR #24796 into nautilus
	osd: fix heartbeat_reset unlock
	Merge PR #24780 into nautilus
	Merge PR #24761 into nautilus
	Merge PR #24651 into nautilus
	osd: fix race between op_wq and context_queue
	test: Make sure kill_daemons failure will be easy to find
	test: Add flush_pg_stats to make test more deterministic
2018-10-29 08:36:34 -05:00
Igor Fedotov
5d38f8b49b qa/standtalone/osd-bluefs-volume-ops: remove redundant code.
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-10-29 16:30:36 +03:00
Kefu Chai
4af71e7c00
Merge pull request #23103 from ifed01/wip-ifed-bluefs-migrate
os/bluestore: allow ceph-bluestore-tool to coalesce, add and migrate BlueFS backing volumes

Reviewed-by:  Sage Weil <sage@redhat.com>
2018-10-22 22:33:08 +08:00
David Zafman
da3c556aa2 test: Make sure kill_daemons failure will be easy to find
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-10-17 16:54:45 -07:00
David Zafman
b33edbc4f6 test: Add flush_pg_stats to make test more deterministic
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-10-17 16:54:45 -07:00
Igor Fedotov
02b5768a4f tests: add qa test case for bluefs volume coalescence
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-10-17 22:39:27 +03:00