Commit Graph

85 Commits

Author SHA1 Message Date
Sage Weil
23eaf7c498 qa/standalone/scrub/osd-scrub-snaps: fix kv grep
SnapMapper keys are now SNA_, not MAP_.

Fixes: http://tracker.ceph.com/issues/40725
Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-12 08:11:21 -05:00
Sage Weil
b2eb5232de Merge PR #28901 into master
* refs/pull/28901/head:
	qa/standalone/scrub/osd-scrub-repair: fix 'scrub ok' grep
	osd/osd_types: remove 'snap_context' from SnapSet::dump()

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-07-08 08:36:05 -05:00
Sage Weil
a960f2faa7 qa/standalone/scrub/osd-scrub-repair: fix 'scrub ok' grep
The log now also has a 'purged_snaps scrub ok' message that (generally)
precedes the first scrubbed PG.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-04 18:27:37 -05:00
Sage Weil
70ad54a0b3 osd/osd_types: remove 'snap_context' from SnapSet::dump()
We no longer have a snaps field with real values, so dumping this as a
"snap_context" is silly.  Instead, just dump the seq.

Adjust qa/standalone/scrub/osd-scrub-repair.sh accordingly.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-07-04 18:24:41 -05:00
David Zafman
fe3b693d0f
Merge pull request #28334 from dzafman/wip-40073
osd: Fix the way that auto repair triggers after regular scrub

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2019-07-03 15:27:27 -07:00
David Zafman
27918bb906 osd: Handle scrub interval changes
Global changes reschedule all PG scrubs
Pool changes reschedule pool PG scrubs

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-06-27 14:20:54 -07:00
David Zafman
893d227c82 test: Make sure that extra scheduled scrubs don't confuse test
Fixes: http://tracker.ceph.com/issues/40078

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-05-29 14:03:57 -07:00
David Zafman
39cc14bdc1
Merge pull request #27503 from dzafman/wip-39099
osd: Give recovery for inactive PGs a higher priority

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
2019-04-25 15:06:56 -07:00
David Zafman
71d254647a test: osd-recovery-scrub.sh ignore error from kill_daemons()
Another work around for http://tracker.ceph.com/issues/38195

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-25 13:53:27 -07:00
David Zafman
7e77898001 test: Divergent testing of _merge_object_divergent_entries() cases
Case 1: A more recent update exists
Case 2: The first entry in the divergent sequence is a create
Case 3  NOT TESTED - Ohject currently missing
Case 4: We can rollback all of the entries
Case 5: We cannot rollback at least 1 of the entries

Support starting OSDs even when "noup" is set (don't wait for up).
Move create_ec_pool() to ceph-helpers.sh

Fixes: https://tracker.ceph.com/issues/39162

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-22 18:50:24 -07:00
David Zafman
69fa515c95 test: Make most tests use default objectstore bluestore
Change run_osd() to default objectstore bluestore
Use run_osd_filestore() to use the non-default objectstore
Fix inject_eio to handle any objectstore if config prefixed with type

Remaining tests using filestore:
	osd-pool-create.sh TEST_pool_create_rep_expected_num_objects
		Test filestore directory creation
	qa/standalone/osd/osd-dup.sh TEST_filestore_to_bluestore
		Obvious
	qa/standalone/osd/osd-rep-recov-eio.sh TEST_rep_read_unfound
		Requires data digest in object info
	qa/standalone/scrub/osd-scrub-repair.sh multiple tests
		Erasure code pools append mode for filestore is tested
	qa/standalone/special/ceph_objectstore_tool.py
		Test code verifies COT by directly examining filestore contents

Fixes: https://tracker.ceph.com/issues/39162

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-04-10 08:55:04 -07:00
David Zafman
57abdb11fa osd, test: Add num_shards_repaired to osd_stat_t for pushes with repair set 3(3)
Fixes: http://tracker.ceph.com/issues/38616

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-03-25 16:03:36 -07:00
David Zafman
d2ca3d2feb osd: Track num_objects_repaired in pg stats 2(3)
Leave repair pg state on until recovery finishes or a new scrub starts

Fixes: http://tracker.ceph.com/issues/38616

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-03-25 16:03:36 -07:00
David Zafman
2202e5d0b1 test, osd: Improvements to auto_repair 1(3)
Allow auto_repair for replicated bluestore pools
Regular scrub within auto repair parameters will trigger deep scrub
New state failed_repair if PG repair attempt could not fix everything
Set failed_repair if not possible to repair anything

Fixes: http://tracker.ceph.com/issues/38616

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-03-23 09:52:40 -07:00
David Zafman
315d324889 test: osd-scrub-repair.sh: use corrupt_and_repair_lrc for lrc tests
Fix for argument handling of create_ec_pool()
Always pass a value for allow_overwrites for consistency

Caused by: 3ca750d41d

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-03-23 09:52:40 -07:00
David Zafman
d4915ee503 qa: Don't create rbd pool because it creates an object
This also reverts commit 10b9626ea7.

Fixes: http://tracker.ceph.com/issues/38631

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-03-11 16:57:51 -07:00
Sage Weil
10b9626ea7 qa/standalone/scrub/osd-scrub-repair: fix unfound grep
It's now "1/2 unfound":

             1/2 objects unfound (50.000%)

..presumably due to the rbd pool init creating the rbd_directory.

Signed-off-by: Sage Weil <sage@redhat.com>
2019-03-08 18:23:48 -06:00
David Zafman
ef2dc05de0 osd, test: Add test case with osd support for overdue PG scrubs and deep scrubs
Add trigger_deep_scrub osd command for testing
Publish stats when trigger_scrub/trigger_deep_scrub is used for testing
Add optional argument to trigger_scrub/trigger_deep_scrub
for amount of extra time to change last scrub stamps

Signed-off-by: David Zafman <dzafman@redhat.com>
2019-01-23 16:49:33 -08:00
David Zafman
879d89aace test: Correct typo trying to call flush_pg_stats
Signed-off-by: David Zafman <dzafman@redhat.com>
2019-01-23 16:49:33 -08:00
Vikhyat Umrao
8a694fc2f9 qa: specify filestore for misc tests
Signed-off-by: Vikhyat Umrao <vumrao@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
2019-01-16 13:09:19 -06:00
David Zafman
554ea73cb5 test: Disable duplicate request command test during scrub testing
Scrub testing requires an orderly control of scrubbing.  Most but not
all the time, the duplicate scrub request is ignored because the first
request hasn't finished.  Teuthology enables this environment variable
in the workunit handling.

Fixes: https://tracker.ceph.com/issues/36525

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-12-21 18:28:23 -08:00
David Zafman
975dbc5841 test: Minor improvement to create_ec_pool()
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-12-10 20:16:01 -08:00
David Zafman
1841928e28 test: Add test for requested scrub priority
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-11-14 23:57:20 -08:00
David Zafman
a159f162c5 test: osd-scrub-snaps.sh: After snapshot removal wait for snaptrim to complete
Due to deliberate corruptions snaptrim_error means snaptrim is done

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-11-08 14:48:20 -08:00
David Zafman
e37f95ac27 test: osd-scrub-snaps.sh: Testing with new --rmtype in ceph-objectstore-tool
Use --rmtype snapmap with new obj16 to remove snapmap only, check for repair message
Use --rmtype nosnapmap to remove obj5 while leaving snapmap behind

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-11-08 14:48:20 -08:00
David Zafman
f43faf4ad7 test: cleanup: Remove redundant cat of log and handle errors in create_scenario()
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-11-08 14:48:19 -08:00
Kefu Chai
1578875194
Merge pull request #24013 from dzafman/wip-35845
test: Use a grep pattern that works across releases

Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-09-12 23:00:39 +08:00
Kefu Chai
510d9e1345
Merge pull request #23723 from xiexingguo/wip-list-missing
osd/PrimaryLogPG: rename list_missing -> list_unfound command

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2018-09-11 20:25:21 +08:00
David Zafman
dc80f8585a test: Use a grep pattern that works across releases
Fixes: http://tracker.ceph.com/issues/35845

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-09-10 08:21:36 -07:00
Sage Weil
4fc02a7f48 osd/OSDMap: include age in up and in counts for ceph status
Signed-off-by: Sage Weil <sage@redhat.com>
2018-09-07 09:07:50 -05:00
xie xingguo
85ba2f0a82 osd/PrimaryLogPG: s/list_missing/list_unfound/
Also:
- Do not print **offset** until specified
- Count missing objects correctly (used to be primary's local missing)

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
2018-09-06 09:52:20 +08:00
Sage Weil
2c26fb0fe1 rados: drop mkpool, rmpool commands
- mkpool and rmpool users should use the normal cli/mon commands

Signed-off-by: Sage Weil <sage@redhat.com>
2018-08-31 09:27:36 -05:00
David Zafman
687f63e599 test: Update tests for error message changes
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-08-23 11:09:22 -07:00
David Zafman
58c4d32203 test: Verify cluster logging of scrub error messages
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-08-23 11:09:22 -07:00
David Zafman
67d9e44de6 test: Add test for repair of bad object info data_digest on all copies
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-07-26 07:50:23 -07:00
David Zafman
ebb05b2542 test: When possible show side-by-side diff in addition to regular diff
Fixes: https://tracker.ceph.com/issues/21664

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-06-26 18:23:07 -07:00
David Zafman
fe09fc5e9d test: Fail immediately if some operations fail
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-06-18 14:09:14 -07:00
David Zafman
39fc43556f test: Put files in private test directory
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-06-18 14:08:23 -07:00
David Zafman
c1e96ae7cb test: Use a file that should be on all OSes
Also, create temporary files in test specific dir and remove

Caused by: 154330fd68

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-06-05 11:27:12 -07:00
Sage Weil
154330fd68 osd/PrimaryLogPG: fix on_local_recover crash on stray clone
If there is a stray clone (one that does not appear in the SnapSet) and
we do any sort of recovery on it the OSD will crash.  Log an error instead
but continue.

This addresses a problem where a cluster has both (1) an unexpected clone
and (2) the clone is not present on all replicas.  Doing repair on that
PG will both not fix the unexpected clone and also cause the remaining
OSDs to crash trying to recover it.

Include a test.

Fixes: https://tracker.ceph.com/issues/24396
Signed-off-by: Sage Weil <sage@redhat.com>
2018-06-05 11:09:01 -05:00
David Zafman
843598b69b Revert "qa/standalone/scrub/osd-scrub-repair.sh: drop omap_digest flag"
This reverts commit 886606bfd7.

Signed-off-by: David Zafman <dzafman@redhat.com>

Conflicts:
	qa/standalone/scrub/osd-scrub-repair.sh (manually made equivalent changes)
2018-05-31 12:01:53 -07:00
David Zafman
1a7fa9a62a test: Add test cases for multiple copy pool and snapshot errors
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-28 16:42:19 -07:00
David Zafman
2fa596dc0c test: Prepare for second test and minor improvements
Check list-inconsistent-obj output
Check how many _scan_snap groupings
Use more general check for crashed osd(s)

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-28 16:42:19 -07:00
David Zafman
bae4940574 test: Fix comment at end of scrub test scripts
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-28 16:42:19 -07:00
David Zafman
458babe7ee test: Use jq in a compatible way and for easier diff analysis
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-16 08:11:24 -07:00
David Zafman
22ddc6da5f osd: Change shard digests to hex like object info digests
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-12 07:59:21 -07:00
David Zafman
9c5ef19f93 test: Be smarter about when jsonschema can be used
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-10 13:52:10 -07:00
David Zafman
60ae2b8eb3 osd rados command: Show snapset in list-inconsistent-snapset
Add SnapSet bufferlist to inconsistent_snapset_t

Partial fix for http://tracker.ceph.com/issues/23428

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-10 13:51:48 -07:00
David Zafman
1b1d45bf51 test: Add getjson variable to save output
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-10 13:26:08 -07:00
David Zafman
007cb45fe5 osd rados command: Change error name snapset_mismatch to snapset_error
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-10 13:26:08 -07:00