Commit Graph

164 Commits

Author SHA1 Message Date
Sage Weil
c3164df959 qa/standalone/mon/misc: fix features test
Signed-off-by: Sage Weil <sage@redhat.com>
2018-05-25 17:02:49 -05:00
David Zafman
1a7fa9a62a test: Add test cases for multiple copy pool and snapshot errors
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-28 16:42:19 -07:00
David Zafman
2fa596dc0c test: Prepare for second test and minor improvements
Check list-inconsistent-obj output
Check how many _scan_snap groupings
Use more general check for crashed osd(s)

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-28 16:42:19 -07:00
David Zafman
bae4940574 test: Fix comment at end of scrub test scripts
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-28 16:42:19 -07:00
Sage Weil
27e91a99f5
Merge pull request #21273 from jdurgin/wip-23195
osd/ECBackend: only check required shards when finishing recovery reads

Reviewed-by: David Zafman <dzafman@redhat.com>
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2018-04-24 17:20:25 -05:00
Josh Durgin
d4808256d2 osd/ECBackend: preserve requests for other objects when sending extra reads
When multiple objects are in flight for the same ReadOp, swap() on the
map<hobject_t, read_request_t> would remove requests for all objects.

We just want to replace the requests for the single object we're
dealing with in send_all_remaining_reads().

This prevents crashing trying to look up rop.to_read[hoid] when another
object in the same ReadOp gets an EIO and tries to send more requests.

Test this by using osd-recovery-max-single-start to bundle multiple
reads into one ReadOp. Save and restore CEPH_ARGS so custom settings
are reset for each test.

Fixes: http://tracker.ceph.com/issues/23195 (the 2nd crash there)
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2018-04-20 19:42:15 -04:00
Josh Durgin
b162a5478d osd/ECBackend: recover from EIO based on the minimum data necessary
Discount shards that already returned EIO, and use minimum_to_decode()
to request just what is necessary to recover or read the originally
requested extents of the object.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2018-04-20 19:42:14 -04:00
Josh Durgin
468ad4b410 osd/ECBackend: only check required shards when finishing recovery reads
1235810c2ad08ccb7ef5946686eb2b85798f5bca allowed recovery to use
multiple passes of reads to handle EIO, but the end condition for
checking whether we finished reading requires the full data to be
decodable (this is what get_want_to_read_shards returns).

This is just a loss of efficiency normally, since when there is only
one object the subsequent read works, and grabs all the data
necessary. The crash comes from having multiple objects in the same
ReadOp - in this case the sequence of events is:

- start recovery of two objects (osd_recovery_max_single_start > 1)
- read object a shard 3
- read object b shard 3
- fail minimum_to_decode because shard 3 can't reconstruct all of object a
- re-read all of object a, marking more reads in progress
- fail minimum_to_decode because shard 3 can't reconstruct all of object b
- skip re-reading object because there are now reads in progress
- finish reading k shards of object a
- still fail minimum_to_decode for object b, so no extra data was read
- send_all_remaining_reads tries to lookup object b in ReadOp object
- crash dereferencing to_read[object b], since this was cleared after handling the original object b read reply

This patch fixes the immediate inefficiency and crash by only checking
for the missing shards that were requested, rather than the entire
object, for recovery reads.

Fixes: http://tracker.ceph.com/issues/23195 (first crash)
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2018-04-20 19:42:14 -04:00
Nathan Cutler
f03b9028f5 qa/standalone/ceph-helpers.sh: provide argument to dirname
Fixes: http://tracker.ceph.com/issues/23805
Signed-off-by: Nathan Cutler <ncutler@suse.com>
2018-04-20 10:10:15 +02:00
David Zafman
458babe7ee test: Use jq in a compatible way and for easier diff analysis
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-16 08:11:24 -07:00
David Zafman
c6207d21a8
Merge pull request #21362 from dzafman/wip-hex-digest
osd: Change shard digests to hex like object info digests

Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-04-12 16:07:36 -07:00
David Zafman
22ddc6da5f osd: Change shard digests to hex like object info digests
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-12 07:59:21 -07:00
Kefu Chai
4cc3dab070
Merge pull request #21318 from badone/wip-qa-mon-misc-add-osdmap-prune-tests
qa/standalone/mon/misc.sh: Add osdmap-prune tests

Reviewed-by: Joao Eduardo Luis <joao@suse.de>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-04-11 23:08:33 +08:00
David Zafman
9c5ef19f93 test: Be smarter about when jsonschema can be used
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-10 13:52:10 -07:00
David Zafman
60ae2b8eb3 osd rados command: Show snapset in list-inconsistent-snapset
Add SnapSet bufferlist to inconsistent_snapset_t

Partial fix for http://tracker.ceph.com/issues/23428

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-10 13:51:48 -07:00
David Zafman
1b1d45bf51 test: Add getjson variable to save output
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-10 13:26:08 -07:00
David Zafman
007cb45fe5 osd rados command: Change error name snapset_mismatch to snapset_error
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-10 13:26:08 -07:00
David Zafman
0c7ac9db3b test: Clean-up test and use local values for number of objects and osds
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-10 13:26:08 -07:00
David Zafman
982509514c osd rados command: list-inconsistent-obj attribute improvements
System attributes shown as "object_info", "snapset" and "hashinfo"
Only output user attributes as "attrs"
	Drop leading undescore "_" for user attribute keys
Improve logic as to when to show user attributes or specific system attributes

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-10 13:26:08 -07:00
David Zafman
01687b052f osd rados command: Change "oi" to "info" in scrub handling errors
data_digest_mismatch_oi -> data_digest_mismatch_info
omap_digest_mismatch_oi -> omap_digest_mismatch_info
size_mismatch_oi -> size_mismatch_info
obj_size_oi_mismatch -> obj_size_info_mismatch

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-10 13:26:08 -07:00
David Zafman
273f6213ea osd rados command: Change "oi_attr" to "info" in scrub handling errors
oi_attr_missing -> info_missing
oi_attr_corrupted -> info_corrupted

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-10 13:26:08 -07:00
David Zafman
bec67e3d40 osd rados command: Rename ss_attr_missing/ss_attr_corrupted to snapset_missing/snapset_corrupted
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-10 13:26:08 -07:00
David Zafman
d713c7dad0 osd rados command: Improve scrub handling of HashInfo (hinfo_key xattr)
Fixes: http://tracker.ceph.com/issues/23364

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-10 13:26:08 -07:00
David Zafman
be815f9b2b test: Remove check that masks differences (let diff fail)
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-10 13:26:08 -07:00
Brad Hubbard
c0dac8ecd2 qa/standalone/mon/misc.sh: Add osdmap-prune tests
Fixes: http://tracker.ceph.com/issues/23621

Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
2018-04-10 14:26:53 +10:00
Kefu Chai
9e840c4382
Merge pull request #21274 from dzafman/wip-cot-config
tools: Use --no-mon-config so ceph_objectstore_tool.py test doesn't hang

Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-04-07 11:59:28 +08:00
David Zafman
a8d26122dc tools: Use --no-mon-config so ceph_objectstore_tool.py test doesn't hang
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-06 11:52:10 -07:00
Joao Eduardo Luis
940dd941ef
Merge pull request #19331 from jecluis/wip-mon-osdmap-prune
mon: osdmap prune

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2018-04-06 15:22:28 +01:00
Joao Eduardo Luis
2ffed4c98f qa: mon: osdmap pruning standalone/workunit
Keep a standalone wrapper for the workunit, so we can test it locally,
leveraging the ceph-helpers to do the setup. Keep a workunit to be
exercised by teuthology.

Signed-off-by: Joao Eduardo Luis <joao@suse.de>
2018-04-06 04:18:23 +01:00
David Zafman
5cfb8241f4 osd: Fix stale scrub stats when a primary takes over
Fixes: http://tracker.ceph.com/issues/23267

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-04-03 12:51:06 -07:00
David Zafman
ce9c029858 test: Eliminate use of bc (use awk) in get_timeout_delays()
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-28 10:24:33 -07:00
David Zafman
293ac9895f test: Replace bc command with printf command
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-22 17:19:56 -07:00
Sage Weil
038afcbc2a Merge remote-tracking branch 'gh/mimic-dev2' 2018-03-18 18:39:46 -05:00
Sage Weil
b1c045cf33 wa/standalong/mon/osd-pool-create: debug and increase delay
Signed-off-by: Sage Weil <sage@redhat.com>
2018-03-16 10:35:32 -05:00
Neha Ojha
7f6f4f90fe qa: modify TEST_recovery_sizeup() to handle async recovery
Signed-off-by: Neha Ojha <nojha@redhat.com>
2018-03-15 11:13:34 -07:00
Sage Weil
69765a788e qa/standalone/osd/repro_long_log: no-mon-config for cot
Signed-off-by: Sage Weil <sage@redhat.com>
2018-03-15 08:42:14 -05:00
David Zafman
8a7e6c2349
Merge pull request #20220 from dzafman/wip-calc-stats3
osd: Improve recovery stat handling by using peer_missing and missing_loc info

Reviewed-by: Sage Weil <sage@redhat.com>
2018-03-14 11:07:44 -07:00
David Zafman
af85f3cc48 test: osd-backfill-stats.sh parallel osd-recovery-stats.sh check() changes
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-14 10:07:11 -07:00
David Zafman
acc1f80684 test: Use "(est)" in log message when an osd doesn't have peer_missing
Consolidate check() code and common script code
TEST_recovery_multi() wasn't reliable due to delayed peer_missing

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-14 10:07:11 -07:00
David Zafman
12e331b742 test: osd-recovery-stats.sh: New test with different missing objs on multiple OSDs
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-14 10:07:11 -07:00
David Zafman
09b5697ba2 test: Correction for better degraded/misplaced handling
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-14 10:07:11 -07:00
David Zafman
d7fd9174b9 osd: Fix for handling more than 1 missing target
Fix test case to test more than 1 target

Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-14 10:07:03 -07:00
David Zafman
51b740ad41 test: Fail upon flush_pg_stats timeout
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-11 16:26:11 -07:00
Josh Durgin
1c15458a00 PrimaryLogPG: only trim up to osd_pg_log_trim_max entries at once
This prevents the fix for http://tracker.ceph.com/issues/22050 or
potential future bugs from causing too much latency by trimming too
many log entries at once.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2018-03-09 19:14:28 -05:00
Josh Durgin
b50186bfe6 PG, PrimaryLogPG: trim log and rollback info for error log entries
Regular updates piggyback some osd state for this purpose with
MOSDRepOp[Reply]. Do the same thing for pure log entry updates (write
errors and lost/revert additions) via MOSDPGUpdateLogMissing[Reply].

Fixes: http://tracker.ceph.com/issues/22050
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2018-03-09 17:54:08 -05:00
Josh Durgin
2067f7c679
Merge pull request #20786 from dzafman/wip-zafman-log-trim
tools/ceph-objectstore-tool: command to trim the pg log

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2018-03-08 16:42:31 -08:00
Josh Durgin
b01e4ea5e2 tools: Add pg log trim command to ceph-objectstore-tool
Add test script that verifies the command in qa/standalone/osd

Fixes: http://tracker.ceph.com/issues/23242

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: David Zafman <dzafman@redhat.com>
2018-03-08 15:58:55 -08:00
David Zafman
317b3d3b36
Merge pull request #20759 from dzafman/wip-cleanup
test: Make clearer by moving code out of loop

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
2018-03-08 10:45:38 -08:00
Sage Weil
c9e974800f qa: --no-mon-config for ceph-objectstore-tool --op mkfs ..
Signed-off-by: Sage Weil <sage@redhat.com>
2018-03-06 14:44:50 -06:00
Sage Weil
5ee5bbace1 qa/standalone: drop CEPH_LIB hacks
Signed-off-by: Sage Weil <sage@redhat.com>
2018-03-06 14:44:49 -06:00