ceph/qa
Josh Durgin 468ad4b410 osd/ECBackend: only check required shards when finishing recovery reads
1235810c2a allowed recovery to use
multiple passes of reads to handle EIO, but the end condition for
checking whether we finished reading requires the full data to be
decodable (this is what get_want_to_read_shards returns).

This is just a loss of efficiency normally, since when there is only
one object the subsequent read works, and grabs all the data
necessary. The crash comes from having multiple objects in the same
ReadOp - in this case the sequence of events is:

- start recovery of two objects (osd_recovery_max_single_start > 1)
- read object a shard 3
- read object b shard 3
- fail minimum_to_decode because shard 3 can't reconstruct all of object a
- re-read all of object a, marking more reads in progress
- fail minimum_to_decode because shard 3 can't reconstruct all of object b
- skip re-reading object because there are now reads in progress
- finish reading k shards of object a
- still fail minimum_to_decode for object b, so no extra data was read
- send_all_remaining_reads tries to lookup object b in ReadOp object
- crash dereferencing to_read[object b], since this was cleared after handling the original object b read reply

This patch fixes the immediate inefficiency and crash by only checking
for the missing shards that were requested, rather than the entire
object, for recovery reads.

Fixes: http://tracker.ceph.com/issues/23195 (first crash)
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
2018-04-20 19:42:14 -04:00
..
archs
btrfs
cephfs
client
clusters
config
crontab Merge pull request #21253 from ceph/wip-yuriw-crontab 2018-04-05 15:04:02 +08:00
debug
distros
erasure-code
libceph cleanup: src/.libs -> build/lib 2018-03-06 14:44:47 -06:00
machine_types Try crontab github connection 2018-03-22 13:18:47 -07:00
mds
mon/bootstrap
mon_kv_backend
nightlies
objectstore
objectstore_cephfs
overrides
packages
qa_scripts
rbd qa: krbd whole-object-discard test 2018-03-07 12:06:33 +01:00
releases
rgw_frontend
rgw_pool_type
standalone osd/ECBackend: only check required shards when finishing recovery reads 2018-04-20 19:42:14 -04:00
suites Merge pull request #21146 from zmc/wip-cephmetrics 2018-04-04 08:28:57 -07:00
tasks Merge PR #16779 into master 2018-04-03 15:41:19 -07:00
timezone
workunits Merge PR #16779 into master 2018-04-03 15:41:19 -07:00
.gitignore
find-used-ports.sh
loopall.sh
Makefile
README Document the new '$' suite file 2018-03-23 00:02:11 +00:00
run_xfstests_qemu.sh
run_xfstests-obsolete.sh
run_xfstests.sh
run-standalone.sh qa/run-standalone.sh: set PYTHONPATH for FreeBSD also 2018-02-28 22:30:32 +08:00
runallonce.sh
runoncfuse.sh
runonkclient.sh
setup-chroot.sh
tox.ini

ceph-qa-suite
-------------

clusters/    - some predefined cluster layouts
suites/      - set suite

The suites directory has a hierarchical collection of tests.  This can be
freeform, but generally follows the convention of

  suites/<test suite name>/<test group>/...

A test is described by a yaml fragment.

A test can exist as a single .yaml file in the directory tree.  For example:

 suites/foo/one.yaml
 suites/foo/two.yaml

is a simple group of two tests.

A directory with a magic '+' file represents a test that combines all
other items in the directory into a single yaml fragment.  For example:

 suites/foo/bar/+
 suites/foo/bar/a.yaml
 suites/foo/bar/b.yaml
 suites/foo/bar/c.yaml

is a single test consisting of a + b + c.

A directory with a magic '%' file represents a test matrix formed from
all other items in the directory.  For example,

 suites/baz/%
 suites/baz/a.yaml
 suites/baz/b/b1.yaml
 suites/baz/b/b2.yaml
 suites/baz/c.yaml
 suites/baz/d/d1.yaml
 suites/baz/d/d2.yaml

is a 4-dimensional test matrix.  Two dimensions (a, c) are trivial (1
item), so this is really 2x2 = 4 tests, which are

  a + b1 + c + d1
  a + b1 + c + d2
  a + b2 + c + d1
  a + b2 + c + d2

A directory with a magic '$' file represents a test where one of the other
items is chosen randomly. For example,

suites/foo/$
suites/foo/a.yaml
suites/foo/b.yaml
suites/foo/c.yaml

is a single test.  It will be either a.yaml, b.yaml or c.yaml.  This can be
used in conjunction with the '%' file in other directories to run a series of
tests without causing an unwanted increase in the total number of jobs run.

Symlinks are okay.

The teuthology code can be found in https://github.com/ceph/teuthology.git