ceph/erasure-code at 468ad4b41010488c8d48ef65ccbebfdb4270690f - ceph

mirror of https://github.com/ceph/ceph synced 2025-01-02 17:12:31 +00:00

History

Josh Durgin 468ad4b410 osd/ECBackend: only check required shards when finishing recovery reads `1235810c2a` allowed recovery to use multiple passes of reads to handle EIO, but the end condition for checking whether we finished reading requires the full data to be decodable (this is what get_want_to_read_shards returns). This is just a loss of efficiency normally, since when there is only one object the subsequent read works, and grabs all the data necessary. The crash comes from having multiple objects in the same ReadOp - in this case the sequence of events is: - start recovery of two objects (osd_recovery_max_single_start > 1) - read object a shard 3 - read object b shard 3 - fail minimum_to_decode because shard 3 can't reconstruct all of object a - re-read all of object a, marking more reads in progress - fail minimum_to_decode because shard 3 can't reconstruct all of object b - skip re-reading object because there are now reads in progress - finish reading k shards of object a - still fail minimum_to_decode for object b, so no extra data was read - send_all_remaining_reads tries to lookup object b in ReadOp object - crash dereferencing to_read[object b], since this was cleared after handling the original object b read reply This patch fixes the immediate inefficiency and crash by only checking for the missing shards that were requested, rather than the entire object, for recovery reads. Fixes: http://tracker.ceph.com/issues/23195 (first crash) Signed-off-by: Josh Durgin <jdurgin@redhat.com>		2018-04-20 19:42:14 -04:00
..
test-erasure-code-plugins.sh	scripts: fix bash path in shebangs	2017-07-27 13:24:26 -06:00
test-erasure-code.sh	qa/standalone: extract delete_pool()	2018-02-28 15:40:28 +08:00
test-erasure-eio.sh	osd/ECBackend: only check required shards when finishing recovery reads	2018-04-20 19:42:14 -04:00