Discount shards that already returned EIO, and use minimum_to_decode()
to request just what is necessary to recover or read the originally
requested extents of the object.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
Added const references to various function parameters in order to avoid
copying data unnecessarily and enhancing performance
Signed-off-by: Wilson E. Alvarez <wilson.e.alvarez1@gmail.com>
This removes a ton of tracking for ReplicatedBackend. ECBackend needs
to keep most of it so that it can track in-flight applies on legacy
peer OSDs. We can remove this post-nautilus.
Signed-off-by: Sage Weil <sage@redhat.com>
When we call handle_sub_write after a write completion, we may
do a sync read completion and then call back into check_ops(). Attaching
the on_write events to the op we're applying means that we don't ensure
that the on_write event(s) happen before the next write in the queue
is submitted (when we call back into check_ops()).
For example, if we have op A, on_write event W, then op B, a sync
applied completion would mean that we would queue the write for A, call
back into SubWriteApplied -> handle_sub_write_reply -> check_ops and then
process B... before getting to W.
Resolve this by attaching the on_write callback to a separate Op that is
placed into the queue, just like any other Op. This keeps the ordering
logic clean, although it is a bit ugly with the polymorphism around Op
being either an Op or an on_write callback.
Signed-off-by: Sage Weil <sage@redhat.com>
with new decode, minimum_to_decode in ErasureCodeInterface.
Updated ECBackend, ECUtil to use the new functions.Fixed the test cases
to use the new functions.
Fixed the review comments.
Authors: Myna, Elita.
Signed-off-by: Myna Vajha <mynaramana@gmail.com>
If we know none of the scrub maps have errors early in
be_large_omap_check() we can return without doing a lot of unnecessary
work.
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
Deletes are the same for EC and replicated pools, so add logic for
handling MOSDPGRecoveryDelete[Reply] to the base PGBackend
class.
Within PrimaryLogPG, add parallel paths for starting deletes,
recover_missing() and prep_object_replica_deletes(), and update the
local and global recovery callbacks to deal with lacking an
ObjectContext after a delete has been performed.
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
It's been confusing for a long time that EC pools are implemented by
ReplicatedPG. What PG/ReplicatedPG really implement is the concept
of a PG where consistency is managed by the primary via a log.
Signed-off-by: Samuel Just <sjust@redhat.com>
Implements the rmw pipeline and integrates the cache.
HashInfo now maintains a projected size for use during the planning
phase of the pipeline.
(Doesn't build without subsequent patches, not worth stubbing out
the interfaces)
Signed-off-by: Samuel Just <sjust@redhat.com>
trim_rollback_to was a not terrible name before in that all
it ever did is (possibly) trim the stashed version of the
object. However, now, it's going to encompass, in general,
the roll_forward part of a tpc (which will still be to
delete the stashed object in cases where that is
appropriate).
Signed-off-by: Samuel Just <sjust@redhat.com>
This patch removes ReplicatedBackend::PGTransaction and implemenations
and switches over all users. Happily, do_osd_ops loses the mod_desc
cruft and OpContext::pending_attrs. PGTransaction doesn't really
have a natural way to implement append, however. In reality, I think
this is probably an improvement, but it does mean that copy_from's
final transaction is now filled in by a lambda rather than by
appending a transaction fragment.
Signed-off-by: Samuel Just <sjust@redhat.com>