After removing the last snapshot linked to a parent image,
don't clear the CLONE_CHILD op feature bit if the image HEAD
is still linked to the parent.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The onreadable completions go through a finisher; add a final event
in that stream that keeps the PG alive while prior events flush.
flush() isn't quite sufficient since it doesn't wait for the finisher
events to flush too--only for the actual apply to have happened.
Signed-off-by: Sage Weil <sage@redhat.com>
Note that we don't need to worry about the internal get_omap_iterator
callrs (e.g., omap_rmkeyrange) because the apply thread does these
ops sequentially and in order.
Signed-off-by: Sage Weil <sage@redhat.com>
- keep mapper around for duration of import
- flush in-flight requests before tearing it down. This is necessary
because the mapper still uses onreadable.
Signed-off-by: Sage Weil <sage@redhat.com>
This removes a ton of tracking for ReplicatedBackend. ECBackend needs
to keep most of it so that it can track in-flight applies on legacy
peer OSDs. We can remove this post-nautilus.
Signed-off-by: Sage Weil <sage@redhat.com>
PrimaryLogPG calls it synchronously, on its own, after
submit_transaction. That means the backends no longer need to
track it or call back to it.
Signed-off-by: Sage Weil <sage@redhat.com>
This is no longer needed. FileStore was the only backend doing async
applies, and it now blocks until apply all on its own.
Signed-off-by: Sage Weil <sage@redhat.com>
bluestore and memstore are the only backends to implement
open_collection, and both of them can issue a handle immediately
after queue_transaction. Do that!
Signed-off-by: Sage Weil <sage@redhat.com>
Prevent a collection delete + recreate sequence from allowing two
conflicting OpSequencers for the same collection to exist as this
can lead to racing async apply threads.
Signed-off-by: Sage Weil <sage@redhat.com>
Note that this is *slight* overkill in that a *source* object of a clone
will also appear in the applying map, even though it is not being
modified. Given that those clone operations are normally coupled with
another transaction that does write (which is why we are cloning in the
first place) this should not make any difference.
Signed-off-by: Sage Weil <sage@redhat.com>
mon,osd: do not use crush_device_class file to initalize class for new osds
Reviewed-by: Alfredo Deza <adeza@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Andrew Schoen <aschoen@redhat.com>
We have several API functions that allow the caller to request I/Os
larger than INT_MAX bytes, but that return an int. Ensure that we don't
try to do more I/O than we can represent in the return value.
Tracker: http://tracker.ceph.com/issues/22948
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Break the core of _preadv_pwritev out into a function that takes a Fh.
Make _preadv_pwritev into a wrapper around that.
Then add in plumbing for ceph_ll_readv and ceph_ll_writev.
Tracker: http://tracker.ceph.com/issues/22948
Signed-off-by: Jeff Layton <jlayton@redhat.com>
In the read codepath, bl->length() returns an unsigned value, and that
could end up looking negative when cast to int. On the write side,
totalwritten is a uint64_t, which could look negative when cast to int.
Have the underlying layers pass back an int64_t and convert them to
int at a higher level. This prepares the underlying infrastructure for
ceph_ll_readv and ceph_ll_writev support.
Tracker: http://tracker.ceph.com/issues/22948
Signed-off-by: Jeff Layton <jlayton@redhat.com>