If a PG has recnetly split and has invalid stats, scrub it now, even if
it has scrubbed recently. This helps the stats become valid again soon.
Fixes: #8147
Signed-off-by: Sage Weil <sage@inktank.com>
Anchor table was the main user of MDCache::discover_ino(), it has
been removed. MDCache::discover_path() can replace discover_ino()
in remaining places. This patch removes discover ino related code.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Under rbd usage, if a volume has tens of thousands of objects and each 4M
object only has several KB(run fio on this volume or other cases), this volume
will be very low performance during a long time after create snapshot on
this volume. The OSD will be busy with large bandwidth read/write although
the object actually has few bytes needed to be copied.
This commit try to use fiemap if backend fs support, it can skip unnecessary
range to write. It also can be beneficial to space effective, because the copied
object will be regard as snapshot object which is access infrequently.
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Added a new command line parameter (-i or --image=) that allows rbd-fuse...
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
This is useful for qemu to guarantee live migration with caching is
safe, by invalidating the cache on the destination before starting it.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
This will only happen when shrinking or rolling back an image is done
while other I/O is in flight to the same ImageCtx. This is unsafe, so
return an error before performing the resize or rollback.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
the mount directory. The purpose of this is to allow a single RBD to be "mounted" in userspace without opening (and locking)
the other RBDs in the pool.
This is accomplished by performing a case-sensitive string compare in enumerate_images() when an image name is
specified on the command line. If no image name is specified, all images appear in the mount directory. If a non-existent
image name is specified, the mount directory is empty.
Signed-off-by: Stephen F Taylor <steveftaylor@gmail.com>
Any information stashed in the OpContext may be obsolete by the time we
actually mark the object clean. Instead, let the start_flush caller
clean up its OpContext and in try_flush_mark_clean we'll create a new
one. The primary reason to keep the OpContext would have been locking,
but we can set the obc as blocking without holding an OpContext, and
that would allow trimming to happen in the mean time (which is good
since trim_object does not respect rw locks since it doesn't change user
visible state). In try_flush_mark_clean, we requeue the fop->op along
with (but ahead of) the fop->dup_ops.
Fixes: #8068
Signed-off-by: Samuel Just <sam.just@inktank.com>
Fixes: #8202
This fixes the radosgw side of issue #8202. Needed to cast value
to unsigned char, otherwise it'd get padded.
Backport: dumpling
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
...so that they pass when they get unlucky with thrashing.
This will vastly decrease the probability of failure, but failure will
always be possible when a timeout is in place.
Fixes: #8193
Signed-off-by: Sage Weil <sage@inktank.com>
We need to ensure that even with pool snaps, we use the snapc provided in order
to ensure that the clones are written back correctly.
Fixes: #7941
Signed-off-by: Samuel Just <sam.just@inktank.com>
get_min_avail_to_read_shards might return an error if there are
no longer enough sources to reconstruct the missing shards.
This is possible if osds went down while we were writing the
previous chunk -- we already notice in check_recovery_sources
if a source goes down during a read.
Fixes: #8161
Signed-off-by: Samuel Just <sam.just@inktank.com>