When a scrub is requested, flag it and move it to the front of the
scrub schedule instead of immediately queuing it. This avoids
bypassing the scrub reservation framework, which can lead to a heavier
impact on performance.
Signed-off-by: Sage Weil <sage@inktank.com>
This is more often the case than not, and we don't have a good way to
magically know what size of cluster the user will be creating. Better to
err on the side of doing the right thing for more people.
Fixes: #3785
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Check for existence of /sys/bus/rbd first to avoid unnecessary calls
Fixes: #3784
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
When we map/unmap devices, udev gets called to manage device nodes;
this will allow the command to wait for those manipulations to complete,
particularly for test runs, so that the device tree is stable by the
time the command exits.
--no-settle is also provided to avoid this behavior if desired (say,
for a series of 'map' commands, perhaps the user wants to wait for
settling only on the last of the series).
Fixes: #3635
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
map_cache.cached_lb() provides us with a lower bound across
all pgs for in-use osdmaps. We cannot trim past this since
those maps are still in use.
backport: bobtail
Fixes: #3770
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
This setting was intended to prevent recovery from overwhelming peering traffic
by delaying the recovery_wq until osd_recovery_delay_start seconds after pgs
stop being added to it. This should be less necessary now that recovery
messages are sent with strictly lower priority then peering messages.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Gregory Farnum <greg@inktank.com>
The previous logic was both complicated and not correct. Consequently,
we have been tending to drop snapcollection links in some cases. This
has resulted in clones incorrectly not being trimmed. This patch
replaces the logic with something less efficient but hopefully a bit
clearer.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
The new get_cluster_stats() method on the rados.Rados object calls
the rados_cluster_stat() function in the librados library.
Signed-off-by: Christopher Glass <christopher.glass@canonical.com>
Implement aio stat and also export this functionality to the C API.
Signed-off-by: Filippos Giannakos <philipgian@grnet.gr>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
If we encounter a scrub without a preceeding head, warn instead of
crashing. Note that this is still something we can't repair.
See #3705.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
If the lock class isn't present, EOPNOTSUPP is returned for lock calls
on newer OSDs, but sadly EIO on older; we need to treat both as
acceptable failures for RBD images. rados lock list will still fail.
Fixes#3744.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Otherwise, if the acting set does not change, the pg might
not show up as degraded if the pool size now exceeds the
acting set size.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Document that ceph_get_stripe_unit_granularity may return an error code
(e.g. -ENOTCONN). The interface requires a mount, but currently we
return a compile-time constant. Other error codes may be possible in the
future.
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>