The previous 'osd scrub min interval' was mostly meaningless and useless.
Meanwhile, the 'osd scrub max interval' would only trigger a scrub if the
load was sufficiently low; if it was high, the PG might *never* scrub.
Instead, make the 'min' what the max used to be. If it has been more than
this many seconds, and the load is low, scrub. And add an additional
condition that if it has been more than the max threshold, scrub the PG
no matter what--regardless of the load.
Note that this does not change the default scrub interval for less-loaded
clusters, but it *does* change the meaning of existing config options.
Fixes: #3786
Signed-off-by: Sage Weil <sage@inktank.com>
This was already a no-op: we don't call PG::scrub_sched() unless it has
been osd_scrub_max_interval seconds since we last scrubbed. Unless we
explicitly requested in, in which case we don't want this check anyway.
Signed-off-by: Sage Weil <sage@inktank.com>
When a scrub is requested, flag it and move it to the front of the
scrub schedule instead of immediately queuing it. This avoids
bypassing the scrub reservation framework, which can lead to a heavier
impact on performance.
Signed-off-by: Sage Weil <sage@inktank.com>
This is more often the case than not, and we don't have a good way to
magically know what size of cluster the user will be creating. Better to
err on the side of doing the right thing for more people.
Fixes: #3785
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Check for existence of /sys/bus/rbd first to avoid unnecessary calls
Fixes: #3784
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
When we map/unmap devices, udev gets called to manage device nodes;
this will allow the command to wait for those manipulations to complete,
particularly for test runs, so that the device tree is stable by the
time the command exits.
--no-settle is also provided to avoid this behavior if desired (say,
for a series of 'map' commands, perhaps the user wants to wait for
settling only on the last of the series).
Fixes: #3635
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
map_cache.cached_lb() provides us with a lower bound across
all pgs for in-use osdmaps. We cannot trim past this since
those maps are still in use.
backport: bobtail
Fixes: #3770
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
This setting was intended to prevent recovery from overwhelming peering traffic
by delaying the recovery_wq until osd_recovery_delay_start seconds after pgs
stop being added to it. This should be less necessary now that recovery
messages are sent with strictly lower priority then peering messages.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Gregory Farnum <greg@inktank.com>
The previous logic was both complicated and not correct. Consequently,
we have been tending to drop snapcollection links in some cases. This
has resulted in clones incorrectly not being trimmed. This patch
replaces the logic with something less efficient but hopefully a bit
clearer.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
The new get_cluster_stats() method on the rados.Rados object calls
the rados_cluster_stat() function in the librados library.
Signed-off-by: Christopher Glass <christopher.glass@canonical.com>
Implement aio stat and also export this functionality to the C API.
Signed-off-by: Filippos Giannakos <philipgian@grnet.gr>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
If we encounter a scrub without a preceeding head, warn instead of
crashing. Note that this is still something we can't repair.
See #3705.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
If the lock class isn't present, EOPNOTSUPP is returned for lock calls
on newer OSDs, but sadly EIO on older; we need to treat both as
acceptable failures for RBD images. rados lock list will still fail.
Fixes#3744.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>