This setting was intended to prevent recovery from overwhelming peering traffic
by delaying the recovery_wq until osd_recovery_delay_start seconds after pgs
stop being added to it. This should be less necessary now that recovery
messages are sent with strictly lower priority then peering messages.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Gregory Farnum <greg@inktank.com>
The previous logic was both complicated and not correct. Consequently,
we have been tending to drop snapcollection links in some cases. This
has resulted in clones incorrectly not being trimmed. This patch
replaces the logic with something less efficient but hopefully a bit
clearer.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
The new get_cluster_stats() method on the rados.Rados object calls
the rados_cluster_stat() function in the librados library.
Signed-off-by: Christopher Glass <christopher.glass@canonical.com>
Implement aio stat and also export this functionality to the C API.
Signed-off-by: Filippos Giannakos <philipgian@grnet.gr>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
If we encounter a scrub without a preceeding head, warn instead of
crashing. Note that this is still something we can't repair.
See #3705.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
If the lock class isn't present, EOPNOTSUPP is returned for lock calls
on newer OSDs, but sadly EIO on older; we need to treat both as
acceptable failures for RBD images. rados lock list will still fail.
Fixes#3744.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Otherwise, if the acting set does not change, the pg might
not show up as degraded if the pool size now exceeds the
acting set size.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Document that ceph_get_stripe_unit_granularity may return an error code
(e.g. -ENOTCONN). The interface requires a mount, but currently we
return a compile-time constant. Other error codes may be possible in the
future.
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
We cannot trust the Message bufferlists or other structures to be
stable without pipe_lock, as another Pipe may claim and modify the sent
list items while we are writing to the socket.
Related to #3678.
Signed-off-by: Sage Weil <sage@inktank.com>
Fill out the Message header, footer, and calculate CRCs during
encoding, not write_message(). This removes most modifications from
Pipe::write_message().
Signed-off-by: Sage Weil <sage@inktank.com>
The call to check_linger_pool_dne() may unregister the linger request,
invalidating the iterator. To avoid this, increment the iterator at
the top of the loop.
This mirror the fix in 4bf9078286 for
regular non-linger ops.
Fixes: #3734
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
To not conflict with future linuxbox pull for nfs-ganesha.
Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
This modifies bufferlists in the Message struct, and it is possible
for multiple instances of the Pipe to get references on the Message;
make sure they don't modify those bufferlists concurrently.
Signed-off-by: Sage Weil <sage@inktank.com>
Associate a sending message with the connection inside the pipe_lock.
This way if a racing thread tries to steal these messages it will
be sure to reset the con point *after* we do such that it the con
pointer is valid in encode_payload() (and later).
This may be part of #3678.
Signed-off-by: Sage Weil <sage@inktank.com>