Kill a set of parallel methods that are using the old addr/inst-based
msgr APIs, and instead use Connection handles. This is much safer and gets
us closer to killing the old msgr API.
Signed-off-by: Sage Weil <sage@inktank.com>
Previously we were sending these maps to dead osds via their old addrs
using a new outgoing connection and setting the flags so that the msgr
would clean up. That mechanism is possibly buggy and fragile, and we can
avoid it entirely if we just reuse the existing heartbeat Connection.
Signed-off-by: Sage Weil <sage@inktank.com>
If we receive an OSDMap on the cluster connection, requeue it for the
cluster messenger, and process it there where we normally do. This avoids
any concerns about locking and ordering rules.
Signed-off-by: Sage Weil <sage@inktank.com>
This reverts commit 44dca5c8c5.
The allowance is not only added for btrfs as of commit
e639254a0c5f8e3528fa8f2b2b451296653556bc, which makes us happy
for both non-btrfs (lower latency) and btrfs (better small io
throughput, no big stall during commit).
Signed-off-by: Sage Weil <sage@inktank.com>
We only need to adjust up the op queue limits during commit for btrfs,
because the snapshot initiation (async create) is currently
high-latency and the op queue is quiesced during that period.
This lets us revert 44dca5c, which disabled the extra allowance because
it is generally bad for non-btrfs writeahead mode.
Signed-off-by: Sage Weil <sage@inktank.com>
Otherwise we try to read the whole object in one go, which doesn't bode
well for large objects (either non-optimal or simply broken).
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
Each completed operation in the transaction proves thread
liveness, a stuck thread should still trigger the timeouts.
Fixes: #3928
Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Implement _process overload with TPHandle argument and use
that to ping the hb map between pgs and between map epochs
when advancing a pg. The thread will still timeout if
genuinely stuck at any point.
Fixes: 3905
Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Some HTTP servers, notabily lighttp, do not set SCRIPT_URI, make the fallback
string configurable.
Signed-off-by: caleb miles <caleb.miles@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Previously, we started scanning omap after omap_recovered_to.
This is a problem since the break in the loop implies that
omap_recovered_to is the first key not recovered.
Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
This is a very easy way for users to accidentally to a *lot* of damage.
Make it an annoying manual process to actually do this.
Signed-off-by: Sage Weil <sage@inktank.com>
Allow admin to artificially induce a stall in the op queue. Forces the
thread(s) to sleep for N seconds. We pause for 1 second increments and
recheck the value so that a previously stalled thread can be unwedged by
reinjecting a lower value (or 0). To stall indefinitely, just injust
very large number.
Signed-off-by: Sage Weil <sage@inktank.com>
This probably doesn't strictly matter because start_boot doesn't need the
lock (currently) and few other threads should be running, but it is
better to be consistent.
Signed-off-by: Sage Weil <sage@inktank.com>
If we find that our internal threads are stalled, do not reply to ping
requests. If we do this long enough, peers will mark us down. If we are
only transiently unhealthy, we will reply to the next ping and they will
be satisfied. If we are unhealthy and marked down, and eventually recover,
we will mark ourselves back up.
Signed-off-by: Sage Weil <sage@inktank.com>
If the thread stalls for 15 seconds, let our internal heartbeat fail.
This will let us internally respond more quickly to a stalled or failing
disk.
Signed-off-by: Sage Weil <sage@inktank.com>
Add ScrubMap encode/decode v4 message with omap digest
Compute digest of header and key/value. Use bufferlist
to reflect structure and compute as we go, clearing
bufferlist to reduce memory usage.
Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>