This reverts commit 44dca5c8c5.
The allowance is not only added for btrfs as of commit
e639254a0c5f8e3528fa8f2b2b451296653556bc, which makes us happy
for both non-btrfs (lower latency) and btrfs (better small io
throughput, no big stall during commit).
Signed-off-by: Sage Weil <sage@inktank.com>
We only need to adjust up the op queue limits during commit for btrfs,
because the snapshot initiation (async create) is currently
high-latency and the op queue is quiesced during that period.
This lets us revert 44dca5c, which disabled the extra allowance because
it is generally bad for non-btrfs writeahead mode.
Signed-off-by: Sage Weil <sage@inktank.com>
Otherwise we try to read the whole object in one go, which doesn't bode
well for large objects (either non-optimal or simply broken).
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
Each completed operation in the transaction proves thread
liveness, a stuck thread should still trigger the timeouts.
Fixes: #3928
Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Implement _process overload with TPHandle argument and use
that to ping the hb map between pgs and between map epochs
when advancing a pg. The thread will still timeout if
genuinely stuck at any point.
Fixes: 3905
Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Some HTTP servers, notabily lighttp, do not set SCRIPT_URI, make the fallback
string configurable.
Signed-off-by: caleb miles <caleb.miles@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Previously, we started scanning omap after omap_recovered_to.
This is a problem since the break in the loop implies that
omap_recovered_to is the first key not recovered.
Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
This is a very easy way for users to accidentally to a *lot* of damage.
Make it an annoying manual process to actually do this.
Signed-off-by: Sage Weil <sage@inktank.com>
Allow admin to artificially induce a stall in the op queue. Forces the
thread(s) to sleep for N seconds. We pause for 1 second increments and
recheck the value so that a previously stalled thread can be unwedged by
reinjecting a lower value (or 0). To stall indefinitely, just injust
very large number.
Signed-off-by: Sage Weil <sage@inktank.com>
This probably doesn't strictly matter because start_boot doesn't need the
lock (currently) and few other threads should be running, but it is
better to be consistent.
Signed-off-by: Sage Weil <sage@inktank.com>
If we find that our internal threads are stalled, do not reply to ping
requests. If we do this long enough, peers will mark us down. If we are
only transiently unhealthy, we will reply to the next ping and they will
be satisfied. If we are unhealthy and marked down, and eventually recover,
we will mark ourselves back up.
Signed-off-by: Sage Weil <sage@inktank.com>
If the thread stalls for 15 seconds, let our internal heartbeat fail.
This will let us internally respond more quickly to a stalled or failing
disk.
Signed-off-by: Sage Weil <sage@inktank.com>
Add ScrubMap encode/decode v4 message with omap digest
Compute digest of header and key/value. Use bufferlist
to reflect structure and compute as we go, clearing
bufferlist to reduce memory usage.
Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>