We have the con handy; use it. This avoids generate a spurious RESET
event, which we do not need or do anything useful with. Note that in this
case we are not attaching anything to the Connection priv field.
Signed-off-by: Sage Weil <sage@inktank.com>
If we get a reset during shutdown, we should still break the cycle to avoid
tripping the valgrind leak detection. Note that we are touching no
internal Monitor state here and the locking has not changed.
Signed-off-by: Sage Weil <sage@inktank.com>
Document these in the interface, not the implementation; having two copies
clutters the header and invites them to get out of sync.
Signed-off-by: Sage Weil <sage@inktank.com>
If the caller is marking down an addr, they presumably don't have the
Connection* handy, so we should generate a reset event to help them
clean up con <-> session ref cycles.
Signed-off-by: Sage Weil <sage@inktank.com>
It's possible for us to just be really slow when getting the reply to the
first op or doing the second op, resulting in a successful lock. If we
do get a success, assert that at least that amount of time has passed to
avoid any false positives.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
In the event of a split or collection rename, we need to ensure that
we don't replay any operations on objects within those collections
prior to that point. Thus, we mark a global replay guard on the
collection after doing a syncfs and make sure to check that in
_check_replay_guard() for all object operations.
Fixes: #5154
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
We shouldn't hold the pipe_lock while doing the ms_verify_authorizer
upcalls.
Fix by unlocking a bit earlier, and verifying our state is still correct
in the failure path.
This regression was introduced by ecab4bb951.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
This is what e213b1bc25 intended to do
but managed to bungle by using >= instead of >.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
osd: include op queue age histogram in osd_stat_t
Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
rgw/rgw_rados.cc: In member function 'virtual int RGWPutObjProcessor_Atomic::handle_data(ceph::bufferlist&, off_t, void**)':
rgw/rgw_rados.cc:648:5: warning: parameter 'ofs' set but not used [-Wunused-but-set-parameter]
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Currently we see slow request warnings go by in the cluster log, but they
are not reflected by 'ceph health'. Use the new op queue histograms to
raise a flag there as well.
For example:
HEALTH_WARN 59 requests are blocked > 32 sec; 2 osds have slow requests
21 ops are blocked > 65.536 sec
38 ops are blocked > 32.768 sec
16 ops are blocked > 65.536 sec on osd.1
23 ops are blocked > 32.768 sec on osd.1
5 ops are blocked > 65.536 sec on osd.2
15 ops are blocked > 32.768 sec on osd.2
2 osds have slow requests
Fixes: #5505
Signed-off-by: Sage Weil <sage@inktank.com>
This includes a simple power-of-2 histogram of op ages in the op queue
inside osd_stat_t. This can be used for a coarse view of overall cluster
performance (it will get summed by the mon), to identify specific outlier
osds who have a higher latency than the others, or to identify stuck ops.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
Once we start serving reads, stray objects must have already
been removed. Therefore, we have to flush all operations
up to the transaction writing out the authoritative log.
On replicas, we flush in Stray() if we will not eventually
be activated and in ReplicaActive if we are in the acting
set. This way a replica won't serve a replica read until
the store is consistent.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Stray objects should have been cleaned up in the merge_log
transactions. Only on the primary have those operations
necessarily been flushed at activate().
Fixes: 5084
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Previously we did not bother with locking for accept() because we were
not visible to any other threads. However, we need to close accepting
Pipes from mark_down_all(), which means we need to handle interference.
Fix up the locking so that we hold pipe_lock when looking at Pipe state
and verify that we are still in the ACCEPTING state any time we retake
the lock.
Signed-off-by: Sage Weil <sage@inktank.com>
New pipes exist in a sort of limbo before we know who the peer is and
add them to rank_pipe. Keep a list of them in accepting_pipes for that
period.
Signed-off-by: Sage Weil <sage@inktank.com>
We can have a situation where:
- we have a pipe to a peer
- pipe goes to standby (on peer)
- we rebind to a new port
- ....
- we rebind again to the same old port
- we connect to peer
and get reattached to the ancient pipe from two instances back. Avoid that
by picking a new nonce each time we rebind.
Add 1,000,000 each time so that the port is still legible in the printed
output.
Signed-off-by: Sage Weil <sage@inktank.com>
If we are shutting down all old connections and binding to new ports,
we want to avoid a sequence like:
- close all prevoius connections
- new connection comes in on old port
- rebind to new ports
-> connection from old port leaks through
As a first step, close all connections after we shut down the old
accepter and before we start the new one.
Signed-off-by: Sage Weil <sage@inktank.com>
We need to maintain the invariant that all sub queues in out_q are never
empty. Fix discard_requeued_up_to() to avoid creating an entry unless we
know it is already present.
This bug leads to an incorrect reconnect attempt when
- we accept a pipe (lossless peer)
- they send some stuff, maybe
- fault
- we initiate reconnect, even tho we have nothing queued
In particular, we shouldn't reconnect because we aren't checking for
resets, and the fact that our out_seq is 0 while the peer's might be
something else entirely will trigger asserts later.
This fixes at least one source of #5626, and possibly #5517.
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
We were returning success without waiting if the pending pool state had
the snap.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
A monc/mon connection fault or the dup command test flag may mean an extra
osd id is created that we isn't actually up; reorder so that doesn't screw
up 'osd ls'.
Signed-off-by: Sage Weil <sage@inktank.com>
This bug prevented resurrection of ancestor pgs where
necessary.
Fixes: #5269
This may result in pg A being created just before pg B
is resurrected and split into A and B resulting in one
or the other operations getting and EEXIST.
Backport: cuttlefish
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Monitor commands need to be idempotent. This helps us test this by
simply issuing any successful command a second time so that we notice
when a dup submission fails.
Signed-off-by: Sage Weil <sage@inktank.com>