missing_loc/missing_loc_sources also must be cleaned up
if a peer goes down during peering:
1) pg is in GetInfo, acting is [3,1]
2) we find object A on osd [0] in GetInfo
3) 0 goes down, no new peering interval since it is neither up nor
acting, but peer_missing[0] is removed.
4) pg goes active and try to pull A from 0 since missing_loc did not get
cleaned up.
Backport: bobtail
Fixes: #4371
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
No further refs to the object can remain at this point.
Furthermore, the callbacks might lock mutexes of their
own.
Backport: bobtail
Fixes: #4378
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
If there is a fault while delivering the message, close the con. This will
clean up the Session state from memory. If the client doesn't get the
CLOSED message, they will reconnect (from their perspective, it is still
a lossless connection) and get a remote_reset event telling them that the
session is gone. The client code already handles this case properly.
Note that way back in 4ac45200f1 we removed
this because the client would reuse the same connection when it reopened
the session. Now the client never does that; it will mark_down the con
as soon as it is closed and open a new one for a new session... which means
the MDS will get a remote_reset and close out the old session.
Signed-off-by: Sage Weil <sage@inktank.com>
For every message handler, look up the MetaSession by int mds and verify
that the Connection* matches properly. If so, proceed; otherwise, discard
the message.
In the future, we probably want to link the MetaSession to the Connection's
priv field, but that can come later.
Clean up a bunch of submethods that take int mds while we're here.
Signed-off-by: Sage Weil <sage@inktank.com>
Resending the request in the reply handler is a bit fugly and throws a
small wrench into moving to a MetaSession*-based approach. Check for
the case(s) where we *do* return ESTALE explicitly and fall through.
Otherwise, kick the caller and let them retry.
Signed-off-by: Sage Weil <sage@inktank.com>
This ensures we don't stall out waiting for a lock state to change.
This fixes ~4-5 second stalls easily reproducible and visible with
ceph-fuse and 'dbench 1'.
Signed-off-by: Sage Weil <sage@inktank.com>
The previous kludge where a waiting_for_session key indicated that we
had an open in progress was... kludgey.
Introduce some helpers to do the session creation/open.
Move the waiting list to be a session member, and clean up associated
code.
Signed-off-by: Sage Weil <sage@inktank.com>
This is mostly just shuffling argument types around. In a few cases we
now assert that the session actually exists; these would have also been
problematic before when we call get_inst() on bad addrs or something, or
silently ignored bugs.
Signed-off-by: Sage Weil <sage@inktank.com>
Bump the max before we run out of IDs to allocate. This avoids a stall in
authentication every N new clients.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Luis <joao.luis@inktank.com>
Simplify the logic a bit so it is easier to follow.
Small behavior change: we will successfully allocate and return a gid that
== the max when we can't bump it.
Signed-off-by: Sage Weil <sage@inktank.com>
This only happens on the Leader and leads to duplicate global_ids.
Fixes: #4285
Signed-off-by: Joao Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
When proposing an older value learned during recovery, we don't create
a queued proposal -- we go straight through Paxos. Therefore, when
finishing a proposal, we must be sure that we have a proposal in the queue
before dereferencing it, otherwise we will segfault.
Fixes: #4250
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Fixes: #4425
Backport: bobtail
Apparently, libcurl needs that in order to be thread safe. Side
effect is that if libcurl is not compiled with c-ares support,
domain name lookups are not going to time out.
Issue affected keystone.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>