We shouldn't look at Pipe::state in SimpleMessenger::_lookup_pipe() without
holding pipe_lock. Instead, use an atomic that we set to non-zero only
when transitioning to the terminal STATE_CLOSED state.
Signed-off-by: Sage Weil <sage@inktank.com>
Exercise some rare races by injecting delays before taking locks
via the 'ms inject internal delays' option.
Signed-off-by: Sage Weil <sage@inktank.com>
When a pipe is faulting and shutting down, we have to drop pipe_lock to
take msgr lock and then remove the entry. The Pipe in this case will
have STATE_CLOSED. Handle this case in all places we do a lookup on
the rank_pipe hash so that we effectively ignore entries that are
CLOSED.
This fixes a race introduced by the previous commit where we won't use
the CLOSED pipe and try to register a new one, but the old one is still
registered.
See bug #3675.
Signed-off-by: Sage Weil <sage@inktank.com>
If we have a con that refs a pipe but it is closed, don't use it. If
the ref is still there, it is only because we are racing with fault()
and it is about to (or just was) be detached. Either way,
Signed-off-by: Sage Weil <sage@inktank.com>
Remove the special-case check, which does not inform the peer what
protocol features are missing. It also enforces this requirement even
when we negotiate auth none.
Reported as part of bug #3657.
Signed-off-by: Sage Weil <sage@inktank.com>
This won't bite us for a while yet (we're on bit 26), but it will soon!
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
If we are failing a pipe, flush the incoming messages before we try to
reconnect. Similarly, flush queued messages on an existing pipe beore we
replace it. This ensures that when we get a socket failure and reconnect
the delayed messages are handled in the normal fashion.
Specifically, it fixes a situation like:
- read msg, update in_seq etc.
- delay msg
- pipe faults
- peer reconnects, we replace existing pipe, discard delayed msgs
- peer resends msgs
- we discard, because they are < in_seq
Signed-off-by: Sage Weil <sage@inktank.com>
- move all delay state into a single class
- create thread once and only once per Pipe
- adjust debug levels
- discard messages at the appropriate times
Signed-off-by: Sage Weil <sage@inktank.com>
Its life-cycle matches that of delay_queue, and the delayed_delivery
function respects it. For now queue_received is just setting it to
delay everything by 1 second.
Signed-off-by: Greg Farnum <greg@inktank.com>
After some brief thought, I believe deleting any messages in the
delay queue is correct -- we are trying to simulate line delays
in delivery and so anything still in the queue has supposedly
not arrived yet. So delete them when we stop the Pipe for
any reason.
Signed-off-by: Greg Farnum <greg@inktank.com>
The Pipe doesn't know the peer type in the constructor. It
doesn't always know in start_reader either, so this needs more work,
but at least it knows more frequently than it did.
Signed-off-by: Greg Farnum <greg@inktank.com>
When ms_inject_delay_type matches that of the incoming Connection,
the Pipe sets up a delay queue that it shuttles all Messages through.
This lets us check cleanup and some notification code but doesn't
actually generate any delays.
Signed-off-by: Greg Farnum <greg@inktank.com>
We need to drop the Message ref() here; the msgr owns one ref
independent of those from the intrusive_ptr's in the queue itself.
Signed-off-by: Sage Weil <sage@inktank.com>
It is possible for rebind() to fail, in which case the OSD will go through
it's shutdown procedure and call stop(). This is simpler than trying to
avoid calling stop() when rebind() fails.
Fixes: #3504
Signed-off-by: Sage Weil <sage@inktank.com>
Start the accepter thread when the first dispatcher is ready. This ensures
that there will be someone around to verify authorizers for incoming
connections, and means we have a bit less failure noise on the monitors
as a result.
Signed-off-by: Sage Weil <sage@inktank.com>
The kernel client expects seq #'s to start at 1 or else it is unhappy.
So, only randomize these values if the MSG_AUTH feature is present--that is
the only time it matters anyway.
Signed-off-by: Sage Weil <sage@inktank.com>
The fault() call in connect should not set onread=true since connect is
effectively a write path. This was forcing the writer() into a tight
loop that repeatedly would call connect(); not very polite.
Changing that, we want to avoid treating this as a normal fault (with the
failure callback) and instead back off.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Cast to (unsigned long) when checking for magic values, so
real ptrs don't get sign-extended. Avoids triggering
assert(inq == &local_queue) failure.
Fixes: #3251
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage.weil@inktank.com>
CID 717049: Big parameter passed by value (PASS_BY_VALUE)
At (1): Passing parameter bind_addr of type entity_addr_t (size 136 bytes) by value.
Signed-off-by: Sage Weil <sage@inktank.com>
CID 717442: Other violation (CHECKED_RETURN)
At (10): Calling function "getsockname(this->listen_sd, (sockaddr *)listen_addr.ss_addr(), &llen)" without checking return value. This library function may fail and return an error code.
At (11): No check of the return value of "getsockname(this->listen_sd, (sockaddr *)listen_addr.ss_addr(), &llen)".
Signed-off-by: Sage Weil <sage@inktank.com>
Previously, a new osd would be bombarded by backfills from many osds
simultaneously, resulting in excessively high load. Instead, we
want to limit the number of backfills coming into and going out
from a single osd.
To that end, each OSDService now has two AsyncReserver instances: one
for backfills going from the osd (local_reserver) and one for backfills
going to the osd (remote_reserver). For a primary to initiate a
backfill, it must first obtain a reservation from its own
local_reserver. Then, it must obtain a reservation from the backfill
target's remote_reserver via a MBackfillReserve message. This process is
managed by substates of Active and ReplicaActive (see the changes in
PG.h). The reservations are dropped either on the Backfilled event,
which is sent on the primary before calling recovery_complete and on the
replica on receipt of the BackfillComplete progress message), or upon
leaving Active or ReplicaActive.
It's important that we always grab the local reservation before the
remote reservation in order to prevent a circular dependency.
Signed-off-by: Samuel Just <sam.just@inktank.com>
We want to avoid a race like:
- entry() starts, populates pfd with listen_sd, gets past !done check
- stop() does shutdown + close on listen_sd
- someone else opens a new fd
- entry() thread calls poll(2) on wrong sd
- stop() calls join, waits forever for entry thread
Signed-off-by: Sage Weil <sage@inktank.com>
Do not special case failure during connect. In particular, we may be
reconnecting and experience a second fault, and wipe out our session
(e.g., between the fs client and the mds) and destroy important session
state.
This logic dates back to the original patch in '08 when the standby
state was introduced.
Bug: #3070
Signed-off-by: Sage Weil <sage@inktank.com>
lcok isn't held during dispatch, so we can take it unconditionally. THis
also makes coverity happier:
CID 716966: Data race condition (MISSING_LOCK)
At (4): Accessing "this->stop" ("DispatchQueue.stop") requires the "Mutex._m" lock.
Signed-off-by: Sage Weil <sage@inktank.com>
cct may be null
CID 716930: Dereference after null check (FORWARD_NULL)
At (11): Dereferencing null pointer "cct".
Signed-off-by: Sage Weil <sage@inktank.com>
CID 717048: Big parameter passed by value (PASS_BY_VALUE)
At (1): Passing parameter a of type entity_addr_t (size 136 bytes) by value.
Signed-off-by: Sage Weil <sage@inktank.com>
This was suggested by Greg too but I was too lazy.
CID 717331: Uninitialized scalar field (UNINIT_CTOR)
At (2): Non-static class member "nonce" is not initialized in this constructor nor in any functions that it calls.
Signed-off-by: Sage Weil <sage@inktank.com>
CID 717023: Out-of-bounds access (OVERRUN_DYNAMIC)
At (4): Allocating insufficient memory for the terminating null of the string.
This appears to be a false positive (we don't interpret the buffer as a
string, ever), but it will make coverity happier.
Signed-off-by: Sage Weil <sage@inktank.com>
CCID 716856: Other violation (CHECKED_RETURN)
At (7): Calling function "setsockopt(this->listen_sd, 1, 2, &on, 4U)" without checking return value. This library function may fail and return an error code.
Signed-off-by: Sage Weil <sage@inktank.com>
Lossless peers (osd<->osd, mds<->mds, mon<->mon) never reset sessions
to each other. In the osd and mds cases, there is no need to check for
session resets. More significantly, these checks can trigger with an
unfortunately sequence of socket failures. In particular,
- A sends connect request to B
- B accepts, increments connect_seq, then has a socket failure
before telling A
- A reconnects, stil with connect_seq == 0
- B sees connect_seq == 0 and thinks there was a reset
This warrants a closer look in the fs client <-> mds case, but for now,
in the cluster-internal communications, it is moot, since reset
detection is unnecessary.
In the monitor case: we do need to check with resets because the peers
reuse the same entity_addr_t's (nonce==0), which means that a daemon
restart is effectively a reset. In that case, use a different policy
that continues to check for resets.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
This helps correlate log output with specific tcp sessions as reported by
e.g. netstat or tcpdump or whatever.
Signed-off-by: Sage Weil <sage@inktank.com>
Add locking in set_policy_throttler.
Also, make it act on the default policy when the specified type does not
have a sepcific policy set for it.
Signed-off-by: Sage Weil <sage@inktank.com>
Introduce a policy_lock in SimpleMessenger to make this safe even after
the messenger has been started up. The user needs to be aware that
policy changes will not affect connections that are already established.
Signed-off-by: Sage Weil <sage@inktank.com>
This needs to be provided to the Accepter at bind time, not by start().
Otherwise the nonce is effectively always 0, which is useless and breaks
all sorts of things. Broken by 8453a8198c.
Signed-off-by: Sage Weil <sage@inktank.com>
We only discard outgoing messages; incoming messages are handled by the
IncomingQueue.. but this method doesn't touch that.
Signed-off-by: Sage Weil <sage@inktank.com>
Normally we never go from need_addr == false to need_addr == true.
It always starts out as true, so this else is useless on the first
call to Accepter::bind().
The only exception is rebind(). Add an unlearn_addr() that will clear
need_addr. This is almost unnecessary, but doing so fixes a small bug
where the local_connection->peer_addr doesn't get updated when we do a
rebind().
Drop now-unused set_need_addr(). We keep get_need_addr() only because
it is useful in the debug output and for the assert.
Signed-off-by: Sage Weil <sage@inktank.com>