The queue may have been previously stopped (by discard_queue()), and needs
to be restarted.
Fixes consistent failures from the mon_recovery.py integration tests.
Signed-off-by: Sage Weil <sage@inktank.com>
If the connect_seq matches, but our existing connection is in STANDBY, take
the incoming one. Otherwise, the other end will wait indefinitely for us
to connect but we won't.
Alternatively, we could "win" the race and trigger a connection by sending
a keepalive (or similar), but that is more work; we may as well accept the
incoming connection we have now.
This removes STANDBY from the acceptable WAIT case states. It also keeps
responsibility squarely on the shoulders of the peer with something to
deliver.
Without this patch, a 3-osd vstart cluster with
'ms inject socket failures = 100' and rados bench write -b 4096 would start
generating slow request warnings after a few minutes due to the osds
failing to connect to each other. With the patch, I complete a 10 minute
run without problems.
Signed-off-by: Sage Weil <sage@inktank.com>
If we replace an existing pipe with a new one, move the incoming queue
of messages that have not yet been dispatched over to the new Pipe so that
they are not lost. This prevents messages from being lost.
Alternatively, we could set in_seq = existing->in_seq - existing->in_qlen,
but that would make the other end resend those messages, which is a waste
of bandwidth.
Very easy to reproduce the original bug with 'ms inject socket failures'.
Signed-off-by: Sage Weil <sage@inktank.com>
This extricates the incoming queue and its funky relationship with
DispatchQueue from Pipe and moves it into IncomingQueue. There is now a
single IncomingQueue attached to each Pipe. DispatchQueue is now no
longer tied to Pipe.
This modularizes the code a bit better (tho that is still a work in
progress) and (more importantly) will make it possible to move the
incoming messages from one pipe to another in accept().
Signed-off-by: Sage Weil <sage@inktank.com>
A while ago we inadvertantly broke ms_handle_connect() callbacks because
of a check for m being non-zero in the dispatch_entry() thread. Adjust the
enums so that they get delivered again.
This fixes hangs when, for example, the ceph tool sends a command, gets a
connection reset, and doesn't get the connect callback to resend after
reconnecting to a new monitor.
Signed-off-by: Sage Weil <sage@inktank.com>
We may replace an existing pipe in the STANDBY state if the previous
attempt failed during accept() (see previous patches).
This might fix#1378.
Signed-off-by: Sage Weil <sage@inktank.com>
If we have a con with a closed pipe, drop the message. For lossless
sessions, the state will be STANDBY if we should reconnect. For lossy
sessions, we will end up with CLOSED and we *should* drop the message.
Signed-off-by: Sage Weil <sage@inktank.com>
If we replace an existing pipe during accept() and then fail, move to
STANDBY so that our connection state (connect_seq, etc.) is preserved.
Otherwise, we will throw out that information and falsely trigger a
RESETSESSION on the next connection attempt.
Signed-off-by: Sage Weil <sage@inktank.com>
LGPLv2 in spec file is not correct, because some of the included
packages/binaries are GPLv2. For example:
src/mount/mtab.c -> package ceph, binary mount.ceph
src/common/fiemap.cc -> package ceph, binary rbd
Also use SPDX format (http://www.spdx.org/licenses) for the sub-package
licenses.
Signed-off-by: Holger Macht <hmacht@suse.de>
This could cause us to incorrectly encode new features into the monstore
that an old mon won't understand.
This is overly conservative; we probably need to persist the set of quorum
features that are supported and use those.
Signed-off-by: Sage Weil <sage@inktank.com>
I'm still not sure about the names for the command line
operations, but they can be changed later if better ones
come up.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
If a write is smaller than some threshold, do not bother to flush it; let
the fs do that (efficiently, we hope) at commit time. Focus on the big
writes.
Signed-off-by: Sage Weil <sage@newdream.net>
A clone can have a prior_version after log_tail and still not have
a corresponding log entry since the prior_version would be the
head object.
Signed-off-by: Samuel Just <sam.just@inktank.com>