CephPeeringEvt is now the supertype for all peering state machine
events. This will allow us to generalize checking for stale peering
events and delaying events for future maps.
Signed-off-by: Samuel Just <sam.just@inktank.com>
The date format now is "YYYY-MM-DD[ hh:mm:ss]". Got rid of
the --time param for the old ops log stuff.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Conflicts:
src/test/cli/radosgw-admin/help.t
Uses gdisk, as it seems to be the only tool that can automate GPT uuid
changes. Needs to run as root.
Adds Recommends: gdisk to ceph.deb.
Closes: #2547
Signed-off-by: Tommi Virtanen <tv@inktank.com>
remove calls to BOOST libraries for computing Chi-squared statistics and
producing discrete random variables with a given probability distribution.
Signed-off-by: caleb miles <caleb.miles@inktank.com>
This assert assumed that all ops submitted before MOSDRepScrub was
submitted were processed by the time that MOSDRepScrub was
processed. In fact, MOSDRepScrub's scrub_to may refer to a
last_update yet to be seen by the replica.
Bug #2693
Signed-off-by: Samuel Just <sam.just@inktank.com>
This assert assumed that all ops submitted before MOSDRepScrub was
submitted were processed by the time that MOSDRepScrub was
processed. In fact, MOSDRepScrub's scrub_to may refer to a
last_update yet to be seen by the replica.
Bug #2693
Signed-off-by: Samuel Just <sam.just@inktank.com>
The queue may have been previously stopped (by discard_queue()), and needs
to be restarted.
Fixes consistent failures from the mon_recovery.py integration tests.
Signed-off-by: Sage Weil <sage@inktank.com>
If the connect_seq matches, but our existing connection is in STANDBY, take
the incoming one. Otherwise, the other end will wait indefinitely for us
to connect but we won't.
Alternatively, we could "win" the race and trigger a connection by sending
a keepalive (or similar), but that is more work; we may as well accept the
incoming connection we have now.
This removes STANDBY from the acceptable WAIT case states. It also keeps
responsibility squarely on the shoulders of the peer with something to
deliver.
Without this patch, a 3-osd vstart cluster with
'ms inject socket failures = 100' and rados bench write -b 4096 would start
generating slow request warnings after a few minutes due to the osds
failing to connect to each other. With the patch, I complete a 10 minute
run without problems.
Signed-off-by: Sage Weil <sage@inktank.com>
If we replace an existing pipe with a new one, move the incoming queue
of messages that have not yet been dispatched over to the new Pipe so that
they are not lost. This prevents messages from being lost.
Alternatively, we could set in_seq = existing->in_seq - existing->in_qlen,
but that would make the other end resend those messages, which is a waste
of bandwidth.
Very easy to reproduce the original bug with 'ms inject socket failures'.
Signed-off-by: Sage Weil <sage@inktank.com>