If a message is working it's way through _dispatch, and another thread
requeues waiting messages under pg->lock (e.g.
osd->take_waiting(waiting_for_active)), the requeued ops are processed
after the one _dispatch() is chewing on, breaking client ordering.
Instead, add a new OSD::requeue_ops() that reinjects ops back into the
op queue by feeding them to the _handle_*() helpers. Those do last minute
checks before enqueuing the ops.
Fixes: #1490 (again)
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
We remove it anyway. If it's missing entirely, just continue and roll
back to the latest snap_ when the user passes --osd-use-stale-snap.
Signed-off-by: Sage Weil <sage@newdream.net>
Static classes with constructors and destructors are dangerous. Explicitly
manage these as part of the server components (OSD, MDS).
Fixes: #1608
Signed-off-by: Sage Weil <sage@newdream.net>
During handle_notify_timeout or ms_handle_reset, watchers are now marked
unconnected via pg->register_unconnected_watcher. A safe timer event has
been added to trigger OSD::handle_watch_timeout.
remove_watchers_and_notifies (called on role change) cleans up these
events before peering.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Pass correct path to configure (fixes SuSE builds).
Use %doc command to install sample.ceph.conf and sample.fetch_conf.
Signed-off-by: Sage Weil <sage@newdream.net>
Use date(1) codes for object name, plus %i and %n for bucket id/name, and
make UTC vs localtime configurable.
Signed-off-by: Sage Weil <sage@newdream.net>
This adds a query_epoch to notify and log messages, which are
sent in response to queries from the primary during peering. To
guarantee we don't try to process old logs and notifies after
restarting peering, query_epoch is set to the epoch at which the
query was sent. If query_epoch is less than last_peering_reset,
the primary discards the message.
This caused a "bad state machine event" crash in the following
scenario:
1. Primary tells a stray to generate a backlog at epoch 199.
2. The up set changes because a stray goes up.
3. Primary restarts peering at epoch 200.
4. Stray gets new map for epoch 200, sees that acting set did not
change, and sends log to primary.
5. Primary crashes.
Related to #1403, #1449
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
These names make more sense, since last_warm_restart was updated
outside of the warm_restart function.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>