The message encoding may depend on the target features. Clear the
payload so that the Message gets reencoded appropriately.
Signed-off-by: Sage Weil <sage@newdream.net>
The Incremental may have a bufferlist containing a full map; reencode
that too if we are reencoding for old clients.
Signed-off-by: Sage Weil <sage@newdream.net>
After we recover each object, we try to raise the last_complete value
(and matching complete_to iterator). If our log was purely a backlog, this
won't necessarily end up bringing last_complete all the way up to the
last_update value, and we'll fail an assert later.
If complete_to does reach the end of the log, then we fast-forward
last_complete to last_update.
The crash we were hitting was in finish_recovery(), and looked something
like
osd/PG.cc: In function 'void PG::finish_recovery(ObjectStore::Transaction&, std::list<Context*, std::allocator<Context*> >&)', in thread '0x7f4573df7700'
osd/PG.cc: 1800: FAILED assert(info.last_complete == info.last_update)
ceph version 0.36-251-g6e29c28 (commit:6e29c2826066a7723ed05b60b8ac0433a04c3c13)
1: (PG::finish_recovery(ObjectStore::Transaction&, std::list<Context*, std::allocator<Context*> >&)+0x8d) [0x6ff0ed]
2: (PG::RecoveryState::Active::react(PG::RecoveryState::ActMap const&)+0x316) [0x729196]
3: (boost::statechart::simple_state<PG::RecoveryState::Active, PG::RecoveryState::Primary, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x21b) [0x759c0b]
4: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x8d) [0x7423dd]
5: (PG::RecoveryState::handle_activate_map(PG::RecoveryCtx*)+0x183) [0x711f43]
6: (OSD::activate_map(ObjectStore::Transaction&, std::list<Context*, std::allocator<Context*> >&)+0x674) [0x579884]
7: (OSD::handle_osd_map(MOSDMap*)+0x2270) [0x57bd50]
8: (OSD::_dispatch(Message*)+0x4d0) [0x596bb0]
9: (OSD::ms_dispatch(Message*)+0x17b) [0x59803b]
10: (SimpleMessenger::dispatch_entry()+0x9c2) [0x617562]
11: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x4a3dec]
12: (Thread::_entry_func(void*)+0x12) [0x611a92]
13: (()+0x7971) [0x7f457f87b971]
14: (clone()+0x6d) [0x7f457e10b92d]
Fixes: #1609
Signed-off-by: Sage Weil <sage@newdream.net>
We stop working backwards when we hit last_epoch_clean, which means for the
oldest interval first_epoch may not be the _real_ first_epoch. (We can't
continue working backward because we may have thrown out those maps
entirely.)
However, if the last_epoch_clean epoch is contained within that interval,
we know that the OSD did in fact go rw because it had to have completed
recovery (and thus peering) to set last_clean_epoch in the first place.
This fixes cases where two different nodes have slightly different
past intervals, generate different prior probe sets as a result, and
flip/flop on the acting set choice. (It may have eventually resolved when
the wrongly excluded node's notify races and arrives in time to be
considered, but that's still clearly no good.)
This does leave the start epoch for that oldest interval incorrect. That
doesn't currently matter except that it's confusing, but I'm not sure how
to mark it properly, or if it's worth the effort.
Signed-off-by: Sage Weil <sage@newdream.net>
Use a helper to determine when we should discard an op due to the client
being disconnected. Use this when the op is first received, (re)queued,
and dequeued.
Fix the check to keep ops that are replayed ACKs, as we should make every
effort to reapply those even when the client goes away.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
This simplifies things, and renames the checks to make it clear that we are
doing validation checks only, with no side-effects allowed.
Also move some checks into the parent handle_op() to further simplify the
(re)queue checks.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
The _handle_op() method (and friends) are called when an op is initially
queued and when it is requeued. In the requeue case we have to be more
careful because the caller may be in the middle of doing all sorts of
random stuff. That means we need to limit ourselves to queueing or
discarding the op, and refrain from doing anything else with dangerous
side effects.
This fixes a crash like
osd/ReplicatedPG.cc: In function 'void ReplicatedPG::recover_primary_got(hobject_t, eversion_t)', in thread '7f21d0189700'
osd/ReplicatedPG.cc: 4109: FAILED assert(missing.num_missing() == 0)
ceph version 0.37-105-gc2069eb (commit:c2069eb1e562ba7d753c9b5ce5c904f4f5ef6abe)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x76) [0x8ab95a]
2: (ReplicatedPG::recover_primary_got(hobject_t, eversion_t)+0x62e) [0x767eea]
3: (ReplicatedPG::sub_op_push(MOSDSubOp*)+0x2b79) [0x76abeb]
4: (ReplicatedPG::do_sub_op(MOSDSubOp*)+0x1ab) [0x74761b]
5: (OSD::dequeue_op(PG*)+0x47d) [0x820ac3]
6: (OSD::OpWQ::_process(PG*)+0x27) [0x82cc8b]
due to an object being pushed to a replica before it is activated.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Using an empty snap context led to the failure of
test_rbd.TestImage.test_rollback_with_resize, since clones weren't
created when deleting objects. This test now passes.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>