The election and some other stuff depend on msg->get_source().num() to get
the peer rank, and that is part of the connection state. If it changes,
we need to close old connections and open new ones so that we aren't
taken for someone else (like mon.-1).
Signed-off-by: Sage Weil <sage@newdream.net>
The rank can change either because we probe and get a new monmap, or
because we get one via paxos. Move the checks to bootstrap() to catch
both cases.
Signed-off-by: Sage Weil <sage@newdream.net>
If we are joining an existing cluster, we can pick whatever address we
want (e.g., one specified by public_addr or public_network).
Signed-off-by: Sage Weil <sage@newdream.net>
Because we have called osr.flush(), it's safe to tag map.valid_through
as last_update. We will still have to catch up once we have stopped
writes and allowed the filestore to catch up anyway.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Instead of specifying an IP address in ceph.conf like
[global]
cluster_addr = 10.1.2.3
you can now avoid the node-specific configuration and just say
[global]
cluster_network = 10.1.2.0/24
The *_network variables can also take a whitespace-separated list of
networks, to be checked in that order:
[global]
cluster_network = 10.1.2.0/24 192.168.42.192/26
pending_ops was protected by osd_lock, but it tracks something in the
queue, which has it's own lock. Messy. Also, useless, since
wait_for_no_ops had a single caller in shutdown() that op_wq.drain() can
do for us.
Rip it out, and track queue size under the queue lock.
Fixes: #1727
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
The op queue is shut down, so this is mostly safe, unless someone comes
through and does requeue_ops() from a callback or something.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
This can happen when multiple C_Active events are queued, and the first
does a propose_pending() (moving us into updating state).
Signed-off-by: Sage Weil <sage@newdream.net>
We can learn either an uncommitted or committed value during the
collect/last recovery phase. For the committed values, we need to remember
each peer's first/last_committed and share only at the end to avoid a
situation like:
- mon.1 has same last_committed as us
- mon.2 has newer last_commited, we save it
- mon.3 has same last_commited as mon.1, we share new value
- done... but mon.1 never got mon.2's newer commit.
Instead, save the commit sharing until the collect process completes, so
we know that any committed value learned from anyone is shared with
everyone who needs it.
This fixes a crash like
mon/Paxos.cc: In function 'void Paxos::handle_begin(MMonPaxos*)', in thread '7fd91192c700'
mon/Paxos.cc: 400: FAILED assert(begin->last_committed == last_committed)
ceph version 0.38-208-g9aabd39 (commit:9aabd3982cceb7e8489412b4bfbb4c2387880de2)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x76) [0x72454e]
2: (Paxos::handle_begin(MMonPaxos*)+0x363) [0x6499ef]
3: (Paxos::dispatch(PaxosServiceMessage*)+0x2b4) [0x64db2c]
4: (Monitor::_ms_dispatch(Message*)+0xdc6) [0x6205c2]
5: (Monitor::ms_dispatch(Message*)+0x3a) [0x62831a]
6: (Messenger::ms_deliver_dispatch(Message*)+0x63) [0x7d1f31]
7: (SimpleMessenger::dispatch_entry()+0x7c2) [0x7bb786]
8: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x6070fa]
9: (Thread::_entry_func(void*)+0x23) [0x6f3f69]
10: (()+0x7971) [0x7fd9153a1971]
11: (clone()+0x6d) [0x7fd913c3092d]
Signed-off-by: Sage Weil <sage@newdream.net>
Use the same callback for when paxos goes active and for when it commits
something. The response in both cases is the same.
Signed-off-by: Sage Weil <sage@newdream.net>
Looks like this conditional was just set backwards by mistake. There
have been a number of issues with OSDMap versions that are probably
related to this...
(Thanks to some smarts in trim_to, we at least did not trim ALL of
our maps. But on every tick prior to epoch 500 [that's the default]
the leader was trimming all old maps off the system.)
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
This reverts commit dd5087fabb.
Calling osr.flush() is not quite enough since the onreadable callbacks
may not have been called (thus, last_update_applied may still lag behind
the tail of the log).
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Many users only set oncommit acks, so if they get an error code
(which comes only as a CEPH_OSD_OP_ACK right now) the request
disappears into the ether.
(And remove stupid debug statements while we're at it.)
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
This avoids a scenario like:
- _active()
- proposes value
- _commit()
- creates new pending, even though in updating state
Signed-off-by: Sage Weil <sage@newdream.net>
Proposing a new state from within update_from_paxos() confuses some callers,
like PaxosService::_active(). Instead, do it in the on_active() callback.
This also let's us collapse the check_osd_map() caller into on_active(),
and makes it happen on leaders and peons alike, which ought to avoid some
of the pg creation lag we see sometimes (presumably when the osds have
sessions with peons instead of the leader).
Fixes: #1708
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Ignore ENOSPC generated by our own callback, as it is only used to
terminate the loop.
Broken by commit cd90061239.
Fixes: #1728
Signed-off-by: Sage Weil <sage@newdream.net>
Remove open-coded trimming of old states and use our method (that also
removes additional per-state files). Fixes old stray state files.
Signed-off-by: Sage Weil <sage@newdream.net>
This ensures that we resend _all_ requests, since we aren't sure which
may have mapped to a different primary and then back. This was missed in
the original implementation in 4fe9cca5dd.
Signed-off-by: Sage Weil <sage@newdream.net>