pending_ops was protected by osd_lock, but it tracks something in the
queue, which has it's own lock. Messy. Also, useless, since
wait_for_no_ops had a single caller in shutdown() that op_wq.drain() can
do for us.
Rip it out, and track queue size under the queue lock.
Fixes: #1727
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
The op queue is shut down, so this is mostly safe, unless someone comes
through and does requeue_ops() from a callback or something.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Looks like this conditional was just set backwards by mistake. There
have been a number of issues with OSDMap versions that are probably
related to this...
(Thanks to some smarts in trim_to, we at least did not trim ALL of
our maps. But on every tick prior to epoch 500 [that's the default]
the leader was trimming all old maps off the system.)
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
This reverts commit dd5087fabb.
Calling osr.flush() is not quite enough since the onreadable callbacks
may not have been called (thus, last_update_applied may still lag behind
the tail of the log).
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Many users only set oncommit acks, so if they get an error code
(which comes only as a CEPH_OSD_OP_ACK right now) the request
disappears into the ether.
(And remove stupid debug statements while we're at it.)
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
This avoids a scenario like:
- _active()
- proposes value
- _commit()
- creates new pending, even though in updating state
Signed-off-by: Sage Weil <sage@newdream.net>
Proposing a new state from within update_from_paxos() confuses some callers,
like PaxosService::_active(). Instead, do it in the on_active() callback.
This also let's us collapse the check_osd_map() caller into on_active(),
and makes it happen on leaders and peons alike, which ought to avoid some
of the pg creation lag we see sometimes (presumably when the osds have
sessions with peons instead of the leader).
Fixes: #1708
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Ignore ENOSPC generated by our own callback, as it is only used to
terminate the loop.
Broken by commit cd90061239.
Fixes: #1728
Signed-off-by: Sage Weil <sage@newdream.net>
Remove open-coded trimming of old states and use our method (that also
removes additional per-state files). Fixes old stray state files.
Signed-off-by: Sage Weil <sage@newdream.net>
This ensures that we resend _all_ requests, since we aren't sure which
may have mapped to a different primary and then back. This was missed in
the original implementation in 4fe9cca5dd.
Signed-off-by: Sage Weil <sage@newdream.net>
This may address #1732 indirectly because we have a Connection* reference
here. However, it's still not clear how we ended up with an OSDSession*
for an osd that doesn't exist. :/
Signed-off-by: Sage Weil <sage@newdream.net>
The slurp process can happen after the monitor has started and has some
in-memory version of the state, and that process may wipe out old
incrementals and change the stashed version. That means that in
update_from_paxos, we need to pull the stashed version if it doesn't
match what we currently have or else we may not have the incrementals we
need to get up to date.
This simplifies and cleans up that code a bit so it is not specific to
monitor startup.
Signed-off-by: Sage Weil <sage@newdream.net>
* a refactor in e2100bce left the mod_ptr and unmod_ptr members set
incorrectly in RGWCopyObj::init_common
* a fix in 6752babd aggregated error returns, but then failed to do
anything with them
Signed-off-by: Josh Pieper <jjp@pobox.com>
Signed-off-by: Sage Weil <sage@newdream.net>
These should have no functional changes:
* Check errors from functions that currently cannot return any
* Initialize variables that gcc can't determine will be initialized
in a following function call
* Remove unused variables
Signed-off-by: Josh Pieper <jjp@pobox.com>
Signed-off-by: Sage Weil <sage@newdream.net>