- new split primming machinery
- new primed split cleanup on pg removal
- cover the pg creation path
The old split tracking is now totally unused; will be removed in the next
patch.
Signed-off-by: Sage Weil <sage@redhat.com>
This doesn't make sense.. although it's the same behavior as
luminous.
The point of the releases here is that if we drop something that is in
the queue we drop the recovery_ops_reserved counter by that much. However,
if something is in the queue and waiting, and we wake it back up, there
is no net change to _reserved... which is only decremented when we
actually dequeue something.
Signed-off-by: Sage Weil <sage@redhat.com>
This resolves problems with a peering event being delivered triggering
advance_pg which triggers a requeue of waiting events that are requeued
*behind* the event we are processing. It also reduces the number of
wait lists by one, yay!
Signed-off-by: Sage Weil <sage@redhat.com>
This can be missed on a RepRecovering -> RepNotRecovering ->
RepWaitBackfillReserved transition. Catch any straggler events in
ReplicaActive.
Signed-off-by: Sage Weil <sage@redhat.com>
Rethink the way we wait for PGs. We need to order peering events relative to
each other; keep them in a separate queue in the pg_slot.
Signed-off-by: Sage Weil <sage@redhat.com>
We have a specific PGRecoveryContext type/event--even though we are just
calling a GenContext--so that we can distinguish the event type properly.
Signed-off-by: Sage Weil <sage@redhat.com>
The mgr code is updated to send spg_t's instead of pg_t's (and is slightly
refactored/cleaned).
The PG events are added to the Primary state, unless we're also in the
Clean substate, in which case they are ignored.
Signed-off-by: Sage Weil <sage@redhat.com>
Once we know which PGs are about to be created, we instantiate their
pg_slot and mark them waiting_pg, which blocks all incoming events until
the split completes, the PG is installed, and we call wake_pg_waiters().
Signed-off-by: Sage Weil <sage@redhat.com>
Add a new MOSDPGCreate2 message that sends the spg_t (not just pg_t) and
includes only the info we need. Fast dispatch it.
Signed-off-by: Sage Weil <sage@redhat.com>
Queue a null event tagged with create_info, elimiating the special
legacy path.
These are still not fast dispatch because we need an spg (not pg) to queue
and event, and we need a current osdmap in order to calculate that. That
isn't possible/a good idea in fast dispatch. In a subsequent patch we'll
create a new pg create message that includes the correct information and
can be fast dispatched, allowing this path to die off post-nautilus.
Also, improve things so that we ack the pg creation only after the PG has
gone active, meaning it is fully replicated (by at least min_size PGs).
Signed-off-by: Sage Weil <sage@redhat.com>
This actually puts the remaining peering events into fast dispatch. The
only remaining event is the pg create from the mon.
Signed-off-by: Sage Weil <sage@redhat.com>
This is a big commit that lays out the infrastructure changes to fast
dispatch the remaining peering events. It's hard to separate it all out
so this probably doesn't quite build; it's just easier to review as a
separate patch.
- lock ordering for pg_map has changed:
before:
OSD::pg_map_lock
PG::lock
ShardData::lock
after:
PG::lock
ShardData::lock
OSD::pg_map_lock
- queue items are now annotated with whether they can proceed without a
pg at all (e.g., query) or can instantiate a pg (e.g., notify log etc).
- There is some wonkiness around getting the initial Initialize event to
a newly-created PG. I don't love it but it gets the job done for now.
Signed-off-by: Sage Weil <sage@redhat.com>