Old clients do not expect mixed epoch compound messages. Thus, we
send each sub-message independently.
Signed-off-by: Samuel Just <sam.just@inktank.com>
We don't ensure for a peer that the flush completes before activation,
merely that we don't serve any ops until flush completes.
Signed-off-by: Samuel Just <sam.just@inktank.com>
This allows us to pass into activate() in which epoch the
message triggering activation occurred allowing us mark
the activate committed callback with the right query_epoch.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Rather than dispatching one item at a time to process, etc,
BatchWorkQueue dispatches up to a configurable number of
items.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Otherwise, we need to syncronize access to the shared PGPool objects.
The wasted memory is probably preferable to syncronization overhead.
Signed-off-by: Samuel Just <sam.just@inktank.com>
First, we don't really want to remove the pg if we can use it. Second,
there might be messages in the pg peering queue for the next interval.
If one of those happens to be an info request or notify, we would lose
the peering message.
If the message falls in the current interval as determined by the
current osdmap, than we know that any messages currently queued must be
obsolete and can safetly be discarded.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Previously, we set last_peering_reset based on the epoch in which the pg
is created. We now pass the map from the query_epoch to the creation
methods to set based on that.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Ops and some subops need to wait for active to ensure correct ordering
with respect to peering operations.
Signed-off-by: Samuel Just <sam.just@inktank.com>
hobject_t must now be globally unique in the filestore. Thus, if we
start creating objects in a pg before the removal collections for the
previous incarnation are fully removed, we might end up a second
instance of the same hobject violating the filestore rules.
Signed-off-by: Samuel Just <sam.just@inktank.com>
PG opsequencers will be used for removing a pg. If the pg is recreated
before the removal is complete, we need the new pg incarnation to be
able to inherit the osr of its predecessor.
Previously, we queued the pg for removal and only rendered it unusable
after the contents were fully removed. Now, we syncronously remove it
from the map and queue a transaction renaming the collections. We then
asyncronously clean up those collections. If the pg is recreated, it
will inherit the same osr until the cleanup is complete ensuring correct
op ordering with respect to the collection rename.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Rather than explicitely flushing the filestore, send a noop through the
filestore at the beginning of peering and, at the end, wait for it to
finish by adding an extra state.
Also, delay ops until flushed is true. Until we have finished flushing,
we cannot safetly read objects.
Signed-off-by: Samuel Just <sam.just@inktank.com>
This is simpler than having to update all of the RecoveryCtx users
whenever we change the types in RecoveryCtx.
Signed-off-by: Samuel Just <sam.just@inktank.com>
In order to clarify data structure locking, PGs will now access
OSDService rather the the OSD directly. Over time, more structures will
be moved to the OSDService. osd_lock can no longer be held while pg
locks are held.
Signed-off-by: Samuel Just <sam.just@inktank.com>
PGs have their map updates done in a different thread. Thus, we no
longer need to grab the pg locks. activate_map no longer requires
the map_lock in order to allow us to queue events for the pgs.
Signed-off-by: Samuel Just <sam.just@inktank.com>
_create_lock_pg might encounter a preexisting pg collection simply
because the removal transaction had not yet completed.
Signed-off-by: Samuel Just <sam.just@inktank.com>