Also, prevent OSD start if we have a PG whose pool is deleted and no
stored pool info. (User should downgrade, let PG deletion complete, then
upgrade.)
Signed-off-by: Sage Weil <sage@redhat.com>
Use the reserver so that delete competes for the same slot(s) as recovery
and such.
Priority below recovery normally, unless the OSD is getting fullish, in
which case we set a very high priority. We have to be careful here because
backfill will back off when the OSD gets full(ish) but log recovery does
not.
Signed-off-by: Sage Weil <sage@redhat.com>
Previously we wouldn't bother splitting if the pool was going away; now
we walk the pg forward and will process the split even if the pool is
later deleted. Adjust the loop to terminate gracefully if that happens.
Signed-off-by: Sage Weil <sage@redhat.com>
Say we get an osdmap indicating a pg will split, but the pg is deleting and
finishes its delete before the pg consumes that map. We need to clean up
the pending split state.
Signed-off-by: Sage Weil <sage@redhat.com>
For filestore, waiting for onreadable ensures that (1) the backend has done
(all) of the deletion work (we are throttled) and (2) that the flush() will
not block. So, all good.
For bluestore, onreadable happens at queue time, so the flush() was needed
to throttle progress. However, we don't want to block the op thread on
flush. And waiting for commit isn't sufficient because that would not
capture the filestore apply work.
Fix by waiting for both commit and readable before doing more deletion
work.
Signed-off-by: Sage Weil <sage@redhat.com>
exit() can happen due to AdvMap and a peering interval change, but it
runs before we have updated any of our internal state about whether we
are the primary, whether our pool is deleted and the pg no longer exists,
and so on. The publish depends on (1) being primary, and (2) will crash
if the pool is gone from the OSDMap.
Signed-off-by: Sage Weil <sage@redhat.com>
We do not need to look up the PG in order to queue a peering event now
that the queue is based on spg_t and handles the validity checks (based on
the epoch) for us.
Signed-off-by: Sage Weil <sage@redhat.com>
A lot of awkward complexity is implemented in OSD to handle PGs that aren't in
pg_map and are in the process of being deleted. This is hard because if the
PG is recreated (or split, or whatever) then we need to stop the deletion and
create a fresh PG referencing the same data.
Instead, leave deleting PGs registered and Stray, with a new peering state
Stray/Deleting. Let them continue to process OSDMaps, splits, peering intervals,
and so on. If they are not fully deleted, they'll go back to Reset -> Stray and
so on and the new primary will get the notify and decide what to do with them
(usually instruct them to delete again).
This (1) streamlines and cleans up the code structure, and also (2) gets rid of
the special purpose RemoveWQ and moves the delete work into the main op_wq
where it can get throttled and so on.
Signed-off-by: Sage Weil <sage@redhat.com>
- the epoch the pool is deleted is an interval change
- no changes are possible after that
Also, use a pg_pool_t pointer to avoid the repeat lookups.
Signed-off-by: Sage Weil <sage@redhat.com>
rgw: drop dump_uri_from_state() which isn't used anymore.
Reviewed-by: Matt Benjamin <mbenjamin@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Introduce an error path in state transitions for reinitiating
image mapping possibly to a new peer. Also, a minor cleanup
in remove_instances() to check for pending actions if any
(rather than scanning the actions list for possible remove
action).
Signed-off-by: Venky Shankar <vshankar@redhat.com>