Commit Graph

84720 Commits

Author SHA1 Message Date
Sage Weil
4c465fbfac osd: OSDShard::pg_slot -> OSDShardPGSlot
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
973836c70d osd: change pg_slots unordered_map to use unique_ptr<>
This avoids moving slots around in memory in the unordered_map... they can
be big!

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
7aeebde3fd osd: remove some unused methods
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
8433941c94 osd: remove created_pgs tracking in RecoveryCtx
Not needed or used!

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
3b2935951f osd: fix PG::ch init
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
2922b3be33 osd: use _attach_pg and _detach_pg helpers; keep PG::osd_shard ptr
Consolidate num_pgs updates (and fix a counting bug along the way).

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
b7dc3d0fab osd: remove old split tracking machinery
This infrastructure is no longer used; simpler split tracking now lives in
the shards pg_slots directly.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
295dfe0372 osd: restructure consume_map in terms of shards
- new split primming machinery
- new primed split cleanup on pg removal
- cover the pg creation path

The old split tracking is now totally unused; will be removed in the next
patch.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
2a9c8d80ce osd: pass sdata into dequeue_peering_evt (and dequeue_delete)
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
e297b1e6ad osd: pass data into OpQueueItem::run()
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
8ee13128fe osd: kill pg_map
Split doesn't work quite right; num_pgs count is probably off.  But, things
mostly work.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
eed90d4a8f osd: rename OSDShard waiting_for_pg_osdmap -> osdmap
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
4fc459829f osd: use _get_pgs() where possible; avoid touching pg_map directly
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
bffa62233e osd: get _get_pgs() and _get_pgids()
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
c4960f03a2 osd: remove get_mapped_pools command
No in-tree users.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
0bf6ac893a osd: move ShardedOpWQ::ShardData -> OSDShard
Soon we will destroy pg_map!

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
d3bd637171 osd: kill _open_lock_pg
Move lock call to caller.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:51 -05:00
Sage Weil
d9dcaa79b7 osd: kill _create_lock_pg
Unused.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:51 -05:00
Sage Weil
0766f5b40c osd: do not release recovery_ops_reserved on requeue
This doesn't make sense.. although it's the same behavior as
luminous.

The point of the releases here is that if we drop something that is in
the queue we drop the recovery_ops_reserved counter by that much.  However,
if something is in the queue and waiting, and we wake it back up, there
is no net change to _reserved... which is only decremented when we
actually dequeue something.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:51 -05:00
Sage Weil
987490db3d osd: debug recovery_ops_reserved
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:51 -05:00
Sage Weil
765e16e04e osd: move PG peering waiters into op wq
This resolves problems with a peering event being delivered triggering
advance_pg which triggers a requeue of waiting events that are requeued
*behind* the event we are processing.  It also reduces the number of
wait lists by one, yay!

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:51 -05:00
Sage Weil
da47654c70 osd: store ec profile with final pool
We need this to reinstantiate semi-deleted ec backends.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:51 -05:00
Sage Weil
c5b3913919 osd/PG: ignore RecoveryDone in ReplicaActive too
This can be missed on a RepRecovering -> RepNotRecovering ->
RepWaitBackfillReserved transition.  Catch any straggler events in
ReplicaActive.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:51 -05:00
Sage Weil
68d89616cb osd/osd_types: include epoch_sent in pg_query_t operator<<
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:50 -05:00
Sage Weil
cea30e9e9e osd: restructure pg waiting more
Wait by epoch.  This is less kludgey than before!

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:50 -05:00
Sage Weil
f1f0d30c35 osd: restructure pg waiting
Rethink the way we wait for PGs.  We need to order peering events relative to
each other; keep them in a separate queue in the pg_slot.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:50 -05:00
Sage Weil
c20251b949 osd: normal command uses slow dispatch (it can send messages)
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:50 -05:00
Sage Weil
560956572e osd/OSD,PG: get_osdmap()->get_epoch() -> get_osdmap_epoch()
Avoid wrangling shared_ptr!

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:50 -05:00
Sage Weil
065829dc11 osd: misc fixes
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:50 -05:00
Sage Weil
3a331c8be2 osd: kill disk_tp, recovery_gen_wq
Progress!

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:50 -05:00
Sage Weil
e5c336851c osd: move recovery contexts to normal wq
We have a specific PGRecoveryContext type/event--even though we are just
calling a GenContext--so that we can distinguish the event type properly.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:49 -05:00
Sage Weil
26938d54d4 osd: remove _ookup_lock_pg_with_map_lock_held()
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:49 -05:00
Sage Weil
60ea5e87b6 osd: new MOSDScrub2 message with spg_t, fast dispatch
Send new message to mimic+ OSDs.  Fast dispatch it at the OSD.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:49 -05:00
Sage Weil
62f79cae1b osd/PG: request scrub via a state machine event
Continuing effort to make PG interactions event based.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:49 -05:00
Sage Weil
fe5e361467 osd: use peering events for forced recovery
The mgr code is updated to send spg_t's instead of pg_t's (and is slightly
refactored/cleaned).

The PG events are added to the Primary state, unless we're also in the
Clean substate, in which case they are ignored.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:49 -05:00
Sage Weil
bd8d198c07 osd/OSDMap: get_primary_shart() variant that returns primary *and* shard
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:49 -05:00
Sage Weil
ae210722b4 osd: prime pg_slots for to-be-split children
Once we know which PGs are about to be created, we instantiate their
pg_slot and mark them waiting_pg, which blocks all incoming events until
the split completes, the PG is installed, and we call wake_pg_waiters().

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:49 -05:00
Sage Weil
718b6baa3c osd: remove obsolete slow dispatch path for most messages
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:48 -05:00
Sage Weil
025ca7e1e3 osd: fast dispatch M[Mon]Command
These just get dumped onto a work queue.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:48 -05:00
Sage Weil
f325b7dbc7 osd: fast dispatch ping
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:48 -05:00
Sage Weil
6710d34872 mon/OSDMOnitor: send MOSDPGCreate2 to mimic+ osds
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:48 -05:00
Sage Weil
974896a819 osd: handle MOSDPGCreate2 messages (fast dispatch!)
Add a new MOSDPGCreate2 message that sends the spg_t (not just pg_t) and
includes only the info we need.  Fast dispatch it.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:48 -05:00
Sage Weil
b4fcd6d702 osd/OSDMapMapping: a getter that returns a spg_t
Note whether a pool is erasure so that we can generate an appropriate
spg_t for a mapping.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:48 -05:00
Sage Weil
39945d299e osd: send pg creations through normal pg queue
Queue a null event tagged with create_info, elimiating the special
legacy path.

These are still not fast dispatch because we need an spg (not pg) to queue
and event, and we need a current osdmap in order to calculate that.  That
isn't possible/a good idea in fast dispatch.  In a subsequent patch we'll
create a new pg create message that includes the correct information and
can be fast dispatched, allowing this path to die off post-nautilus.

Also, improve things so that we ack the pg creation only after the PG has
gone active, meaning it is fully replicated (by at least min_size PGs).

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:48 -05:00
Sage Weil
294bc5d631 osd: fix max pg check for peer events
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:48 -05:00
Sage Weil
9ab2400109 osd: use atomic for pg_map_size
This avoids the need for pg_map_lock in the max pg check.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:47 -05:00
Sage Weil
828060749a osd/PGPeeringEvent: note mon- vs peer-initiated pg creates
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:47 -05:00
Sage Weil
9dc71e653a osd: fast dispatch peering events (part 2)
This actually puts the remaining peering events into fast dispatch.  The
only remaining event is the pg create from the mon.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:47 -05:00
Sage Weil
2284e133af osd: fast dispatch peering events (part 1)
This is a big commit that lays out the infrastructure changes to fast
dispatch the remaining peering events.  It's hard to separate it all out
so this probably doesn't quite build; it's just easier to review as a
separate patch.

- lock ordering for pg_map has changed:
  before:
    OSD::pg_map_lock
      PG::lock
        ShardData::lock

  after:
    PG::lock
      ShardData::lock
        OSD::pg_map_lock

- queue items are now annotated with whether they can proceed without a
pg at all (e.g., query) or can instantiate a pg (e.g., notify log etc).

- There is some wonkiness around getting the initial Initialize event to
a newly-created PG.  I don't love it but it gets the job done for now.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-03 10:12:35 -05:00
Sage Weil
ac142c3cc0 osd: queue null events without PG lock
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-03 10:12:35 -05:00