Commit Graph

84976 Commits

Author SHA1 Message Date
Sage Weil
ce699ff870 osd: close split vs query race in consume_map
Consider the race:

- shard 0 consumes epoch E
- shard 1 consumes epoch E
  - shard 1 pg P will split to C
- shard 0 processes query on C, returns DNE
- shard 0 primes slot C

Close race by priming split children before consuming map into each
OSDShard.  That way the query will either (1) arrive before E and before
slot C is primed and wait for E, or find the slot present with
waiting_for_split true.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
b4d96be92d osd: improve documentation for event queue ordering and requeueing rules
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
ff0f798e1b osd/PG: flush sequencer/collection on shutdown
This should catch any in-flight work we have.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
40a92a1f56 osd/PG: move shutdown into PG
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
c454184d5e osd/osd_types: fix pg_t::pool() return type (uint64_t -> int64_t)
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
38319f8300 mon/OSDMonitor: disallow pg_num changes until after pool is created
The pg create handling OSD code does not handle races between a mon create
message and a split message.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
334bf7e3dc osd/PG: set send_notify on child
If we are a non-primary, we need to ensure the split children send
notifies.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
0713586300 osd: kill broken _process optimization; simplify null pg flow
- drop fast quuee to waiting list optimization: it breaks ordering and is
a useless optimization
- restructure so that we don't drop the lock and revalidate the world if
pg == nullptr

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
f9667a9ef3 osd: fix fast pg create vs limits
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
b4af83d735 osd: (pre)publish map before distributing to shards (and pgs)
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
cf50361066 osd: update numpg_* counters when removing a pg
Usually on a pg create we see an OSDMap update; on PG removal completion
we may not.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
9d6425ab25 osd: decrement deleting pg count in _delete_some
The exit() method for ToDelete state doesn't run on PG destruction.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
3b970e32b0 osd: clear shard osdmaps during shutdown
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
bfeae027aa osd: make save osdmap accessor for OSDShard
The advance_pg needs to get the shard osdmap without racing against
consume_map().

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
540b1bc9e6 osd: clean up mutex naming for OSDShard
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
183e7d7bc2 common/tracked_int_ptr: fix operator= return value
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
3a0b197cd1 osd: fix pg removal vs _process race
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
7fb35ff961 osd: lookup_*pg must return PGRef
Otherwise it is fundamentally unsafe, as the PG might get destroyed out
from under us without a reference.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
1270b49fb5 osd: kill pass-through _open_pg
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
486faa482a osd: remove old min pg epoch tracking
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
bc9436bcb5 osd/PG: remove RecoveryCtx on_applied and on_commit
These were awkward and unnecessary.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
7a9153c4b3 osd/PG: register delete completion directly on Transaction
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
ed72f30db7 osd: register split completion directly on Transaction
No need to use wonky RecoveryCtx C_Contexts

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
2c2378c49e osd/PG: drop unused context list accessors for RecoveryCtx
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
45e07480df osd/PG: register recovery finish context directly on Transaction
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
643714ff96 osd/PG: drop unused activate() context list arg
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
a5494b815c osd/PG: register flush completions directly on the Transaction
No need to awkward list passed as an arg; all of these callbacks end up
on the Transaction anyway.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
6c52e5d1c7 osd: wait for pg epochs based on shard tracking
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
9895c9f1a9 osd: index pg (slots) by map epoch within each shard
This will replace the epoch tracking in OSDService shortly.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
e178a6d876 osd/PG: link back to pg slot
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
4c465fbfac osd: OSDShard::pg_slot -> OSDShardPGSlot
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
973836c70d osd: change pg_slots unordered_map to use unique_ptr<>
This avoids moving slots around in memory in the unordered_map... they can
be big!

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
7aeebde3fd osd: remove some unused methods
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
8433941c94 osd: remove created_pgs tracking in RecoveryCtx
Not needed or used!

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
3b2935951f osd: fix PG::ch init
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
2922b3be33 osd: use _attach_pg and _detach_pg helpers; keep PG::osd_shard ptr
Consolidate num_pgs updates (and fix a counting bug along the way).

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
b7dc3d0fab osd: remove old split tracking machinery
This infrastructure is no longer used; simpler split tracking now lives in
the shards pg_slots directly.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
295dfe0372 osd: restructure consume_map in terms of shards
- new split primming machinery
- new primed split cleanup on pg removal
- cover the pg creation path

The old split tracking is now totally unused; will be removed in the next
patch.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
2a9c8d80ce osd: pass sdata into dequeue_peering_evt (and dequeue_delete)
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
e297b1e6ad osd: pass data into OpQueueItem::run()
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
8ee13128fe osd: kill pg_map
Split doesn't work quite right; num_pgs count is probably off.  But, things
mostly work.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
eed90d4a8f osd: rename OSDShard waiting_for_pg_osdmap -> osdmap
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
4fc459829f osd: use _get_pgs() where possible; avoid touching pg_map directly
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
bffa62233e osd: get _get_pgs() and _get_pgids()
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
c4960f03a2 osd: remove get_mapped_pools command
No in-tree users.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
0bf6ac893a osd: move ShardedOpWQ::ShardData -> OSDShard
Soon we will destroy pg_map!

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
d3bd637171 osd: kill _open_lock_pg
Move lock call to caller.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:51 -05:00
Sage Weil
d9dcaa79b7 osd: kill _create_lock_pg
Unused.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:51 -05:00
Sage Weil
0766f5b40c osd: do not release recovery_ops_reserved on requeue
This doesn't make sense.. although it's the same behavior as
luminous.

The point of the releases here is that if we drop something that is in
the queue we drop the recovery_ops_reserved counter by that much.  However,
if something is in the queue and waiting, and we wake it back up, there
is no net change to _reserved... which is only decremented when we
actually dequeue something.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:51 -05:00
Sage Weil
987490db3d osd: debug recovery_ops_reserved
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:51 -05:00