Sage Weil
ce699ff870
osd: close split vs query race in consume_map
...
Consider the race:
- shard 0 consumes epoch E
- shard 1 consumes epoch E
- shard 1 pg P will split to C
- shard 0 processes query on C, returns DNE
- shard 0 primes slot C
Close race by priming split children before consuming map into each
OSDShard. That way the query will either (1) arrive before E and before
slot C is primed and wait for E, or find the slot present with
waiting_for_split true.
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
b4d96be92d
osd: improve documentation for event queue ordering and requeueing rules
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
ff0f798e1b
osd/PG: flush sequencer/collection on shutdown
...
This should catch any in-flight work we have.
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
40a92a1f56
osd/PG: move shutdown into PG
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
c454184d5e
osd/osd_types: fix pg_t::pool() return type (uint64_t -> int64_t)
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
38319f8300
mon/OSDMonitor: disallow pg_num changes until after pool is created
...
The pg create handling OSD code does not handle races between a mon create
message and a split message.
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
334bf7e3dc
osd/PG: set send_notify on child
...
If we are a non-primary, we need to ensure the split children send
notifies.
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
0713586300
osd: kill broken _process optimization; simplify null pg flow
...
- drop fast quuee to waiting list optimization: it breaks ordering and is
a useless optimization
- restructure so that we don't drop the lock and revalidate the world if
pg == nullptr
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
f9667a9ef3
osd: fix fast pg create vs limits
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
b4af83d735
osd: (pre)publish map before distributing to shards (and pgs)
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
cf50361066
osd: update numpg_* counters when removing a pg
...
Usually on a pg create we see an OSDMap update; on PG removal completion
we may not.
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
9d6425ab25
osd: decrement deleting pg count in _delete_some
...
The exit() method for ToDelete state doesn't run on PG destruction.
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
3b970e32b0
osd: clear shard osdmaps during shutdown
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
bfeae027aa
osd: make save osdmap accessor for OSDShard
...
The advance_pg needs to get the shard osdmap without racing against
consume_map().
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
540b1bc9e6
osd: clean up mutex naming for OSDShard
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
183e7d7bc2
common/tracked_int_ptr: fix operator= return value
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
3a0b197cd1
osd: fix pg removal vs _process race
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
7fb35ff961
osd: lookup_*pg must return PGRef
...
Otherwise it is fundamentally unsafe, as the PG might get destroyed out
from under us without a reference.
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
1270b49fb5
osd: kill pass-through _open_pg
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
486faa482a
osd: remove old min pg epoch tracking
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
bc9436bcb5
osd/PG: remove RecoveryCtx on_applied and on_commit
...
These were awkward and unnecessary.
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
7a9153c4b3
osd/PG: register delete completion directly on Transaction
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
ed72f30db7
osd: register split completion directly on Transaction
...
No need to use wonky RecoveryCtx C_Contexts
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
2c2378c49e
osd/PG: drop unused context list accessors for RecoveryCtx
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
45e07480df
osd/PG: register recovery finish context directly on Transaction
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
643714ff96
osd/PG: drop unused activate() context list arg
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
a5494b815c
osd/PG: register flush completions directly on the Transaction
...
No need to awkward list passed as an arg; all of these callbacks end up
on the Transaction anyway.
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
6c52e5d1c7
osd: wait for pg epochs based on shard tracking
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
9895c9f1a9
osd: index pg (slots) by map epoch within each shard
...
This will replace the epoch tracking in OSDService shortly.
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
e178a6d876
osd/PG: link back to pg slot
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
4c465fbfac
osd: OSDShard::pg_slot -> OSDShardPGSlot
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
973836c70d
osd: change pg_slots unordered_map to use unique_ptr<>
...
This avoids moving slots around in memory in the unordered_map... they can
be big!
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
7aeebde3fd
osd: remove some unused methods
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
8433941c94
osd: remove created_pgs tracking in RecoveryCtx
...
Not needed or used!
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
3b2935951f
osd: fix PG::ch init
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
2922b3be33
osd: use _attach_pg and _detach_pg helpers; keep PG::osd_shard ptr
...
Consolidate num_pgs updates (and fix a counting bug along the way).
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
b7dc3d0fab
osd: remove old split tracking machinery
...
This infrastructure is no longer used; simpler split tracking now lives in
the shards pg_slots directly.
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
295dfe0372
osd: restructure consume_map in terms of shards
...
- new split primming machinery
- new primed split cleanup on pg removal
- cover the pg creation path
The old split tracking is now totally unused; will be removed in the next
patch.
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
2a9c8d80ce
osd: pass sdata into dequeue_peering_evt (and dequeue_delete)
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
e297b1e6ad
osd: pass data into OpQueueItem::run()
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
8ee13128fe
osd: kill pg_map
...
Split doesn't work quite right; num_pgs count is probably off. But, things
mostly work.
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
eed90d4a8f
osd: rename OSDShard waiting_for_pg_osdmap -> osdmap
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
4fc459829f
osd: use _get_pgs() where possible; avoid touching pg_map directly
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
bffa62233e
osd: get _get_pgs() and _get_pgids()
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
c4960f03a2
osd: remove get_mapped_pools command
...
No in-tree users.
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
0bf6ac893a
osd: move ShardedOpWQ::ShardData -> OSDShard
...
Soon we will destroy pg_map!
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
d3bd637171
osd: kill _open_lock_pg
...
Move lock call to caller.
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:51 -05:00
Sage Weil
d9dcaa79b7
osd: kill _create_lock_pg
...
Unused.
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:51 -05:00
Sage Weil
0766f5b40c
osd: do not release recovery_ops_reserved on requeue
...
This doesn't make sense.. although it's the same behavior as
luminous.
The point of the releases here is that if we drop something that is in
the queue we drop the recovery_ops_reserved counter by that much. However,
if something is in the queue and waiting, and we wake it back up, there
is no net change to _reserved... which is only decremented when we
actually dequeue something.
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:51 -05:00
Sage Weil
987490db3d
osd: debug recovery_ops_reserved
...
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:51 -05:00