Commit Graph

84969 Commits

Author SHA1 Message Date
Sage Weil
5155baf323 osd: handle NOUP flag vs boot race
If we digest maps that show a NOUP flag change *and* we also go active,
there is no need to restart the boot process--we can just go/stay active.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:58 -05:00
Sage Weil
29a885c915 qa/suites/rados/singleton/all/recovery_preemption: make test more reliable
A 30 second run did only 7000 ops, which means ~50 log entires per pg...
not enough to trigger backfill.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
c3589df320 qa/suites/rados/singleton/all/mon-seesaw: whitelist PG_AVAILABILITY
The seesaw might delay pg creation by more than 60s.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
494d02c349 osd/PG: ensure an actual transaction gets queued for recovery finish
Otherwise, this context gets leaked and lost.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
ce699ff870 osd: close split vs query race in consume_map
Consider the race:

- shard 0 consumes epoch E
- shard 1 consumes epoch E
  - shard 1 pg P will split to C
- shard 0 processes query on C, returns DNE
- shard 0 primes slot C

Close race by priming split children before consuming map into each
OSDShard.  That way the query will either (1) arrive before E and before
slot C is primed and wait for E, or find the slot present with
waiting_for_split true.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
b4d96be92d osd: improve documentation for event queue ordering and requeueing rules
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
ff0f798e1b osd/PG: flush sequencer/collection on shutdown
This should catch any in-flight work we have.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
40a92a1f56 osd/PG: move shutdown into PG
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
c454184d5e osd/osd_types: fix pg_t::pool() return type (uint64_t -> int64_t)
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
38319f8300 mon/OSDMonitor: disallow pg_num changes until after pool is created
The pg create handling OSD code does not handle races between a mon create
message and a split message.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
334bf7e3dc osd/PG: set send_notify on child
If we are a non-primary, we need to ensure the split children send
notifies.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
0713586300 osd: kill broken _process optimization; simplify null pg flow
- drop fast quuee to waiting list optimization: it breaks ordering and is
a useless optimization
- restructure so that we don't drop the lock and revalidate the world if
pg == nullptr

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
f9667a9ef3 osd: fix fast pg create vs limits
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
b4af83d735 osd: (pre)publish map before distributing to shards (and pgs)
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
cf50361066 osd: update numpg_* counters when removing a pg
Usually on a pg create we see an OSDMap update; on PG removal completion
we may not.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
9d6425ab25 osd: decrement deleting pg count in _delete_some
The exit() method for ToDelete state doesn't run on PG destruction.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
3b970e32b0 osd: clear shard osdmaps during shutdown
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
bfeae027aa osd: make save osdmap accessor for OSDShard
The advance_pg needs to get the shard osdmap without racing against
consume_map().

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
540b1bc9e6 osd: clean up mutex naming for OSDShard
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
183e7d7bc2 common/tracked_int_ptr: fix operator= return value
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
3a0b197cd1 osd: fix pg removal vs _process race
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
7fb35ff961 osd: lookup_*pg must return PGRef
Otherwise it is fundamentally unsafe, as the PG might get destroyed out
from under us without a reference.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
1270b49fb5 osd: kill pass-through _open_pg
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
486faa482a osd: remove old min pg epoch tracking
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
bc9436bcb5 osd/PG: remove RecoveryCtx on_applied and on_commit
These were awkward and unnecessary.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
7a9153c4b3 osd/PG: register delete completion directly on Transaction
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
ed72f30db7 osd: register split completion directly on Transaction
No need to use wonky RecoveryCtx C_Contexts

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
2c2378c49e osd/PG: drop unused context list accessors for RecoveryCtx
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
45e07480df osd/PG: register recovery finish context directly on Transaction
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
643714ff96 osd/PG: drop unused activate() context list arg
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
a5494b815c osd/PG: register flush completions directly on the Transaction
No need to awkward list passed as an arg; all of these callbacks end up
on the Transaction anyway.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
6c52e5d1c7 osd: wait for pg epochs based on shard tracking
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
9895c9f1a9 osd: index pg (slots) by map epoch within each shard
This will replace the epoch tracking in OSDService shortly.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
e178a6d876 osd/PG: link back to pg slot
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
4c465fbfac osd: OSDShard::pg_slot -> OSDShardPGSlot
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
973836c70d osd: change pg_slots unordered_map to use unique_ptr<>
This avoids moving slots around in memory in the unordered_map... they can
be big!

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
7aeebde3fd osd: remove some unused methods
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
8433941c94 osd: remove created_pgs tracking in RecoveryCtx
Not needed or used!

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
3b2935951f osd: fix PG::ch init
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
2922b3be33 osd: use _attach_pg and _detach_pg helpers; keep PG::osd_shard ptr
Consolidate num_pgs updates (and fix a counting bug along the way).

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
b7dc3d0fab osd: remove old split tracking machinery
This infrastructure is no longer used; simpler split tracking now lives in
the shards pg_slots directly.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
295dfe0372 osd: restructure consume_map in terms of shards
- new split primming machinery
- new primed split cleanup on pg removal
- cover the pg creation path

The old split tracking is now totally unused; will be removed in the next
patch.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:53 -05:00
Sage Weil
2a9c8d80ce osd: pass sdata into dequeue_peering_evt (and dequeue_delete)
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
e297b1e6ad osd: pass data into OpQueueItem::run()
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
8ee13128fe osd: kill pg_map
Split doesn't work quite right; num_pgs count is probably off.  But, things
mostly work.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
eed90d4a8f osd: rename OSDShard waiting_for_pg_osdmap -> osdmap
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
4fc459829f osd: use _get_pgs() where possible; avoid touching pg_map directly
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
bffa62233e osd: get _get_pgs() and _get_pgids()
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
c4960f03a2 osd: remove get_mapped_pools command
No in-tree users.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00
Sage Weil
0bf6ac893a osd: move ShardedOpWQ::ShardData -> OSDShard
Soon we will destroy pg_map!

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:52 -05:00