Commit Graph

84931 Commits

Author SHA1 Message Date
Sage Weil
a01d2dfc87 osd: accessors for num_pgs
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:27:00 -05:00
Sage Weil
bfbf2044b2 osd: fix old wake_pg_waiters references
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:59 -05:00
Sage Weil
dc66a055ea osd: fix 'stale' message
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:59 -05:00
Sage Weil
08381749f6 osd: constify arg for handle_pg_create_info, maybe_wait_for_max_pg
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:59 -05:00
Sage Weil
db35bbd352 osd: constify arg to prime_splits
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:59 -05:00
Sage Weil
dba7521d92 osd: constify arg to identify_splits
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:59 -05:00
Sage Weil
c9bf02f481 osd: drop unused pushes_to_free variable on _process
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:59 -05:00
Sage Weil
3bd333810c osd: handle pushes_to_free in consume_map
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:59 -05:00
Sage Weil
b57d40991c osd: synchronously remove pgids when pool tombstone is missing or invalid
This is needed for upgraded clusters (e.g., v13.0.2 clusters with an
missing ec_profile or upgraded clusters with partially-deleted pools/pgs).

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:59 -05:00
Sage Weil
26f00dd67c qa/suites: mon warn on pool no app = false for api tests
Among other things, the list.cc tests set pg_num which waits for cluster
healthy.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:58 -05:00
Sage Weil
c2cce3bc88 qa/suites/rados/basic/tasks/rados_api_tests: debug ms = 1
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:58 -05:00
Sage Weil
0e6db5e320 osd: periodically request newer map from mon if waiting peering events
If we have peering events waiting on a newer map than we have, request it
from the mon.  Do this periodically in tick so that we normally wait to get
it from a peer first.

This avoids a deadlock situation where we are, say, waiting for a newer
map to create a pg or but do not ever get the map to do it (because the
cluster is idle).

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:58 -05:00
Sage Weil
740b7809af osd: use rctx transaction for PG removal
In the normal case, queue up the removal work on the rctx transaction.

For the final cleanup, since we need to block, dispatch it ourselves, and
do not do so in OSD.cc.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:58 -05:00
Sage Weil
11a9fbecf9 osd: some debug output in identify_split_children
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:58 -05:00
Sage Weil
68dac914ed osd/PG: do final pg delete transaction on pg sequencer
Simpler, cleaner.  Also, this way we flush before returning.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:58 -05:00
Sage Weil
1eec5bb6a2 osd: better debug output in identify_splits
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:58 -05:00
Sage Weil
5155baf323 osd: handle NOUP flag vs boot race
If we digest maps that show a NOUP flag change *and* we also go active,
there is no need to restart the boot process--we can just go/stay active.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:58 -05:00
Sage Weil
29a885c915 qa/suites/rados/singleton/all/recovery_preemption: make test more reliable
A 30 second run did only 7000 ops, which means ~50 log entires per pg...
not enough to trigger backfill.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
c3589df320 qa/suites/rados/singleton/all/mon-seesaw: whitelist PG_AVAILABILITY
The seesaw might delay pg creation by more than 60s.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
494d02c349 osd/PG: ensure an actual transaction gets queued for recovery finish
Otherwise, this context gets leaked and lost.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
ce699ff870 osd: close split vs query race in consume_map
Consider the race:

- shard 0 consumes epoch E
- shard 1 consumes epoch E
  - shard 1 pg P will split to C
- shard 0 processes query on C, returns DNE
- shard 0 primes slot C

Close race by priming split children before consuming map into each
OSDShard.  That way the query will either (1) arrive before E and before
slot C is primed and wait for E, or find the slot present with
waiting_for_split true.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
b4d96be92d osd: improve documentation for event queue ordering and requeueing rules
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
ff0f798e1b osd/PG: flush sequencer/collection on shutdown
This should catch any in-flight work we have.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
40a92a1f56 osd/PG: move shutdown into PG
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
c454184d5e osd/osd_types: fix pg_t::pool() return type (uint64_t -> int64_t)
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:57 -05:00
Sage Weil
38319f8300 mon/OSDMonitor: disallow pg_num changes until after pool is created
The pg create handling OSD code does not handle races between a mon create
message and a split message.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
334bf7e3dc osd/PG: set send_notify on child
If we are a non-primary, we need to ensure the split children send
notifies.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
0713586300 osd: kill broken _process optimization; simplify null pg flow
- drop fast quuee to waiting list optimization: it breaks ordering and is
a useless optimization
- restructure so that we don't drop the lock and revalidate the world if
pg == nullptr

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
f9667a9ef3 osd: fix fast pg create vs limits
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
b4af83d735 osd: (pre)publish map before distributing to shards (and pgs)
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
cf50361066 osd: update numpg_* counters when removing a pg
Usually on a pg create we see an OSDMap update; on PG removal completion
we may not.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
9d6425ab25 osd: decrement deleting pg count in _delete_some
The exit() method for ToDelete state doesn't run on PG destruction.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
3b970e32b0 osd: clear shard osdmaps during shutdown
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
bfeae027aa osd: make save osdmap accessor for OSDShard
The advance_pg needs to get the shard osdmap without racing against
consume_map().

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:56 -05:00
Sage Weil
540b1bc9e6 osd: clean up mutex naming for OSDShard
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
183e7d7bc2 common/tracked_int_ptr: fix operator= return value
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
3a0b197cd1 osd: fix pg removal vs _process race
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
7fb35ff961 osd: lookup_*pg must return PGRef
Otherwise it is fundamentally unsafe, as the PG might get destroyed out
from under us without a reference.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
1270b49fb5 osd: kill pass-through _open_pg
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
486faa482a osd: remove old min pg epoch tracking
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
bc9436bcb5 osd/PG: remove RecoveryCtx on_applied and on_commit
These were awkward and unnecessary.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
7a9153c4b3 osd/PG: register delete completion directly on Transaction
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:55 -05:00
Sage Weil
ed72f30db7 osd: register split completion directly on Transaction
No need to use wonky RecoveryCtx C_Contexts

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
2c2378c49e osd/PG: drop unused context list accessors for RecoveryCtx
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
45e07480df osd/PG: register recovery finish context directly on Transaction
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
643714ff96 osd/PG: drop unused activate() context list arg
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
a5494b815c osd/PG: register flush completions directly on the Transaction
No need to awkward list passed as an arg; all of these callbacks end up
on the Transaction anyway.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
6c52e5d1c7 osd: wait for pg epochs based on shard tracking
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
9895c9f1a9 osd: index pg (slots) by map epoch within each shard
This will replace the epoch tracking in OSDService shortly.

Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00
Sage Weil
e178a6d876 osd/PG: link back to pg slot
Signed-off-by: Sage Weil <sage@redhat.com>
2018-04-04 08:26:54 -05:00