mirror of
https://github.com/ceph/ceph
synced 2025-01-08 20:21:33 +00:00
00e903160b
Three Reservation priorities from RECOVERY, BACKFILL_HIGH, BACKFILL_LOW fixes: #4273 Signed-off-by: David Zafman <david.zafman@inktank.com>
35 lines
1.8 KiB
ReStructuredText
35 lines
1.8 KiB
ReStructuredText
====================
|
|
Backfill Reservation
|
|
====================
|
|
|
|
When a new osd joins a cluster, all pgs containing it must eventually backfill
|
|
to it. If all of these backfills happen simultaneously, it would put excessive
|
|
load on the osd. osd_num_concurrent_backfills limits the number of outgoing or
|
|
incoming backfills on a single node.
|
|
|
|
Each OSDService now has two AsyncReserver instances: one for backfills going
|
|
from the osd (local_reserver) and one for backfills going to the osd
|
|
(remote_reserver). An AsyncReserver (common/AsyncReserver.h) manages a queue
|
|
by priority of waiting items and a set of current reservation holders. When a
|
|
slot frees up, the AsyncReserver queues the Context* associated with the next
|
|
item on the highest priority queue in the finisher provided to the constructor.
|
|
|
|
For a primary to initiate a backfill, it must first obtain a reservation from
|
|
its own local_reserver. Then, it must obtain a reservation from the backfill
|
|
target's remote_reserver via a MBackfillReserve message. This process is
|
|
managed by substates of Active and ReplicaActive (see the substates of Active
|
|
in PG.h). The reservations are dropped either on the Backfilled event, which
|
|
is sent on the primary before calling recovery_complete and on the replica on
|
|
receipt of the BackfillComplete progress message), or upon leaving Active or
|
|
ReplicaActive.
|
|
|
|
It's important that we always grab the local reservation before the remote
|
|
reservation in order to prevent a circular dependency.
|
|
|
|
We want to minimize the risk of data loss by prioritizing the order in which
|
|
PGs are recovered. We use 3 AsyncReserver priorities to hand out reservations.
|
|
The highest priority is log based recovery (RECOVERY) since this must always
|
|
complete before backfill can start. The next priority is backfill of degraded
|
|
PGs (BACKFILL_HIGH). The lowest priority is backfill of non-degraded PGs
|
|
(BACKFILL_LOW).
|