mirror of https://github.com/ceph/ceph
75 lines
2.8 KiB
ReStructuredText
75 lines
2.8 KiB
ReStructuredText
====================
|
|
Recovery Reservation
|
|
====================
|
|
|
|
Recovery reservation extends and subsumes backfill reservation. The
|
|
reservation system from backfill recovery is used for local and remote
|
|
reservations.
|
|
|
|
When a PG goes active, first it determines what type of recovery is
|
|
necessary, if any. It may need log-based recovery, backfill recovery,
|
|
both, or neither.
|
|
|
|
In log-based recovery, the primary first acquires a local reservation
|
|
from the OSDService's local_reserver. Then a MRemoteReservationRequest
|
|
message is sent to each replica in order of OSD number. These requests
|
|
will always be granted (i.e., cannot be rejected), but they may take
|
|
some time to be granted if the remotes have already granted all their
|
|
remote reservation slots.
|
|
|
|
After all reservations are acquired, log-based recovery proceeds as it
|
|
would without the reservation system.
|
|
|
|
After log-based recovery completes, the primary releases all remote
|
|
reservations. The local reservation remains held. The primary then
|
|
determines whether backfill is necessary. If it is not necessary, the
|
|
primary releases its local reservation and waits in the Recovered state
|
|
for all OSDs to indicate that they are clean.
|
|
|
|
If backfill recovery occurs after log-based recovery, the local
|
|
reservation does not need to be reacquired since it is still held from
|
|
before. If it occurs immediately after activation (log-based recovery
|
|
not possible/necessary), the local reservation is acquired according to
|
|
the typical process.
|
|
|
|
Once the primary has its local reservation, it requests a remote
|
|
reservation from the backfill target. This reservation CAN be rejected,
|
|
for instance if the OSD is too full (osd_backfill_full_ratio config
|
|
option). If the reservation is rejected, the primary drops its local
|
|
reservation, waits (osd_backfill_retry_interval), and then retries. It
|
|
will retry indefinitely.
|
|
|
|
Once the primary has the local and remote reservations, backfill
|
|
proceeds as usual. After backfill completes the remote reservation is
|
|
dropped.
|
|
|
|
Finally, after backfill (or log-based recovery if backfill was not
|
|
necessary), the primary drops the local reservation and enters the
|
|
Recovered state. Once all the PGs have reported they are clean, the
|
|
primary enters the Clean state and marks itself active+clean.
|
|
|
|
|
|
--------------
|
|
Things to Note
|
|
--------------
|
|
|
|
We always grab the local reservation first, to prevent a circular
|
|
dependency. We grab remote reservations in order of OSD number for the
|
|
same reason.
|
|
|
|
The recovery reservation state chart controls the PG state as reported
|
|
to the monitor. The state chart can set:
|
|
|
|
- recovery_wait: waiting for local/remote reservations
|
|
- recovering: recoverying
|
|
- wait_backfill: waiting for remote backfill reservations
|
|
- backfilling: backfilling
|
|
- backfill_toofull: backfill reservation rejected, OSD too full
|
|
|
|
|
|
--------
|
|
See Also
|
|
--------
|
|
|
|
The Active substate of the automatically generated OSD state diagram.
|