mirror of
https://github.com/ceph/ceph
synced 2025-02-22 02:27:29 +00:00
Merge pull request #40282 from rzarzynski/wip-crimson-doc-waitstates
doc/crimson: document wait states Reviewed-by: Kefu Chai <kchai@redhat.com>
This commit is contained in:
commit
24d34d7b83
95
doc/dev/crimson/pipeline.rst
Normal file
95
doc/dev/crimson/pipeline.rst
Normal file
@ -0,0 +1,95 @@
|
||||
==============================
|
||||
The ``ClientRequest`` pipeline
|
||||
==============================
|
||||
|
||||
In crimson, exactly like in the classical OSD, a client request has data and
|
||||
ordering dependencies which must be satisfied before processing (actually
|
||||
a particular phase of) can begin. As one of the goals behind crimson is to
|
||||
preserve the compatibility with the existing OSD incarnation, the same semantic
|
||||
must be assured. An obvious example of such data dependency is the fact that
|
||||
an OSD needs to have a version of OSDMap that matches the one used by the client
|
||||
(``Message::get_min_epoch()``).
|
||||
|
||||
If a dependency is not satisfied, the processing stops. It is crucial to note
|
||||
the same must happen to all other requests that are sequenced-after (due to
|
||||
their ordering requirements).
|
||||
|
||||
There are a few cases when the blocking of a client request can happen.
|
||||
|
||||
|
||||
``ClientRequest::ConnectionPipeline::await_map``
|
||||
wait for particular OSDMap version is available at the OSD level
|
||||
``ClientRequest::ConnectionPipeline::get_pg``
|
||||
wait a particular PG becomes available on OSD
|
||||
``ClientRequest::PGPipeline::await_map``
|
||||
wait on a PG being advanced to particular epoch
|
||||
``ClientRequest::PGPipeline::wait_for_active``
|
||||
wait on a PG becomes ``is_active()``
|
||||
``ClientRequest::PGPipeline::recover_missing``
|
||||
wait on an object has been recovered
|
||||
``ClientRequest::PGPipeline::get_obc``
|
||||
wait on an object context becomes locked
|
||||
``ClientRequest::PGPipeline::process``
|
||||
wait if any other ``MOSDOp`` message is handled against this PG
|
||||
|
||||
At any moment, a ``ClientRequest`` being served should be in one and only one
|
||||
of these phases. Similarly, an object denoting particular phase can host not
|
||||
more than a single ``ClientRequest`` the same time. At low-level this is achieved
|
||||
with a combination of a barrier and an exclusive lock. They implement the
|
||||
semantic of a semaphore with a single slot for these exclusive phases.
|
||||
|
||||
As the execution advances, request enters next phase and leaves the current one
|
||||
freeing it for another ``ClientRequest`` instance. All these phases form a pipeline
|
||||
which assures the order is preserved.
|
||||
|
||||
These pipeline phases are divided into two ordering domains: ``ConnectionPipeline``
|
||||
and ``PGPipeline``. The former ensures order across a client connection while
|
||||
the latter does that across a PG. That is, requests originating from the same
|
||||
connection are executed in the same order as they were sent by the client.
|
||||
The same applies to the PG domain: when requests from multiple connections reach
|
||||
a PG, they are executed in the same order as they entered a first blocking phase
|
||||
of the ``PGPipeline``.
|
||||
|
||||
Comparison with the classical OSD
|
||||
----------------------------------
|
||||
As the audience of this document are Ceph Developers, it seems reasonable to
|
||||
match the phases of crimson's ``ClientRequest`` pipeline with the blocking
|
||||
stages in the classical OSD. The names in the right column are names of
|
||||
containers (lists and maps) used to implement these stages. They are also
|
||||
already documented in the ``PG.h`` header.
|
||||
|
||||
+----------------------------------------+--------------------------------------+
|
||||
| crimson | ceph-osd waiting list |
|
||||
+========================================+======================================+
|
||||
|``ConnectionPipeline::await_map`` | ``OSDShardPGSlot::waiting`` and |
|
||||
|``ConnectionPipeline::get_pg`` | ``OSDShardPGSlot::waiting_peering`` |
|
||||
+----------------------------------------+--------------------------------------+
|
||||
|``PGPipeline::await_map`` | ``PG::waiting_for_map`` |
|
||||
+----------------------------------------+--------------------------------------+
|
||||
|``PGPipeline::wait_for_active`` | ``PG::waiting_for_peered`` |
|
||||
| +--------------------------------------+
|
||||
| | ``PG::waiting_for_flush`` |
|
||||
| +--------------------------------------+
|
||||
| | ``PG::waiting_for_active`` |
|
||||
+----------------------------------------+--------------------------------------+
|
||||
|To be done (``PG_STATE_LAGGY``) | ``PG::waiting_for_readable`` |
|
||||
+----------------------------------------+--------------------------------------+
|
||||
|To be done | ``PG::waiting_for_scrub`` |
|
||||
+----------------------------------------+--------------------------------------+
|
||||
|``PGPipeline::recover_missing`` | ``PG::waiting_for_unreadable_object``|
|
||||
| +--------------------------------------+
|
||||
| | ``PG::waiting_for_degraded_object`` |
|
||||
+----------------------------------------+--------------------------------------+
|
||||
|To be done (proxying) | ``PG::waiting_for_blocked_object`` |
|
||||
+----------------------------------------+--------------------------------------+
|
||||
|``PGPipeline::get_obc`` | *obc rwlocks* |
|
||||
+----------------------------------------+--------------------------------------+
|
||||
|``PGPipeline::process`` | ``PG::lock`` (roughly) |
|
||||
+----------------------------------------+--------------------------------------+
|
||||
|
||||
|
||||
As the last word it might be worth to emphasize that the ordering implementations
|
||||
in both classical OSD and in crimson are stricter than a theoretical minimum one
|
||||
required by the RADOS protocol. For instance, we could parallelize read operations
|
||||
targeting the same object at the price of extra complexity but we don't -- the
|
||||
simplicity has won.
|
Loading…
Reference in New Issue
Block a user