doc/rados: rewrite mclock-config-ref

This PR rewrites the material provided in
PR#40531, improving its elegance and
readability.

This PR rewrites the first 48 percent of the
material provided in PR#40531.

Signed-off-by: Zac Dover <zac.dover@gmail.com>
This commit is contained in:
Zac Dover 2021-04-03 06:57:51 +10:00
parent c94f0a50a5
commit e262e85606

View File

@ -4,19 +4,21 @@
.. index:: mclock; configuration
This dmclock related configuration parameters can be applied using mclock
profiles. The purpose of the mclock profiles is to mask the low level details
from a user trying to configure mclock for a specific purpose. At a high level,
for each OSD in the enviroment the steps involve,
Mclock profiles mask the low level details from users, making it
easier for them to configure mclock.
- providing an input related to the total capacity of OSD(s) and
- choosing the desired mclock profile to enable
To use mclock, you must provide the following input parameters:
Based on the profile specified, the OSD determines and applies the lower level
mclock and Ceph parameters. The parameters applied by the mclock profile
enables tuning of the QoS between client, background recovery/backfill ops and
other internal entities (aka best effort clients) within Ceph that generate IOPS
(for e.g. scrubs, snap trim, PG deletions etc.).
* total capacity of each OSD
* an mclock profile to enable
Using the settings in the specified profile, the OSD determines and applies the
lower-level mclock and Ceph parameters. The parameters applied by the mclock
profile make it possible to tune the QoS between client I/O, recovery/backfill
operations, and other background operations (for example, scrub, snap trim, and
PG deletion). These background activities are considered best-effort internal
clients of Ceph.
.. index:: mclock; profile definition
@ -29,16 +31,15 @@ Ceph cluster enables the throttling of the operations(IOPS) belonging to
different client classes (background recovery, scrub, snaptrim, client op,
osd subop)”*.
The purpose of the mclock profile is to help with the following,
The mclock profile uses the capacity limits and the mclock profile selected by
the user to determine the low-level mclock resource control parameters.
- Based on the capacity limits provided and the mclock profile selected,
determine the low level mclock resource control parameters,
- Transparently apply the lower level mclock resource control parameters and
some Ceph configuration parameters depending on the profile.
Depending on the profile, lower-level mclock resource-control parameters and
some Ceph-configuration parameters are transparently applied.
The lower level mclock resource control parameters are the *reservation, weight
and limit* that provide control of the resource shares as already described in
the `OSD Config Reference`_ section.
The low-level mclock resource control parameters are the *reservation*,
*limit*, and *weight* that provide control of the resource shares, as
described in the `OSD Config Reference`_.
.. index:: mclock; profile types
@ -48,30 +49,30 @@ mClock Profile Types
mclock profiles can be broadly classified into two types,
- **Built-in**: A user can choose between the following built-in profiles types,
- **Built-in**: Users can choose between the following built-in profile types:
- **high_client_ops** (*default*):
This profile allocates more reservation and limit to external clients ops
when compared to background recoveries and other internal clients within
This profile allocates more reservation and limit to external-client ops
as compared to background recoveries and other internal clients within
Ceph. This profile is enabled by default.
- **high_recovery_ops**:
This profile allocates more reservation to background recoveries when
This profile allocates more reservation to background recoveries as
compared to external clients and other internal clients within Ceph. For
e.g. an admin may enable this profile temporarily to speed-up background
example, an admin may enable this profile temporarily to speed-up background
recoveries during non-peak hours.
- **balanced**:
This profile allocates equal reservations to client ops and background
This profile allocates equal reservation to client ops and background
recovery ops.
- **Custom**: A user may enable this profile to have complete control over all the
mclock and Ceph configuration parameters. Using this profile is not
recommended unless one has a deep understanding of mclock and the related
Ceph configuration options.
- **Custom**: This profile gives users complete control over all mclock and
Ceph configuration parameters. Using this profile is not recommended without
a deep understanding of mclock and related Ceph-configuration options.
.. note:: Across the built-in profiles, for internal clients of mclock (for e.g.
scrub, snap trim etc.) are given slightly lower reservations but
higher weight and no limit. This is to ensure that these operations
are able to complete quickly if there are no other competing services.
.. note:: Across the built-in profiles, internal clients of mclock (for example
"scrub", "snap trim", and "pg deletion") are given slightly lower
reservations, but higher weight and no limit. This ensures that
these operations are able to complete quickly if there are no other
competing services.
.. index:: mclock; built-in profiles
@ -80,10 +81,10 @@ mClock Built-in Profiles
========================
When a built-in profile is enabled, the mClock scheduler calculates the low
level mclock parameters [*reservation, weight, limit*] based on the profile
level mclock parameters [*reservation*, *weight*, *limit*] based on the profile
enabled for each client type. The mclock parameters are calculated based on
the max OSD capacity provided beforehand. As a result, the following mclock
config parameters cannot be modified when using any of the built-in profiles,
config parameters cannot be modified when using any of the built-in profiles:
- ``osd_mclock_scheduler_client_res``
- ``osd_mclock_scheduler_client_wgt``
@ -95,7 +96,7 @@ config parameters cannot be modified when using any of the built-in profiles,
- ``osd_mclock_scheduler_background_best_effort_wgt``
- ``osd_mclock_scheduler_background_best_effort_lim``
Additionally, the following Ceph options will not be modifiable by the user,
The following Ceph options will not be modifiable by the user:
- ``osd_max_backfills``
- ``osd_recovery_max_active``
@ -103,15 +104,15 @@ Additionally, the following Ceph options will not be modifiable by the user,
This is because the above options are internally modified by the mclock
scheduler in order to maximize the impact of the set profile.
By default the *high_client_ops* profile will be enabled to provide larger chunk
of the bandwidth allocation to client ops. Background recovery ops are given
lower allocation and so take a longer time to complete. But there might be
instances that necessitate giving higher allocations to either client ops or
recovery ops. In order to satisfy such an eventuality, alternate built-in
profiles mentioned above may be enabled.
By default, the *high_client_ops* profile is enabled to ensure that a larger
chunk of the bandwidth allocation goes to client ops. Background recovery ops
are given lower allocation (and therefore take a longer time to complete). But
there might be instances that necessitate giving higher allocations to either
client ops or recovery ops. In order to deal with such a situation, you can
enable one of the alternate built-in profiles mentioned above.
Additionally, if a built-in profile is active, the following Ceph config sleep
options will be disabled,
If a built-in profile is active, the following Ceph config sleep options will
be disabled,
- ``osd_recovery_sleep``
- ``osd_recovery_sleep_hdd``
@ -127,10 +128,10 @@ options will be disabled,
- ``osd_snap_trim_sleep_ssd``
- ``osd_snap_trim_sleep_hybrid``
The above sleep options are disabled to ensure that mclock scheduler is able
determine when to pick the next op from its operation queue and transfer
it to the operation sequencer. This results in the desired QoS to be provided
across all its clients.
The above sleep options are disabled to ensure that mclock scheduler is able to
determine when to pick the next op from its operation queue and transfer it to
the operation sequencer. This results in the desired QoS being provided across
all its clients.
.. index:: mclock; enable built-in profile
@ -138,34 +139,34 @@ across all its clients.
Steps to Enable mClock Profile
==============================
The following sections outline the steps required to enable a mclock profile:
The following sections outline the steps required to enable a mclock profile.
Determine OSD Capacity Using Benchmark Tests
--------------------------------------------
Determining OSD Capacity Using Benchmark Tests
----------------------------------------------
To allow mclock to fulfill the QoS goals across its clients, the most important
criteria is to have a good understanding of each OSD capacity in terms of their
baseline throughputs (IOPS) across the Ceph nodes. To determine the capacity,
appropriate benchmarking tests must be manually performed and the steps for
this is broadly outlined below.
To allow mclock to fulfill its QoS goals across its clients, it is most
important to have a good understanding of each OSD's capacity in terms of its
baseline throughputs (IOPS) across the Ceph nodes. To determine this capacity,
you must perform appropriate benchmarking tests. The steps for performing these
benchmarking tests are broadly outlined below.
Any existing benchmarking tool may be employed for this purpose and the
following steps employs the *Ceph Benchmarking Tool* (aka cbt_). Regardless of
the tool used, the steps described below remain the same.
Any existing benchmarking tool can be used for this purpose. The following
steps use the *Ceph Benchmarking Tool* (cbt_). Regardless of the tool
used, the steps described below remain the same.
As already described in the `OSD Config Reference`_ section, the number of
shards and the bluestore throttle parameters have an impact on the mclock op
shards and the bluestore's throttle parameters have an impact on the mclock op
queues. Therefore, it is critical to set these values carefully in order to
maximize the impact of the mclock scheduler.
:Number of Operational Shards:
The recommendation is to use the default number of shards as defined by the
We recommend using the default number of shards as defined by the
configuration options ``osd_op_num_shards``, ``osd_op_num_shards_hdd``, and
``osd_op_num_shards_ssd``. In general, a lower number of shards will increase
the impact of the mclock queues.
:Bluestore Throttle Parameters:
The recommendation is to use the default values as defined by
We recommend using the default values as defined by
``bluestore_throttle_bytes`` and ``bluestore_throttle_deferred_bytes``. But
these parameters may also be determined during the benchmarking phase as
described below.