mirror of
https://github.com/ceph/ceph
synced 2025-01-17 00:12:31 +00:00
e1548ef36a
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
266 lines
9.4 KiB
ReStructuredText
266 lines
9.4 KiB
ReStructuredText
=======================
|
|
Config Settings
|
|
=======================
|
|
|
|
See `Block Device`_ for additional details.
|
|
|
|
Generic IO Settings
|
|
===================
|
|
|
|
.. confval:: rbd_compression_hint
|
|
.. confval:: rbd_read_from_replica_policy
|
|
.. confval:: rbd_default_order
|
|
|
|
Cache Settings
|
|
=======================
|
|
|
|
.. sidebar:: Kernel Caching
|
|
|
|
The kernel driver for Ceph block devices can use the Linux page cache to
|
|
improve performance.
|
|
|
|
The user space implementation of the Ceph block device (i.e., ``librbd``) cannot
|
|
take advantage of the Linux page cache, so it includes its own in-memory
|
|
caching, called "RBD caching." RBD caching behaves just like well-behaved hard
|
|
disk caching. When the OS sends a barrier or a flush request, all dirty data is
|
|
written to the OSDs. This means that using write-back caching is just as safe as
|
|
using a well-behaved physical hard disk with a VM that properly sends flushes
|
|
(i.e. Linux kernel >= 2.6.32). The cache uses a Least Recently Used (LRU)
|
|
algorithm, and in write-back mode it can coalesce contiguous requests for
|
|
better throughput.
|
|
|
|
The librbd cache is enabled by default and supports three different cache
|
|
policies: write-around, write-back, and write-through. Writes return
|
|
immediately under both the write-around and write-back policies, unless there
|
|
are more than ``rbd_cache_max_dirty`` unwritten bytes to the storage cluster.
|
|
The write-around policy differs from the write-back policy in that it does
|
|
not attempt to service read requests from the cache, unlike the write-back
|
|
policy, and is therefore faster for high performance write workloads. Under the
|
|
write-through policy, writes return only when the data is on disk on all
|
|
replicas, but reads may come from the cache.
|
|
|
|
Prior to receiving a flush request, the cache behaves like a write-through cache
|
|
to ensure safe operation for older operating systems that do not send flushes to
|
|
ensure crash consistent behavior.
|
|
|
|
If the librbd cache is disabled, writes and
|
|
reads go directly to the storage cluster, and writes return only when the data
|
|
is on disk on all replicas.
|
|
|
|
.. note::
|
|
The cache is in memory on the client, and each RBD image has
|
|
its own. Since the cache is local to the client, there's no coherency
|
|
if there are others accessing the image. Running GFS or OCFS on top of
|
|
RBD will not work with caching enabled.
|
|
|
|
|
|
Option settings for RBD should be set in the ``[client]``
|
|
section of your configuration file or the central config store. These settings
|
|
include:
|
|
|
|
.. confval:: rbd_cache
|
|
.. confval:: rbd_cache_policy
|
|
.. confval:: rbd_cache_writethrough_until_flush
|
|
.. confval:: rbd_cache_size
|
|
.. confval:: rbd_cache_max_dirty
|
|
.. confval:: rbd_cache_target_dirty
|
|
.. confval:: rbd_cache_max_dirty_age
|
|
|
|
.. _Block Device: ../../rbd
|
|
|
|
|
|
Read-ahead Settings
|
|
=======================
|
|
|
|
librbd supports read-ahead/prefetching to optimize small, sequential reads.
|
|
This should normally be handled by the guest OS in the case of a VM,
|
|
but boot loaders may not issue efficient reads. Read-ahead is automatically
|
|
disabled if caching is disabled or if the policy is write-around.
|
|
|
|
|
|
.. confval:: rbd_readahead_trigger_requests
|
|
.. confval:: rbd_readahead_max_bytes
|
|
.. confval:: rbd_readahead_disable_after_bytes
|
|
|
|
Image Features
|
|
==============
|
|
|
|
RBD supports advanced features which can be specified via the command line when
|
|
creating images or the default features can be configured via
|
|
``rbd_default_features = <sum of feature numeric values>`` or
|
|
``rbd_default_features = <comma-delimited list of CLI values>``.
|
|
|
|
``Layering``
|
|
|
|
:Description: Layering enables cloning.
|
|
:Internal value: 1
|
|
:CLI value: layering
|
|
:Added in: v0.52 (Bobtail)
|
|
:KRBD support: since v3.10
|
|
:Default: yes
|
|
|
|
``Striping v2``
|
|
|
|
:Description: Striping spreads data across multiple objects. Striping helps with
|
|
parallelism for sequential read/write workloads.
|
|
:Internal value: 2
|
|
:CLI value: striping
|
|
:Added in: v0.55 (Bobtail)
|
|
:KRBD support: since v3.10 (default striping only, "fancy" striping added in v4.17)
|
|
:Default: yes
|
|
|
|
``Exclusive locking``
|
|
|
|
:Description: When enabled, it requires a client to acquire a lock on an object
|
|
before making a write. Exclusive lock should only be enabled when
|
|
a single client is accessing an image at any given time.
|
|
:Internal value: 4
|
|
:CLI value: exclusive-lock
|
|
:Added in: v0.92 (Hammer)
|
|
:KRBD support: since v4.9
|
|
:Default: yes
|
|
|
|
``Object map``
|
|
|
|
:Description: Object map support depends on exclusive lock support. Block
|
|
devices are thin provisioned, which means that they only store
|
|
data that actually has been written, ie. they are *sparse*. Object
|
|
map support helps track which objects actually exist (have data
|
|
stored on a device). Enabling object map support speeds up I/O
|
|
operations for cloning, importing and exporting a sparsely
|
|
populated image, and deleting.
|
|
:Internal value: 8
|
|
:CLI value: object-map
|
|
:Added in: v0.93 (Hammer)
|
|
:KRBD support: since v5.3
|
|
:Default: yes
|
|
|
|
|
|
``Fast-diff``
|
|
|
|
:Description: Fast-diff support depends on object map support and exclusive lock
|
|
support. It adds another property to the object map, which makes
|
|
it much faster to generate diffs between snapshots of an image.
|
|
It is also much faster to calculate the actual data usage of a
|
|
snapshot or volume (``rbd du``).
|
|
:Internal value: 16
|
|
:CLI value: fast-diff
|
|
:Added in: v9.0.1 (Infernalis)
|
|
:KRBD support: since v5.3
|
|
:Default: yes
|
|
|
|
|
|
``Deep-flatten``
|
|
|
|
:Description: Deep-flatten enables ``rbd flatten`` to work on all snapshots of
|
|
an image, in addition to the image itself. Without it, snapshots
|
|
of an image will still rely on the parent, so the parent cannot be
|
|
deleted until the snapshots are first deleted. Deep-flatten makes
|
|
a parent independent of its clones, even if they have snapshots,
|
|
at the expense of using additional OSD device space.
|
|
:Internal value: 32
|
|
:CLI value: deep-flatten
|
|
:Added in: v9.0.2 (Infernalis)
|
|
:KRBD support: since v5.1
|
|
:Default: yes
|
|
|
|
|
|
``Journaling``
|
|
|
|
:Description: Journaling support depends on exclusive lock support. Journaling
|
|
records all modifications to an image in the order they occur. RBD
|
|
mirroring can utilize the journal to replicate a crash-consistent
|
|
image to a remote cluster. It is best to let ``rbd-mirror``
|
|
manage this feature only as needed, as enabling it long term may
|
|
result in substantial additional OSD space consumption.
|
|
:Internal value: 64
|
|
:CLI value: journaling
|
|
:Added in: v10.0.1 (Jewel)
|
|
:KRBD support: no
|
|
:Default: no
|
|
|
|
|
|
``Data pool``
|
|
|
|
:Description: On erasure-coded pools, the image data block objects need to be stored on a separate pool from the image metadata.
|
|
:Internal value: 128
|
|
:Added in: v11.1.0 (Kraken)
|
|
:KRBD support: since v4.11
|
|
:Default: no
|
|
|
|
|
|
``Operations``
|
|
|
|
:Description: Used to restrict older clients from performing certain maintenance operations against an image (e.g. clone, snap create).
|
|
:Internal value: 256
|
|
:Added in: v13.0.2 (Mimic)
|
|
:KRBD support: since v4.16
|
|
|
|
|
|
``Migrating``
|
|
|
|
:Description: Used to restrict older clients from opening an image when it is in migration state.
|
|
:Internal value: 512
|
|
:Added in: v14.0.1 (Nautilus)
|
|
:KRBD support: no
|
|
|
|
``Non-primary``
|
|
|
|
:Description: Used to restrict changes to non-primary images using snapshot-based mirroring.
|
|
:Internal value: 1024
|
|
:Added in: v15.2.0 (Octopus)
|
|
:KRBD support: no
|
|
|
|
|
|
QoS Settings
|
|
============
|
|
|
|
librbd supports limiting per-image IO in several ways. These all apply
|
|
to a given image within a given process - the same image used in
|
|
multiple places, e.g. two separate VMs, would have independent limits.
|
|
|
|
* **IOPS:** number of I/Os per second (any type of I/O)
|
|
* **read IOPS:** number of read I/Os per second
|
|
* **write IOPS:** number of write I/Os per second
|
|
* **bps:** bytes per second (any type of I/O)
|
|
* **read bps:** bytes per second read
|
|
* **write bps:** bytes per second written
|
|
|
|
Each of these limits operates independently of each other. They are
|
|
all off by default. Every type of limit throttles I/O using a token
|
|
bucket algorithm, with the ability to configure the limit (average
|
|
speed over time) and potential for a higher rate (a burst) for a short
|
|
period of time (burst_seconds). When any of these limits is reached,
|
|
and there is no burst capacity left, librbd reduces the rate of that
|
|
type of I/O to the limit.
|
|
|
|
For example, if a read bps limit of 100MB was configured, but writes
|
|
were not limited, writes could proceed as quickly as possible, while
|
|
reads would be throttled to 100MB/s on average. If a read bps burst of
|
|
150MB was set, and read burst seconds was set to five seconds, reads
|
|
could proceed at 150MB/s for up to five seconds before dropping back
|
|
to the 100MB/s limit.
|
|
|
|
The following options configure these throttles:
|
|
|
|
.. confval:: rbd_qos_iops_limit
|
|
.. confval:: rbd_qos_iops_burst
|
|
.. confval:: rbd_qos_iops_burst_seconds
|
|
.. confval:: rbd_qos_read_iops_limit
|
|
.. confval:: rbd_qos_read_iops_burst
|
|
.. confval:: rbd_qos_read_iops_burst_seconds
|
|
.. confval:: rbd_qos_write_iops_limit
|
|
.. confval:: rbd_qos_write_iops_burst
|
|
.. confval:: rbd_qos_write_iops_burst_seconds
|
|
.. confval:: rbd_qos_bps_limit
|
|
.. confval:: rbd_qos_bps_burst
|
|
.. confval:: rbd_qos_bps_burst_seconds
|
|
.. confval:: rbd_qos_read_bps_limit
|
|
.. confval:: rbd_qos_read_bps_burst
|
|
.. confval:: rbd_qos_read_bps_burst_seconds
|
|
.. confval:: rbd_qos_write_bps_limit
|
|
.. confval:: rbd_qos_write_bps_burst
|
|
.. confval:: rbd_qos_write_bps_burst_seconds
|
|
.. confval:: rbd_qos_schedule_tick_min
|
|
.. confval:: rbd_qos_exclude_ops
|