mirror of
https://github.com/ceph/ceph
synced 2025-03-30 07:19:14 +00:00
Merge pull request #29135 from dillaman/wip-40486
doc/rbd: re-organize top-level and add live-migration docs Reviewed-by: Mykola Golub <mgolub@suse.com>
This commit is contained in:
commit
978d185cc8
@ -12,6 +12,11 @@ Synopsis
|
||||
| **rbd-fuse** [ -p pool ] [-c conffile] *mountpoint* [ *fuse options* ]
|
||||
|
||||
|
||||
Note
|
||||
====
|
||||
|
||||
**rbd-fuse** is not recommended for any production or high performance workloads.
|
||||
|
||||
Description
|
||||
===========
|
||||
|
||||
|
@ -39,20 +39,19 @@ to operate the :ref:`Ceph RADOS Gateway <object-gateway>`, the
|
||||
Ceph cluster.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:maxdepth: 1
|
||||
|
||||
Commands <rados-rbd-cmds>
|
||||
Kernel Modules <rbd-ko>
|
||||
Snapshots<rbd-snapshot>
|
||||
Mirroring <rbd-mirroring>
|
||||
Persistent Cache <rbd-persistent-cache>
|
||||
LIO iSCSI Gateway <iscsi-overview>
|
||||
QEMU <qemu-rbd>
|
||||
libvirt <libvirt>
|
||||
librbd Settings <rbd-config-ref/>
|
||||
OpenStack <rbd-openstack>
|
||||
CloudStack <rbd-cloudstack>
|
||||
RBD Replay <rbd-replay>
|
||||
Basic Commands <rados-rbd-cmds>
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
Operations <rbd-operations>
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
Integrations <rbd-integrations>
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
@ -9,8 +9,8 @@
|
||||
rbd-fuse <../../man/8/rbd-fuse>
|
||||
rbd-nbd <../../man/8/rbd-nbd>
|
||||
rbd-ggate <../../man/8/rbd-ggate>
|
||||
rbd-map <../../man/8/rbdmap>
|
||||
ceph-rbdnamer <../../man/8/ceph-rbdnamer>
|
||||
rbd-replay-prep <../../man/8/rbd-replay-prep>
|
||||
rbd-replay <../../man/8/rbd-replay>
|
||||
rbd-replay-many <../../man/8/rbd-replay-many>
|
||||
rbd-map <../../man/8/rbdmap>
|
||||
|
@ -1,6 +1,6 @@
|
||||
=======================
|
||||
Block Device Commands
|
||||
=======================
|
||||
=============================
|
||||
Basic Block Device Commands
|
||||
=============================
|
||||
|
||||
.. index:: Ceph Block Device; image management
|
||||
|
||||
|
@ -1,5 +1,5 @@
|
||||
=======================
|
||||
librbd Settings
|
||||
Config Settings
|
||||
=======================
|
||||
|
||||
See `Block Device`_ for additional details.
|
||||
@ -9,7 +9,7 @@ Cache Settings
|
||||
|
||||
.. sidebar:: Kernel Caching
|
||||
|
||||
The kernel driver for Ceph block devices can use the Linux page cache to
|
||||
The kernel driver for Ceph block devices can use the Linux page cache to
|
||||
improve performance.
|
||||
|
||||
The user space implementation of the Ceph block device (i.e., ``librbd``) cannot
|
||||
@ -19,33 +19,36 @@ disk caching. When the OS sends a barrier or a flush request, all dirty data is
|
||||
written to the OSDs. This means that using write-back caching is just as safe as
|
||||
using a well-behaved physical hard disk with a VM that properly sends flushes
|
||||
(i.e. Linux kernel >= 2.6.32). The cache uses a Least Recently Used (LRU)
|
||||
algorithm, and in write-back mode it can coalesce contiguous requests for
|
||||
algorithm, and in write-back mode it can coalesce contiguous requests for
|
||||
better throughput.
|
||||
|
||||
.. versionadded:: 0.46
|
||||
The librbd cache is enabled by default and supports three different cache
|
||||
policies: write-around, write-back, and write-through. Writes return
|
||||
immediately under both the write-around and write-back policies, unless there
|
||||
are more than ``rbd cache max dirty`` unwritten bytes to the storage cluster.
|
||||
The write-around policy differs from the write-back policy in that it does
|
||||
not attempt to service read requests from the cache, unlike the write-back
|
||||
policy, and is therefore faster for high performance write workloads. Under the
|
||||
write-through policy, writes return only when the data is on disk on all
|
||||
replicas, but reads may come from the cache.
|
||||
|
||||
Ceph supports write-back caching for RBD. To enable it, add ``rbd cache =
|
||||
true`` to the ``[client]`` section of your ``ceph.conf`` file. By default
|
||||
``librbd`` does not perform any caching. Writes and reads go directly to the
|
||||
storage cluster, and writes return only when the data is on disk on all
|
||||
replicas. With caching enabled, writes return immediately, unless there are more
|
||||
than ``rbd cache max dirty`` unflushed bytes. In this case, the write triggers
|
||||
writeback and blocks until enough bytes are flushed.
|
||||
Prior to receiving a flush request, the cache behaves like a write-through cache
|
||||
to ensure safe operation for older operating systems that do not send flushes to
|
||||
ensure crash consistent behavior.
|
||||
|
||||
.. versionadded:: 0.47
|
||||
If the librbd cache is disabled, writes and
|
||||
reads go directly to the storage cluster, and writes return only when the data
|
||||
is on disk on all replicas.
|
||||
|
||||
.. note::
|
||||
The cache is in memory on the client, and each RBD image has
|
||||
its own. Since the cache is local to the client, there's no coherency
|
||||
if there are others accessing the image. Running GFS or OCFS on top of
|
||||
RBD will not work with caching enabled.
|
||||
|
||||
Ceph supports write-through caching for RBD. You can set the size of
|
||||
the cache, and you can set targets and limits to switch from
|
||||
write-back caching to write through caching. To enable write-through
|
||||
mode, set ``rbd cache max dirty`` to 0. This means writes return only
|
||||
when the data is on disk on all replicas, but reads may come from the
|
||||
cache. The cache is in memory on the client, and each RBD image has
|
||||
its own. Since the cache is local to the client, there's no coherency
|
||||
if there are others accessing the image. Running GFS or OCFS on top of
|
||||
RBD will not work with caching enabled.
|
||||
|
||||
The ``ceph.conf`` file settings for RBD should be set in the ``[client]``
|
||||
section of your configuration file. The settings include:
|
||||
section of your configuration file. The settings include:
|
||||
|
||||
|
||||
``rbd cache``
|
||||
@ -56,12 +59,30 @@ section of your configuration file. The settings include:
|
||||
:Default: ``true``
|
||||
|
||||
|
||||
``rbd cache policy``
|
||||
|
||||
:Description: Select the caching policy for librbd.
|
||||
:Type: Enum
|
||||
:Required: No
|
||||
:Default: ``writearound``
|
||||
:Values: ``writearound``, ``writeback``, ``writethrough``
|
||||
|
||||
|
||||
``rbd cache writethrough until flush``
|
||||
|
||||
:Description: Start out in write-through mode, and switch to write-back after the first flush request is received. Enabling this is a conservative but safe setting in case VMs running on rbd are too old to send flushes, like the virtio driver in Linux before 2.6.32.
|
||||
:Type: Boolean
|
||||
:Required: No
|
||||
:Default: ``true``
|
||||
|
||||
|
||||
``rbd cache size``
|
||||
|
||||
:Description: The RBD cache size in bytes.
|
||||
:Type: 64-bit Integer
|
||||
:Required: No
|
||||
:Default: ``32 MiB``
|
||||
:Policies: write-back and write-through
|
||||
|
||||
|
||||
``rbd cache max dirty``
|
||||
@ -71,6 +92,7 @@ section of your configuration file. The settings include:
|
||||
:Required: No
|
||||
:Constraint: Must be less than ``rbd cache size``.
|
||||
:Default: ``24 MiB``
|
||||
:Policies: write-around and write-back
|
||||
|
||||
|
||||
``rbd cache target dirty``
|
||||
@ -80,6 +102,7 @@ section of your configuration file. The settings include:
|
||||
:Required: No
|
||||
:Constraint: Must be less than ``rbd cache max dirty``.
|
||||
:Default: ``16 MiB``
|
||||
:Policies: write-back
|
||||
|
||||
|
||||
``rbd cache max dirty age``
|
||||
@ -88,15 +111,8 @@ section of your configuration file. The settings include:
|
||||
:Type: Float
|
||||
:Required: No
|
||||
:Default: ``1.0``
|
||||
:Policies: write-back
|
||||
|
||||
.. versionadded:: 0.60
|
||||
|
||||
``rbd cache writethrough until flush``
|
||||
|
||||
:Description: Start out in write-through mode, and switch to write-back after the first flush request is received. Enabling this is a conservative but safe setting in case VMs running on rbd are too old to send flushes, like the virtio driver in Linux before 2.6.32.
|
||||
:Type: Boolean
|
||||
:Required: No
|
||||
:Default: ``true``
|
||||
|
||||
.. _Block Device: ../../rbd
|
||||
|
||||
@ -104,12 +120,10 @@ section of your configuration file. The settings include:
|
||||
Read-ahead Settings
|
||||
=======================
|
||||
|
||||
.. versionadded:: 0.86
|
||||
|
||||
RBD supports read-ahead/prefetching to optimize small, sequential reads.
|
||||
librbd supports read-ahead/prefetching to optimize small, sequential reads.
|
||||
This should normally be handled by the guest OS in the case of a VM,
|
||||
but boot loaders may not issue efficient reads.
|
||||
Read-ahead is automatically disabled if caching is disabled.
|
||||
but boot loaders may not issue efficient reads. Read-ahead is automatically
|
||||
disabled if caching is disabled or if the policy is write-around.
|
||||
|
||||
|
||||
``rbd readahead trigger requests``
|
||||
@ -136,8 +150,8 @@ Read-ahead is automatically disabled if caching is disabled.
|
||||
:Default: ``50 MiB``
|
||||
|
||||
|
||||
RBD Features
|
||||
============
|
||||
Image Features
|
||||
==============
|
||||
|
||||
RBD supports advanced features which can be specified via the command line when creating images or the default features can be specified via Ceph config file via 'rbd_default_features = <sum of feature numeric values>' or 'rbd_default_features = <comma-delimited list of CLI values>'
|
||||
|
||||
@ -233,10 +247,10 @@ RBD supports advanced features which can be specified via the command line when
|
||||
:KRBD support: no
|
||||
|
||||
|
||||
RBD QOS Settings
|
||||
================
|
||||
QOS Settings
|
||||
============
|
||||
|
||||
RBD supports limiting per image IO, controlled by the following
|
||||
librbd supports limiting per image IO, controlled by the following
|
||||
settings.
|
||||
|
||||
``rbd qos iops limit``
|
||||
|
13
doc/rbd/rbd-integrations.rst
Normal file
13
doc/rbd/rbd-integrations.rst
Normal file
@ -0,0 +1,13 @@
|
||||
=========================================
|
||||
Ceph Block Device 3rd Party Integration
|
||||
=========================================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
Kernel Modules <rbd-ko>
|
||||
QEMU <qemu-rbd>
|
||||
libvirt <libvirt>
|
||||
OpenStack <rbd-openstack>
|
||||
CloudStack <rbd-cloudstack>
|
||||
LIO iSCSI Gateway <iscsi-overview>
|
155
doc/rbd/rbd-live-migration.rst
Normal file
155
doc/rbd/rbd-live-migration.rst
Normal file
@ -0,0 +1,155 @@
|
||||
======================
|
||||
Image Live-Migration
|
||||
======================
|
||||
|
||||
.. index:: Ceph Block Device; live-migration
|
||||
|
||||
RBD images can be live-migrated between different pools within the same cluster
|
||||
or between different image formats and layouts. When started, the source image
|
||||
will be deep-copied to the destination image, pulling all snapshot history and
|
||||
optionally keeping any link to the source image's parent to help preserve
|
||||
sparseness.
|
||||
|
||||
This copy process can safely run in the background while the new target image is
|
||||
in-use. There is currently a requirement to temporarily stop using the source
|
||||
image before preparing a migration. This helps to ensure that the client using
|
||||
the image is updated to point to the new target image.
|
||||
|
||||
.. note::
|
||||
Image live-migration requires the Ceph Nautilus release or later. The krbd
|
||||
kernel module does not support live-migration at this time.
|
||||
|
||||
|
||||
.. ditaa:: +-------------+ +-------------+
|
||||
| {s} c999 | | {s} |
|
||||
| Live | Target refers | Live |
|
||||
| migration |<-------------*| migration |
|
||||
| source | to Source | target |
|
||||
| | | |
|
||||
| (read only) | | (writable) |
|
||||
+-------------+ +-------------+
|
||||
|
||||
Source Target
|
||||
|
||||
The live-migration process is comprised of three steps:
|
||||
|
||||
#. **Prepare Migration:** The initial step creates the new target image and
|
||||
cross-links the source and target images. Similar to `layered images`_,
|
||||
attempts to read uninitialized extents within the target image will
|
||||
internally redirect the read to the source image, and writes to
|
||||
uninitialized extents within the target will internally deep-copy the
|
||||
overlapping source image block to the target image.
|
||||
|
||||
|
||||
#. **Execute Migration:** This is a background operation that deep-copies all
|
||||
initialized blocks from the source image to the target. This step can be
|
||||
run while clients are actively using the new target image.
|
||||
|
||||
|
||||
#. **Finish Migration:** Once the background migration process has completed,
|
||||
the migration can be committed or aborted. Committing the migration will
|
||||
remove the cross-links between the source and target images, and will
|
||||
remove the source image. Aborting the migration will remove the cross-links,
|
||||
and will remove the target image.
|
||||
|
||||
Prepare Migration
|
||||
=================
|
||||
|
||||
The live-migration process is initiated by running the `rbd migration prepare`
|
||||
command, providing the source and target images::
|
||||
|
||||
$ rbd migration prepare migration_source [migration_target]
|
||||
|
||||
The `rbd migration prepare` command accepts all the same layout optionals as the
|
||||
`rbd create` command, which allows changes to the immutable image on-disk
|
||||
layout. The `migration_target` can be skipped if the goal is only to change the
|
||||
on-disk layout, keeping the original image name.
|
||||
|
||||
All clients using the source image must be stopped prior to preparing a
|
||||
live-migration. The prepare step will fail if it finds any running clients with
|
||||
the image open in read/write mode. Once the prepare step is complete, the
|
||||
clients can be restarted using the new target image name. Attempting to restart
|
||||
the clients using the source image name will result in failure.
|
||||
|
||||
The `rbd status` command will show the current state of the live-migration::
|
||||
|
||||
$ rbd status migration_target
|
||||
Watchers: none
|
||||
Migration:
|
||||
source: rbd/migration_source (5e2cba2f62e)
|
||||
destination: rbd/migration_target (5e2ed95ed806)
|
||||
state: prepared
|
||||
|
||||
Note that the source image will be moved to the RBD trash to avoid mistaken
|
||||
usage during the migration process::
|
||||
|
||||
$ rbd info migration_source
|
||||
rbd: error opening image migration_source: (2) No such file or directory
|
||||
$ rbd trash ls --all
|
||||
5e2cba2f62e migration_source
|
||||
|
||||
|
||||
Execute Migration
|
||||
=================
|
||||
|
||||
After preparing the live-migration, the image blocks from the source image
|
||||
must be copied to the target image. This is accomplished by running the
|
||||
`rbd migration execute` command::
|
||||
|
||||
$ rbd migration execute migration_target
|
||||
Image migration: 100% complete...done.
|
||||
|
||||
The `rbd status` command will also provide feedback on the progress of the
|
||||
migration block deep-copy process::
|
||||
|
||||
$ rbd status migration_target
|
||||
Watchers:
|
||||
watcher=1.2.3.4:0/3695551461 client.123 cookie=123
|
||||
Migration:
|
||||
source: rbd/migration_source (5e2cba2f62e)
|
||||
destination: rbd/migration_target (5e2ed95ed806)
|
||||
state: executing (32% complete)
|
||||
|
||||
|
||||
Commit Migration
|
||||
================
|
||||
|
||||
Once the live-migration has completed deep-copying all data blocks from the
|
||||
source image to the target, the migration can be committed::
|
||||
|
||||
$ rbd status migration_target
|
||||
Watchers: none
|
||||
Migration:
|
||||
source: rbd/migration_source (5e2cba2f62e)
|
||||
destination: rbd/migration_target (5e2ed95ed806)
|
||||
state: executed
|
||||
$ rbd migration commit migration_target
|
||||
Commit image migration: 100% complete...done.
|
||||
|
||||
If the `migration_source` image is a parent of one or more clones, the `--force`
|
||||
option will need to be specified after ensuring all descendent clone images are
|
||||
not in use.
|
||||
|
||||
Commiting the live-migration will remove the cross-links between the source
|
||||
and target images, and will remove the source image::
|
||||
|
||||
$ rbd trash list --all
|
||||
|
||||
|
||||
Abort Migration
|
||||
===============
|
||||
|
||||
If you wish to revert the prepare or execute step, run the `rbd migration abort`
|
||||
command to revert the migration process::
|
||||
|
||||
$ rbd migration abort migration_target
|
||||
Abort image migration: 100% complete...done.
|
||||
|
||||
Aborting the migration will result in the target image being deleted and access
|
||||
to the original source image being restored::
|
||||
|
||||
$ rbd ls
|
||||
migration_source
|
||||
|
||||
|
||||
.. _layered images: ../rbd-snapshot/#layering
|
13
doc/rbd/rbd-operations.rst
Normal file
13
doc/rbd/rbd-operations.rst
Normal file
@ -0,0 +1,13 @@
|
||||
==============================
|
||||
Ceph Block Device Operations
|
||||
==============================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
Snapshots<rbd-snapshot>
|
||||
Mirroring <rbd-mirroring>
|
||||
Live-Migration <rbd-live-migration>
|
||||
Persistent Cache <rbd-persistent-cache>
|
||||
Config Settings (librbd) <rbd-config-ref/>
|
||||
RBD Replay <rbd-replay>
|
Loading…
Reference in New Issue
Block a user