mirror of
https://github.com/ceph/ceph
synced 2025-01-02 17:12:31 +00:00
erasure-code: document pool operations
A short introduction to the first time user of an erasure coded pool. It includes a reminder of how it relates to cache tiering and links to define new profiles with an example. There was examples in the developer documentation but the operator expects to find such a guide in the rados operations chapter. http://tracker.ceph.com/issues/9970 Fixes: #9970 Signed-off-by: Loic Dachary <ldachary@redhat.com>
This commit is contained in:
parent
725d05f52f
commit
c44bdb1dc9
173
doc/rados/operations/erasure-code.rst
Normal file
173
doc/rados/operations/erasure-code.rst
Normal file
@ -0,0 +1,173 @@
|
||||
=============
|
||||
Erasure code
|
||||
=============
|
||||
|
||||
A Ceph pool is associated to a type to sustain the loss of an OSD
|
||||
(i.e. a disk since most of the time there is one OSD per disk). The
|
||||
default choice when `creating a pool <../pools>`_ is *replicated*,
|
||||
meaning every object is copied on multiple disks. The `Erasure Code
|
||||
<https://en.wikipedia.org/wiki/Erasure_code>`_ pool type can be used
|
||||
instead to save space.
|
||||
|
||||
Creating a sample erasure coded pool
|
||||
------------------------------------
|
||||
|
||||
The simplest erasure coded pool is equivalent to `RAID5
|
||||
<https://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_5>`_ and
|
||||
requires at least three hosts::
|
||||
|
||||
$ ceph osd pool create ecpool 12 12 erasure
|
||||
pool 'ecpool' created
|
||||
$ echo ABCDEFGHI | rados --pool ecpool put NYAN -
|
||||
$ rados --pool ecpool get NYAN -
|
||||
ABCDEFGHI
|
||||
|
||||
.. note:: the 12 in *pool create* stands for
|
||||
`the number of placement groups <../pools>`_.
|
||||
|
||||
Erasure code profiles
|
||||
---------------------
|
||||
|
||||
The default erasure code profile sustains the loss of a single OSD. It
|
||||
is equivalent to a replicated pool of size two but requires 1.5TB
|
||||
instead of 2TB to store 1TB of data. The default profile can be
|
||||
displayed with::
|
||||
|
||||
$ ceph osd erasure-code-profile get default
|
||||
directory=.libs
|
||||
k=2
|
||||
m=1
|
||||
plugin=jerasure
|
||||
ruleset-failure-domain=host
|
||||
technique=reed_sol_van
|
||||
|
||||
Choosing the right profile is important because it cannot be modified
|
||||
after the pool is created: a new pool with a different profile needs
|
||||
to be created and all objects from the previous pool moved to the new.
|
||||
|
||||
The most important parameters of the profile are *K*, *M* and
|
||||
*ruleset-failure-domain* because they define the storage overhead and
|
||||
the data durability. For instance, if the desired architecture must
|
||||
sustain the loss of two racks with a storage overhead of 40% overhead,
|
||||
the following profile can be defined::
|
||||
|
||||
$ ceph osd erasure-code-profile set myprofile \
|
||||
k=3 \
|
||||
m=2 \
|
||||
ruleset-failure-domain=rack
|
||||
$ ceph osd pool create ecpool 12 12 erasure *myprofile*
|
||||
$ echo ABCDEFGHI | rados --pool ecpool put NYAN -
|
||||
$ rados --pool ecpool get NYAN -
|
||||
ABCDEFGHI
|
||||
|
||||
The *NYAN* object will be divided in three (*K=3*) and two additional
|
||||
*chunks* will be created (*M=2*). The value of *M* defines how many
|
||||
OSD can be lost simultaneously without losing any data. The
|
||||
*ruleset-failure-domain=rack* will create a CRUSH ruleset that ensures
|
||||
no two *chunks* are stored in the same rack.
|
||||
|
||||
.. ditaa::
|
||||
+-------------------+
|
||||
name | NYAN |
|
||||
+-------------------+
|
||||
content | ABCDEFGHI |
|
||||
+--------+----------+
|
||||
|
|
||||
|
|
||||
v
|
||||
+------+------+
|
||||
+---------------+ encode(3,2) +-----------+
|
||||
| +--+--+---+---+ |
|
||||
| | | | |
|
||||
| +-------+ | +-----+ |
|
||||
| | | | |
|
||||
+--v---+ +--v---+ +--v---+ +--v---+ +--v---+
|
||||
name | NYAN | | NYAN | | NYAN | | NYAN | | NYAN |
|
||||
+------+ +------+ +------+ +------+ +------+
|
||||
shard | 1 | | 2 | | 3 | | 4 | | 5 |
|
||||
+------+ +------+ +------+ +------+ +------+
|
||||
content | ABC | | DEF | | GHI | | YXY | | QGC |
|
||||
+--+---+ +--+---+ +--+---+ +--+---+ +--+---+
|
||||
| | | | |
|
||||
| | v | |
|
||||
| | +--+---+ | |
|
||||
| | | OSD1 | | |
|
||||
| | +------+ | |
|
||||
| | | |
|
||||
| | +------+ | |
|
||||
| +------>| OSD2 | | |
|
||||
| +------+ | |
|
||||
| | |
|
||||
| +------+ | |
|
||||
| | OSD3 |<----+ |
|
||||
| +------+ |
|
||||
| |
|
||||
| +------+ |
|
||||
| | OSD4 |<--------------+
|
||||
| +------+
|
||||
|
|
||||
| +------+
|
||||
+----------------->| OSD5 |
|
||||
+------+
|
||||
|
||||
|
||||
More information can be found in the `erasure code profiles
|
||||
<../erasure-code-profile>`_ documentation.
|
||||
|
||||
Erasure coded pool and cache tiering
|
||||
------------------------------------
|
||||
|
||||
Erasure coded pools require more resources than replicated pools and
|
||||
lack some functionalities such as partial writes. To overcome these
|
||||
limitations, it is recommended to set a `cache tier <../cache-tiering>`_
|
||||
before the erasure coded pool.
|
||||
|
||||
For instance, if the pool *hot-storage* is made of fast storage::
|
||||
|
||||
$ ceph osd tier add ecpool hot-storage
|
||||
$ ceph osd tier cache-mode hot-storage writeback
|
||||
$ ceph osd tier set-overlay ecpool hot-storage
|
||||
|
||||
will place the *hot-storage* pool as tier of *ecpool* in *writeback*
|
||||
mode so that every write and read to the *ecpool* are actually using
|
||||
the *hot-storage* and benefit from its flexibility and speed.
|
||||
|
||||
It is not possible to create an RBD image on an erasure coded pool
|
||||
because it requires partial writes. It is however possible to create
|
||||
an RBD image on an erasure coded pools when a replicated pool tier set
|
||||
a cache tier::
|
||||
|
||||
$ rbd --pool ecpool create --size 10 myvolume
|
||||
|
||||
More information can be found in the `cache tiering
|
||||
<../cache-tiering>`_ documentation.
|
||||
|
||||
Glossary
|
||||
--------
|
||||
|
||||
*chunk*
|
||||
when the encoding function is called, it returns chunks of the same
|
||||
size. Data chunks which can be concatenated to reconstruct the original
|
||||
object and coding chunks which can be used to rebuild a lost chunk.
|
||||
|
||||
*K*
|
||||
the number of data *chunks*, i.e. the number of *chunks* in which the
|
||||
original object is divided. For instance if *K* = 2 a 10KB object
|
||||
will be divided into *K* objects of 5KB each.
|
||||
|
||||
*M*
|
||||
the number of coding *chunks*, i.e. the number of additional *chunks*
|
||||
computed by the encoding functions. If there are 2 coding *chunks*,
|
||||
it means 2 OSDs can be out without losing data.
|
||||
|
||||
|
||||
Table of content
|
||||
----------------
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
erasure-code-profile
|
||||
erasure-code-jerasure
|
||||
erasure-code-isa
|
||||
erasure-code-lrc
|
@ -32,7 +32,7 @@ CRUSH algorithm.
|
||||
|
||||
data-placement
|
||||
pools
|
||||
erasure-code-profile
|
||||
erasure-code
|
||||
cache-tiering
|
||||
placement-groups
|
||||
crush-map
|
||||
|
@ -9,7 +9,7 @@ pools for storing data. A pool provides you with:
|
||||
For replicated pools, it is the desired number of copies/replicas of an object.
|
||||
A typical configuration stores an object and one additional copy
|
||||
(i.e., ``size = 2``), but you can determine the number of copies/replicas.
|
||||
For erasure coded pools, it is the number of coding chunks
|
||||
For `erasure coded pools <../erasure-code>`_, it is the number of coding chunks
|
||||
(i.e. ``m=2`` in the **erasure code profile**)
|
||||
|
||||
- **Placement Groups**: You can set the number of placement groups for the pool.
|
||||
@ -31,7 +31,6 @@ pools for storing data. A pool provides you with:
|
||||
To organize data into pools, you can list, create, and remove pools.
|
||||
You can also view the utilization statistics for each pool.
|
||||
|
||||
|
||||
List Pools
|
||||
==========
|
||||
|
||||
@ -98,8 +97,9 @@ Where:
|
||||
|
||||
:Description: The pool type which may either be **replicated** to
|
||||
recover from lost OSDs by keeping multiple copies of the
|
||||
objects or **erasure** to get a kind of generalized
|
||||
RAID5 capability. The **replicated** pools require more
|
||||
objects or **erasure** to get a kind of
|
||||
`generalized RAID5 <../erasure-code>`_ capability.
|
||||
The **replicated** pools require more
|
||||
raw storage but implement all Ceph operations. The
|
||||
**erasure** pools require less raw storage but only
|
||||
implement a subset of the available operations.
|
||||
|
Loading…
Reference in New Issue
Block a user