mirror of
https://github.com/ceph/ceph
synced 2024-12-28 22:43:29 +00:00
doc:dev update how to use deduplication
Add prerequisite for deduplication, example outputs, and chunk-repair operation. Signed-off-by: Sungmin Lee <sung_min.lee@samsung.com>
This commit is contained in:
parent
d01e128db2
commit
9b74a33fa1
@ -157,25 +157,46 @@ How to use deduplication
|
||||
Ceph provides deduplication using RADOS machinery.
|
||||
Below we explain how to perform deduplication.
|
||||
|
||||
Prerequisite
|
||||
------------
|
||||
|
||||
If the Ceph cluster is started from Ceph mainline, users need to check
|
||||
``ceph-test`` package which is including ceph-dedup-tool is installed.
|
||||
|
||||
Deatiled Instructions
|
||||
---------------------
|
||||
|
||||
Users can use ceph-dedup-tool with ``estimate``, ``sample-dedup``,
|
||||
``chunk-scrub``, and ``chunk-repair`` operations. To provide better
|
||||
convenience for users, we have enabled necessary operations through
|
||||
ceph-dedup-tool, and we recommend using the following operations freely
|
||||
by using any types of scripts.
|
||||
|
||||
|
||||
1. Estimate space saving ratio of a target pool using ``ceph-dedup-tool``.
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. code:: bash
|
||||
|
||||
ceph-dedup-tool --op estimate --pool $POOL --chunk-size chunk_size
|
||||
--chunk-algorithm fixed|fastcdc --fingerprint-algorithm sha1|sha256|sha512
|
||||
--max-thread THREAD_COUNT
|
||||
ceph-dedup-tool --op estimate
|
||||
--pool [BASE_POOL]
|
||||
--chunk-size [CHUNK_SIZE]
|
||||
--chunk-algorithm [fixed|fastcdc]
|
||||
--fingerprint-algorithm [sha1|sha256|sha512]
|
||||
--max-thread [THREAD_COUNT]
|
||||
|
||||
This CLI command will show how much storage space can be saved when deduplication
|
||||
is applied on the pool. If the amount of the saved space is higher than user's expectation,
|
||||
the pool probably is worth performing deduplication.
|
||||
Users should specify $POOL where the object---the users want to perform
|
||||
deduplication---is stored. The users also need to run ceph-dedup-tool multiple time
|
||||
Users should specify the ``BASE_POOL``, within which the object targeted for deduplication
|
||||
is stored. The users also need to run ceph-dedup-tool multiple time
|
||||
with varying ``chunk_size`` to find the optimal chunk size. Note that the
|
||||
optimal value probably differs in the content of each object in case of fastcdc
|
||||
chunk algorithm (not fixed). Example output:
|
||||
chunk algorithm (not fixed).
|
||||
|
||||
::
|
||||
Example output:
|
||||
|
||||
.. code:: bash
|
||||
|
||||
{
|
||||
"chunk_algo": "fastcdc",
|
||||
@ -204,54 +225,202 @@ represents the standard deviation of the chunk size.
|
||||
|
||||
|
||||
2. Create chunk pool.
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. code:: bash
|
||||
|
||||
ceph osd pool create CHUNK_POOL
|
||||
ceph osd pool create [CHUNK_POOL]
|
||||
|
||||
|
||||
3. Run dedup command (there are two ways).
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
- **sample-dedup**
|
||||
|
||||
.. code:: bash
|
||||
|
||||
ceph-dedup-tool --op sample-dedup --pool POOL --chunk-pool CHUNK_POOL --chunk-size
|
||||
CHUNK_SIZE --chunk-algorithm fastcdc --fingerprint-algorithm sha1|sha256|sha512
|
||||
--chunk-dedup-threshold THRESHOLD --max-thread THREAD_COUNT ----sampling-ratio SAMPLE_RATIO
|
||||
--wakeup-period WAKEUP_PERIOD --loop --snap
|
||||
ceph-dedup-tool --op sample-dedup
|
||||
--pool [BASE_POOL]
|
||||
--chunk-pool [CHUNK_POOL]
|
||||
--chunk-size [CHUNK_SIZE]
|
||||
--chunk-algorithm [fastcdc]
|
||||
--fingerprint-algorithm [sha1|sha256|sha512]
|
||||
--chunk-dedup-threshold [THRESHOLD]
|
||||
--max-thread [THREAD_COUNT]
|
||||
--sampling-ratio [SAMPLE_RATIO]
|
||||
--wakeup-period [WAKEUP_PERIOD]
|
||||
--loop
|
||||
--snap
|
||||
|
||||
The ``sample-dedup`` comamnd spawns threads specified by ``THREAD_COUNT`` to deduplicate objects on
|
||||
the ``POOL``. According to sampling-ratio---do a full search if ``SAMPLE_RATIO`` is 100, the threads selectively
|
||||
the ``BASE_POOL``. According to sampling-ratio---do a full search if ``SAMPLE_RATIO`` is 100, the threads selectively
|
||||
perform deduplication if the chunk is redundant over ``THRESHOLD`` times during iteration.
|
||||
If --loop is set, the theads will wakeup after ``WAKEUP_PERIOD``. If not, the threads will exit after one iteration.
|
||||
|
||||
Example output:
|
||||
|
||||
.. code:: bash
|
||||
|
||||
ceph-dedup-tool --op object-dedup --pool POOL --object OID --chunk-pool CHUNK_POOL
|
||||
--fingerprint-algorithm sha1|sha256|sha512 --dedup-cdc-chunk-size CHUNK_SIZE
|
||||
$ bin/ceph df
|
||||
--- RAW STORAGE ---
|
||||
CLASS SIZE AVAIL USED RAW USED %RAW USED
|
||||
ssd 303 GiB 294 GiB 9.0 GiB 9.0 GiB 2.99
|
||||
TOTAL 303 GiB 294 GiB 9.0 GiB 9.0 GiB 2.99
|
||||
|
||||
--- POOLS ---
|
||||
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
|
||||
.mgr 1 1 577 KiB 2 1.7 MiB 0 97 GiB
|
||||
base 2 32 2.0 GiB 517 6.0 GiB 2.02 97 GiB
|
||||
chunk 3 32 0 B 0 0 B 0 97 GiB
|
||||
|
||||
$ bin/ceph-dedup-tool --op sample-dedup --pool base --chunk-pool chunk
|
||||
--fingerprint-algorithm sha1 --chunk-algorithm fastcdc --loop --sampling-ratio 100
|
||||
--chunk-dedup-threshold 2 --chunk-size 8192 --max-thread 4 --wakeup-period 60
|
||||
|
||||
$ bin/ceph df
|
||||
--- RAW STORAGE ---
|
||||
CLASS SIZE AVAIL USED RAW USED %RAW USED
|
||||
ssd 303 GiB 298 GiB 5.4 GiB 5.4 GiB 1.80
|
||||
TOTAL 303 GiB 298 GiB 5.4 GiB 5.4 GiB 1.80
|
||||
|
||||
--- POOLS ---
|
||||
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
|
||||
.mgr 1 1 577 KiB 2 1.7 MiB 0 98 GiB
|
||||
base 2 32 452 MiB 262 1.3 GiB 0.50 98 GiB
|
||||
chunk 3 32 258 MiB 25.91k 938 MiB 0.31 98 GiB
|
||||
|
||||
- **object dedup**
|
||||
|
||||
.. code:: bash
|
||||
|
||||
ceph-dedup-tool --op object-dedup
|
||||
--pool [BASE_POOL]
|
||||
--object [OID]
|
||||
--chunk-pool [CHUNK_POOL]
|
||||
--fingerprint-algorithm [sha1|sha256|sha512]
|
||||
--dedup-cdc-chunk-size [CHUNK_SIZE]
|
||||
|
||||
The ``object-dedup`` command triggers deduplication on the RADOS object specified by ``OID``.
|
||||
All parameters shown above must be specified. ``CHUNK_SIZE`` should be taken from
|
||||
the results of step 1 above.
|
||||
Note that when this command is executed, ``fastcdc`` will be set by default and other parameters
|
||||
such as ``FP`` and ``CHUNK_SIZE`` will be set as defaults for the pool.
|
||||
such as ``fingerprint-algorithm`` and ``CHUNK_SIZE`` will be set as defaults for the pool.
|
||||
Deduplicated objects will appear in the chunk pool. If the object is mutated over time, user needs to re-run
|
||||
``object-dedup`` because chunk-boundary should be recalculated based on updated contents.
|
||||
The user needs to specify ``snap`` if the target object is snapshotted. After deduplication is done, the target
|
||||
object size in ``POOL`` is zero (evicted) and chunks objects are genereated---these appear in ``CHUNK_POOL``.
|
||||
|
||||
object size in ``BASE_POOL`` is zero (evicted) and chunks objects are genereated---these appear in ``CHUNK_POOL``.
|
||||
|
||||
4. Read/write I/Os
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
After step 3, the users don't need to consider anything about I/Os. Deduplicated objects are
|
||||
completely compatible with existing RAODS operations.
|
||||
|
||||
|
||||
5. Run scrub to fix reference count
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Reference mismatches can on rare occasions occur to false positives when handling reference counts for
|
||||
deduplicated RADOS objects. These mismatches will be fixed by periodically scrubbing the pool:
|
||||
|
||||
.. code:: bash
|
||||
|
||||
ceph-dedup-tool --op chunk-scrub --op chunk-scrub --chunk-pool CHUNK_POOL --pool POOL --max-thread THREAD_COUNT
|
||||
|
||||
ceph-dedup-tool --op chunk-scrub
|
||||
--chunk-pool [CHUNK_POOL]
|
||||
--pool [POOL]
|
||||
--max-thread [THREAD_COUNT]
|
||||
|
||||
The ``chunk-scrub`` command identifies reference mismatches between a
|
||||
metadata object and a chunk object. The ``chunk-pool`` parameter tells
|
||||
where the target chunk objects are located to the ceph-dedup-tool.
|
||||
|
||||
Example output:
|
||||
|
||||
A reference mismatch is intentionally created by inserting a reference (dummy-obj) into a chunk object (2ac67f70d3dd187f8f332bb1391f61d4e5c9baae) by using chunk-get-ref.
|
||||
|
||||
.. code:: bash
|
||||
|
||||
$ bin/ceph-dedup-tool --op dump-chunk-refs --chunk-pool chunk --object 2ac67f70d3dd187f8f332bb1391f61d4e5c9baae
|
||||
{
|
||||
"type": "by_object",
|
||||
"count": 2,
|
||||
"refs": [
|
||||
{
|
||||
"oid": "testfile2",
|
||||
"key": "",
|
||||
"snapid": -2,
|
||||
"hash": 2905889452,
|
||||
"max": 0,
|
||||
"pool": 2,
|
||||
"namespace": ""
|
||||
},
|
||||
{
|
||||
"oid": "dummy-obj",
|
||||
"key": "",
|
||||
"snapid": -2,
|
||||
"hash": 1203585162,
|
||||
"max": 0,
|
||||
"pool": 2,
|
||||
"namespace": ""
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
$ bin/ceph-dedup-tool --op chunk-scrub --chunk-pool chunk --max-thread 10
|
||||
10 seconds is set as report period by default
|
||||
join
|
||||
join
|
||||
2ac67f70d3dd187f8f332bb1391f61d4e5c9baae
|
||||
--done--
|
||||
2ac67f70d3dd187f8f332bb1391f61d4e5c9baae ref 10:5102bde2:::dummy-obj:head: referencing pool does not exist
|
||||
--done--
|
||||
Total object : 1
|
||||
Examined object : 1
|
||||
Damaged object : 1
|
||||
|
||||
6. Repair a mismatched chunk reference
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
If any reference mismatches occur after the ``chunk-scrub``, it is
|
||||
recommended to perform the ``chunk-repair`` operation to fix reference
|
||||
mismatches. The ``chunk-repair`` operation helps in resolving the
|
||||
reference mismatch and restoring consistency.
|
||||
|
||||
.. code:: bash
|
||||
|
||||
ceph-dedup-tool --op chunk-repair
|
||||
--chunk-pool [CHUNK_POOL_NAME]
|
||||
--object [CHUNK_OID]
|
||||
--target-ref [TARGET_OID]
|
||||
--target-ref-pool-id [TARGET_POOL_ID]
|
||||
|
||||
``chunk-repair`` fixes the ``target-ref``, which is a wrong reference of
|
||||
an ``object``. To fix it correctly, the users must enter the correct
|
||||
``TARGET_OID`` and ``TARGET_POOL_ID``.
|
||||
|
||||
.. code:: bash
|
||||
|
||||
$ bin/ceph-dedup-tool --op chunk-repair --chunk-pool chunk --object 2ac67f70d3dd187f8f332bb1391f61d4e5c9baae --target-ref dummy-obj --target-ref-pool-id 10
|
||||
2ac67f70d3dd187f8f332bb1391f61d4e5c9baae has 1 references for dummy-obj
|
||||
dummy-obj has 0 references for 2ac67f70d3dd187f8f332bb1391f61d4e5c9baae
|
||||
fix dangling reference from 1 to 0
|
||||
|
||||
$ bin/ceph-dedup-tool --op dump-chunk-refs --chunk-pool chunk --object 2ac67f70d3dd187f8f332bb1391f61d4e5c9baae
|
||||
{
|
||||
"type": "by_object",
|
||||
"count": 1,
|
||||
"refs": [
|
||||
{
|
||||
"oid": "testfile2",
|
||||
"key": "",
|
||||
"snapid": -2,
|
||||
"hash": 2905889452,
|
||||
"max": 0,
|
||||
"pool": 2,
|
||||
"namespace": ""
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user