btrfs-progs: docs: update some chapters
Signed-off-by: David Sterba <dsterba@suse.com>
This commit is contained in:
parent
b28f7bd9bb
commit
df91bfd5d5
|
@ -10,7 +10,7 @@ There are two main deduplication types:
|
||||||
|
|
||||||
* **in-band** *(sometimes also called on-line)* -- all newly written data are
|
* **in-band** *(sometimes also called on-line)* -- all newly written data are
|
||||||
considered for deduplication before writing
|
considered for deduplication before writing
|
||||||
* **out-of-band** *(sometimes alco called offline)* -- data for deduplication
|
* **out-of-band** *(sometimes also called offline)* -- data for deduplication
|
||||||
have to be actively looked for and deduplicated by the user application
|
have to be actively looked for and deduplicated by the user application
|
||||||
|
|
||||||
Both have their pros and cons. BTRFS implements **only out-of-band** type.
|
Both have their pros and cons. BTRFS implements **only out-of-band** type.
|
||||||
|
@ -37,8 +37,41 @@ be up-to-date, maintained and widely used.
|
||||||
- No
|
- No
|
||||||
- Yes
|
- Yes
|
||||||
|
|
||||||
Legend:
|
File based deduplication
|
||||||
|
------------------------
|
||||||
|
|
||||||
- *File based*: the tool takes a list of files and deduplicates blocks only from that set
|
The tool takes a list of files and tries to find duplicates among data only
|
||||||
- *Block based*: the tool enumerates blocks and looks for duplicates
|
from that files. This is suitable eg. for files that originated from the same
|
||||||
- *Incremental*: repeated runs of the tool utilizes information gathered from previous runs
|
base image, source of a reflinked file. Optionally the tools could track a
|
||||||
|
database of hashes and allow to deduplicate blocks from more files, or use that
|
||||||
|
for repeated runs and update the database incrementally.
|
||||||
|
|
||||||
|
Block based deduplication
|
||||||
|
-------------------------
|
||||||
|
|
||||||
|
The tool typically scans the filesystem and builds a database of file block
|
||||||
|
hashes, then finds candidate files and deduplicates the ranges. The hash
|
||||||
|
database is kept as an ordinary file and can be scaled according to the needs.
|
||||||
|
|
||||||
|
As the file changes, the hash database may get out of sync and the scan has to
|
||||||
|
be done repeatedly.
|
||||||
|
|
||||||
|
Safety of block comparison
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
The deduplication inside the filesystem is implemented as an ``ioctl`` that takes
|
||||||
|
a source file, destination file and the range. The blocks from both files are
|
||||||
|
compared for exact match before merging to the same range (ie. there's no
|
||||||
|
hash based comparison). Pages representing the extents in memory are locked
|
||||||
|
prior to deduplication and prevent concurrent modification by buffered writes
|
||||||
|
or mmaped writes.
|
||||||
|
|
||||||
|
Limitations, compatibility
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
Files that are subject do deduplication must have the same status regarding
|
||||||
|
COW, ie. both regular COW files with checksums, or both NOCOW, or files that
|
||||||
|
are COW but don't have checksums (NODATASUM attribute is set).
|
||||||
|
|
||||||
|
If the deduplication is in progress on any file in the filesystem, the *send*
|
||||||
|
operation cannot be started as it relies on the extent layout being unchanged.
|
||||||
|
|
|
@ -20,3 +20,11 @@ and takes care of synchronization. Once a filesystem sync or flush is started
|
||||||
devices. This however reduces the chances to find optimal layout as the writes
|
devices. This however reduces the chances to find optimal layout as the writes
|
||||||
happen together with other data and the result depends on the remaining free
|
happen together with other data and the result depends on the remaining free
|
||||||
space layout and fragmentation.
|
space layout and fragmentation.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
Defragmentation does not preserve extent sharing, eg. files created by **cp
|
||||||
|
--reflink** or existing on multiple snapshots. Due to that the data space
|
||||||
|
consumption may increase.
|
||||||
|
|
||||||
|
Defragmentation can be started together with compression on the given range,
|
||||||
|
and takes precedence over per-file compression property or mount options.
|
||||||
|
|
|
@ -10,3 +10,18 @@ If the filesystem has been created with different data and metadata profiles,
|
||||||
namely with different level of integrity, this also affects the inlined files.
|
namely with different level of integrity, this also affects the inlined files.
|
||||||
It can be completely disabled by mounting with ``max_inline=0``. The upper
|
It can be completely disabled by mounting with ``max_inline=0``. The upper
|
||||||
limit is either the size of b-tree node or the page size of the host.
|
limit is either the size of b-tree node or the page size of the host.
|
||||||
|
|
||||||
|
An inline file can be identified by enumerating the extents, eg. by the tool
|
||||||
|
``filefrag``:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
$ filefrag -v inlinefile
|
||||||
|
Filesystem type is: 9123683e
|
||||||
|
File size of inlinefile is 463 (1 block of 4096 bytes)
|
||||||
|
ext: logical_offset: physical_offset: length: expected: flags:
|
||||||
|
0: 0.. 4095: 0.. 4095: 4096: last,not_aligned,inline,eof
|
||||||
|
|
||||||
|
In the above example, the file is not compressed, otherwise it would have the
|
||||||
|
*encoded* flag. The inline files have no limitations and behave like regular
|
||||||
|
files with respect to copying, renaming, reflink, truncate etc.
|
||||||
|
|
|
@ -1,9 +1,9 @@
|
||||||
Data and metadata are checksummed by default, the checksum is calculated before
|
Data and metadata are checksummed by default, the checksum is calculated before
|
||||||
write and verifed after reading the blocks. There are several checksum
|
write and verifed after reading the blocks from devices. There are several
|
||||||
algorithms supported. The default and backward compatible is *crc32c*. Since
|
checksum algorithms supported. The default and backward compatible is *crc32c*.
|
||||||
kernel 5.5 there are three more with different characteristics and trade-offs
|
Since kernel 5.5 there are three more with different characteristics and
|
||||||
regarding speed and strength. The following list may help you to decide which
|
trade-offs regarding speed and strength. The following list may help you to
|
||||||
one to select.
|
decide which one to select.
|
||||||
|
|
||||||
CRC32C (32bit digest)
|
CRC32C (32bit digest)
|
||||||
default, best backward compatibility, very fast, modern CPUs have
|
default, best backward compatibility, very fast, modern CPUs have
|
||||||
|
|
|
@ -23,9 +23,9 @@ Requirements, limitations
|
||||||
but this is namely for testing
|
but this is namely for testing
|
||||||
* the super block is handled in a special way and is at different locations
|
* the super block is handled in a special way and is at different locations
|
||||||
than on a non-zoned filesystem:
|
than on a non-zoned filesystem:
|
||||||
* primary: 0B (and the next two zones)
|
* primary: 0B (and the next two zones)
|
||||||
* secondary: 512GiB (and the next two zones)
|
* secondary: 512GiB (and the next two zones)
|
||||||
* tertiary: 4TiB (4096GiB, and the next two zones)
|
* tertiary: 4TiB (4096GiB, and the next two zones)
|
||||||
|
|
||||||
Incompatible features
|
Incompatible features
|
||||||
^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
Loading…
Reference in New Issue