From b871bf49f370dda15126a4839a1498d38649f172 Mon Sep 17 00:00:00 2001 From: David Sterba Date: Thu, 9 Dec 2021 20:46:42 +0100 Subject: [PATCH] btrfs-progs: docs: add more chapters The feature pages share the contents with the manual page section 5 so put the contents to separate files. Signed-off-by: David Sterba --- Documentation/Checksumming.rst | 2 +- Documentation/Common-features.rst | 2 + Documentation/Compression.rst | 2 +- Documentation/Inline-files.rst | 10 +- Documentation/Interoperability.rst | 3 + Documentation/Seeding-device.rst | 2 +- Documentation/btrfs-man5.rst | 238 +---------------------------- 7 files changed, 20 insertions(+), 239 deletions(-) diff --git a/Documentation/Checksumming.rst b/Documentation/Checksumming.rst index 6ae8c38f..b96f53ed 100644 --- a/Documentation/Checksumming.rst +++ b/Documentation/Checksumming.rst @@ -1,4 +1,4 @@ Checksumming ============ -... +.. include:: ch-checksumming.rst diff --git a/Documentation/Common-features.rst b/Documentation/Common-features.rst index 6fee7817..81ba3c0e 100644 --- a/Documentation/Common-features.rst +++ b/Documentation/Common-features.rst @@ -16,3 +16,5 @@ Anything that's standard and also supported - FIEMAP - O_TMPFILE + +- XFLAGS, fileattr diff --git a/Documentation/Compression.rst b/Documentation/Compression.rst index 1a1c4578..9eed04af 100644 --- a/Documentation/Compression.rst +++ b/Documentation/Compression.rst @@ -1,4 +1,4 @@ Compression =========== -... +.. include:: ch-compression.rst diff --git a/Documentation/Inline-files.rst b/Documentation/Inline-files.rst index 91ee1801..c3857c57 100644 --- a/Documentation/Inline-files.rst +++ b/Documentation/Inline-files.rst @@ -1,4 +1,12 @@ Inline files ============ -... +Files up to some size can be stored in the metadata section ("inline" in the +b-tree nodes), ie. no separate blocks for the extents. The default limit is +2048 bytes and can be configured by mount option ``max_inline``. The data of +inlined files can be also compressed as long as they fit into the b-tree nodes. + +If the filesystem has been created with different data and metadata profiles, +namely with different level of integrity, this also affects the inlined files. +It can be completely disabled by mounting with ``max_inline=0``. The upper +limit is either the size of b-tree node or the page size of the host. diff --git a/Documentation/Interoperability.rst b/Documentation/Interoperability.rst index b0b44dc8..8439a6e8 100644 --- a/Documentation/Interoperability.rst +++ b/Documentation/Interoperability.rst @@ -26,3 +26,6 @@ overlayfs SELinux ------- + +io_uring +-------- diff --git a/Documentation/Seeding-device.rst b/Documentation/Seeding-device.rst index 5ebffb8f..b40b0507 100644 --- a/Documentation/Seeding-device.rst +++ b/Documentation/Seeding-device.rst @@ -1,4 +1,4 @@ Seeding device ============== -... +.. include:: ch-seeding-device.rst diff --git a/Documentation/btrfs-man5.rst b/Documentation/btrfs-man5.rst index 0fafc84c..ee3364d8 100644 --- a/Documentation/btrfs-man5.rst +++ b/Documentation/btrfs-man5.rst @@ -737,169 +737,13 @@ priority, not the btrfs mount options). CHECKSUM ALGORITHMS ------------------- -There are several checksum algorithms supported. The default and backward -compatible is *crc32c*. Since kernel 5.5 there are three more with different -characteristics and trade-offs regarding speed and strength. The following -list may help you to decide which one to select. - -CRC32C (32bit digest) - default, best backward compatibility, very fast, modern CPUs have - instruction-level support, not collision-resistant but still good error - detection capabilities - -XXHASH* (64bit digest) - can be used as CRC32C successor, very fast, optimized for modern CPUs utilizing - instruction pipelining, good collision resistance and error detection - -SHA256 (256bit digest):: - a cryptographic-strength hash, relatively slow but with possible CPU - instruction acceleration or specialized hardware cards, FIPS certified and - in wide use - -BLAKE2b (256bit digest) - a cryptographic-strength hash, relatively fast with possible CPU acceleration - using SIMD extensions, not standardized but based on BLAKE which was a SHA3 - finalist, in wide use, the algorithm used is BLAKE2b-256 that's optimized for - 64bit platforms - -The *digest size* affects overall size of data block checksums stored in the -filesystem. The metadata blocks have a fixed area up to 256 bits (32 bytes), so -there's no increase. Each data block has a separate checksum stored, with -additional overhead of the b-tree leaves. - -Approximate relative performance of the algorithms, measured against CRC32C -using reference software implementations on a 3.5GHz intel CPU: - - -======== ============ ======= ================ -Digest Cycles/4KiB Ratio Implementation -======== ============ ======= ================ -CRC32C 1700 1.00 CPU instruction -XXHASH 2500 1.44 reference impl. -SHA256 105000 61 reference impl. -SHA256 36000 21 libgcrypt/AVX2 -SHA256 63000 37 libsodium/AVX2 -BLAKE2b 22000 13 reference impl. -BLAKE2b 19000 11 libgcrypt/AVX2 -BLAKE2b 19000 11 libsodium/AVX2 -======== ============ ======= ================ - -Many kernels are configured with SHA256 as built-in and not as a module. -The accelerated versions are however provided by the modules and must be loaded -explicitly (**modprobe sha256**) before mounting the filesystem to make use of -them. You can check in */sys/fs/btrfs/FSID/checksum* which one is used. If you -see *sha256-generic*, then you may want to unmount and mount the filesystem -again, changing that on a mounted filesystem is not possible. -Check the file */proc/crypto*, when the implementation is built-in, you'd find - -.. code-block:: none - - name : sha256 - driver : sha256-generic - module : kernel - priority : 100 - ... - -while accelerated implementation is e.g. - -.. code-block:: none - - name : sha256 - driver : sha256-avx2 - module : sha256_ssse3 - priority : 170 - ... +.. include:: ch-checksumming.rst COMPRESSION ----------- -Btrfs supports transparent file compression. There are three algorithms -available: ZLIB, LZO and ZSTD (since v4.14). Basically, compression is on a file -by file basis. You can have a single btrfs mount point that has some files that -are uncompressed, some that are compressed with LZO, some with ZLIB, for -instance (though you may not want it that way, it is supported). - -To enable compression, mount the filesystem with options *compress* or -*compress-force*. Please refer to section *MOUNT OPTIONS*. Once compression is -enabled, all new writes will be subject to compression. Some files may not -compress very well, and these are typically not recompressed but still written -uncompressed. - -Each compression algorithm has different speed/ratio trade offs. The levels -can be selected by a mount option and affect only the resulting size (ie. -no compatibility issues). - -Basic characteristics: - -ZLIB - * slower, higher compression ratio - * levels: 1 to 9, mapped directly, default level is 3 - * good backward compatibility -LZO - * faster compression and decompression than zlib, worse compression ratio, designed to be fast - * no levels - * good backward compatibility -ZSTD - * compression comparable to zlib with higher compression/decompression speeds and different ratio - * levels: 1 to 15 - * since 4.14, levels since 5.1 - -The differences depend on the actual data set and cannot be expressed by a -single number or recommendation. Higher levels consume more CPU time and may -not bring a significant improvement, lower levels are close to real time. - -The algorithms could be mixed in one file as they're stored per extent. The -compression can be changed on a file by **btrfs filesystem defrag** command, -using the *-c* option, or by **btrfs property set** using the *compression* -property. Setting compression by **chattr +c** utility will set it to zlib. - -INCOMPRESSIBLE DATA -^^^^^^^^^^^^^^^^^^^ - -Files with already compressed data or with data that won't compress well with -the CPU and memory constraints of the kernel implementations are using a simple -decision logic. If the first portion of data being compressed is not smaller -than the original, the compression of the file is disabled -- unless the -filesystem is mounted with *compress-force*. In that case compression will -always be attempted on the file only to be later discarded. This is not optimal -and subject to optimizations and further development. - -If a file is identified as incompressible, a flag is set (NOCOMPRESS) and it's -sticky. On that file compression won't be performed unless forced. The flag -can be also set by **chattr +m** (since e2fsprogs 1.46.2) or by properties with -value *no* or *none*. Empty value will reset it to the default that's currently -applicable on the mounted filesystem. - -There are two ways to detect incompressible data: - -* actual compression attempt - data are compressed, if the result is not smaller, - it's discarded, so this depends on the algorithm and level -* pre-compression heuristics - a quick statistical evaluation on the data is - peformed and based on the result either compression is performed or skipped, - the NOCOMPRESS bit is not set just by the heuristic, only if the compression - algorithm does not make an improvent - -PRE-COMPRESSION HEURISTICS -^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The heuristics aim to do a few quick statistical tests on the compressed data -in order to avoid probably costly compression that would turn out to be -inefficient. Compression algorithms could have internal detection of -incompressible data too but this leads to more overhead as the compression is -done in another thread and has to write the data anyway. The heuristic is -read-only and can utilize cached memory. - -The tests performed based on the following: data sampling, long repated -pattern detection, byte frequency, Shannon entropy. - -COMPATIBILITY WITH OTHER FEATURES -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Compression is done using the COW mechanism so it's incompatible with -*nodatacow*. Direct IO works on compressed files but will fall back to buffered -writes. Currently 'nodatasum' and compression don't work together. - +.. include:: ch-compression.rst FILESYSTEM EXCLUSIVE OPERATIONS ------------------------------- @@ -1249,83 +1093,7 @@ that report space usage: **filesystem df**, **device usage**. The command SEEDING DEVICE -------------- -The COW mechanism and multiple devices under one hood enable an interesting -concept, called a seeding device: extending a read-only filesystem on a single -device filesystem with another device that captures all writes. For example -imagine an immutable golden image of an operating system enhanced with another -device that allows to use the data from the golden image and normal operation. -This idea originated on CD-ROMs with base OS and allowing to use them for live -systems, but this became obsolete. There are technologies providing similar -functionality, like *unionmount*, *overlayfs* or *qcow2* image snapshot. - -The seeding device starts as a normal filesystem, once the contents is ready, -**btrfstune -S 1** is used to flag it as a seeding device. Mounting such device -will not allow any writes, except adding a new device by **btrfs device add**. -Then the filesystem can be remounted as read-write. - -Given that the filesystem on the seeding device is always recognized as -read-only, it can be used to seed multiple filesystems, at the same time. The -UUID that is normally attached to a device is automatically changed to a random -UUID on each mount. - -Once the seeding device is mounted, it needs the writable device. After adding -it, something like **remount -o remount,rw /path** makes the filesystem at -*/path* ready for use. The simplest usecase is to throw away all changes by -unmounting the filesystem when convenient. - -Alternatively, deleting the seeding device from the filesystem can turn it into -a normal filesystem, provided that the writable device can also contain all the -data from the seeding device. - -The seeding device flag can be cleared again by **btrfstune -f -s 0**, eg. -allowing to update with newer data but please note that this will invalidate -all existing filesystems that use this particular seeding device. This works -for some usecases, not for others, and a forcing flag to the command is -mandatory to avoid accidental mistakes. - -Example how to create and use one seeding device: - -.. code-block:: bash - - # mkfs.btrfs /dev/sda - # mount /dev/sda /mnt/mnt1 - # ... fill mnt1 with data - # umount /mnt/mnt1 - # btrfstune -S 1 /dev/sda - # mount /dev/sda /mnt/mnt1 - # btrfs device add /dev/sdb /mnt - # mount -o remount,rw /mnt/mnt1 - # ... /mnt/mnt1 is now writable - -Now */mnt/mnt1* can be used normally. The device */dev/sda* can be mounted -again with a another writable device: - -.. code-block:: bash - - # mount /dev/sda /mnt/mnt2 - # btrfs device add /dev/sdc /mnt/mnt2 - # mount -o remount,rw /mnt/mnt2 - ... /mnt/mnt2 is now writable - -The writable device (*/dev/sdb*) can be decoupled from the seeding device and -used independently: - -.. code-block:: bash - - # btrfs device delete /dev/sda /mnt/mnt1 - -As the contents originated in the seeding device, it's possible to turn -*/dev/sdb* to a seeding device again and repeat the whole process. - -A few things to note: - -* it's recommended to use only single device for the seeding device, it works - for multiple devices but the *single* profile must be used in order to make - the seeding device deletion work -* block group profiles *single* and *dup* support the usecases above -* the label is copied from the seeding device and can be changed by **btrfs filesystem label** -* each new mount of the seeding device gets a new random UUID - +.. include:: ch-seeding-device.rst RAID56 STATUS AND RECOMMENDED PRACTICES ---------------------------------------