From b40943dea4b9df8e2f71e3d48aa7997ce4d64845 Mon Sep 17 00:00:00 2001 From: David Sterba Date: Wed, 6 Sep 2023 17:03:49 +0200 Subject: [PATCH] btrfs-progs: docs: updates - group features on status page - update developer docs - add cross references Signed-off-by: David Sterba --- Documentation/Status.rst | 208 ++++++++++++++---------- Documentation/btrfs-ioctl.rst | 57 ++++++- Documentation/btrfstune.rst | 2 + Documentation/ch-file-attributes.rst | 2 +- Documentation/ch-mount-options.rst | 16 +- Documentation/dev/On-disk-format.rst | 5 + Documentation/dev/dev-internal-apis.rst | 21 ++- Documentation/dev/dev-send-stream.rst | 38 ++++- Documentation/mkfs.btrfs.rst | 23 ++- 9 files changed, 262 insertions(+), 110 deletions(-) diff --git a/Documentation/Status.rst b/Documentation/Status.rst index 91d9345e..b082f4ed 100644 --- a/Documentation/Status.rst +++ b/Documentation/Status.rst @@ -13,7 +13,7 @@ in meeting your performance expectations for your specific workload. Combination of features can vary in performance, the table does not cover all possibilities. -**The table is based on the latest released linux kernel: 6.4** +**The table is based on the latest released linux kernel: 6.5** The columns for each feature reflect the status of the implementation in following ways: @@ -43,26 +43,34 @@ in following ways: - Stability - Performance - Notes - * - :doc:`discard (synchronous)` + * - :doc:`Subvolumes, snapshots` - :statusok:`OK` + - OK - - - mounted with `-o discard` (has performance implications), also see `fstrim` - * - :doc:`discard (asynchronous)` - - :statusok:`OK` - - - - mounted with `-o discard=async` (improved performance) - * - Autodefrag - - :statusok:`OK` - - - - - * - :doc:`Defrag` - - :statusmok:`mostly OK` - - - - extents get unshared (see below) * - :doc:`Compression` - :statusok:`OK` - - + * - :doc:`Checksumming algorithms` + - :statusok:`OK` + - OK + - + * - :doc:`Defragmentation` + - :statusmok:`mostly OK` + - + - extents get unshared (see below) + * - :ref:`Autodefrag` + - :statusok:`OK` + - + - + * - :doc:`Discard (synchronous)` + - :statusok:`OK` + - + - mounted with `-o discard` (has performance implications), also see `fstrim` + * - :doc:`Discard (asynchronous)` + - :statusok:`OK` + - + - mounted with `-o discard=async` (improved performance) * - :doc:`Out-of-band dedupe` - :statusok:`OK` - :statusmok:`mostly OK` @@ -71,10 +79,14 @@ in following ways: - :statusok:`OK` - :statusmok:`mostly OK` - (reflink), heavily referenced extents have a noticeable performance hit (see below) - * - :doc:`More checksumming algorithms` + * - :doc:`Filesystem resize` - :statusok:`OK` - OK - - + - shrink, grow + * - :doc:`Device replace` + - :statusmok:`mostly OK` + - mostly OK + - (see below) * - :doc:`Auto-repair` - :statusok:`OK` - OK @@ -87,18 +99,66 @@ in following ways: - :statusmok:`mostly OK` - mostly OK - + * - :ref:`Degraded mount` + - :statusok:`OK` + - n/a + - + * - :doc:`Balance` + - :statusok:`OK` + - OK + - balance + qgroups can be slow when there are many snapshots + * - :doc:`Send` + - :statusok:`OK` + - OK + - + * - :doc:`Receive` + - :statusok:`OK` + - OK + - + * - Offline UUID change + - :statusok:`OK` + - OK + - + * - Metadata UUID change + - :statusok:`OK` + - OK + - + * - :doc:`Seeding` + - :statusok:`OK` + - OK + - + * - :doc:`Quotas, qgroups` + - :statusmok:`mostly OK` + - mostly OK + - qgroups with many snapshots slows down balance + * - :doc:`Swapfile` + - :statusok:`OK` + - n/a + - with some limitations * - nodatacow - :statusok:`OK` - OK - - * - :doc:`Device replace` + * - :doc:`Subpage block size` - :statusmok:`mostly OK` - mostly OK - - (see below) - * - Degraded mount - - :statusok:`OK` - - n/a - - + - Also see table below for more detailed compatibility. + * - :doc:`Zoned mode` + - :statusmok:`mostly OK` + - mostly OK + - Not yet feature complete but moderately stable, also see table below + for more detailed compatibility. + +Block group profiles +^^^^^^^^^^^^^^^^^^^^ + +.. list-table:: + :header-rows: 1 + + * - Feature + - Stability + - Performance + - Notes * - :ref:`Single (block group profile)` - :statusok:`OK` - OK @@ -131,50 +191,59 @@ in following ways: - :statusunstable:`unstable` - n/a - (see below) - * - Mixed block groups + * - :ref:`Mixed block groups` - :statusok:`OK` - OK - - * - :doc:`Filesystem resize` - - :statusok:`OK` - - OK - - shrink, grow - * - :doc:`Balance` - - :statusok:`OK` - - OK - - balance + qgroups can be slow when there are many snapshots - * - Offline UUID change + + +On-disk format +^^^^^^^^^^^^^^ + +Features that are typically set at *mkfs* time (sometimes can be changed or +converted later). + +.. list-table:: + :header-rows: 1 + + * - Feature + - Stability + - Performance + - Notes + * - :ref:`extended-refs` - :statusok:`OK` - OK - - * - Metadata UUID change + * - :ref:`skinny-metadata` - :statusok:`OK` - OK - - * - :doc:`Subvolumes, snapshots` + * - :ref:`no-holes` - :statusok:`OK` - OK - - * - :doc:`Send` + * - :ref:`Free space tree` - :statusok:`OK` - OK - - * - :doc:`Receive` + * - :ref:`Block group tree`` - :statusok:`OK` - OK - - * - :doc:`Seeding` - - :statusok:`OK` - - OK - - - * - :doc:`Quotas, qgroups` - - :statusmok:`mostly OK` - - mostly OK - - qgroups with many snapshots slows down balance - * - :doc:`Swapfile` - - :statusok:`OK` - - n/a - - with some limitations + +Interoperability +^^^^^^^^^^^^^^^^ + +Integration with other Linux features or external systems. +:doc:`See also`. + +.. list-table:: + :header-rows: 1 + + * - Feature + - Stability + - Performance + - Notes * - :ref:`NFS` - :statusok:`OK` - OK @@ -183,10 +252,6 @@ in following ways: - :statusok:`OK` - OK - IO controller - * - :ref:`Samba` - - :statusok:`OK` - - OK - - compression, server-side copies, snapshots * - :ref:`io_uring` - :statusok:`OK` - OK @@ -199,35 +264,10 @@ in following ways: - :statusok:`OK` - OK - - * - :ref:`Free space tree` - - :statusok:`OK` - - - - - * - Block group tree - - :statusok:`OK` - - - - - * - :ref:`no-holes` + * - :ref:`Samba` - :statusok:`OK` - OK - - - * - :ref:`skinny-metadata` - - :statusok:`OK` - - OK - - - * - :ref:`extended-refs` - - :statusok:`OK` - - OK - - - * - :doc:`Subpage block size` - - :statusmok:`mostly OK` - - mostly OK - - Also see table below for more detailed compatibility. - * - :doc:`Zoned mode` - - :statusmok:`mostly OK` - - mostly OK - - Not yet feature complete but moderately stable, also see table below - for more detailed compatibility. + - compression, server-side copies, snapshots Please open an issue if: @@ -256,7 +296,7 @@ with subpage or require another feature to work: - The max_inline mount option value is ignored, as if mounted with max_inline=0 * - Free space cache v1 - :statusunsupp:`unsupported` - - Free space tree is mandatory, v1 has some assumptions about page size + - Free space tree is mandatory, v1 makes some assumptions about page size * - Compression - :statusok:`partial support` - Only page-aligned ranges can be compressed @@ -303,12 +343,6 @@ are unaffected by the zoned device constraints. * - Free space tree - :statusok:`supported` - - * - single profile - - :statusok:`supported` - - Both data and metadata - * - DUP profile - - :statusok:`partial support` - - Only for metadata * - Filesystem resize - :statusok:`supported` - diff --git a/Documentation/btrfs-ioctl.rst b/Documentation/btrfs-ioctl.rst index df86d8bc..123dacba 100644 --- a/Documentation/btrfs-ioctl.rst +++ b/Documentation/btrfs-ioctl.rst @@ -150,6 +150,33 @@ DATA STRUCTURES AND DEFINITIONS __u64 rsv_excl; }; +.. _struct_btrfs_ioctl_fs_info_args: + +.. code-block:: c + + /* Request information about checksum type and size */ + #define BTRFS_FS_INFO_FLAG_CSUM_INFO (1 << 0) + /* Request information about filesystem generation */ + #define BTRFS_FS_INFO_FLAG_GENERATION (1 << 1) + /* Request information about filesystem metadata UUID */ + #define BTRFS_FS_INFO_FLAG_METADATA_UUID (1 << 2) + + struct btrfs_ioctl_fs_info_args { + __u64 max_id; /* out */ + __u64 num_devices; /* out */ + __u8 fsid[BTRFS_FSID_SIZE]; /* out */ + __u32 nodesize; /* out */ + __u32 sectorsize; /* out */ + __u32 clone_alignment; /* out */ + /* See BTRFS_FS_INFO_FLAG_* */ + __u16 csum_type; /* out */ + __u16 csum_size; /* out */ + __u64 flags; /* in/out */ + __u64 generation; /* out */ + __u8 metadata_uuid[BTRFS_FSID_SIZE]; /* out */ + __u8 reserved[944]; /* pad to 1k */ + }; + .. list-table:: :header-rows: 1 @@ -157,10 +184,14 @@ DATA STRUCTURES AND DEFINITIONS - Value * - BTRFS_UUID_SIZE - 16 + * - BTRFS_FSID_SIZE + - 16 * - BTRFS_SUBVOL_NAME_MAX - 4039 * - BTRFS_PATH_NAME_MAX - 4087 + * - BTRFS_VOL_NAME_MAX + - 255 OVERVIEW -------- @@ -296,9 +327,9 @@ LIST OF IOCTLS * - BTRFS_IOC_DEV_INFO - - - * - BTRFS_IOC_FS_INFO - - - - + * - :ref:`BTRFS_IOC_FS_INFO` + - get information about filesystem (device count, fsid, ...) + - :ref:`struct btrfs_ioctl_fs_info_args` * - BTRFS_IOC_BALANCE_V2 - - @@ -555,6 +586,26 @@ Change the flags of a subvolume. * - ioctl args - uint64_t, either 0 or `BTRFS_SUBVOL_RDONLY` +.. _BTRFS_IOC_FS_INFO: + +BTRFS_IOC_FS_INFO +~~~~~~~~~~~~~~~~~ + +Read internal information about the filesystem. The data can be exchanged +both ways and part of the structure could be optionally filled. The reserved +bytes can be used to get new kind of information in the future, always +depending on the flags set. + +.. list-table:: + :header-rows: 1 + + * - Field + - Description + * - ioctl fd + - file descriptor of any file/directory in the filesystem + * - ioctl args + - :ref:`struct btrfs_ioctl_fs_info_args` + .. _BTRFS_IOC_GET_SUBVOL_INFO: BTRFS_IOC_GET_SUBVOL_INFO diff --git a/Documentation/btrfstune.rst b/Documentation/btrfstune.rst index 306fac67..78125ddd 100644 --- a/Documentation/btrfstune.rst +++ b/Documentation/btrfstune.rst @@ -52,6 +52,8 @@ OPTIONS change fsid stored as *metadata_uuid* to a randomly generated UUID, see also *-U* +.. _btrfstune-feature-metadata-uuid: + -M (since kernel: 5.0) diff --git a/Documentation/ch-file-attributes.rst b/Documentation/ch-file-attributes.rst index 54bd236b..4f2b9eef 100644 --- a/Documentation/ch-file-attributes.rst +++ b/Documentation/ch-file-attributes.rst @@ -68,7 +68,7 @@ No other attributes are supported. For the complete list please refer to the XFLAGS ^^^^^^ -There's overlap of letters assigned to the bits with the attributes, this list +There's an overlap of letters assigned to the bits with the attributes, this list refers to what ``xfs_io(8)`` provides: i diff --git a/Documentation/ch-mount-options.rst b/Documentation/ch-mount-options.rst index 97fc1056..42005f75 100644 --- a/Documentation/ch-mount-options.rst +++ b/Documentation/ch-mount-options.rst @@ -27,13 +27,15 @@ acl, noacl The support for ACL is build-time configurable (BTRFS_FS_POSIX_ACL) and mount fails if *acl* is requested but the feature is not compiled in. +.. _mount-option-autodefrag: + autodefrag, noautodefrag (since: 3.0, default: off) Enable automatic file defragmentation. When enabled, small random writes into files (in a range of tens of kilobytes, currently it's 64KiB) are detected and queued up for the defragmentation process. - Not well suited for large database workloads. + May not be well suited for large database workloads. The read latency may increase due to reading the adjacent blocks that make up the range for defragmentation, successive write will merge the blocks in the new @@ -170,10 +172,12 @@ datasum, nodatasum The cost of checksumming of the blocks in memory is much lower than the IO, modern CPUs feature hardware support of the checksumming algorithm. +.. _mount-option-degraded: + degraded (default: off) - Allow mounts with less devices than the RAID profile constraints + Allow mounts with fewer devices than the RAID profile constraints require. A read-write mount (or remount) may fail when there are too many devices missing, for example if a stripe member is completely missing from RAID0. @@ -261,12 +265,12 @@ flushoncommit, noflushoncommit one transaction commit. fragment= - (depends on compile-time option BTRFS_DEBUG, since: 4.4, default: off) + (depends on compile-time option CONFIG_BTRFS_DEBUG, since: 4.4, default: off) A debugging helper to intentionally fragment given *type* of block groups. The type can be *data*, *metadata* or *all*. This mount option should not be used outside of debugging environments and is not recognized if the kernel config - option *BTRFS_DEBUG* is not enabled. + option *CONFIG_BTRFS_DEBUG* is not enabled. nologreplay (default: off, even read-only) @@ -287,8 +291,8 @@ max_inline= with a K suffix (case insensitive). In practice, this value is limited by the filesystem block size (named *sectorsize* at mkfs time), and memory page size of the system. In case of sectorsize limit, there's - some space unavailable due to leaf headers. For example, a 4KiB sectorsize, - maximum size of inline data is about 3900 bytes. + some space unavailable due to b-tree leaf headers. For example, a 4KiB + sectorsize, maximum size of inline data is about 3900 bytes. Inlining can be completely turned off by specifying 0. This will increase data block slack if file sizes are much smaller than block size but will reduce diff --git a/Documentation/dev/On-disk-format.rst b/Documentation/dev/On-disk-format.rst index 9e8a9f7e..6d62d03a 100644 --- a/Documentation/dev/On-disk-format.rst +++ b/Documentation/dev/On-disk-format.rst @@ -3,6 +3,11 @@ On-disk Format This document describes the Btrfs on‐disk format. +.. note:: + + This document contains outdated and incomplete information and has been + copied from the original btrfs.wiki.kernel.org with little review. + Overview ~~~~~~~~ diff --git a/Documentation/dev/dev-internal-apis.rst b/Documentation/dev/dev-internal-apis.rst index 3fa39a8c..ed4257d1 100644 --- a/Documentation/dev/dev-internal-apis.rst +++ b/Documentation/dev/dev-internal-apis.rst @@ -5,12 +5,13 @@ There's some common functionality found in many places like help, parsing values, sorting, extensible arrays, etc. Not all places are unified and use old code implementing it manually. Below is list of usable APIs that should be spread and updated where it's still not. A need for new API might emerge from -cleanups, then it should appear here. +cleanups, then it should appear here. The text below gives pointers and is not +extensive, search the definitions and actual use in other code too. Option parsing -------------- -Files: common/help.h, common/parse-utils.h +Files: :file:`common/help.h`, :file:`common/parse-utils.h` Global options need to be processed and consumed by `clean_args_no_options`, argument count by `check_argc_*`, `usage_*` for handling usage. @@ -18,6 +19,21 @@ argument count by `check_argc_*`, `usage_*` for handling usage. Options are parsed by `getopt` or `getopt_long`. Individual values from options are recognized by `parse_*`, basic types and custom types are supported. +Size unit pretty printing +------------------------- + +Files: :file:`common/units.h` + +Many commands print byte sizes with suffixes and the output format can be +affected by command line options. In the help text the options are specified by +either `HELPINFO_UNITS_SHORT_LONG` (both long and short options) or just +`HELPINFO_UNITS_LONG` in case the short option letters would conflict. + +Automatic parsing of the options from *argv* is done by `get_unit_mode_from_arg`. +Printing options is done by `pretty_size_mode` which takes the value and option +mode. Default mode is human readable, the macros defining the modes are from +`UNITS_*` namespace. + TODO ---- @@ -33,4 +49,3 @@ Undocumented or incomplete APIs: * common/string-table.h * common/string-table.h * common/task-utils.h -* common/units.h diff --git a/Documentation/dev/dev-send-stream.rst b/Documentation/dev/dev-send-stream.rst index 00301c5c..22bcf3ed 100644 --- a/Documentation/dev/dev-send-stream.rst +++ b/Documentation/dev/dev-send-stream.rst @@ -19,14 +19,36 @@ Data types Raw data types. Integer values are stored in little endian byte order. -- unsigned int 8bit (u8) -- unsigned int 16bit (u16) -- unsigned int 32bit (u32) -- unsigned int 64bit (u64) -- variable length binary data (data) -- variable length string (string) -- UUID, 16 bytes (uuid) -- time specification, 64bit seconds, 32bit nanoseconds (timespec) +.. list-table:: + :header-rows: 1 + + * - Meaning + - Size + - Name + * - unsigned int + - 8 bit + - u8 + * - unsigned int + - 16 bit + - u16 + * - unsigned int + - 32 bit + - u32 + * - unsigned int + - 64 bit + - u64 + * - variable length binary data + - variable + - data + * - variable length string + - variable + - string + * - UUID + - 16 bytes + - uuid + * - time specification + - 64bit seconds, 32bit nanoseconds + - timespec Stream structure ---------------- diff --git a/Documentation/mkfs.btrfs.rst b/Documentation/mkfs.btrfs.rst index 53117037..1fca7448 100644 --- a/Documentation/mkfs.btrfs.rst +++ b/Documentation/mkfs.btrfs.rst @@ -79,6 +79,8 @@ OPTIONS On multiple devices the default is *raid1*. +.. _mkfs-feature-mixed-bg: + -M|--mixed Normally the data and metadata block groups are isolated. The *mixed* mode will remove the isolation and store both types in the same block group type. @@ -300,12 +302,29 @@ free-space-tree (default since btrfs-progs 5.15, kernel support since 4.5) Enable the free space tree (mount option *space_cache=v2*) for persisting the - free space cache. + free space cache in a b-tree. This is built on top of the COW mechanism + and has better performance than v1. + + Offline conversion from filesystems that don't have this feature + enabled at *mkfs* time is possible, see :doc:`btrfstune`. + + Online conversion can be done by mounting with ``space_cache=v2``, this + is sufficient to be done one time. + +.. _mkfs-feature-block-group-tree: block-group-tree (kernel support since 6.1) - Enable the block group tree to greatly reduce mount time for large filesystems. + Enable a dedicated b-tree for block group items, this greatly reduces + mount time for large filesystems due to better data locality that + avoids seeking. On rotational devices the *large* size is considered + starting from the 2-4TiB. Can be used on other types of devices (SSD, + NVMe, ...) as well. + + Offline conversion from filesystems that don't have this feature + enabled at *mkfs* time is possible, see :doc:`btrfstune`. Online + conversion is not possible. .. _mkfs-section-profiles: