btrfs-progs: docs: updates

- group features on status page
- update developer docs
- add cross references

Signed-off-by: David Sterba <dsterba@suse.com>
This commit is contained in:
David Sterba 2023-09-06 17:03:49 +02:00
parent 2726a83952
commit b40943dea4
9 changed files with 262 additions and 110 deletions

View File

@ -13,7 +13,7 @@ in meeting your performance expectations for your specific workload.
Combination of features can vary in performance, the table does not Combination of features can vary in performance, the table does not
cover all possibilities. cover all possibilities.
**The table is based on the latest released linux kernel: 6.4** **The table is based on the latest released linux kernel: 6.5**
The columns for each feature reflect the status of the implementation The columns for each feature reflect the status of the implementation
in following ways: in following ways:
@ -43,26 +43,34 @@ in following ways:
- Stability - Stability
- Performance - Performance
- Notes - Notes
* - :doc:`discard (synchronous)<Trim>` * - :doc:`Subvolumes, snapshots<Subvolumes>`
- :statusok:`OK` - :statusok:`OK`
- OK
- -
- mounted with `-o discard` (has performance implications), also see `fstrim`
* - :doc:`discard (asynchronous)<Trim>`
- :statusok:`OK`
-
- mounted with `-o discard=async` (improved performance)
* - Autodefrag
- :statusok:`OK`
-
-
* - :doc:`Defrag<Defragmentation>`
- :statusmok:`mostly OK`
-
- extents get unshared (see below)
* - :doc:`Compression<Compression>` * - :doc:`Compression<Compression>`
- :statusok:`OK` - :statusok:`OK`
- -
- -
* - :doc:`Checksumming algorithms<Checksumming>`
- :statusok:`OK`
- OK
-
* - :doc:`Defragmentation<Defragmentation>`
- :statusmok:`mostly OK`
-
- extents get unshared (see below)
* - :ref:`Autodefrag<mount-option-autodefrag>`
- :statusok:`OK`
-
-
* - :doc:`Discard (synchronous)<Trim>`
- :statusok:`OK`
-
- mounted with `-o discard` (has performance implications), also see `fstrim`
* - :doc:`Discard (asynchronous)<Trim>`
- :statusok:`OK`
-
- mounted with `-o discard=async` (improved performance)
* - :doc:`Out-of-band dedupe<Deduplication>` * - :doc:`Out-of-band dedupe<Deduplication>`
- :statusok:`OK` - :statusok:`OK`
- :statusmok:`mostly OK` - :statusmok:`mostly OK`
@ -71,10 +79,14 @@ in following ways:
- :statusok:`OK` - :statusok:`OK`
- :statusmok:`mostly OK` - :statusmok:`mostly OK`
- (reflink), heavily referenced extents have a noticeable performance hit (see below) - (reflink), heavily referenced extents have a noticeable performance hit (see below)
* - :doc:`More checksumming algorithms<Checksumming>` * - :doc:`Filesystem resize<Resize>`
- :statusok:`OK` - :statusok:`OK`
- OK - OK
- - shrink, grow
* - :doc:`Device replace<Volume-management>`
- :statusmok:`mostly OK`
- mostly OK
- (see below)
* - :doc:`Auto-repair<Auto-repair>` * - :doc:`Auto-repair<Auto-repair>`
- :statusok:`OK` - :statusok:`OK`
- OK - OK
@ -87,18 +99,66 @@ in following ways:
- :statusmok:`mostly OK` - :statusmok:`mostly OK`
- mostly OK - mostly OK
- -
* - :ref:`Degraded mount<mount-option-degraded>`
- :statusok:`OK`
- n/a
-
* - :doc:`Balance<Balance>`
- :statusok:`OK`
- OK
- balance + qgroups can be slow when there are many snapshots
* - :doc:`Send<Send-receive>`
- :statusok:`OK`
- OK
-
* - :doc:`Receive<Send-receive>`
- :statusok:`OK`
- OK
-
* - Offline UUID change
- :statusok:`OK`
- OK
-
* - Metadata UUID change
- :statusok:`OK`
- OK
-
* - :doc:`Seeding<Seeding-device>`
- :statusok:`OK`
- OK
-
* - :doc:`Quotas, qgroups<Qgroups>`
- :statusmok:`mostly OK`
- mostly OK
- qgroups with many snapshots slows down balance
* - :doc:`Swapfile<Swapfile>`
- :statusok:`OK`
- n/a
- with some limitations
* - nodatacow * - nodatacow
- :statusok:`OK` - :statusok:`OK`
- OK - OK
- -
* - :doc:`Device replace<Volume-management>` * - :doc:`Subpage block size<Subpage>`
- :statusmok:`mostly OK` - :statusmok:`mostly OK`
- mostly OK - mostly OK
- (see below) - Also see table below for more detailed compatibility.
* - Degraded mount * - :doc:`Zoned mode<Zoned-mode>`
- :statusok:`OK` - :statusmok:`mostly OK`
- n/a - mostly OK
- - Not yet feature complete but moderately stable, also see table below
for more detailed compatibility.
Block group profiles
^^^^^^^^^^^^^^^^^^^^
.. list-table::
:header-rows: 1
* - Feature
- Stability
- Performance
- Notes
* - :ref:`Single (block group profile)<mkfs-section-profiles>` * - :ref:`Single (block group profile)<mkfs-section-profiles>`
- :statusok:`OK` - :statusok:`OK`
- OK - OK
@ -131,50 +191,59 @@ in following ways:
- :statusunstable:`unstable` - :statusunstable:`unstable`
- n/a - n/a
- (see below) - (see below)
* - Mixed block groups * - :ref:`Mixed block groups<mkfs-feature-mixed-bg>`
- :statusok:`OK` - :statusok:`OK`
- OK - OK
- -
* - :doc:`Filesystem resize<Resize>`
- :statusok:`OK`
- OK On-disk format
- shrink, grow ^^^^^^^^^^^^^^
* - :doc:`Balance<Balance>`
- :statusok:`OK` Features that are typically set at *mkfs* time (sometimes can be changed or
- OK converted later).
- balance + qgroups can be slow when there are many snapshots
* - Offline UUID change .. list-table::
:header-rows: 1
* - Feature
- Stability
- Performance
- Notes
* - :ref:`extended-refs<mkfs-feature-extended-refs>`
- :statusok:`OK` - :statusok:`OK`
- OK - OK
- -
* - Metadata UUID change * - :ref:`skinny-metadata<mkfs-feature-skinny-metadata>`
- :statusok:`OK` - :statusok:`OK`
- OK - OK
- -
* - :doc:`Subvolumes, snapshots<Subvolumes>` * - :ref:`no-holes<mkfs-feature-no-holes>`
- :statusok:`OK` - :statusok:`OK`
- OK - OK
- -
* - :doc:`Send<Send-receive>` * - :ref:`Free space tree<mkfs-feature-free-space-tree>`
- :statusok:`OK` - :statusok:`OK`
- OK - OK
- -
* - :doc:`Receive<Send-receive>` * - :ref:`Block group tree`<mkfs-feature-block-group-tree>`
- :statusok:`OK` - :statusok:`OK`
- OK - OK
- -
* - :doc:`Seeding<Seeding-device>`
- :statusok:`OK` Interoperability
- OK ^^^^^^^^^^^^^^^^
-
* - :doc:`Quotas, qgroups<Qgroups>` Integration with other Linux features or external systems.
- :statusmok:`mostly OK` :doc:`See also<Interoperability>`.
- mostly OK
- qgroups with many snapshots slows down balance .. list-table::
* - :doc:`Swapfile<Swapfile>` :header-rows: 1
- :statusok:`OK`
- n/a * - Feature
- with some limitations - Stability
- Performance
- Notes
* - :ref:`NFS<interop-nfs>` * - :ref:`NFS<interop-nfs>`
- :statusok:`OK` - :statusok:`OK`
- OK - OK
@ -183,10 +252,6 @@ in following ways:
- :statusok:`OK` - :statusok:`OK`
- OK - OK
- IO controller - IO controller
* - :ref:`Samba<interop-samba>`
- :statusok:`OK`
- OK
- compression, server-side copies, snapshots
* - :ref:`io_uring<interop-io-uring>` * - :ref:`io_uring<interop-io-uring>`
- :statusok:`OK` - :statusok:`OK`
- OK - OK
@ -199,35 +264,10 @@ in following ways:
- :statusok:`OK` - :statusok:`OK`
- OK - OK
- -
* - :ref:`Free space tree<mkfs-feature-free-space-tree>` * - :ref:`Samba<interop-samba>`
- :statusok:`OK`
-
-
* - Block group tree
- :statusok:`OK`
-
-
* - :ref:`no-holes<mkfs-feature-no-holes>`
- :statusok:`OK` - :statusok:`OK`
- OK - OK
- - compression, server-side copies, snapshots
* - :ref:`skinny-metadata<mkfs-feature-skinny-metadata>`
- :statusok:`OK`
- OK
-
* - :ref:`extended-refs<mkfs-feature-extended-refs>`
- :statusok:`OK`
- OK
-
* - :doc:`Subpage block size<Subpage>`
- :statusmok:`mostly OK`
- mostly OK
- Also see table below for more detailed compatibility.
* - :doc:`Zoned mode<Zoned-mode>`
- :statusmok:`mostly OK`
- mostly OK
- Not yet feature complete but moderately stable, also see table below
for more detailed compatibility.
Please open an issue if: Please open an issue if:
@ -256,7 +296,7 @@ with subpage or require another feature to work:
- The max_inline mount option value is ignored, as if mounted with max_inline=0 - The max_inline mount option value is ignored, as if mounted with max_inline=0
* - Free space cache v1 * - Free space cache v1
- :statusunsupp:`unsupported` - :statusunsupp:`unsupported`
- Free space tree is mandatory, v1 has some assumptions about page size - Free space tree is mandatory, v1 makes some assumptions about page size
* - Compression * - Compression
- :statusok:`partial support` - :statusok:`partial support`
- Only page-aligned ranges can be compressed - Only page-aligned ranges can be compressed
@ -303,12 +343,6 @@ are unaffected by the zoned device constraints.
* - Free space tree * - Free space tree
- :statusok:`supported` - :statusok:`supported`
- -
* - single profile
- :statusok:`supported`
- Both data and metadata
* - DUP profile
- :statusok:`partial support`
- Only for metadata
* - Filesystem resize * - Filesystem resize
- :statusok:`supported` - :statusok:`supported`
- -

View File

@ -150,6 +150,33 @@ DATA STRUCTURES AND DEFINITIONS
__u64 rsv_excl; __u64 rsv_excl;
}; };
.. _struct_btrfs_ioctl_fs_info_args:
.. code-block:: c
/* Request information about checksum type and size */
#define BTRFS_FS_INFO_FLAG_CSUM_INFO (1 << 0)
/* Request information about filesystem generation */
#define BTRFS_FS_INFO_FLAG_GENERATION (1 << 1)
/* Request information about filesystem metadata UUID */
#define BTRFS_FS_INFO_FLAG_METADATA_UUID (1 << 2)
struct btrfs_ioctl_fs_info_args {
__u64 max_id; /* out */
__u64 num_devices; /* out */
__u8 fsid[BTRFS_FSID_SIZE]; /* out */
__u32 nodesize; /* out */
__u32 sectorsize; /* out */
__u32 clone_alignment; /* out */
/* See BTRFS_FS_INFO_FLAG_* */
__u16 csum_type; /* out */
__u16 csum_size; /* out */
__u64 flags; /* in/out */
__u64 generation; /* out */
__u8 metadata_uuid[BTRFS_FSID_SIZE]; /* out */
__u8 reserved[944]; /* pad to 1k */
};
.. list-table:: .. list-table::
:header-rows: 1 :header-rows: 1
@ -157,10 +184,14 @@ DATA STRUCTURES AND DEFINITIONS
- Value - Value
* - BTRFS_UUID_SIZE * - BTRFS_UUID_SIZE
- 16 - 16
* - BTRFS_FSID_SIZE
- 16
* - BTRFS_SUBVOL_NAME_MAX * - BTRFS_SUBVOL_NAME_MAX
- 4039 - 4039
* - BTRFS_PATH_NAME_MAX * - BTRFS_PATH_NAME_MAX
- 4087 - 4087
* - BTRFS_VOL_NAME_MAX
- 255
OVERVIEW OVERVIEW
-------- --------
@ -296,9 +327,9 @@ LIST OF IOCTLS
* - BTRFS_IOC_DEV_INFO * - BTRFS_IOC_DEV_INFO
- -
- -
* - BTRFS_IOC_FS_INFO * - :ref:`BTRFS_IOC_FS_INFO<BTRFS_IOC_FS_INFO>`
- - get information about filesystem (device count, fsid, ...)
- - :ref:`struct btrfs_ioctl_fs_info_args<struct_btrfs_ioctl_fs_info_args>`
* - BTRFS_IOC_BALANCE_V2 * - BTRFS_IOC_BALANCE_V2
- -
- -
@ -555,6 +586,26 @@ Change the flags of a subvolume.
* - ioctl args * - ioctl args
- uint64_t, either 0 or `BTRFS_SUBVOL_RDONLY` - uint64_t, either 0 or `BTRFS_SUBVOL_RDONLY`
.. _BTRFS_IOC_FS_INFO:
BTRFS_IOC_FS_INFO
~~~~~~~~~~~~~~~~~
Read internal information about the filesystem. The data can be exchanged
both ways and part of the structure could be optionally filled. The reserved
bytes can be used to get new kind of information in the future, always
depending on the flags set.
.. list-table::
:header-rows: 1
* - Field
- Description
* - ioctl fd
- file descriptor of any file/directory in the filesystem
* - ioctl args
- :ref:`struct btrfs_ioctl_fs_info_args<struct_btrfs_ioctl_fs_info_args>`
.. _BTRFS_IOC_GET_SUBVOL_INFO: .. _BTRFS_IOC_GET_SUBVOL_INFO:
BTRFS_IOC_GET_SUBVOL_INFO BTRFS_IOC_GET_SUBVOL_INFO

View File

@ -52,6 +52,8 @@ OPTIONS
change fsid stored as *metadata_uuid* to a randomly generated UUID, change fsid stored as *metadata_uuid* to a randomly generated UUID,
see also *-U* see also *-U*
.. _btrfstune-feature-metadata-uuid:
-M <UUID> -M <UUID>
(since kernel: 5.0) (since kernel: 5.0)

View File

@ -68,7 +68,7 @@ No other attributes are supported. For the complete list please refer to the
XFLAGS XFLAGS
^^^^^^ ^^^^^^
There's overlap of letters assigned to the bits with the attributes, this list There's an overlap of letters assigned to the bits with the attributes, this list
refers to what ``xfs_io(8)`` provides: refers to what ``xfs_io(8)`` provides:
i i

View File

@ -27,13 +27,15 @@ acl, noacl
The support for ACL is build-time configurable (BTRFS_FS_POSIX_ACL) and The support for ACL is build-time configurable (BTRFS_FS_POSIX_ACL) and
mount fails if *acl* is requested but the feature is not compiled in. mount fails if *acl* is requested but the feature is not compiled in.
.. _mount-option-autodefrag:
autodefrag, noautodefrag autodefrag, noautodefrag
(since: 3.0, default: off) (since: 3.0, default: off)
Enable automatic file defragmentation. Enable automatic file defragmentation.
When enabled, small random writes into files (in a range of tens of kilobytes, When enabled, small random writes into files (in a range of tens of kilobytes,
currently it's 64KiB) are detected and queued up for the defragmentation process. currently it's 64KiB) are detected and queued up for the defragmentation process.
Not well suited for large database workloads. May not be well suited for large database workloads.
The read latency may increase due to reading the adjacent blocks that make up the The read latency may increase due to reading the adjacent blocks that make up the
range for defragmentation, successive write will merge the blocks in the new range for defragmentation, successive write will merge the blocks in the new
@ -170,10 +172,12 @@ datasum, nodatasum
The cost of checksumming of the blocks in memory is much lower than the IO, The cost of checksumming of the blocks in memory is much lower than the IO,
modern CPUs feature hardware support of the checksumming algorithm. modern CPUs feature hardware support of the checksumming algorithm.
.. _mount-option-degraded:
degraded degraded
(default: off) (default: off)
Allow mounts with less devices than the RAID profile constraints Allow mounts with fewer devices than the RAID profile constraints
require. A read-write mount (or remount) may fail when there are too many devices require. A read-write mount (or remount) may fail when there are too many devices
missing, for example if a stripe member is completely missing from RAID0. missing, for example if a stripe member is completely missing from RAID0.
@ -261,12 +265,12 @@ flushoncommit, noflushoncommit
one transaction commit. one transaction commit.
fragment=<type> fragment=<type>
(depends on compile-time option BTRFS_DEBUG, since: 4.4, default: off) (depends on compile-time option CONFIG_BTRFS_DEBUG, since: 4.4, default: off)
A debugging helper to intentionally fragment given *type* of block groups. The A debugging helper to intentionally fragment given *type* of block groups. The
type can be *data*, *metadata* or *all*. This mount option should not be used type can be *data*, *metadata* or *all*. This mount option should not be used
outside of debugging environments and is not recognized if the kernel config outside of debugging environments and is not recognized if the kernel config
option *BTRFS_DEBUG* is not enabled. option *CONFIG_BTRFS_DEBUG* is not enabled.
nologreplay nologreplay
(default: off, even read-only) (default: off, even read-only)
@ -287,8 +291,8 @@ max_inline=<bytes>
with a K suffix (case insensitive). In practice, this value with a K suffix (case insensitive). In practice, this value
is limited by the filesystem block size (named *sectorsize* at mkfs time), is limited by the filesystem block size (named *sectorsize* at mkfs time),
and memory page size of the system. In case of sectorsize limit, there's and memory page size of the system. In case of sectorsize limit, there's
some space unavailable due to leaf headers. For example, a 4KiB sectorsize, some space unavailable due to b-tree leaf headers. For example, a 4KiB
maximum size of inline data is about 3900 bytes. sectorsize, maximum size of inline data is about 3900 bytes.
Inlining can be completely turned off by specifying 0. This will increase data Inlining can be completely turned off by specifying 0. This will increase data
block slack if file sizes are much smaller than block size but will reduce block slack if file sizes are much smaller than block size but will reduce

View File

@ -3,6 +3,11 @@ On-disk Format
This document describes the Btrfs ondisk format. This document describes the Btrfs ondisk format.
.. note::
This document contains outdated and incomplete information and has been
copied from the original btrfs.wiki.kernel.org with little review.
Overview Overview
~~~~~~~~ ~~~~~~~~

View File

@ -5,12 +5,13 @@ There's some common functionality found in many places like help, parsing
values, sorting, extensible arrays, etc. Not all places are unified and use old values, sorting, extensible arrays, etc. Not all places are unified and use old
code implementing it manually. Below is list of usable APIs that should be spread code implementing it manually. Below is list of usable APIs that should be spread
and updated where it's still not. A need for new API might emerge from and updated where it's still not. A need for new API might emerge from
cleanups, then it should appear here. cleanups, then it should appear here. The text below gives pointers and is not
extensive, search the definitions and actual use in other code too.
Option parsing Option parsing
-------------- --------------
Files: common/help.h, common/parse-utils.h Files: :file:`common/help.h`, :file:`common/parse-utils.h`
Global options need to be processed and consumed by `clean_args_no_options`, Global options need to be processed and consumed by `clean_args_no_options`,
argument count by `check_argc_*`, `usage_*` for handling usage. argument count by `check_argc_*`, `usage_*` for handling usage.
@ -18,6 +19,21 @@ argument count by `check_argc_*`, `usage_*` for handling usage.
Options are parsed by `getopt` or `getopt_long`. Individual values from options Options are parsed by `getopt` or `getopt_long`. Individual values from options
are recognized by `parse_*`, basic types and custom types are supported. are recognized by `parse_*`, basic types and custom types are supported.
Size unit pretty printing
-------------------------
Files: :file:`common/units.h`
Many commands print byte sizes with suffixes and the output format can be
affected by command line options. In the help text the options are specified by
either `HELPINFO_UNITS_SHORT_LONG` (both long and short options) or just
`HELPINFO_UNITS_LONG` in case the short option letters would conflict.
Automatic parsing of the options from *argv* is done by `get_unit_mode_from_arg`.
Printing options is done by `pretty_size_mode` which takes the value and option
mode. Default mode is human readable, the macros defining the modes are from
`UNITS_*` namespace.
TODO TODO
---- ----
@ -33,4 +49,3 @@ Undocumented or incomplete APIs:
* common/string-table.h * common/string-table.h
* common/string-table.h * common/string-table.h
* common/task-utils.h * common/task-utils.h
* common/units.h

View File

@ -19,14 +19,36 @@ Data types
Raw data types. Integer values are stored in little endian byte order. Raw data types. Integer values are stored in little endian byte order.
- unsigned int 8bit (u8) .. list-table::
- unsigned int 16bit (u16) :header-rows: 1
- unsigned int 32bit (u32)
- unsigned int 64bit (u64) * - Meaning
- variable length binary data (data) - Size
- variable length string (string) - Name
- UUID, 16 bytes (uuid) * - unsigned int
- time specification, 64bit seconds, 32bit nanoseconds (timespec) - 8 bit
- u8
* - unsigned int
- 16 bit
- u16
* - unsigned int
- 32 bit
- u32
* - unsigned int
- 64 bit
- u64
* - variable length binary data
- variable
- data
* - variable length string
- variable
- string
* - UUID
- 16 bytes
- uuid
* - time specification
- 64bit seconds, 32bit nanoseconds
- timespec
Stream structure Stream structure
---------------- ----------------

View File

@ -79,6 +79,8 @@ OPTIONS
On multiple devices the default is *raid1*. On multiple devices the default is *raid1*.
.. _mkfs-feature-mixed-bg:
-M|--mixed -M|--mixed
Normally the data and metadata block groups are isolated. The *mixed* mode Normally the data and metadata block groups are isolated. The *mixed* mode
will remove the isolation and store both types in the same block group type. will remove the isolation and store both types in the same block group type.
@ -300,12 +302,29 @@ free-space-tree
(default since btrfs-progs 5.15, kernel support since 4.5) (default since btrfs-progs 5.15, kernel support since 4.5)
Enable the free space tree (mount option *space_cache=v2*) for persisting the Enable the free space tree (mount option *space_cache=v2*) for persisting the
free space cache. free space cache in a b-tree. This is built on top of the COW mechanism
and has better performance than v1.
Offline conversion from filesystems that don't have this feature
enabled at *mkfs* time is possible, see :doc:`btrfstune`.
Online conversion can be done by mounting with ``space_cache=v2``, this
is sufficient to be done one time.
.. _mkfs-feature-block-group-tree:
block-group-tree block-group-tree
(kernel support since 6.1) (kernel support since 6.1)
Enable the block group tree to greatly reduce mount time for large filesystems. Enable a dedicated b-tree for block group items, this greatly reduces
mount time for large filesystems due to better data locality that
avoids seeking. On rotational devices the *large* size is considered
starting from the 2-4TiB. Can be used on other types of devices (SSD,
NVMe, ...) as well.
Offline conversion from filesystems that don't have this feature
enabled at *mkfs* time is possible, see :doc:`btrfstune`. Online
conversion is not possible.
.. _mkfs-section-profiles: .. _mkfs-section-profiles: