mirror of
https://github.com/kdave/btrfs-progs
synced 2025-01-28 16:32:43 +00:00
btrfs-progs: docs: updates, clarifications
Update spelling, add notable kernel/version features or updates. Signed-off-by: David Sterba <dsterba@suse.com>
This commit is contained in:
parent
719b5a592f
commit
8cec98cb75
@ -2,7 +2,13 @@ Auto-repair on read
|
||||
===================
|
||||
|
||||
Data or metadata that are found to be damaged (e.g. because the checksum does
|
||||
not match) at the time they're read from the device can be salvaged in case the
|
||||
not match) at the time they're read from a device can be salvaged in case the
|
||||
filesystem has another valid copy when using block group profile with redundancy
|
||||
(DUP, RAID1, RAID5/6). The correct data are returned to the user application
|
||||
and the damaged copy is replaced by it.
|
||||
(DUP, RAID1-like, RAID5/6). The correct data are returned to the user application
|
||||
and the damaged copy is replaced by it. When this happen a message is emitted
|
||||
to the system log.
|
||||
|
||||
If there are more copies of data and one of them is damaged but not read by
|
||||
user application then this is not detected. To verify all data and metadata
|
||||
copies there's :doc:`scrub<Scrub>` that needs to be started manually, automatic
|
||||
repairs happens in that case.
|
||||
|
@ -6,14 +6,15 @@ call interface to let user applications access the advanced features. They're
|
||||
low level and the following list gives only an overview of the capabilities or
|
||||
a command if available:
|
||||
|
||||
- reverse lookup, from file offset to inode, ``btrfs inspect-internal
|
||||
logical-resolve``
|
||||
- reverse lookup, from file offset to inode, as command ``btrfs
|
||||
inspect-internal logical-resolve``
|
||||
|
||||
- resolve inode number to list of name, ``btrfs inspect-internal inode-resolve``
|
||||
- resolve inode number to list of names, as command ``btrfs inspect-internal inode-resolve``
|
||||
|
||||
- tree search, given a key range and tree id, lookup and return all b-tree items
|
||||
found in that range, basically all metadata at your hand but you need to know
|
||||
what to do with them
|
||||
what to do with them, the ioctl is privileged as it has full access to all
|
||||
filesystem metadata
|
||||
|
||||
- informative, about devices, space allocation or the whole filesystem, many of
|
||||
which is also exported in ``/sys/fs/btrfs``
|
||||
|
@ -41,8 +41,8 @@ File based deduplication
|
||||
------------------------
|
||||
|
||||
The tool takes a list of files and tries to find duplicates among data only
|
||||
from that files. This is suitable e.g. for files that originated from the same
|
||||
base image, source of a reflinked file. Optionally the tools could track a
|
||||
from these files. This is suitable e.g. for files that originated from the same
|
||||
base image, source of a reflinked file. Optionally the tool could track a
|
||||
database of hashes and allow to deduplicate blocks from more files, or use that
|
||||
for repeated runs and update the database incrementally.
|
||||
|
||||
@ -53,7 +53,7 @@ The tool typically scans the filesystem and builds a database of file block
|
||||
hashes, then finds candidate files and deduplicates the ranges. The hash
|
||||
database is kept as an ordinary file and can be scaled according to the needs.
|
||||
|
||||
As the file changes, the hash database may get out of sync and the scan has to
|
||||
As the files change, the hash database may get out of sync and the scan has to
|
||||
be done repeatedly.
|
||||
|
||||
Safety of block comparison
|
||||
@ -64,12 +64,13 @@ a source file, destination file and the range. The blocks from both files are
|
||||
compared for exact match before merging to the same range (i.e. there's no
|
||||
hash based comparison). Pages representing the extents in memory are locked
|
||||
prior to deduplication and prevent concurrent modification by buffered writes
|
||||
or mmaped writes.
|
||||
or mmaped writes. Blocks are compared byte by byte and not using any hash-based
|
||||
approach, i.e. the existing checksums are not used.
|
||||
|
||||
Limitations, compatibility
|
||||
--------------------------
|
||||
|
||||
Files that are subject do deduplication must have the same status regarding
|
||||
Files that are subject to deduplication must have the same status regarding
|
||||
COW, i.e. both regular COW files with checksums, or both NOCOW, or files that
|
||||
are COW but don't have checksums (NODATASUM attribute is set).
|
||||
|
||||
|
@ -384,10 +384,10 @@ features see [[Status]] page.
|
||||
|
||||
5.17 - send and relocation
|
||||
Send and relocation (balance, device remove, shrink, block group
|
||||
reclaim) can now work in parallel
|
||||
reclaim) can now work in parallel.
|
||||
|
||||
5.17 - device add vs balance
|
||||
It is possible to add a device with paused balance
|
||||
It is possible to add a device with paused balance.
|
||||
|
||||
.. note::
|
||||
Since kernel 5.17.7 and btrfs-progs 5.17.1
|
||||
@ -414,11 +414,116 @@ features see [[Status]] page.
|
||||
the VFS limitation to reflink files on separate subvolume mounts of the
|
||||
same filesystem has been removed
|
||||
|
||||
5.18 - syslog error messages with filesystem state
|
||||
Messages are printed with a one letter tag ("state: X") that denotes in
|
||||
which state the filesystem was at this point:
|
||||
|
||||
* A - transaction aborted (permanent)
|
||||
* E - filesystem error (permanent)
|
||||
* M - remount in progress (transient)
|
||||
* R - device replace in progress (transient)
|
||||
* C - checksum checks disabled by mount option (rescue=ignoredatacsums)
|
||||
* L - log tree replay did not complete due to some error
|
||||
|
||||
5.18 - tree-checker verifies transaction id pre-write
|
||||
Metadata buffer to be written gets an extra check if the stored
|
||||
transaction number matches the current state of the filesystem.
|
||||
|
||||
5.19 - subpage support pages > 4KiB
|
||||
Metadata node size is supported regardless of the CPU page size
|
||||
(minimum size is 4KiB), data sectorsize is supported <= page size.
|
||||
(minimum size is 4KiB), data sector size is supported <= page size.
|
||||
Additionally subpage also supports RAID56.
|
||||
|
||||
5.19 - per-type background threshold for reclaim
|
||||
Add sysfs tunable for background reclaim threshold for all block group
|
||||
types (data, metadata, system).
|
||||
|
||||
5.19 - automatically repair device number mismatch
|
||||
Device information is storead in two places, the number in the super
|
||||
block and items in the device tree. When this is goes out of sync, e.g.
|
||||
by device removal short before unmount, the next mount could fail.
|
||||
The b-tree is an authoritative information an can be used to override
|
||||
the stale value in the superblock.
|
||||
|
||||
5.19 - defrag can convert inline files to regular ones
|
||||
The logic has been changed so that inline files are considered for
|
||||
defragmentation even if the mount option max_inline would prevent that.
|
||||
No defragmentation might happen but the inlined files are not skipped.
|
||||
|
||||
5.19 - explicit minimum zone size is 4MiB
|
||||
Set the minimum limit of zone on zoned devices to 4MiB. Real devices
|
||||
zones are much larger, this is for emulated devices.
|
||||
|
||||
5.19 - sysfs tunable for automatic block group reclaim
|
||||
Add possibility to set a threshold to automatically reclaim block groups
|
||||
also in non-zoned mode. By default completely empty block groups are
|
||||
reclaimed automatically but the threshold can be tuned in
|
||||
/sys/fs/btrfs/FSID/allocation/PROFILE/bg_reclaim_threshold .
|
||||
|
||||
5.19 - tree-checker verifies metadata block ownership
|
||||
Additional check done by tree-checker to verify relationship between a
|
||||
tree block and it's tree root owner.
|
||||
|
||||
6.x
|
||||
---
|
||||
|
||||
6.0 - send protocol v2
|
||||
Send protocol update that adds new commands and extends existing
|
||||
functionality to write large data chunks. Compressed (and encrypted)
|
||||
extents can be optionally emitted and transfered as-is without the need
|
||||
to recompress (or reencrypt) on the receiving side.
|
||||
|
||||
6.0 - sysfs exports commit stats
|
||||
The file /sys/fs/btrfs/FSID/commit_stats shows number of commits and
|
||||
various time related statistics.
|
||||
|
||||
6.0 - sysfs exports chunk sizes
|
||||
Chunk size value can be read from
|
||||
/sys/fs/btrfs/FSID/allocation/PROFILE/chunk_size .
|
||||
|
||||
6.0 - sysfs shows zoned mode among features
|
||||
The zoned mode has been supported since 5.10 and adding functionality.
|
||||
Now it's advertised among features.
|
||||
|
||||
6.0 - checksum implementation is logged at mount time
|
||||
When a filesystem is mounted the implementation backing the checksums
|
||||
is logged. The information is also accessible in
|
||||
/sys/fs/btrfs/FSID/checksum .
|
||||
|
||||
6.1 - sysfs support to temporarily skip exact qgroup accounting
|
||||
Allow user override of qgroup accounting and make it temporarily out
|
||||
of date e.g. in case when there are several subvolumes deleted and the
|
||||
qgroup numbers need to be updated at some cost, an update after that
|
||||
can amortize the costs.
|
||||
|
||||
6.1 - scrub also repairs superblock
|
||||
An improvement to scrub in case the superblock is detected to be
|
||||
corrupted, the repair happens immediately. Previously it was delayed
|
||||
until the next transaction commit for performance reasons that would
|
||||
store an updated and correct copy eventually.
|
||||
|
||||
6.1 - block group tree
|
||||
An incompatible change that has to be enabled at mkfs time. Add a new
|
||||
b-tree item that stores information about block groups in a compact way
|
||||
that significantly improves mount time that's usually long due to
|
||||
fragmentation and scatterd b-tree items tracking the individual block
|
||||
groups. Requires and also enables the free-space-tree and no-holes
|
||||
features.
|
||||
|
||||
6.1 - discard stats available in sysfs
|
||||
The directory '/sys/fs/btrfs/FSID/discard' exports statistics and
|
||||
tunables related to discard.
|
||||
|
||||
6.1 - additional qgroup stats in sysfs
|
||||
The overall status of qgroups are exported in
|
||||
/sys/sys/fs/btrfs/FSID/qgroups/ .
|
||||
|
||||
6.1 - check that subperblock is unchnaged at thaw time
|
||||
Do full check of super block once a filesystem is thawed. This namely
|
||||
happens when system resumes from suspend or hibernation. Accidental
|
||||
change by other operating systems will be detected.
|
||||
|
||||
6.2 - discard=async on by default
|
||||
Devices that support trim/discard will enable the asynchronous discard
|
||||
for the whole filesystem.
|
||||
|
||||
|
@ -19,7 +19,7 @@ balance
|
||||
again. It is primarily intended to rebalance the data in the filesystem
|
||||
across the *devices* when a device is added or removed. A balance
|
||||
will regenerate missing copies for the redundant *RAID* levels, if a
|
||||
device has failed. As of linux kernel 3.3, a balance operation can be
|
||||
device has failed. As of Linux kernel 3.3, a balance operation can be
|
||||
made selective about which parts of the filesystem are rewritten.
|
||||
|
||||
barrier
|
||||
|
@ -163,7 +163,7 @@ fd
|
||||
ignored
|
||||
name
|
||||
name of the subvolume, although the buffer can be almost 4k, the file
|
||||
size is limited by linux VFS to 255 characters and must not contain a slash
|
||||
size is limited by Linux VFS to 255 characters and must not contain a slash
|
||||
('/')
|
||||
|
||||
BTRFS_IOC_SUBVOL_CREATE_V2
|
||||
@ -190,7 +190,7 @@ qgroup_inherit
|
||||
...
|
||||
name
|
||||
name of the subvolume, although the buffer can be almost 4k, the file size
|
||||
is limited by linux VFS to 255 characters and must not contain a slash ('/')
|
||||
is limited by Linux VFS to 255 characters and must not contain a slash ('/')
|
||||
devid
|
||||
...
|
||||
|
||||
|
@ -4,8 +4,11 @@ booting from BTRFS with respect to features.
|
||||
U-boot (https://www.denx.de/wiki/U-Boot/) has decent support for booting but
|
||||
not all BTRFS features are implemented, check the documentation.
|
||||
|
||||
EXTLINUX (from the https://syslinux.org project) can boot but does not support
|
||||
all features. Please check the upstream documentation before you use it.
|
||||
EXTLINUX (from the https://syslinux.org project) has limited support for BTRFS
|
||||
boot and hasn't been updated for for a long time so is not recommended as
|
||||
bootloader.
|
||||
|
||||
The first 1MiB on each device is unused with the exception of primary
|
||||
superblock that is on the offset 64KiB and spans 4KiB.
|
||||
In general, the first 1MiB on each device is unused with the exception of
|
||||
primary superblock that is on the offset 64KiB and spans 4KiB. The rest can be
|
||||
freely used by bootloaders or for other system information. Note that booting
|
||||
from a filesystem on :doc:`zoned device<Zoned-mode>` is not supported.
|
||||
|
@ -1,6 +1,9 @@
|
||||
maximum file name length
|
||||
255
|
||||
|
||||
This limit is imposed by Linux VFS, the strucutres of BTRFS could store
|
||||
larger file names.
|
||||
|
||||
maximum symlink target length
|
||||
depends on the *nodesize* value, for 4KiB it's 3949 bytes, for larger nodesize
|
||||
it's 4095 due to the system limit PATH_MAX
|
||||
@ -13,16 +16,29 @@ maximum number of inodes
|
||||
2\ :sup:`64` but depends on the available metadata space as the inodes are created
|
||||
dynamically
|
||||
|
||||
Each subvolume is an independent namespace of inodes and thus their
|
||||
numbers, so the limit is per subvolume, not for the whole filesystem.
|
||||
|
||||
inode numbers
|
||||
minimum number: 256 (for subvolumes), regular files and directories: 257
|
||||
minimum number: 256 (for subvolumes), regular files and directories: 257,
|
||||
maximum number: (2\:sup:`64` - 256)
|
||||
|
||||
The inode numbers that can be assigned to user created files are from
|
||||
the whole 64bit space except first 256 and last 256 in that range that
|
||||
are reserved for internal b-tree identifiers.
|
||||
|
||||
maximum file length
|
||||
inherent limit of btrfs is 2\ :sup:`64` (16 EiB) but the linux VFS limit is 2\ :sup:`63` (8 EiB)
|
||||
inherent limit of BTRFS is 2\ :sup:`64` (16 EiB) but the practical
|
||||
limit of Linux VFS is 2\ :sup:`63` (8 EiB)
|
||||
|
||||
maximum number of subvolumes
|
||||
the subvolume ids can go up to 2\ :sup:`64` but the number of actual subvolumes
|
||||
depends on the available metadata space, the space consumed by all subvolume
|
||||
metadata includes bookkeeping of shared extents can be large (MiB, GiB)
|
||||
the subvolume ids can go up to 2\ :sup:`48` but the number of actual subvolumes
|
||||
depends on the available metadata space
|
||||
|
||||
The space consumed by all subvolume metadata includes bookkeeping of
|
||||
shared extents can be large (MiB, GiB). The range is not the full 64bit
|
||||
range because of qgroups that use the upper 16 bits for another
|
||||
purposes.
|
||||
|
||||
maximum number of hardlinks of a file in a directory
|
||||
65536 when the *extref* feature is turned on during mkfs (default), roughly
|
||||
|
@ -1,6 +1,6 @@
|
||||
A swapfile is file-backed memory that the system uses to temporarily offload
|
||||
the RAM. It is supported since kernel 5.0. Use ``swapon(8)`` to activate the
|
||||
swapfile. There are some limitations of the implementation in BTRFS and linux
|
||||
swapfile. There are some limitations of the implementation in BTRFS and Linux
|
||||
swap subsystem:
|
||||
|
||||
* filesystem - must be only single device
|
||||
|
Loading…
Reference in New Issue
Block a user