From 94856dd5476772a7fdba20adc1ad01543dc276c3 Mon Sep 17 00:00:00 2001 From: David Sterba Date: Tue, 7 Nov 2017 19:03:44 +0100 Subject: [PATCH] btrfs-progs: docs: update mount options Enhance the text, update for 4.14, sync with existing wiki page. Signed-off-by: David Sterba --- Documentation/btrfs-man5.asciidoc | 117 +++++++++++++++++++++--------- 1 file changed, 84 insertions(+), 33 deletions(-) diff --git a/Documentation/btrfs-man5.asciidoc b/Documentation/btrfs-man5.asciidoc index 3981435e..5199f88b 100644 --- a/Documentation/btrfs-man5.asciidoc +++ b/Documentation/btrfs-man5.asciidoc @@ -83,21 +83,22 @@ supposed to make it to the permanent storage. (since: 3.0, default: off) + These debugging options control the behavior of the integrity checking -module (the BTRFS_FS_CHECK_INTEGRITY config option required). + +module (the BTRFS_FS_CHECK_INTEGRITY config option required). The main goal is +to verify that all blocks from a given transaction period are properly linked. + -`check_int` enables the integrity checker module, which examines all +'check_int' enables the integrity checker module, which examines all block write requests to ensure on-disk consistency, at a large -memory and CPU cost. + +memory and CPU cost. + -`check_int_data` includes extent data in the integrity checks, and -implies the check_int option. + +'check_int_data' includes extent data in the integrity checks, and +implies the 'check_int' option. + -`check_int_print_mask` takes a bitmask of BTRFSIC_PRINT_MASK_* values +'check_int_print_mask' takes a bitmask of BTRFSIC_PRINT_MASK_* values as defined in 'fs/btrfs/check-integrity.c', to control the integrity -checker module behavior. + +checker module behavior. + See comments at the top of 'fs/btrfs/check-integrity.c' -for more info. +for more information. *clear_cache*:: Force clearing and rebuilding of the disk space cache if something @@ -106,10 +107,11 @@ has gone wrong. See also: 'space_cache'. *commit='seconds'*:: (since: 3.12, default: 30) + -Set the interval of periodic commit. Higher -values defer data being synced to permanent storage with obvious -consequences when the system crashes. The upper bound is not forced, -but a warning is printed if it's more than 300 seconds (5 minutes). +Set the interval of periodic transaction commit when data are synchronized +to permanent storage. Higher interval values lead to larger amount of unwritten +data, which has obvious consequences when the system crashes. +The upper bound is not forced, but a warning is printed if it's more than 300 +seconds (5 minutes). Use with care. *compress*:: *compress='type'*:: @@ -141,6 +143,10 @@ Enable data copy-on-write for newly created files. under 'nodatacow' are also set the NOCOW file attribute (see `chattr`(1)). + NOTE: If 'nodatacow' or 'nodatasum' are enabled, compression is disabled. ++ +Updates in-place improve performance for workloads that do frequent overwrites, +at the cost of potential partial writes, in case the write is interruted +(system crash, device failure). *datasum*:: *nodatasum*:: @@ -152,13 +158,31 @@ under 'nodatasum' inherit the "no checksums" property, however there's no corresponding file attribute (see `chattr`(1)). + NOTE: If 'nodatacow' or 'nodatasum' are enabled, compression is disabled. ++ +There is a slight performance gain when checksums are turned off, the +correspoinding metadata blocks holding the checksums do not need to updated. +The cost of checksumming of the blocks in memory is much lower than the IO, +modern CPUs feature hardware support of the checksumming algorithm. *degraded*:: (default: off) + -Allow mounts with less devices than the raid profile constraints -require. A read-write mount (or remount) may fail with too many devices +Allow mounts with less devices than the RAID profile constraints +require. A read-write mount (or remount) may fail when there are too many devices missing, for example if a stripe member is completely missing from RAID0. ++ +Since 4.14, the constraint checks have been improved and are verified on the +chunk level, not an the device level. This allows degraded mounts of +filesystems with mixed RAID profiles for data and metadata, even if the +device number constraints would not be satisfied for some of the prifles. ++ +Example: metadata -- raid1, data -- single, devices -- /dev/sda, /dev/sdb ++ +Suppose the data are completely stored on 'sda', then missing 'sdb' will not +prevent the mount, even if 1 missing device would normally prevent (any) +'single' profile to mount. In case some of the data chunks are stored on 'sdb', +then the constraint of single/data is not satisfied and the filesystem +cannot be mounted. *device='devicepath'*:: Specify a path to a device that will be scanned for BTRFS filesystem during @@ -174,14 +198,14 @@ system at that point. *nodiscard*:: (default: off) + -Enable discarding of freed file blocks using TRIM operation. This is useful +Enable discarding of freed file blocks using the TRIM operation. This is useful for SSD devices, thinly provisioned LUNs or virtual machine images where the backing device understands the operation. Depending on support of the underlying device, the operation may severely hurt performance in case the TRIM operation is synchronous (eg. with SATA devices up to revision 3.0). + If discarding is not necessary to be done at the block freeing time, there's -`fstrim` tool that lets the filesystem discard all free blocks in a batch, +`fstrim`(8) tool that lets the filesystem discard all free blocks in a batch, possibly not much interfering with other operations. Also, the the device may ignore the TRIM command if the range is too small, so running the batch discard can actually discard the blocks. @@ -215,7 +239,7 @@ This option forces any data dirtied by a write in a prior transaction to commit as part of the current commit, effectively a full filesystem sync. + This makes the committed state a fully consistent view of the file system from -the application's perspective (i.e., it includes all completed file system +the application's perspective (i.e. it includes all completed file system operations). This was previously the behavior only when a snapshot was created. + @@ -245,6 +269,14 @@ the option. + NOTE: Defaults to off due to a potential overflow problem when the free space checksums don't fit inside a single page. ++ +Don't use this option unless you really need it. The inode number limit +on 64bit system is 2^64^, which is practically enough for the whole filesystem +lifetime. Due to implemention of linux VFS layer, the inode numbers on 32bit +systems are only 32 bits wide. This lowers the limit significantly and makes +it possible to reach it. In such case, this mount option will help. +Alternatively, files with high inode numbers can be copied to a new subvolume +which will effectively start the inode numbers from the beginning again. *logreplay*:: *nologreplay*:: @@ -258,7 +290,7 @@ disable that behaviour, mount also with 'nologreplay'. *max_inline='bytes'*:: (default: min(2048, page size) ) + -Specify the maximum amount of space, in bytes, that can be inlined in +Specify the maximum amount of space, that can be inlined in a metadata B-tree leaf. The value is specified in bytes, optionally with a K suffix (case insensitive). In practice, this value is limited by the filesystem block size (named 'sectorsize' at mkfs time), @@ -319,8 +351,8 @@ the space cache consumes some resources, including a small amount of disk space. + There are two implementations of the free space cache. The original -implementation, 'v1', is the safe default. The 'v1' space cache can be disabled -at mount time with 'nospace_cache' without clearing. +one, referred to as 'v1', is the safe default. The 'v1' space cache can be +disabled at mount time with 'nospace_cache' without clearing. + On very large filesystems (many terabytes) and certain workloads, the performance of the 'v1' space cache may degrade drastically. The 'v2' @@ -329,12 +361,12 @@ this issue. Once enabled, the 'v2' space cache will always be used and cannot be disabled unless it is cleared. Use 'clear_cache,space_cache=v1' or 'clear_cache,nospace_cache' to do so. If 'v2' is enabled, kernels without 'v2' support will only be able to mount the filesystem in read-only mode. The -`btrfs(8)` command currently only has read-only support for 'v2'. A read-write +`btrfs`(8) command currently only has read-only support for 'v2'. A read-write command may be run on a 'v2' filesystem by clearing the cache, running the command, and then remounting with 'space_cache=v2'. + If a version is not explicitly specified, the default implementation will be -chosen, which is 'v1' as of 4.9. +chosen, which is 'v1'. *ssd*:: *ssd_spread*:: @@ -342,10 +374,22 @@ chosen, which is 'v1' as of 4.9. (default: SSD autodetected) + Options to control SSD allocation schemes. By default, BTRFS will -enable or disable SSD allocation heuristics depending on whether a -rotational or non-rotational device is in use (contents of -'/sys/block/DEV/queue/rotational'). If it is, the 'ssd' option is turned on. -The option 'nossd' will disable the autodetection. +enable or disable SSD optimizations depending on status of a device with +respect to rotational or non-rotational type. This is determined by the +contents of '/sys/block/DEV/queue/rotational'). If it is 1, the 'ssd' option is +turned on. The option 'nossd' will disable the autodetection. ++ +The optimizations make use of the absence of the seek penalty that's inherent +for the rotational devices. The blocks can be typically written faster and +are not offloaded to separate threads. ++ +NOTE: Since 4.14, the block layout optimizations have been dropped. This used +to help with first generations of SSD devices. Their FTL (flash translation +layer) was not effective and the optimization was supposed to improve the wear +by better aligning blocks. This is no longer true with modern SSD devices and +the optimization had no real benefit. Furthermore it caused increased +fragmentation. The layout tuning has been kept intact for the option +'ssd_spread'. + The 'ssd_spread' mount option attempts to allocate into bigger and aligned chunks of unused space, and may perform better on low-end SSDs. 'ssd_spread' @@ -354,25 +398,26 @@ will disable all SSD options. *subvol='path'*:: Mount subvolume from 'path' rather than the toplevel subvolume. The -'path' is absolute (ie. starts at the toplevel subvolume). +'path' is always treated as relative to the the toplevel subvolume. This mount option overrides the default subvolume set for the given filesystem. *subvolid='subvolid'*:: Mount subvolume specified by a 'subvolid' number rather than the toplevel -subvolume. You can use *btrfs subvolume list* to see subvolume ID numbers. +subvolume. You can use *btrfs subvolume list* of *btrfs subvolume show* to see +subvolume ID numbers. This mount option overrides the default subvolume set for the given filesystem. + NOTE: if both 'subvolid' and 'subvol' are specified, they must point at the -same subvolume, otherwise mount will fail. +same subvolume, otherwise the mount will fail. *thread_pool='number'*:: (default: min(NRCPUS + 2, 8) ) + -The number of worker threads to allocate. NRCPUS is number of on-line CPUs +The number of worker threads to start. NRCPUS is number of on-line CPUs detected at the time of mount. Small number leads to less parallelism in processing data and metadata, higher numbers could lead to a performance hit -due to increased locking contention, cache-line bouncing or costly data -transfers between local CPU memories. +due to increased locking contention, process scheduling, cache-line bouncing or +costly data transfers between local CPU memories. *treelog*:: *notreelog*:: @@ -384,13 +429,14 @@ are flushed at sync and transaction commit. If the system crashes between two such syncs, the pending tree log operations are replayed during mount. + WARNING: currently, the tree log is replayed even with a read-only mount! To -disable that behaviour, mount also with 'nologreplay'. +disable that behaviour, also mount with 'nologreplay'. + The tree log could contain new files/directories, these would not exist on a mounted filesystem if the log is not replayed. *usebackuproot*:: *nousebackuproot*:: +(since: 4.6, default: off) + Enable autorecovery attempts if a bad tree root is found at mount time. Currently this scans a backup list of several previous tree roots and tries to @@ -403,6 +449,11 @@ NOTE: This option has replaced 'recovery'. + Allow subvolumes to be deleted by their respective owner. Otherwise, only the root user can do that. ++ +NOTE: historically, any user could create a snapshot even if he was not owner +of the source subvolume, the subvolume deletion has been restricted for that +reason. The subvolume creation has been restricted but this mount option is +still required. This is a usability issue and will be addressed in the future. DEPRECATED MOUNT OPTIONS ~~~~~~~~~~~~~~~~~~~~~~~~