diff --git a/Documentation/btrfs-balance.asciidoc b/Documentation/btrfs-balance.asciidoc index 7df40b9c..fff8a321 100644 --- a/Documentation/btrfs-balance.asciidoc +++ b/Documentation/btrfs-balance.asciidoc @@ -51,17 +51,32 @@ NOTE: A short syntax *btrfs balance * works due to backward compatibility but is deprecated and should not be used anymore. Use *btrfs balance start* command instead. +PERFORMANCE IMPLICATIONS +------------------------ + +Balance operation is intense namely in the IO respect, but can be also CPU +intense. It affects other actions on the filesystem. There are typically lots +of data being copied from one location to another, and lots of metadata get +updated. + +Depending on the actual block group layout, it can be also seek-heavy. The +performance on rotational devices is noticeably worse than on SSDs or fast +arrays. + SUBCOMMAND ---------- *cancel* :: -cancel running or paused balance +cancel running or paused balance, the command will block and wait until the +actually processed blockgroup is finished *pause* :: pause running balance operation, this will store the state of the balance progress and used filters to the filesystem *resume* :: -resume interrupted balance +resume interrupted balance, the balance status must be stored on the filesystem +from previous run, eg. after it was forcibly interrupted and mounted again with +'skip_balance' *start* [options] :: start the balance operation according to the specified filters, no filters @@ -73,6 +88,10 @@ filesystem size. To prevent starting a full balance by accident, the user is warned and has a few seconds to cancel the operation before it starts. The warning and delay can be skipped with '--full-balance' option. + +Please note that the filters must be written together with the '-d', '-m' and +'-s' options, because they're optional and bare '-d' etc alwo work and mean no +filters. ++ `Options` + -d[]:::: @@ -94,7 +113,7 @@ If '-v' option is given, output will be verbose. FILTERS ------- From kernel 3.3 onwards, btrfs balance can limit its action to a subset of the -full filesystem, and can be used to change the replication configuration (e.g. +whole filesystem, and can be used to change the replication configuration (e.g. moving data from single to RAID1). This functionality is accessed through the '-d', '-m' or '-s' options to btrfs balance start, which filter on data, metadata and system blocks respectively. @@ -140,6 +159,9 @@ parameters. + NOTE: starting with kernel 4.5, the 'data' chunks can be converted to/from the 'DUP' profile on a single device. ++ +NOTE: starting with kernel 4.6, all profiles can be converted to/from 'DUP' on +multi-device filesystems. *limit=*:: *limit=*:: @@ -206,6 +228,128 @@ Conversion to profiles based on striping (RAID0, RAID5/6) require the work space on each device. An interrupted balance may leave partially filled block groups that might consume the work space. +EXAMPLES +-------- + +A more comprehensive example when going from one to multiple devices, and back, +can be found in section 'TYPICAL USECASES' of `btrfs-device`(8). + +MAKING BLOCK GROUP LAYOUT MORE COMPACT +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The layout of block groups is not normally visible, most tools report only +summarized numbers of free or used space, but there are still some hints +provided. + +Let's use the following real life example and start with the output: + +-------------------- +$ btrfs fi df /path +Data, single: total=75.81GiB, used=64.44GiB +System, RAID1: total=32.00MiB, used=20.00KiB +Metadata, RAID1: total=15.87GiB, used=8.84GiB +GlobalReserve, single: total=512.00MiB, used=0.00B +-------------------- + +Roughly calculating for data, '75G - 64G = 11G', the used/total ratio is +about '85%'. How can we can interpret that: + +* chunks are filled by 85% on average, ie. the 'usage' filter with anything + smaller than 85 will likely not affect anything +* in a more realistic scenario, the space is distributed unevenly, we can + assume there are completely used chunks and the remaining are partially filled + +Compacting the layout could be used on both. In the former case it would spread +data of a given chunk to the others and removing it. Here we can estimate that +roughly 850 MiB of data have to be moved (85% of a 1 GiB chunk). + +In the latter case, targeting the partially used chunks will have to move less +data and thus will be faster. A typical filter command would look like: + +-------------------- +# btrfs balance start -dusage=50 /path +Done, had to relocate 2 out of 97 chunks + +$ btrfs fi df /path +Data, single: total=74.03GiB, used=64.43GiB +System, RAID1: total=32.00MiB, used=20.00KiB +Metadata, RAID1: total=15.87GiB, used=8.84GiB +GlobalReserve, single: total=512.00MiB, used=0.00B +-------------------- + +As you can see, the 'total' amount of data is decreased by just 1 GiB, which is +an expected result. Let's see what will happen when we increase the estimated +usage filter. + +-------------------- +# btrfs balance start -dusage=85 /path +Done, had to relocate 13 out of 95 chunks + +$ btrfs fi df /path +Data, single: total=68.03GiB, used=64.43GiB +System, RAID1: total=32.00MiB, used=20.00KiB +Metadata, RAID1: total=15.87GiB, used=8.85GiB +GlobalReserve, single: total=512.00MiB, used=0.00B +-------------------- + +Now the used/total ratio is about 94% and we moved about '74G - 68G = 6G' of +data to the remaining blockgroups, ie. the 6GiB are now free of filesystem +structures, and can be reused for new data or metadata block groups. + +We can do a similar exercise with the metadata block groups, but this should +not be typically necessary, unless the used/total ration is really off. Here +the ratio is roughly 50% but the difference as an absolute number is "a few +gigabytes", which can be considered normal for a workload with snapshots or +reflinks updated frequently. + +-------------------- +# btrfs balance start -musage=50 /path +Done, had to relocate 4 out of 89 chunks + +$ btrfs fi df /path +Data, single: total=68.03GiB, used=64.43GiB +System, RAID1: total=32.00MiB, used=20.00KiB +Metadata, RAID1: total=14.87GiB, used=8.85GiB +GlobalReserve, single: total=512.00MiB, used=0.00B +-------------------- + +Just 1 GiB decrease, which possibly means there are block groups with good +utilization. Making the metadata layout more compact would in turn require +updating more metadata structures, ie. lots of IO. As running out of metadata +space is a more severe problem, it's not necessary to keep the utilization +ratio too high. For the purpose of this example, let's see the effects of +further compaction: + +-------------------- +# btrfs balance start -musage=70 /path +Done, had to relocate 13 out of 88 chunks + +$ btrfs fi df . +Data, single: total=68.03GiB, used=64.43GiB +System, RAID1: total=32.00MiB, used=20.00KiB +Metadata, RAID1: total=11.97GiB, used=8.83GiB +GlobalReserve, single: total=512.00MiB, used=0.00B +-------------------- + +GETTING RID OF COMPLETELY UNUSED BLOCK GROUPS +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Normally the balance operation needs a work space, to temporarily move the +data before the old block groups gets removed. If there's no work space, it +ends with 'no space left'. + +There's a special case when the block groups are completely unused, possibly +left after removing lots of files or deleting snapshots. Removing empty block +groups is automatic since 3.18. The same can be achieved manually with a +notable exception that this operation does not require the work space. Thus it +can be used to reclaim unused block groups to make it available. + +-------------------- +# btrfs balance start -dusage=0 /path +-------------------- + +This should lead to decrease in the 'total' numbers in the *btrfs fi df* output. + EXIT STATUS ----------- *btrfs balance* returns a zero exit status if it succeeds. Non zero is