btrfs-progs: docs: add section about raid56

Used sources:

- wiki
- IRC discussions
- https://lore.kernel.org/linux-btrfs/20200627032414.GX10769@hungrycats.org

Signed-off-by: David Sterba <dsterba@suse.com>
This commit is contained in:
David Sterba 2021-03-04 13:47:26 +01:00
parent 2aeaea41a8
commit 922797e155
2 changed files with 47 additions and 2 deletions

View File

@ -20,6 +20,7 @@ tools. Currently covers:
. control device
. filesystems with multiple block group profiles
. seeding device
. raid56 status and recommended practices
MOUNT OPTIONS
@ -1089,6 +1090,51 @@ A few things to note:
* each new mount of the seeding device gets a new random UUID
RAID56 STATUS AND RECOMMENDED PRACTICES
---------------------------------------
The RAID56 feature provides striping and parity over several devices, same as
the traditional RAID5/6. There are some implementation and design deficiencies
that make it unreliable for some corner cases and the feature **should not be
used in production, only for evaluation or testing**. The power failure safety
for metadata with RAID56 is not 100%.
Metadata
~~~~~~~~
Do not use 'raid5' nor 'raid6' for metadata. Use 'raid1' or 'raid1c3'
respectively.
The substitute profiles provide the same guarantees against loss of 1 or 2
devices, and in some respect can be an improvement. Recovering from one
missing device will only need to access the remaining 1st or 2nd copy, that in
general may be stored on some other devices due to the way RAID1 works on
btrfs, unlike on a striped profile (similar to 'raid0') that would need all
devices all the time.
The space allocation pattern and consumption is different (eg. on N devices):
for 'raid5' as an example, a 1GiB chunk is reserved on each device, while with
'raid1' there's each 1GiB chunk stored on 2 devices. The consumption of each
1GiB of used metadata is then 'N * 1GiB' for vs '2 * 1GiB'. Using 'raid1'
is also more convenient for balancing/converting to other profile due to lower
requirement on the available chunk space.
Missing/incomplete support
~~~~~~~~~~~~~~~~~~~~~~~~~~
When RAID56 is on the same filesystem with different raid profiles, the space
reporting is inaccurate, eg. 'df', 'btrfs filesystem df' or 'btrfs filesystem
usge'. When there's only a one profile per block group type (eg. raid5 for data)
the reporting is accurate.
When scrub is started on a RAID56 filesystem, it's started on all devices that
degrade the performance. The workaround is to start it on each device
separately. Due to that the device stats may not match the actual state and
some errors might get reported multiple times.
The 'write hole' problem.
SEE ALSO
--------
`acl`(5),

View File

@ -205,8 +205,7 @@ root partition created with RAID1/10/5/6 profiles. The mount action can happen
before all block devices are discovered. The waiting is usually done on the
initramfs/initrd systems.
As of kernel 4.14, RAID5/6 is still considered experimental and shouldn't be
employed for production use.
RAID5/6 has known problems and should not be used in production.
FILESYSTEM FEATURES
-------------------