mirror of
https://github.com/kdave/btrfs-progs
synced 2025-01-13 09:11:36 +00:00
btrfs-progs: docs: more about hardware considerations
Make it a new chapter with sections. The SSD and firmware parts were inspired by a more detailed Zygo's writeup at https://github.com/kdave/btrfs-progs/issues/319#issuecomment-739423260 Signed-off-by: David Sterba <dsterba@suse.com>
This commit is contained in:
parent
94f3b75c00
commit
78501931de
@ -24,6 +24,7 @@ tools. Currently covers:
|
||||
. seeding device
|
||||
. raid56 status and recommended practices
|
||||
. storage model
|
||||
. hardware considerations
|
||||
|
||||
|
||||
MOUNT OPTIONS
|
||||
@ -1466,15 +1467,16 @@ such block the data inside would not be consistent with the rest. To rule that
|
||||
out there's embedded block number in the metadata block. It's the logical
|
||||
block number because this is what the logical structure expects and verifies.
|
||||
|
||||
|
||||
HARDWARE CONSIDERATIONS
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
-----------------------
|
||||
|
||||
The following is based on information publicly available, user feedback,
|
||||
community discussions or bug report analyses. It's not complete and further
|
||||
research is encouraged when in doubt.
|
||||
|
||||
HARDWARE CONSIDERATIONS - MEMORY
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
MAIN MEMORY
|
||||
~~~~~~~~~~~
|
||||
|
||||
The data structures and raw data blocks are temporarily stored in computer
|
||||
memory before they get written to the device. It is critical that memory is
|
||||
@ -1499,15 +1501,25 @@ have been demonstrated ('rowhammer') achieving specific bits to be flipped.
|
||||
While these were targeted, this shows that a series of reads or writes can
|
||||
affect unrelated parts of memory.
|
||||
|
||||
Further reading:
|
||||
|
||||
- https://en.wikipedia.org/wiki/Row_hammer
|
||||
|
||||
DIRECT MEMORY ACCESS (DMA)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Another class of errors is related to DMA (direct memory access) performed
|
||||
by device drivers. While this could be considered a software error, the
|
||||
data transfers that happen without CPU assistance may accidentally corrupt
|
||||
other pages. Storage devices utilize DMA for performance reasons, the
|
||||
filesystem structures and data pages are passed back and forth, making
|
||||
errors possible.
|
||||
errors possible in case page life time is not properly tracked.
|
||||
|
||||
HARDWARE CONSIDERATIONS - ROTATIONAL DISKS
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
There are lots of quirks (device-specific workarounds) in linux kernel
|
||||
drivers (regarding not only DMA) that are added when found.
|
||||
|
||||
ROTATIONAL DISKS (HDD)
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Rotational HDDs typically fail at the level of individual sectors or small clusters.
|
||||
Read failures are caught on the levels below the filesystem and are returned to
|
||||
@ -1524,12 +1536,82 @@ unexpected physical conditions or unsupported use cases.
|
||||
Disks are connected by cables with two ends, both of which can cause problems
|
||||
when not attached properly. Data transfers are protected by checksums and the
|
||||
lower layers try hard to transfer the data correctly or not at all. The errors
|
||||
from badly-connecting cables
|
||||
may manifest as large amount of failed read or write requests, or as short
|
||||
error bursts depending on physical conditions.
|
||||
from badly-connecting cables may manifest as large amount of failed read or
|
||||
write requests, or as short error bursts depending on physical conditions.
|
||||
|
||||
HARDWARE CONSIDERATIONS - SD FLASH CARDS
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
SOLID STATE DRIVES (SSD)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The mechanism of information storage is different from HDDs and this affects
|
||||
the failure mode as well. The data are stored in cells grouped in large blocks
|
||||
with limited number of resets and other write constraints. The firmware tries
|
||||
to avoid unnecessary resets and performs optimizations to maximize the storage
|
||||
media lifetime. The known techniques are deduplication (blocks with same
|
||||
fingerprint/hash are mapped to same physical block), compression or internal
|
||||
remapping and garbage collection of used memory cells. Due to the additional
|
||||
processing there are measures to verity the data e.g. by ECC codes.
|
||||
|
||||
The observations of failing SSDs show that the whole electronic fails at once
|
||||
or affects a lot of data (eg. stored on one chip). Recovering such data
|
||||
may need specialized equipment and reading data repeatedly does not help as
|
||||
it's possible with HDDs.
|
||||
|
||||
There are several technologies of the memory cells with different
|
||||
characteristics and price. The lifetime is directly affected by the type and
|
||||
frequency of data written. Writing "too much" distinct data (e.g. encrypted)
|
||||
may render the internal deduplication ineffective and lead to a lot of rewrites
|
||||
and increased wear of the memory cells.
|
||||
|
||||
There are several technologies and manufacturers so it's hard to describe them
|
||||
but there are some that exhibit similar behaviour:
|
||||
|
||||
- expensive SSD will use more durable memory cells and is optimized
|
||||
for reliability and high load
|
||||
- cheap SSD is projected for a lower load ("desktop user") and is optimized for
|
||||
cost, it may employ the optimizations and/or extended error reporting partially
|
||||
or not at all
|
||||
|
||||
It's not possible to reliably determine the expected lifetime of an SSD due to
|
||||
lack of information about how it works or due to lack of reliable stats provided
|
||||
by the device.
|
||||
|
||||
Metadata writes tend to be the biggest component of lifetime writes to a SSD,
|
||||
so there is some value in reducing them. Depending on the device class (high
|
||||
end/low end) the features like DUP block group profiles may affect the
|
||||
reliability in both ways:
|
||||
|
||||
- 'high end' are typically more reliable and using 'single' for data and metadata
|
||||
could be suitable to reduce device wear
|
||||
- 'low end' could lack ability to identify errors so an additional
|
||||
redundancy at the filesystem level (checksums, 'DUP') could help
|
||||
|
||||
Only users who consume 50 to 100% of the SSD's actual lifetime writes need to be
|
||||
concerned by the write amplification of btrfs DUP metadata. Most users will be
|
||||
far below 50% of the actual lifetime, or will write the drive to death and
|
||||
discover how many writes 100% of the actual lifetime was. SSD firmware often
|
||||
adds its own write multipliers that can be arbitrary and unpredictable and
|
||||
dependent on application behavior, and these will typically have far greater
|
||||
effect on SSD lifespan than DUP metadata. It's more or less impossible to
|
||||
predict when a SSD will run out of lifetime writes to within a factor of two, so
|
||||
it's hard to justify wear reduction as a benefit.
|
||||
|
||||
Further reading:
|
||||
|
||||
- https://www.snia.org/educational-library/ssd-and-deduplication-end-spinning-disk-2012
|
||||
- https://www.snia.org/educational-library/realities-solid-state-storage-2013-2013
|
||||
- https://www.snia.org/educational-library/ssd-performance-primer-2013
|
||||
- https://www.snia.org/educational-library/how-controllers-maximize-ssd-life-2013
|
||||
|
||||
DRIVE FIRMWARE
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
Firmware is technically still software but embedded into the hardware. As all
|
||||
software has bugs, so does firmware. Storage devices can update the firmware
|
||||
and fix known bugs. In some cases the it's possible to avoid certain bugs by
|
||||
quirks (device-specific workarounds) in Linux kernel.
|
||||
|
||||
SD FLASH CARDS
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
There are a lot of devices with low power consumption and thus using storage
|
||||
media based on low power consumption, typically flash memory stored on
|
||||
@ -1537,8 +1619,8 @@ a chip enclosed in a detachable card package. An improperly inserted card may be
|
||||
damaged by electrical spikes when the device is turned on or off. The chips
|
||||
storing data in turn may be damaged permanently. All types of flash memory
|
||||
have a limited number of number of rewrites, so the data are internally
|
||||
translated by FTL (flash translation layer). This is implemented in firmware (software) and
|
||||
prone to bugs that manifest as hadrware errors.
|
||||
translated by FTL (flash translation layer). This is implemented in firmware
|
||||
(software) and prone to bugs that manifest as hardware errors.
|
||||
|
||||
Adding redundancy like using DUP profiles for both data and metadata can help
|
||||
in some cases.
|
||||
|
Loading…
Reference in New Issue
Block a user