77 lines
3.8 KiB
ReStructuredText
77 lines
3.8 KiB
ReStructuredText
Tree checker
|
|
============
|
|
|
|
Tree checker is a feature that verifies metadata blocks before write or after
|
|
read from the devices. The b-tree nodes contain several items describing the
|
|
filesystem structures and to some degree can be verified for consistency or
|
|
validity. This is an additional check to the checksums that only verify the
|
|
overall block status while the tree checker tries to validate and cross
|
|
reference the logical structure. This takes a slight performance hit but is
|
|
comparable to calculating the checksum and has no noticeable impact while it
|
|
does catch all sorts of errors.
|
|
|
|
There are two occasions when the checks are done:
|
|
|
|
Pre-write checks
|
|
----------------
|
|
|
|
When metadata blocks are in memory and about to be written to the permanent
|
|
storage, the checks are performed, before the checksums are calculated. This
|
|
can catch random corruptions of the blocks (or pages) either caused by bugs or
|
|
by other parts of the system or hardware errors (namely faulty RAM).
|
|
|
|
Once a block does not pass the checks, the filesystem refuses to write more data
|
|
and turns itself to read-only mode to prevent further damage. At this point some
|
|
the recent metadata updates are held *only* in memory so it's best to not panic
|
|
and try to remember what files could be affected and copy them elsewhere. Once
|
|
the filesystem gets unmounted, the most recent changes are unfortunately lost.
|
|
The filesystem that is stored on the device is still consistent and should mount
|
|
fine.
|
|
|
|
A message may look like:
|
|
|
|
.. code-block::
|
|
|
|
[ 1716.823895] BTRFS critical (device vdb): corrupt leaf: root=18446744073709551607 block=38092800 slot=0, invalid key objectid: has 1 expect 6 or [256, 18446744073709551360] or 18446744073709551604
|
|
[ 1716.829499] BTRFS info (device vdb): leaf 38092800 gen 19 total ptrs 4 free space 15851 owner 18446744073709551607
|
|
[ 1716.832891] BTRFS info (device vdb): refs 3 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) lock_owner 0 current 1506
|
|
[ 1716.836054] item 0 key (1 1 0) itemoff 16123 itemsize 160
|
|
[ 1716.837993] inode generation 1 size 0 mode 100600
|
|
[ 1716.839760] item 1 key (256 1 0) itemoff 15963 itemsize 160
|
|
[ 1716.841742] inode generation 4 size 0 mode 40755
|
|
[ 1716.843393] item 2 key (256 12 256) itemoff 15951 itemsize 12
|
|
[ 1716.845320] item 3 key (18446744073709551611 48 1) itemoff 15951 itemsize 0
|
|
[ 1716.847505] BTRFS error (device vdb): block=38092800 write time tree block corruption detected
|
|
|
|
The line(s) before the *write time tree block corruption detected* message is
|
|
specific to the found error.
|
|
|
|
Post-read checks
|
|
----------------
|
|
|
|
Metadata blocks get verified right after they're read from devices and the
|
|
checksum is found to be valid. This protects against changes to the metadata
|
|
that could possibly also update the checksum, less likely to happen accidentally
|
|
but rather due to intentional corruption or fuzzing.
|
|
|
|
.. code-block::
|
|
|
|
[ 4823.612832] BTRFS critical (device vdb): corrupt leaf: root=7 block=30474240 slot=0, invalid nritems, have 0 should not be 0 for non-root leaf
|
|
[ 4823.616798] BTRFS error (device vdb): block=30474240 read time tree block corruption detected
|
|
|
|
The checks
|
|
----------
|
|
|
|
As implemented right now, the metadata consistency is limited to one b-tree node
|
|
and what items are stored there, ie. there's no extensive or broad check done
|
|
eg. against other data structures in other b-tree nodes. This still provides
|
|
enough opportunities to verify consistency of individual items, besides verifying
|
|
general validity of the items like the length or offset. The b-tree items are
|
|
also coupled with a key so proper key ordering is also part of the check and can
|
|
reveal random bitflips in the sequence (this has been the most successful
|
|
detector of faulty RAM).
|
|
|
|
The capabilities of tree checker have been improved over time and it's possible
|
|
that a filesystem created on an older kernel may trigger warnings or fail some
|
|
checks on a new one.
|