btrfs-progs/Documentation/dev-btrees.rst

112 lines
4.0 KiB
ReStructuredText
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Btrees
======
Btrees Introduction
-------------------
Btrfs uses a single set of btree manipulation code for all metadata in
the filesystem. For performance or organizational purposes, the trees
are broken up into a few different types, and each type of tree will
hold a few different types of keys. The super block holds pointers to
the tree roots of the tree of tree roots and the chunk tree.
Tree of Tree roots
------------------
This tree is used for indexing and finding the root of most of the other
trees in the filesystem. It attaches names to subvolumes and snapshots,
and stores the location of the extent allocation tree root. It also
stores pointers to all of the subvolumes or snapshots that are being
deleted by the transaction code. This allows the deletion to pick up
where it left off after a crash.
Chunk Tree
----------
The chunk tree does all of the logical to physical block address mapping
for the filesystem, and it stores information about all of the devices
in the FS. In order to bootstrap lookup in the chunk tree, the super
block also duplicates the chunk items needed to resolve blocks in the
chunk tree. Over time, the chunk tree will be split into multiple roots
to allow access of larger storage pools.
There are back references from the chunk items to the extent tree that
allocated them. Only a single extent tree can allocate extents out of a
given chunk.
Two types of key are stored in the chunk tree:
- DEV_ITEM (where the offset field is the internal devid), which
contain information on all of the underlying block devices in the
filesystem
- CHUNK_ITEM (where the offset field is the start of the chunk as a
virtual address), which maps a section of the virtual address space
(a chunk) into physical storage.
Device Allocation Tree
----------------------
The device allocation tree records which parts of each physical device
have been allocated into chunks. This is a relatively small tree that is
only updated as new chunks are allocated. It stores back references to
the chunk tree that allocated each physical extent on the device.
Extent Allocation Tree
----------------------
The extent allocation tree records byte ranges that are in use,
maintains reference counts on each extent and records back references to
the tree or file that is using each extent. Logical block groups are
created inside the extent allocation tree, and these reference large
logical extents from the chunk tree.
Each block group can only store a specific type of extent. This might
include metadata, or mirrored metadata, or striped data blocks etc.
Currently there is only one extent allocation tree shared by all the
other trees. This will change in order to scale better under load.
Keys for the extent tree use the start of the extent as the objectid. A BLOCK_GROUP_ITEM key will be followed by the EXTENT_ITEM keys for extents within that block group.
FS Trees
--------
These store files and directories, and all of the normal metadata you
would expect to find in a filesystem. There is one root for each
subvolume or snapshot, but snapshots will share blocks between roots.
Keys in FS trees always use the inode number of the filesystem object as the objectid.
Each object will have one or more of:
- Inode.
- Inode ref, indicating what name this object is known as, and in which
directory.
- For files, a set of extent information, indicating where on the
filesystem this file's data is.
- For directories, two sequences of dir_items, one indexed by a hash of
the object name, and one indexed by a unique sequential index number.
Checksum Tree
-------------
The checksum tree stores block checksums. Every 4k block of data stored
on disk has a checksum associated with it. The "offset" part of the keys
in the checksum tree indicates the start of the checksummed data on
disk. The value stored with the key is a sequence of (currently 4-byte)
checksums, for the 4k blocks starting at the offset.
Data Relocation Tree
--------------------
Log Root Tree
-------------