112 lines
4.0 KiB
ReStructuredText
112 lines
4.0 KiB
ReStructuredText
Btrees
|
||
======
|
||
|
||
Btrees Introduction
|
||
-------------------
|
||
|
||
Btrfs uses a single set of btree manipulation code for all metadata in
|
||
the filesystem. For performance or organizational purposes, the trees
|
||
are broken up into a few different types, and each type of tree will
|
||
hold a few different types of keys. The super block holds pointers to
|
||
the tree roots of the tree of tree roots and the chunk tree.
|
||
|
||
|
||
Tree of Tree roots
|
||
------------------
|
||
|
||
This tree is used for indexing and finding the root of most of the other
|
||
trees in the filesystem. It attaches names to subvolumes and snapshots,
|
||
and stores the location of the extent allocation tree root. It also
|
||
stores pointers to all of the subvolumes or snapshots that are being
|
||
deleted by the transaction code. This allows the deletion to pick up
|
||
where it left off after a crash.
|
||
|
||
|
||
Chunk Tree
|
||
----------
|
||
|
||
The chunk tree does all of the logical to physical block address mapping
|
||
for the filesystem, and it stores information about all of the devices
|
||
in the FS. In order to bootstrap lookup in the chunk tree, the super
|
||
block also duplicates the chunk items needed to resolve blocks in the
|
||
chunk tree. Over time, the chunk tree will be split into multiple roots
|
||
to allow access of larger storage pools.
|
||
|
||
There are back references from the chunk items to the extent tree that
|
||
allocated them. Only a single extent tree can allocate extents out of a
|
||
given chunk.
|
||
|
||
Two types of key are stored in the chunk tree:
|
||
|
||
- DEV_ITEM (where the offset field is the internal devid), which
|
||
contain information on all of the underlying block devices in the
|
||
filesystem
|
||
- CHUNK_ITEM (where the offset field is the start of the chunk as a
|
||
virtual address), which maps a section of the virtual address space
|
||
(a chunk) into physical storage.
|
||
|
||
|
||
Device Allocation Tree
|
||
----------------------
|
||
|
||
The device allocation tree records which parts of each physical device
|
||
have been allocated into chunks. This is a relatively small tree that is
|
||
only updated as new chunks are allocated. It stores back references to
|
||
the chunk tree that allocated each physical extent on the device.
|
||
|
||
|
||
Extent Allocation Tree
|
||
----------------------
|
||
|
||
The extent allocation tree records byte ranges that are in use,
|
||
maintains reference counts on each extent and records back references to
|
||
the tree or file that is using each extent. Logical block groups are
|
||
created inside the extent allocation tree, and these reference large
|
||
logical extents from the chunk tree.
|
||
|
||
Each block group can only store a specific type of extent. This might
|
||
include metadata, or mirrored metadata, or striped data blocks etc.
|
||
|
||
Currently there is only one extent allocation tree shared by all the
|
||
other trees. This will change in order to scale better under load.
|
||
|
||
Keys for the extent tree use the start of the extent as the objectid. A BLOCK_GROUP_ITEM key will be followed by the EXTENT_ITEM keys for extents within that block group.
|
||
|
||
|
||
FS Trees
|
||
--------
|
||
|
||
These store files and directories, and all of the normal metadata you
|
||
would expect to find in a filesystem. There is one root for each
|
||
subvolume or snapshot, but snapshots will share blocks between roots.
|
||
|
||
Keys in FS trees always use the inode number of the filesystem object as the objectid.
|
||
|
||
Each object will have one or more of:
|
||
|
||
- Inode.
|
||
- Inode ref, indicating what name this object is known as, and in which
|
||
directory.
|
||
- For files, a set of extent information, indicating where on the
|
||
filesystem this file's data is.
|
||
- For directories, two sequences of dir_items, one indexed by a hash of
|
||
the object name, and one indexed by a unique sequential index number.
|
||
|
||
|
||
Checksum Tree
|
||
-------------
|
||
|
||
The checksum tree stores block checksums. Every 4k block of data stored
|
||
on disk has a checksum associated with it. The "offset" part of the keys
|
||
in the checksum tree indicates the start of the checksummed data on
|
||
disk. The value stored with the key is a sequence of (currently 4-byte)
|
||
checksums, for the 4k blocks starting at the offset.
|
||
|
||
|
||
Data Relocation Tree
|
||
--------------------
|
||
|
||
|
||
Log Root Tree
|
||
-------------
|