mirror of
https://github.com/kdave/btrfs-progs
synced 2025-02-11 23:27:04 +00:00
2d039cc815
RST format provides cross reference function that users can navigate manual pages click. This patch is written by macro that replaces old references to doc role in RST format. Issue: #495 Signed-off-by: Sidong Yang <realwakka@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
169 lines
8.1 KiB
ReStructuredText
169 lines
8.1 KiB
ReStructuredText
A BTRFS subvolume is a part of filesystem with its own independent
|
|
file/directory hierarchy and inode number namespace. Subvolumes can share file
|
|
extents. A snapshot is also subvolume, but with a given initial content of the
|
|
original subvolume. A subvolume has always inode number 256.
|
|
|
|
.. note::
|
|
A subvolume in BTRFS is not like an LVM logical volume, which is block-level
|
|
snapshot while BTRFS subvolumes are file extent-based.
|
|
|
|
A subvolume looks like a normal directory, with some additional operations
|
|
described below. Subvolumes can be renamed or moved, nesting subvolumes is not
|
|
restricted but has some implications regarding snapshotting. The numeric id
|
|
(called *subvolid* or *rootid*) of the subvolume is persistent and cannot be
|
|
changed.
|
|
|
|
A subvolume in BTRFS can be accessed in two ways:
|
|
|
|
* like any other directory that is accessible to the user
|
|
* like a separately mounted filesystem (options *subvol* or *subvolid*)
|
|
|
|
In the latter case the parent directory is not visible and accessible. This is
|
|
similar to a bind mount, and in fact the subvolume mount does exactly that.
|
|
|
|
A freshly created filesystem is also a subvolume, called *top-level*,
|
|
internally has an id 5. This subvolume cannot be removed or replaced by another
|
|
subvolume. This is also the subvolume that will be mounted by default, unless
|
|
the default subvolume has been changed (see ``btrfs subvolume set-default``).
|
|
|
|
A snapshot is a subvolume like any other, with given initial content. By
|
|
default, snapshots are created read-write. File modifications in a snapshot
|
|
do not affect the files in the original subvolume.
|
|
|
|
Subvolumes can be given capacity limits, through the qgroups/quota facility, but
|
|
otherwise share the single storage pool of the whole btrfs filesystem. They may
|
|
even share data between themselves (through deduplication or snapshotting).
|
|
|
|
.. note::
|
|
A snapshot is not a backup: snapshots work by use of BTRFS' copy-on-write
|
|
behaviour. A snapshot and the original it was taken from initially share all
|
|
of the same data blocks. If that data is damaged in some way (cosmic rays,
|
|
bad disk sector, accident with dd to the disk), then the snapshot and the
|
|
original will both be damaged. Snapshots are useful to have local online
|
|
"copies" of the filesystem that can be referred back to, or to implement a
|
|
form of deduplication, or to fix the state of a filesystem for making a full
|
|
backup without anything changing underneath it. They do not in themselves
|
|
make your data any safer.
|
|
|
|
Subvolume flags
|
|
---------------
|
|
|
|
The subvolume flag currently implemented is the *ro* property (read-only
|
|
status). Read-write subvolumes have that set to *false*, snapshots as *true*.
|
|
In addition to that, a plain snapshot will also have last change generation and
|
|
creation generation equal.
|
|
|
|
Read-only snapshots are building blocks of incremental send (see
|
|
:doc:`btrfs-send(8)<btrfs-send>`) and the whole use case relies on unmodified snapshots where
|
|
the relative changes are generated from. Thus, changing the subvolume flags
|
|
from read-only to read-write will break the assumptions and may lead to
|
|
unexpected changes in the resulting incremental stream.
|
|
|
|
A snapshot that was created by send/receive will be read-only, with different
|
|
last change generation, read-only and with set *received_uuid* which identifies
|
|
the subvolume on the filesystem that produced the stream. The use case relies
|
|
on matching data on both sides. Changing the subvolume to read-write after it
|
|
has been received requires to reset the *received_uuid*. As this is a notable
|
|
change and could potentially break the incremental send use case, performing
|
|
it by **btrfs property set** requires force if that is really desired by user.
|
|
|
|
.. note::
|
|
The safety checks have been implemented in 5.14.2, any subvolumes previously
|
|
received (with a valid *received_uuid*) and read-write status may exist and
|
|
could still lead to problems with send/receive. You can use **btrfs subvolume
|
|
show** to identify them. Flipping the flags to read-only and back to
|
|
read-write will reset the *received_uuid* manually. There may exist a
|
|
convenience tool in the future.
|
|
|
|
Nested subvolumes
|
|
-----------------
|
|
|
|
There are no restrictions for subvolume creation, so it's up to the user how to
|
|
organize them, whether to have a flat layout (all subvolumes are direct
|
|
descendants of the toplevel one), or nested.
|
|
|
|
What should be mentioned early is that a snapshotting is not recursive, so a
|
|
subvolume or a snapshot is effectively a barrier and no files in the nested
|
|
appear in the snapshot. Instead there's a stub subvolume (also sometimes
|
|
**empty subvolume** with the same name as original subvolume, with inode number
|
|
2). This can be used intentionally but could be confusing in case of nested
|
|
layouts.
|
|
|
|
Case study: system root layouts
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
There are two ways how the system root directory and subvolume layout could be
|
|
organized. The interesting usecase for root is to allow rollbacks to previous
|
|
version, as one atomic step. If the entire filesystem hierarchy starting in "/"
|
|
is in one subvolume, taking snapshot will encompass all files. This is easy for
|
|
the snapshotting part but has undesirable consequences for rollback. For example,
|
|
log files would get rolled back too, or any data that are stored on the root
|
|
filesystem but are not meant to be rolled back either (database files, VM
|
|
images, ...).
|
|
|
|
Here we could utilize the snapshotting barrier mentioned above, each directory
|
|
that stores data to be preserved accross rollbacks is it's own subvolume. This
|
|
could be eg. ``/var``. Further more-fine grained partitioning could be done, eg.
|
|
adding separate subvolumes for ``/var/log``, ``/var/cache`` etc.
|
|
|
|
That there are separate subvolumes requrires separate actions to take the
|
|
snapshots (here it gets disconnected from the system root snapshots). This needs
|
|
to be taken care of by system tools, installers together with selection of which
|
|
directories are highly recommended to be separate subvolumes.
|
|
|
|
Mount options
|
|
-------------
|
|
|
|
Mount options are of two kinds, generic (that are handled by VFS layer) and
|
|
specific, handled by the filesystem. The following list shows which are
|
|
applicable to individual subvolume mounts, while there are more options that
|
|
always affect the whole filesystem:
|
|
|
|
- generic: noatime/relatime/..., nodev, nosuid, ro, rw, dirsync
|
|
- fs-specific: compress, autodefrag, nodatacow, nodatasum
|
|
|
|
An example of whole filesystem options is eg. *space_cache*, *rescue*, *device*,
|
|
*skip_balance*, etc. The exceptional options are *subvol* and *subvolid* that
|
|
are actually used for mounting a given subvolume and can be specified only once
|
|
for the mount.
|
|
|
|
Subvolumes belong to a single filesystem and as implemented now all share the
|
|
same specific mount options, changes done by remount have immediate effect. This
|
|
may change in the future.
|
|
|
|
Mounting a read-write snapshot as read-only is possible and will not change the
|
|
*ro* property and flag of the subvolume.
|
|
|
|
The name of the mounted subvolume is stored in file ``/proc/self/mounts`` in the
|
|
4th column:
|
|
|
|
.. code-block::
|
|
|
|
27 21 0:19 /subv1 /mnt rw,relatime - btrfs /dev/sda rw,space_cache
|
|
^^^^^^
|
|
|
|
Inode numbers
|
|
-------------
|
|
|
|
A proper subvolume has always inode number 256. If a subvolume is nested and
|
|
then a snapshot is taken, then the cloned directory entry representing the
|
|
subvolume becomes empty and the inode has number 2. All other files and
|
|
directories in the target snapshot preserve their original inode numbers.
|
|
|
|
.. note::
|
|
Inode number is not a filesystem-wide unique identifier, some applications
|
|
assume that. Please user pair *subvolumeid:inodenumber* for that purpose.
|
|
|
|
Performance
|
|
-----------
|
|
|
|
Subvolume creation needs to flush dirty data that belong to the subvolume, this
|
|
step may take some time, otherwise once there's nothing else to do, the snapshot
|
|
is instant and in the metadata it only creates a new tree root copy.
|
|
|
|
Snapshot deletion has two phases: first its directory is deleted and the
|
|
subvolume is added to a list, then the list is processed one by one and the
|
|
data related to the subvolume get deleted. This is usually called *cleaning* and
|
|
can take some time depending on the amount of shared blocks (can be a lot of
|
|
metadata updates), and the number of currently queued deleted subvolumes.
|