Since btrfs only supports block size 4K and PAGE_SIZE, on x86_64 it
means we can not test subpage block size easily.
With the recent kernel change to support 2K block size for debug builds,
also add 2K block size support for btrfs-progs, so that we can do proper
subpage block size testing on x86_64, without acquiring an aarch64
machine.
There is a limitation:
- No support for 2K node size
The limitation is from the initial mkfs tree root, which can only have
a single leaf to contain all root items.
But 2K leaf cannot handle all the root items, thus we have to disable
it.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When compression is null the code always goes through the LZO case,
or prints "lzo support not compiled in".
This bug was added by commit c6d24a363d ("btrfs-progs: mkfs: add lzo
to --compress option").
Pull-request: #967
Signed-off-by: Wang Mingyu <wangmy@fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Fix a option typo in the mkfs help (`mkfs.btrfs -h`) introduced in the
most recent public release: `defalut-ro` instead of `default-ro`.
Pull-request: #958
Signed-off-by: David Sterba <dsterba@suse.com>
ASAN reports memory leak when zlib is used. The missing part is
deflateEnd() that frees structures allocated at deflateInit(). Add it to
all exit paths.
Signed-off-by: David Sterba <dsterba@suse.com>
Follow the kernel by setting the BIG_METADATA incompat flag if nodesize
is greater than the page size.
This flag was introduced with commit 727011e07cbdf8 ("Btrfs: allow
metadata blocks larger than the page size") in 2010, as kernels before
2.6.36 would crash due to a buggy page cache implementation.
The flag has no real meaning anymore but we can at least set it at mkfs
time.
Signed-off-by: Mark Harmstone <maharmstone@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The kernel adds a zeroed btrfs_dev_stats_item for each device on the
first mount. Preempt this by doing it at mkfs time.
Signed-off-by: Mark Harmstone <maharmstone@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The kernel commit 08fe4db170b419 ("Btrfs: Fix uninitialized root flags
for subvolumes") from 2011 sets the flag BTRFS_INODE_ROOT_ITEM_INIT on
root items, to work around a bug where flags and byte_limit weren't
being set.
Copy this behaviour in mkfs, to prevent the kernel from having to do it
on the first mount. We memset the btrfs_root_item, so there's no
corruption issue as there once was. We already do this in
btrfs_make_subvolume(), as otherwise the readonly flag of any subvolumes
would get reset.
Signed-off-by: Mark Harmstone <maharmstone@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Print indicators in the summary if the subvolume is read-write,
read-only or default:
$ mkfs.btrfs --subvol ro:subvolro --subvol rw:subvolrw --subvol default-ro:defaultro --rootdir /rootdir/path img
...
Rootdir from: /rootdir/path
Compress: no
Subvolume (rw): subvolrw
Subvolume (ro): subvolro
Subvolume (dro): defaultro
...
The path is relative to the rootdir path and may not be a subvolume in
the source directory so drop the rootdir as this may be confusing.
Signed-off-by: David Sterba <dsterba@suse.com>
The subvolumes created during mkfs are not printed in the summary
because btrfs_mkfs_fill_dir() deletes them from the list as they get
created.
Signed-off-by: David Sterba <dsterba@suse.com>
The preferred error message should have a prefix with problem
description and then the errno description as we use the negative errno
convention almost everywhere.
- drop additional %d in the message if %m is present
- replace %d with %m
Signed-off-by: David Sterba <dsterba@suse.com>
For parity with 'btrfs', also print the builtin features as the
compression is now available.
$ mkfs.btrfs --version
mkfs.btrfs, part of btrfs-progs v6.12
+EXPERIMENTAL -INJECT -STATIC +LZO +ZSTD +UDEV +FSVERITY +ZONED CRYPTO=builtin
Signed-off-by: David Sterba <dsterba@suse.com>
This has been deprecated since 4.0 and mkfs fails since 6.0 with that
option. No need to keep it around anymore.
Signed-off-by: David Sterba <dsterba@suse.com>
Enhance information in the help text where some interesting information
was not missing and would require looking up the documentation.
Signed-off-by: David Sterba <dsterba@suse.com>
It does not make sense to pass only the compression option when there
are no files being added by --rootdir.
Signed-off-by: David Sterba <dsterba@suse.com>
Report invalid compression specification while parsing the options. Now
an ivalid level won't be silently accepted and capped when processing
the files. Other checks regarding conditional support of LZO and ZSTD
are left in place.
Signed-off-by: David Sterba <dsterba@suse.com>
The compression support is optional, eg. also in 'btrfs-restore', so
print the support in help text.
usage: mkfs.btrfs [options] <dev> [<dev...>]
...
--compress ALGO[:LEVEL] compress files by algorithm and level, ALGO can be 'no' (default), zstd, lzo, zlib
Built-in:
- ZSTD: yes
- LZO: yes
- ZLIB: yes
...
Signed-off-by: David Sterba <dsterba@suse.com>
Improve readability and add space between sections.
usage: mkfs.btrfs [options] <dev> [<dev...>]
Create a BTRFS filesystem on a device or multiple devices
Allocation profiles:
-d|--data PROFILE data profile, raid0, raid1, raid1c3, raid1c4, raid5, raid6, raid10, dup or single
-m|--metadata PROFILE metadata profile, values like for data profile
-M|--mixed mix metadata and data together
Features:
--csum TYPE
--checksum TYPE checksum algorithm to use, crc32c (default), xxhash, sha256, blake2
-n|--nodesize SIZE size of btree nodes
-s|--sectorsize SIZE data block size (may not be mountable by current kernel)
-O|--features LIST comma separated list of filesystem features (use '-O list-all' to list features)
-L|--label LABEL set the filesystem label
-U|--uuid UUID specify the filesystem UUID (must be unique for a filesystem with multiple devices)
--device-uuid UUID Specify the filesystem device UUID (a.k.a sub-uuid) (for single device filesystem
only)
...
Signed-off-by: David Sterba <dsterba@suse.com>
Allow --compress to work with lzo.
Signed-off-by: Mark Harmstone <maharmstone@fb.com>
[ Add extra handling when LZO support is not compiled in ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
There were two major problems with add_file_items(): it was
writing all files sector-by-sector, making compression impossible, and
it was assuming that pread would never do a short read.
Fix these problems, and create a new helper add_file_item_extent().
Signed-off-by: Mark Harmstone <maharmstone@fb.com>
Just like insert_reserved_file_extent() from the kernel, we can make
btrfs_insert_file_extent() accept an on-stack file extent item
directly.
This makes btrfs_insert_file_extent() more flex, and it can now handle
the converted file extent where it has an non-zero offset.
And this makes it much easier to expand for future compressed file
extent generation.
Signed-off-by: Qu Wenruo <wqu@suse.com>
The function btrfs_record_file_extent() has extra handling that's
specific to convert, like allowing the range to be split by block group
boundary and image file extent boundary.
All of these split can only lead to corruption for non-converted fs.
As the only caller out of btrfs-convert is rootdir, which expects the
file extent item insert to respect the reserved data extent, and never
to be split.
Thankfully this is not going to cause huge problem, as
btrfs_record_file_extent() has extra checks if the data extent overlaps
with any existing one, and if it doesn't the handling will be the same
as the kernel.
But to avoid abuse, change btrfs_record_file_extent() by:
- Rename it to btrfs_convert_file_extent()
And add extra comments on that it is specific to btrfs-convert.
- Move it to convert/common.[ch]
- Introduce a helper insert_reserved_file_extent() for rootdir.c
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reworks mkfs.btrfs --subvol so that dir and full_path in struct
rootdir_subvol are stored as arrays rather than pointers.
Signed-off-by: Mark Harmstone <maharmstone@fb.com>
Change mkfs.btrfs --subvol so that instead of being of the form --subvol
DIR:FLAGS, it's instead --subvol MODIFIER:DIR, with MODIFIER being ro,
rw, default, or ro-default.
Signed-off-by: Mark Harmstone <maharmstone@fb.com>
This introduces two new cases:
- 3 hardlinks without any subvolume
This should results 3 hard links inside the btrfs.
- 3 hardlinks, but a subvolume will split 2 of them
Then the 2 inside the same subvolume should still report 2 nlinks,
but the lone one inside the new subvolume can only report 1 nlink.
Signed-off-by: Qu Wenruo <wqu@suse.com>
The new hard link detection and creation support is done by maintaining
an rb tree with the following members:
- st_ino, st_dev
This is to record the stat() report from the host fs.
With this two, we can detect if it's really a hard link (st_dev
determines one filesystem/subvolume, and st_ino determines the inode
number inside the fs).
- root
This is btrfs root pointer. This a special requirement for the recent
introduced "--subvol" option.
As we can have the following corner case:
rootdir/
|- foobar_hardlink1
|- foobar_hardlink2
|- subv/ <- To be a subvolume inside btrfs
|- foobar_hardlink3
In above case, on the host fs, `subv/` directory is just a regular
directory, but in the new btrfs it will be a subvolume.
In that case, `foobar_hardlink3` cannot be created as a hard link,
but a new inode.
- st_nlink and found_nlink
Records the original reported number of links, and the nlinks we
created inside btrfs.
This is recorded in case we created all hard links and can remove
the entry early.
- btrfs_ino
This is the inode number inside btrfs.
And since we can handle hard links safely, remove all the related
warnings, and add a new note for `--subvol` option, warning about the
case where we need to split hard links due to subvolume boundary.
Pull-request: #873
Signed-off-by: Qu Wenruo <wqu@suse.com>
Change --subvol that it can accept flags, and add a "default" flag that
allows you to mark a subvolume as the default.
Signed-off-by: Mark Harmstone <maharmstone@fb.com>
Add a new option --subvol, which tells mkfs.btrfs to create the
specified directories as subvolumes when used with --rootdir.
Given a populated directory dir, the command
$ mkfs.btrfs --rootdir dir --subvol usr --subvol home --subvol home/username img
will create subvolumes 'usr' and 'home' within the toplevel subvolume,
and subvolume 'username' within the 'home' subvolume. It will fail if
any of the directories do not yet exist.
Pull-request: #868
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Mark Harmstone <maharmstone@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is an internal report that, during btrfs-convert to block-group
tree, by accident some systemd events triggered the mount of the target
fs.
This leads to double mount (one by kernel and one by the btrfs-progs),
which seems to cause quite some problems.
To avoid such accident, exclusively opens all devices if btrfs-progs is
doing write operations.
Pull-request: #888
Reported-by: pandada8 <pandada8@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
There's a report that newly added --rootdir print too many warnings for
hardlinks, which is maybe not that uncommon. We still want to let the
user know about that so print it just once and count how many were
found:
$ mkfs.btrfs --rootdir ...
WARNING: '/tmp/btrfs-progs-mkfs-rootdir-hardlinks.7RcdfR/rootdir/inside_link' has extra hardlinks, they will be converted into new inodes
WARNING: 12 hardlinks were detected in /tmp/btrfs-progs-mkfs-rootdir-hardlinks.7RcdfR/rootdir, all converted to new inodes
Link: https://github.com/kdave/btrfs-progs/pull/872#issuecomment-2289096125
Signed-off-by: David Sterba <dsterba@suse.com>
Commit 14ac1a6051 ("btrfs-progs: mkfs: add support for squota")
mistakenly added ctree.h from libbtrfs/ but this is not supposed to be
used outside of the library. Moreover the correct ctree.h was already
there.
Signed-off-by: David Sterba <dsterba@suse.com>
The recent rework changes how we detect hard links.
[OLD BEHAVIOR]
We trusted st_nlink and st_ino, reuse them without extra sanity
checks.
That behavior has problems handling cross mount-point or hard links out
of the rootdir cases.
[NEW BEHAVIOR]
The new refactored code will treat every inode, no matter if it's a
hardlink, as a new inode.
This means we will break the hard link detection, and every hard link
will be created as a different inode.
For the most common use case, like populating a rootfs, it's totally
fine.
[EXTRA WARNING]
But for cases where the user have extra hard links inside the rootdir,
output a warning just to inform the end user.
This will not cause any content difference, just breaking the hard links
into new inodes.
Signed-off-by: Qu Wenruo <wqu@suse.com>
[PITFALLS]
There are several hidden pitfalls of the existing traverse_directory():
- Hand written preorder traversal
There is already a better written standard library function, nftw()
doing exactly what we need.
- Over-designed path list
To properly handle the directory change, we have structure
directory_name_entry, to record every inode until rootdir.
But it has two string members, dir_name and path, which is a little
confusing and overkilled.
As for preorder traversal, we will never need to read the parent's
filename, just its btrfs inode number.
And it's exported while no one utilizes it out of mkfs/rootdir.c.
- Weird inode numbers
We use the inode number from st->st_ino, with an extra offset.
This by itself is not safe, if the rootdir has child directories in
another filesystem.
And this results very weird inode numbers, e.g:
item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
item 6 key (88347519 INODE_ITEM 0) itemoff 15815 itemsize 160
item 16 key (88347520 INODE_ITEM 0) itemoff 15363 itemsize 160
item 20 key (88347521 INODE_ITEM 0) itemoff 15119 itemsize 160
item 24 key (88347522 INODE_ITEM 0) itemoff 14875 itemsize 160
item 26 key (88347523 INODE_ITEM 0) itemoff 14700 itemsize 160
item 28 key (88347524 INODE_ITEM 0) itemoff 14525 itemsize 160
item 30 key (88347557 INODE_ITEM 0) itemoff 14350 itemsize 160
item 32 key (88347566 INODE_ITEM 0) itemoff 14175 itemsize 160
Which is far from a regular fs created by copying the data.
- Weird directory inode size calculation
Unlike kernel, which updated the directory inode size every time new
child inodes are added, we calculate the directory inode size by
searching all its children first, then later new inodes linked to this
directory won't touch the inode size.
- Bad hard link detection and cross mount point handling
The hard link detection is purely based on the st_ino returned from
the host filesystem, this means we do not have extra checks whether
the inode is even inside the same fs.
And we directly reuse st_nlink from the host filesystem, if there
is a hard link out of rootdir, the st_nlink will be incorrect and
cause a corrupted fs.
Enhance all these points by:
- Use nftw() to do the preorder traversal
It also provides the extra level detection, which is pretty handy.
- Use a simple local inode_entry to record each parent
The only value is a u64 to record the inode number.
And one simple rootdir_path structure to record the list of
inode_entry, alone with the current level.
This rootdir_path structure along with two helpers,
rootdir_path_push() and rootdir_path_pop(), along with the
preorder traversal provided by nftw(), are enough for us to record
all the parent directories until the rootdir.
- Grab new inode number properly
Just call btrfs_get_free_objectid() to grab a proper inode number,
other than using some weird calculated value.
- Treat every inode as a new one
This means we will have no hard link support for now.
But I still believe it's a good trade-off, especially considering the
old handling is buggy for several corner cases.
- Use btrfs_insert_inode() and btrfs_add_link() to update directory
inode automatically
With all the refactoring, the code is shorter and easier to read.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
mkfs_main() is a main-like function, meaning that return and exit are
equivalent. Deduplicate our cleanup code by moving the error label.
Signed-off-by: Mark Harmstone <maharmstone@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently mkfs uses its own create_uuid_tree(), but that function is
only handling FS_TREE. This means for btrfs-convert we do not generate
the uuid tree, nor add the UUID of the image subvolume. This can be a
problem if we're going to support multiple subvolumes during mkfs time.
To address this, introduce a new helper, btrfs_rebuild_uuid_tree():
- Create a new uuid tree if there is not one
- Remove all the existing items from uuid tree
- Iterate through all subvolumes
* If the subvolume has no valid UUID, regenerate one
* Add the uuid entry for the subvolume UUID
* If the subvolume has received UUID, also add it to UUID tree
By this, this new helper can handle all the uuid tree generation needs for:
- Current mkfs
Only one uuid entry for FS_TREE
- Current btrfs-convert
Only FS_TREE and the image subvolume
- Future multi-subvolume mkfs
As we do the scan for all subvolumes.
- Future "btrfs rescue rebuild-uuid-tree"
Signed-off-by: Qu Wenruo <wqu@suse.com>
The modification is minimal:
- Replace WARN_ON() with UASSERT()
- Remove the @trans parameter for btrfs_extend_item() and
btrfs_mark_buffer_dirty()
As progs version doesn't need a transaction handler.
- Remove the btrfs_uuid_tree_add() in mkfs/main.c
Signed-off-by: Qu Wenruo <wqu@suse.com>
Currently we already have a kernel-shared/uuid-tree.c, which is mostly
shared with kernel.
Kernel also has a uuid-tree.h, but we are still using ctree.h for the
header.
Move all the uuid-tree related definitions to kernel-shared/uuid-tree.h,
making future code sync easier.
Signed-off-by: Qu Wenruo <wqu@suse.com>