Since btrfs only supports block size 4K and PAGE_SIZE, on x86_64 it
means we can not test subpage block size easily.
With the recent kernel change to support 2K block size for debug builds,
also add 2K block size support for btrfs-progs, so that we can do proper
subpage block size testing on x86_64, without acquiring an aarch64
machine.
There is a limitation:
- No support for 2K node size
The limitation is from the initial mkfs tree root, which can only have
a single leaf to contain all root items.
But 2K leaf cannot handle all the root items, thus we have to disable
it.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce "zone_is_active" member to struct btrfs_block_group and activate it
on loading a block group.
Note that activeness check for the extent allocation is currently not
implemented. The activeness checking requires to activate a non-active block
group on the extent allocation, which also require finishing a zone in the case
of hitting the active zone limit. Since mkfs should not hit the limit,
implementing the zone finishing code would not be necessary at the moment.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Properly load the zone activeness on the userland tool. Also, check if a device
has enough active zone limit to run btrfs.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The userland tools did not load and use the zone capacity. Support it properly.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The kernel adds a zeroed btrfs_dev_stats_item for each device on the
first mount. Preempt this by doing it at mkfs time.
Signed-off-by: Mark Harmstone <maharmstone@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently we already have a kernel-shared/uuid-tree.c, which is mostly
shared with kernel.
Kernel also has a uuid-tree.h, but we are still using ctree.h for the
header.
Move all the uuid-tree related definitions to kernel-shared/uuid-tree.h,
making future code sync easier.
Signed-off-by: Qu Wenruo <wqu@suse.com>
The function btrfs_mksubvol() is very different between btrfs-progs and
kernel, the former version is really just linking a subvolume to another
directory inode, but the kernel version is really to make a completely
new subvolume.
Instead of same-named function, introduce btrfs_link_subvolume() and use
it to replace the old btrfs_mksubvol().
This is done by:
- Introduce btrfs_link_subvolume()
Which does extra checks before doing any modification:
* Make sure the target inode is a directory
* Make sure no filename conflict
Then do the linkage:
* Add the dir_item/dir_index into the parent inode
* Add the forward and backward root refs into tree root
- Introduce link_image_subvolume() helper
Currently btrfs_mksubvol() has a dedicated convert filename retry
behavior, which is unnecessary and should be done by the convert code.
Now move the filename retry behavior into the helper.
- Remove btrfs_mksubvol()
Since there is only one caller utilizing btrfs_mksubvol(), and it's
now gone, we can remove the old btrfs_mksubvol().
Signed-off-by: Qu Wenruo <wqu@suse.com>
[BUG]
There is a bug report that a canceled checksum conversion (still
experimental feature) resulted in unexpected super flags:
csum_type 0 (crc32c)
csum_size 4
csum 0x14973811 [match]
bytenr 65536
flags 0x1000000001
( WRITTEN |
CHANGING_FSID_V2 )
magic _BHRfS_M [match]
While for a filesystem under checksum conversion it should have either
CHANGING_DATA_CSUM or CHANGING_META_CSUM.
[CAUSE]
It turns out that, due to btrfs-progs keeps its own extra flags inside
its own ctree.h headers, not the shared uapi headers, we have
conflicting super flags:
kernel-shared/uapi/btrfs_tree.h:#define BTRFS_SUPER_FLAG_METADUMP_V2 (1ULL << 34)
kernel-shared/uapi/btrfs_tree.h:#define BTRFS_SUPER_FLAG_CHANGING_FSID (1ULL << 35)
kernel-shared/uapi/btrfs_tree.h:#define BTRFS_SUPER_FLAG_CHANGING_FSID_V2 (1ULL << 36)
kernel-shared/ctree.h:#define BTRFS_SUPER_FLAG_CHANGING_DATA_CSUM (1ULL << 36)
kernel-shared/ctree.h:#define BTRFS_SUPER_FLAG_CHANGING_META_CSUM (1ULL << 37)
Note that CHANGING_FSID_V2 is conflicting with CHANGING_DATA_CSUM.
[FIX]
Cross port the proper updated uapi headers into btrfs-progs, and remove
the definition from ctree.h.
This would change the value for CHANGING_DATA_CSUM and
CHANGING_META_CSUM, but considering they are experimental features, and
kernel would reject them anyway, the damage is not that huge and we can
accept such change before exposing it to end users.
Pull-request: #810
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Sync a few more file on the source level with kernel 6.8.
- type cleanups
- defines and enums
- comments
- parameter updates
- error handling
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Although commit b2a1be83b8 ("btrfs-progs: mkfs: keep file descriptors
open during whole time") is making sure we're only closing the writeable
fds after the fs is properly created, there is still a missing fd not
following the requirement.
And this explains the issue why sometimes after mkfs.btrfs, lsblk still
doesn't give a valid uuid.
Shown by the strace output (the command is "mkfs.btrfs -f
/dev/test/scratch1"):
openat(AT_FDCWD, "/dev/test/scratch1", O_RDWR) = 5 <<< Writeable open
fadvise64(5, 0, 0, POSIX_FADV_DONTNEED) = 0
sysinfo({uptime=2529, loads=[8704, 6272, 2496], totalram=4104548352, freeram=3376611328, sharedram=9211904, bufferram=43016192, totalswap=3221221376, freeswap=3221221376, procs=190, totalhigh=0, freehigh=0, mem_unit=1}) = 0
lseek(5, 0, SEEK_END) = 10737418240
lseek(5, 0, SEEK_SET) = 0
......
close(5) = 0 <<< Closed now
pwrite64(6, "O\250\22\261\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1163264) = 16384
pwrite64(6, "\201\316\272\342\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1179648) = 16384
pwrite64(6, "K}S\t\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1196032) = 16384
pwrite64(6, "\207j$\265\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1212416) = 16384
pwrite64(6, "q\267;\336\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 5242880) = 16384
fsync(6) <<< But we're still writing into the disk.
[CAUSE]
After more digging, it turns out we have a very obvious escape in
open_ctree_fs_info():
open_ctree_fs_info()
|- fp = open(oca->filename, flags);
|- info = __open_ctree_fd();
|- close(fp);
As later we only do IO using the device fd, this close() seems fine.
But the truth is, for mkfs usage, this fs_info is a temporary one, with
a special magic number for the disk. And since mkfs is doing writeable
operations, this close() would immediately trigger udev scan.
And since at this stage, the fs is not yet fully created, udev can race
with mkfs, and may get the invalid temporary superblock.
[FIX]
Introduce a new btrfs_fs_info member, initial_fd, for
open_ctree_fs_info() to record the fd.
And on close_ctree(), if we find fs_info::initial_fd is a valid fd, then
close it.
By this, we make sure all writeable fds are only closed after we have
written valid super blocks into the disk.
Issue: #734
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The kernel patches for RST and squota are queued for 6.7, we need to be
able to test the features so it's not necessary to hide the mkfs support
under experimental build. The kernel may still need debug build to
enable mount.
Signed-off-by: David Sterba <dsterba@suse.com>
Unlike kernel, in btrfs-progs btrfs_start_transaction() never checks if
there is enough metadata space.
This can lead to very dangerous situation where there is no metadata
space left at all, deadlocking future tree operations.
This patch introduces a very basic version of metadata/system free space
check by:
- Check if there is enough metadata/system space left
If there is enough, go as usual.
- If there is not enough space left, try allocating a new chunk
- Recheck if the new space can meet our demand
If not, return ERR_PTR(-ENOSPC).
Otherwise, allocate a new trans handle to the caller.
This is possible thanks to the simplified transaction model in
btrfs-progs:
- We don't allow joining a transaction
This means we don't need to handle complex cases like data ordered
extents, which need to reserve space first, then join the current
transaction and use the reserved blocks.
- We don't allow multiple transaction handles for one transaction
Since btrfs-progs is single threaded, we always start a transaction
and then commit it.
However there is a feature that must be an exception for the new
metadata/system free space check:
- btrfs check --init-extent-tree
As all the meta/system free space check is based on the space info,
which is loaded from block group items.
Thus when rebuilding extent tree, we can no longer have an accurate
view, thus we have to disable the feature for the whole execution if
we're rebuilding the extent tree.
For now, there is no regression exposed during the self tests, but I
really hope this can be an extra safety net to prevent causing ENOSPC
deadlock in btrfs-progs.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel new_key is const, update the definition in btrfs-progs to
match the in-kernel definition.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is the opposite of what we do in the kernel, however in the kernel
we put the helpers in dir-item.h and inode-item.h respectively. Those
do not exist in btrfs-progs right now, so instead of doing all that work
right now simply inline them in ctree.h to make it easier to sync
ctree.c from the kernel.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel we have const struct extent_buffer instead of struct
extent_buffer, update this to make it more straightforward to sync
ctree.c.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This has const struct btrfs_key instead of just struct btrfs_key, update
this to make it more straightforward to sync ctree.c.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The kernel version of btrfs_del_ptr takes a trans handle as an argument
and returns an error in the case of tree-mod-log, update our version to
match to make syncing ctree.c more straightforward.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel the key for btrfs_insert_empty_item is a const, update the
helper to match the in-kernel version.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel we have a control struct called btrfs_item_batch that
encodes all of the information for bulk inserting a bunch of items.
Update btrfs_insert_empty_times to match the in-kernel implementation to
make sync'ing ctree.c more straightforward.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_cow_block takes the lockdep nesting enum in the kernel. Update
the definition to match the kernel version to make syncing ctree.c into
btrfs-progs more straightforward.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is used in ctree.c around getting the old root, add this to our
btrfs_fs_info to make it more straightforward to sync ctree.c into
btrfs-progs.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This exists in the kernel, and is touched by ctree.c, add it to the
btrfs_fs_info to make syncing ctree.c more straightforward.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel we pass in the parent to btrfs_alloc_tree_block instead of
the blocksize and simply derive the blocksize from the fs_info. Update
the function to match the kernel's convention and update all of the
callers so we can sync ctree.c easily.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This always returns 0, and in the kernel is a void. Update the
definition to match the kernel and then update all of the callers.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function is only used in mkfs, and doesn't exist in the kernel in
ctree.c. Additionally we have a uuid lookup function to see if the uuid
exists in the tree, which for mkfs it won't because we just created the
tree. Move btrfs_uuid_tree_add into mkfs, and remove the lookup
function as it's not needed.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function and it's related functions only exist for the utilities
that populate existing file systems, and do not exist in the upstream
kernel. Move this function and the related function into it's own
common source file and out of the kernel-shared sources, and then update
all of the users to include the new location of this code.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This helper exists for check and for btrfs-corrupt-block. Move the
helper and the btrfs_fixup_low_keys helper into check/repair.[ch] so we
can keep the kernel-shared sources close to the upstream kernel.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This simply zero's out the path, and this is used everywhere we use a
stack path. Drop this usage and simply init the path's to empty instead
of using a function to do the memset.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Similar to btrfs_truncate_item(), this is void in the kernel as the
failure case is simply BUG_ON(). Additionally there is no root
parameter as it's not used in the function at all. Make these changes
and update the callers.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is void in the kernel, and this makes sense in btrfs-progs as it
stands currently as it doesn't actually return an error if there's a
problem, it simply BUG()'s. Update this to be a void and update the
callers to make it easier to sync ctree.c.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel we just pass the btrfs_fs_info, and we const'ify the
new_key. Update the btrfs-progs definition to make syncing ctree.c
easier.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This was updated to include a first_slot argument, update it to match
the kernel definition to make it easier to sync ctree.c.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel this is called btrfs_read_node_slot, and it doesn't take a
btrfs_fs_info. Update the btrfs-progs version to match the kernel and
update all of the callers.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Copy over structs, accessors, and constants for simple quotas
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Allow for RAID levels 0, 1 and 10 on zoned devices if the RAID stripe tree
is used.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add support for the RAID stripe tree to btrfs inspect-internal dump-tree.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When encountering a filesystem formatted with the raid stripe tree
feature, read it from disk.
Also add the incompat declaration to the tree printer.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The existing attempt for changing csum types is as the following:
- Create a new temporary csum root
- Generate new data csums into the temporary csum root
- Drop the old csum tree and make the temporary one as csum root
- Change the checksums for metadata in-place
Unfortunately after some experiments, the csum root switch method has a
big pitfall, the backref items in extent tree.
Those backref items still point back to the old tree, meaning without a
lot of extra tricks, the extent tree would be corrupted.
Thus we have to go a new single tree variant:
- Generate new data csums into the csum root
The new data csums would have a different objectid to distinguish
them.
- Drop the old data csum items
- Change the key objectids of the new csums
- Change the checksums for metadata in-place
This means unfortunately we have to revert most of the old code, and
update the temporary item format.
The new temporary item would only record the target csum type.
At every stage we have a method to determine the progress, thus no need
for an item, but in the future it's still open for change.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This syncs tree-checker.c from the kernel. The main modification was to
add a open ctree flag to skip the deeper leaf checks, and plumbing this
through tree-checker.c. We need this for things like fsck or
btrfs-image that need to work with slightly corrupted file systems, and
these checks simply make us unable to look at the corrupted blocks.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
These helpers are called __btrfs_check_* in the kernel as they return
the special enum to indicate what part of the leaf/node failed. Rename
the uses in btrfs-progs to match the kernel naming convention to make it
easier to sync that code.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When we sync ctree.c into btrfs-progs we're going to need to have a
bunch of flags and definitions that exist in btrfs_path in the kernel
that do not exist in btrfs_progs. Sync these changes into btrfs-progs
to enable us to sync ctree.c into btrfs-progs.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We were using this in cmds/restore.c, however it only does anything if
path->reada is set, and we don't set that in cmds/restore.c. Remove
this usage of reada_for_search and make the function static.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is used to protect the used count for btrfs_root in the kernel,
sync it to btrfs-progs to allow us to sync ctree.c into btrfs-progs.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is sprinkled throughout the kernel code for the in-kernel self
tests. Add the helper to btrfs-progs to make it easier to sync the
kernel code.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>