[BUG]
Although commit b2a1be83b8 ("btrfs-progs: mkfs: keep file descriptors
open during whole time") is making sure we're only closing the writeable
fds after the fs is properly created, there is still a missing fd not
following the requirement.
And this explains the issue why sometimes after mkfs.btrfs, lsblk still
doesn't give a valid uuid.
Shown by the strace output (the command is "mkfs.btrfs -f
/dev/test/scratch1"):
openat(AT_FDCWD, "/dev/test/scratch1", O_RDWR) = 5 <<< Writeable open
fadvise64(5, 0, 0, POSIX_FADV_DONTNEED) = 0
sysinfo({uptime=2529, loads=[8704, 6272, 2496], totalram=4104548352, freeram=3376611328, sharedram=9211904, bufferram=43016192, totalswap=3221221376, freeswap=3221221376, procs=190, totalhigh=0, freehigh=0, mem_unit=1}) = 0
lseek(5, 0, SEEK_END) = 10737418240
lseek(5, 0, SEEK_SET) = 0
......
close(5) = 0 <<< Closed now
pwrite64(6, "O\250\22\261\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1163264) = 16384
pwrite64(6, "\201\316\272\342\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1179648) = 16384
pwrite64(6, "K}S\t\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1196032) = 16384
pwrite64(6, "\207j$\265\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1212416) = 16384
pwrite64(6, "q\267;\336\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 5242880) = 16384
fsync(6) <<< But we're still writing into the disk.
[CAUSE]
After more digging, it turns out we have a very obvious escape in
open_ctree_fs_info():
open_ctree_fs_info()
|- fp = open(oca->filename, flags);
|- info = __open_ctree_fd();
|- close(fp);
As later we only do IO using the device fd, this close() seems fine.
But the truth is, for mkfs usage, this fs_info is a temporary one, with
a special magic number for the disk. And since mkfs is doing writeable
operations, this close() would immediately trigger udev scan.
And since at this stage, the fs is not yet fully created, udev can race
with mkfs, and may get the invalid temporary superblock.
[FIX]
Introduce a new btrfs_fs_info member, initial_fd, for
open_ctree_fs_info() to record the fd.
And on close_ctree(), if we find fs_info::initial_fd is a valid fd, then
close it.
By this, we make sure all writeable fds are only closed after we have
written valid super blocks into the disk.
Issue: #734
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch introduces a new parser helper, parse_u64_with_suffix(),
which has a better error handling, following all the parse_*()
helpers to return non-zero value for errors.
This new helper is going to replace parse_size_from_string(), which
would directly call exit(1) to stop the whole program.
Furthermore most callers of parse_size_from_string() are expecting
exit(1) for error, so that they can skip the error handling.
For those call sites, introduce a wrapper, arg_strtou64_with_suffix(),
to do that. The only disadvantage is a little less detailed error
report for why the parse failed, but for most cases the generic error
string should be enough.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Unlike kernel where tree-checker would provide enough info so later we
can use "btrfs inspect dump-tree" to catch the offending tree block, in
progs we may not even have a btrfs to start "btrfs inspect dump-tree".
E.g during btrfs-convert.
To make later debuging easier, let's call btrfs_print_tree() for every
error we hit inside tree-checker.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Without the change `BTRFS_IOC_SCAN_DEV` aliased with `BTRFS_IOC_FORGET_DEV`.
It's a regression introduced in fcd9142b6 "btrfs-progs: docs: formatting,
fixups, updates".
It manifests as a sudden device disappearance when device is scanned:
machine # [ 4.095032] Btrfs loaded, crc32c=crc32c-intel, zoned=no, fsverity=no
machine # ERROR: device scan failed on '/dev/vdb': No such file or directory
machine # ERROR: device scan failed on '/dev/vdc': No such file or directory
(finished: must succeed: mkfs.btrfs -d raid0 /dev/vdb /dev/vdc, in 10.31 seconds)
Issue: #704
Pull-request: #706
Reported-by: Atemu <atemu.main@gmail.com>
Bug: https://github.com/NixOS/nixpkgs/issues/265668
Author: Sergei Trofimovich <slyich@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
- update Status page
- new features in 6.7
- more ioctls
- CSS fix to wrap long lines in tables
[ci skip]
Signed-off-by: David Sterba <dsterba@suse.com>
In experimental build, read global '--param zone-size=SIZE' and use it
as emulated zone size. This is for testing only, will be promoted to a
proper option in the future.
Signed-off-by: David Sterba <dsterba@suse.com>
Commit 6cf11f3e38 ("btrfs-progs: check: check order of inline extent
refs") fixes a problem that btrfs check never properly verify the
sequence of inline references.
It's not obvious because by default kernel handles EXTENT_DATA_REF_KEY
using its own hash, resulting some seemingly out-of-order result:
item 0 key (13631488 EXTENT_ITEM 4096) itemoff 16143 itemsize 140
refs 4 gen 7 flags DATA
extent data backref root FS_TREE objectid 258 offset 0 count 1
extent data backref root FS_TREE objectid 257 offset 0 count 1
extent data backref root FS_TREE objectid 260 offset 0 count 1
extent data backref root FS_TREE objectid 259 offset 0 count 1
By a quick glance, no one can see the above inline backref items are in
any order.
To make such sequence more obvious, let dump-tree to output a new prefix
to indicate the type and the internal sequence number:
For above case, the new output would look like this:
item 0 key (13631488 EXTENT_ITEM 4096) itemoff 16143 itemsize 140
refs 4 gen 7 flags DATA
(178 0xdfb591fbbf5f519) extent data backref root FS_TREE objectid 258 offset 0 count 1
(178 0xdfb591fa80d95ea) extent data backref root FS_TREE objectid 257 offset 0 count 1
(178 0xdfb591f9c0534ff) extent data backref root FS_TREE objectid 260 offset 0 count 1
(178 0xdfb591f49f9f8e7) extent data backref root FS_TREE objectid 259 offset 0 count 1
Although still not that obvious, it should show the inline data backrefs
has descending sequence number.
For the type part, it's anti-instinctive in ascending order, which is
not that easy to produce.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently, write_dev_supers() compares the superblock location vs the size
of the device to check if it can write the superblock. This is not correct
for a zoned device, whose superblock location is different than a regular
device.
Introduce check_sb_location() to check if the superblock zone exists for
the zoned case.
Running btrfs check can fail on a certain zoned device setup (e.g,
zone size = 128MB, device size = 16GB).
From generic/330:
yes | btrfs check --repair --force /dev/nullb1
[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
ERROR: zoned: failed to read zone info of 4096 and 4097: Invalid argument
ERROR: failed to write super block for devid 1: write error: Input/output error
failed to write new super block err -5
failed to repair damaged filesystem, aborting
This happens because write_dev_supers() is comparing the original
superblock location vs the device size to check if it can write out a
superblock copy or not.
For the above example, since the first copy location (64MB) < device size
(16GB), it tries to write out the copy. But, the copy must be written into
zone 4096 (512G / zone size (128M) = 4096), which is out of the device.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce sb_bytenr_to_sb_zone(), which converts the original superblock
location to the zone number of superblock log writing.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The kernel patches for RST and squota are queued for 6.7, we need to be
able to test the features so it's not necessary to hide the mkfs support
under experimental build. The kernel may still need debug build to
enable mount.
Signed-off-by: David Sterba <dsterba@suse.com>
Unlike kernel, in btrfs-progs btrfs_start_transaction() never checks if
there is enough metadata space.
This can lead to very dangerous situation where there is no metadata
space left at all, deadlocking future tree operations.
This patch introduces a very basic version of metadata/system free space
check by:
- Check if there is enough metadata/system space left
If there is enough, go as usual.
- If there is not enough space left, try allocating a new chunk
- Recheck if the new space can meet our demand
If not, return ERR_PTR(-ENOSPC).
Otherwise, allocate a new trans handle to the caller.
This is possible thanks to the simplified transaction model in
btrfs-progs:
- We don't allow joining a transaction
This means we don't need to handle complex cases like data ordered
extents, which need to reserve space first, then join the current
transaction and use the reserved blocks.
- We don't allow multiple transaction handles for one transaction
Since btrfs-progs is single threaded, we always start a transaction
and then commit it.
However there is a feature that must be an exception for the new
metadata/system free space check:
- btrfs check --init-extent-tree
As all the meta/system free space check is based on the space info,
which is loaded from block group items.
Thus when rebuilding extent tree, we can no longer have an accurate
view, thus we have to disable the feature for the whole execution if
we're rebuilding the extent tree.
For now, there is no regression exposed during the self tests, but I
really hope this can be an extra safety net to prevent causing ENOSPC
deadlock in btrfs-progs.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are quite some variable shadowing in btrfs-progs, most of them are
just reusing some common names like tmp.
And those are quite safe and the shadowed one are even different type.
But there are some exceptions:
- @end in traverse_tree_blocks()
There is already an @end with the same type, but a different meaning
(the end of the current extent buffer passed in).
Just rename it to @child_end.
- @start in generate_new_data_csums_range()
Just rename it to @csum_start.
- @size of fixup_chunk_tree_block()
This one is particularly bad, we declare a local @size and initialize
it to -1, then before we really utilize the variable @size, we
immediately reset it to 0, then pass it to logical_to_physical().
Then there is a location to check if @size is -1, which will always be
true.
According to the code in logical_to_physical(), @size would be clamped
down by its original value, thus our local @size will always be 0.
This patch would rename the local @size to @found_size, and only set
it to -1.
The call site is only to pass something as logical_to_physical()
requires a non-NULL pointer.
We don't really need to bother the returned value.
- duplicated @ref declaration in run_delayed_tree_ref()
- duplicated @super_flags in change_meta_csums()
Just delete the duplicated one.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The current implementation would introduce variable shadowing due to
both max() and min() are using the same __x and __y.
This may not be a big deal, but since kernel is already handling it
properly using __UNIQUE_ID() macro, and has more checks, we can
cross-port the kernel version to btrfs-progs.
There are some dependency needed, they are all small enough thus can be
put into the helper.
- __PASTE()
- __UNIQUE_ID()
- BUILD_BUG_ON_ZERO()
- __is_constexpr()
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The stride length has been removed from kernel code, remove it here as
well.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The length has been removed from kernel, remove it here as well.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When adding the extent buffer leak detection I started getting failures
on some of the fuzz tests. This is because we don't clean up dirty
buffers for aborted transactions, we just leave them dirty and thus we
leak them. Fix this up by making btrfs_commit_transaction() on an
aborted transaction properly cleanup the dirty buffers that exist in the
system.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Update parts of struct btrfs_delayed_ref_head and updated where used,
add more prototypes. More still needs to be synced.
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel new_key is const, update the definition in btrfs-progs to
match the in-kernel definition.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is the opposite of what we do in the kernel, however in the kernel
we put the helpers in dir-item.h and inode-item.h respectively. Those
do not exist in btrfs-progs right now, so instead of doing all that work
right now simply inline them in ctree.h to make it easier to sync
ctree.c from the kernel.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel we've added a control struct to handle the different
checks we want to do on extent buffers when we read them. Update our
copy of read_tree_block to take this as an argument, then update all of
the callers to use the new structure.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel we have a control structure call btrfs_tree_parent_check
to pass around the various sanity checks we have for extent buffers.
Add this to btrfs_tree_parent_check and then update the callers.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel we have const struct extent_buffer instead of struct
extent_buffer, update this to make it more straightforward to sync
ctree.c.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This has const struct btrfs_key instead of just struct btrfs_key, update
this to make it more straightforward to sync ctree.c.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The kernel version of btrfs_del_ptr takes a trans handle as an argument
and returns an error in the case of tree-mod-log, update our version to
match to make syncing ctree.c more straightforward.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel the key for btrfs_insert_empty_item is a const, update the
helper to match the in-kernel version.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel we have a control struct called btrfs_item_batch that
encodes all of the information for bulk inserting a bunch of items.
Update btrfs_insert_empty_times to match the in-kernel implementation to
make sync'ing ctree.c more straightforward.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_cow_block takes the lockdep nesting enum in the kernel. Update
the definition to match the kernel version to make syncing ctree.c into
btrfs-progs more straightforward.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is used in ctree.c around getting the old root, add this to our
btrfs_fs_info to make it more straightforward to sync ctree.c into
btrfs-progs.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This exists in the kernel, and is touched by ctree.c, add it to the
btrfs_fs_info to make syncing ctree.c more straightforward.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is how the kernel initializes blocks, so anybody who uses
btrfs_alloc_tree_block in the kernel expects the blocks to be already
initialized. Put this init code into btrfs-progs so as we sync code
from the kernel we get the correct behavior.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This flag is used by the kernel btrfs_search_slot to make sure that leaf
splitting decision doesn't subtract the size of an item. This is for
inline extent items and csum items where we know we're going to find the
item we want, and we're only going to want to extend it. Currently this
flag doesn't do anything, but when we sync ctree.c we'll stop making the
right decision WRT the leaf space, so add the flag usage in the places
we need it so we can sync ctree.c easily.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel we pass in the parent to btrfs_alloc_tree_block instead of
the blocksize and simply derive the blocksize from the fs_info. Update
the function to match the kernel's convention and update all of the
callers so we can sync ctree.c easily.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This always returns 0, and in the kernel is a void. Update the
definition to match the kernel and then update all of the callers.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function is only used in mkfs, and doesn't exist in the kernel in
ctree.c. Additionally we have a uuid lookup function to see if the uuid
exists in the tree, which for mkfs it won't because we just created the
tree. Move btrfs_uuid_tree_add into mkfs, and remove the lookup
function as it's not needed.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We use this in print-tree to do BFS tree printing, but there are no
other users and it doesn't exist upstream. Copy the current code and
clean it up so it can exist in print-tree.c and use the local copy
there. This will allow us to remove the function call when ctree.c is
synced.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function and it's related functions only exist for the utilities
that populate existing file systems, and do not exist in the upstream
kernel. Move this function and the related function into it's own
common source file and out of the kernel-shared sources, and then update
all of the users to include the new location of this code.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This helper exists for check and for btrfs-corrupt-block. Move the
helper and the btrfs_fixup_low_keys helper into check/repair.[ch] so we
can keep the kernel-shared sources close to the upstream kernel.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This simply zero's out the path, and this is used everywhere we use a
stack path. Drop this usage and simply init the path's to empty instead
of using a function to do the memset.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We use this in ctree.c in the kernel, so sync this helper into
btrfs-progs to make sync'ing ctree.c easier.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Similar to btrfs_truncate_item(), this is void in the kernel as the
failure case is simply BUG_ON(). Additionally there is no root
parameter as it's not used in the function at all. Make these changes
and update the callers.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is void in the kernel, and this makes sense in btrfs-progs as it
stands currently as it doesn't actually return an error if there's a
problem, it simply BUG()'s. Update this to be a void and update the
callers to make it easier to sync ctree.c.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel we have btrfs_print_leaf(eb) instead of
btrfs_print_leaf(eb, mode). In fact in all of the kernel-shared sources
we're just using the default mode. Fix this to have a
__btrfs_print_leaf() which handles the mode for the user space utilities
that want the different behavior, and then change btrfs_print_leaf() to
just be the normal default style.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel we just pass the btrfs_fs_info, and we const'ify the
new_key. Update the btrfs-progs definition to make syncing ctree.c
easier.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This was updated to include a first_slot argument, update it to match
the kernel definition to make it easier to sync ctree.c.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel this is called btrfs_read_node_slot, and it doesn't take a
btrfs_fs_info. Update the btrfs-progs version to match the kernel and
update all of the callers.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is the calling convention in the kernel because we track dirty
blocks per transaction instead of globally in the fs_info. Simply
mirror what we do in the kernel to make it easier to sync ctree.c
locally.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>