Closing the fs will try to commit a pending transaction, but may fail to
do so if the filesystem state is not well defined. This will eg. fail
for some fuzz tests. The data structures are freed but no furhter
attempt to commit is made.
Signed-off-by: David Sterba <dsterba@suse.com>
Currently transaction bugs out insided btrfs_start_transaction in case
of error, we want to lift the error handling to the callers. This patch
adds the BUG_ON anywhere it's been missing so far. This is not the best
way of course. Transforming BUG_ON to a proper error handling highly
depends on the caller and should be dealt with case by case.
Signed-off-by: David Sterba <dsterba@suse.com>
The call to btrfs_read_block_groups could loop if the metadata are
damaged (reported eg. for an unaligned block), due to lack of error
handling. We have to check for restored images or currently created
filesystems, that do not contain the blockgroups.
Can be reproduced by fuzzed image bko-155551.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=155551
Reported-by: Lukas Lueg <lukas.lueg@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Tree blocks are always nodesize. As readahead is only an optimization,
exact size is not required and is only advisory.
Signed-off-by: David Sterba <dsterba@suse.com>
Metadata blocks are always nodesize. When reading the
superblock::sys_array, the actual size of data is fixed to 4k and
smaller than nodesize, but otherwise everything works as before.
Signed-off-by: David Sterba <dsterba@suse.com>
The tree blocks are supposed to be always of nodesize. Before the
parameter has been dropped, it was unlikely but possible to pass a
misaligned value.
Signed-off-by: David Sterba <dsterba@suse.com>
Just to keep the 1st paramter the same as kernel.
We can also save a few lines since the parameter is shorter now.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The only reasom read_tree_block() needs a btrfs_root parameter is to get
its node/sector size.
And long ago, I have already introduced a compactible interface,
read_tree_block_fs_info() to pass btrfs_fs_info instead of btrfs_root.
Since we have cleaned up all root->sector/node/stripesize users, we
should be OK to refactor read_tree_block() function.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Since we have cached block sizes in fs_info, there is no need to specify
these sizes in btrfs_setup_root() function.
And refactor all root->sector/node/stripesize users in disk-io.c.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
btrfs_fs_info
Just like what we do in kernel, since we will not support different
leaf/node/stripe size per tree, there is no need to store these block
sizes in btrfs_root.
This patch will introduce these block size members into btrfs_fs_info
structure, allowing us to convert such usage in later patches.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Leafsize is deprecated for a long time, and kernel has already updated
ctree.h to rename sb->leafsize to sb->__unused_leafsize.
This patch will remove normal users of leafsize:
1) Remove leafsize member from btrfs_root structure
Now only root->nodesize and root->sectorisze.
No longer root->leafsize.
2) Remove @leafsize parameter from btrfs_setup_root() function
Since no root->leafsize, no need for @leafsize parameter.
The remaining user of leafsize will be:
1) btrfs inspect-internal dump-super
Reformat the "leafsize" output to "leafsize (deprecated)" and
use le32_to_cpu() to do the cast manually.
2) mkfs
We still need to set sb->__unused_leafsize to nodesize.
Do the manual cast too.
3) convert
Same as mkfs, these two superblock setup should be merged later
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Fuzzed image bko-161821.raw causes btrfs check to get segmentation fault.
The function check_owner_ref attempts to access a non-exist quota tree
when dealing with extent_item [4198400 4096] in the corrupted filesystem.
The function btrfs_new_fs_info always allocates memory for
fs_info->quota_root regardless of whether quota_tree exists or not.
Additionally, the function btrfs_read_fs_root will directly return
fs_info->quota_root if location->objectid == BTRFS_QUOTA_TREE_OBJECTID.
This patch does the following things:
1. Do extra check and return ENOENT if quota tree does not exist in the
function btrfs_read_fs_root.
2. Free useless fs_info->quota_root in the function btrfs_setup_all_roots
to reduce confusion.
3. free_extent_buffer even if check_child_node failed in the function
walk_down_tree.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If the final fsync() on the Btrfs device fails, we just swallow the
error and don't alert the user in any way. This was uncovered by xfstest
generic/405, which checks that mkfs fails when it encounters EIO.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Large numbers like (1024 * 1024 * 1024) may cost reader/reviewer to
waste one second to convert to 1G.
Introduce kernel include/linux/sizes.h to replace any intermediate
number larger than 4096 (not including 4096) to SZ_*.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
root->highest_inode is not accurate at the time of creating a lost+found
and it fails because the highest_inode+1 is already present. This could be
because of fixes after highest_inode is set. Instead, search
for the highest inode in the tree and use it for lost+found.
This makes root->highest_inode unnecessary and hence deleted.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If this flag is passed to open_ctree(), we'll clear the
FREE_SPACE_TREE_VALID compat_ro bit. The kernel will then reconstruct
the free space tree the next time the filesystem is mounted.
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It seems like bad idea to use a library name (lblkid) within generic
function name. The currently used scanning library is implementation
detail and this detail should be hidden for rest of the code.
Signed-off-by: Karel Zak <kzak@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
If blocksize is 0, it passes the IS_ALIGNED check but fails later as the
length of ebs will be zero.
Reported-by: Lukas Lueg <lukas.lueg@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=169311
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_read_dev_super() only returns 0 or -1, which doesn't really help,
caller won't know if it's caused by bad superblock or superblock out of
range.
Return -errno if pread64() return -1, and return -EOF if none or part of
the super is read out, and return what check_super() returned.
So caller can get -EIO to catch real corrupted super blocks.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Although we have enhanced read_tree_block() from a lot of different
aspects, it lacks the early bytenr/blocksize alignment check.
And the lack of such check can lead to strange use-after-free bugs, due
to the fact that alloc_extent_buffer() will free overlapping extent
buffers, and allocate new eb for the usage.
So we should not allow invalid bytenr/blocksize even passed to
btrfs_find_create_tree_block().
This patch will add such check so we won't trigger use-after-free bug
then.
Reported-by: Lukas Lueg <lukas.lueg@gmail.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The filesystem existence on a device is manifested by the signature,
during the mkfs process we write it first and then create other
structures. Such filesystem is not valid and should not be registered
during device scan nor listed among devices from blkid.
This patch will introduce two staged creation. In the first phase, the
signature is wrong, but recognized as a partially created filesystem (by
open or scan helpers). Once we successfully create and write everything,
we fixup the signature. At this point automated scanning should find
a valid filesystem on all devices.
We can also rely on the partially created filesystem to do better error
handling during creation. We can just bail out and do not need to clean
up.
The partial signature is '!BHRfS_M', can be shown by
btrfs inspect-internal dump-super -F image
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As we're passing a set of flags, the enum type is not appropriate.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This will not cause unaligned access as the checksum is at the beginning
of btrfs_header and thus aligned to a page, but for clarity use the
helper.
Signed-off-by: David Sterba <dsterba@suse.com>