Originally commit 2681e00f00 ("btrfs-progs: check for matchingi
free space in cache") added the account_super_bytes function to prevent
false negative when running btrfs check. Turns out this function is
really copied exclude_super_stripes, excluding the calls to
exclude_super_stripes. Later commit e4797df6a9 ("btrfs-progs: check
the free space tree in btrfsck") introduced proper version of
exclude_super_stripes. Instead of duplicating the function, just remove
account_super_bytes and use exclude_super_stripes instead of the former.
This also has the benefit of bringing the userspace code a bit closer
to the kernel counterpart.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This parameter was introduced with the original implementation of the
function but has never really been used, so just remove it.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Just like kernel cleanup made by David, btrfs_print_leaf() and
btrfs_print_tree() doesn't need btrfs_root parameter at all.
With previous patches as preparation, now we can remove the btrfs_root
parameter.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Although skinny_metadata's type is int, its value just can be 0/1. And
if condition be true only when skinny_metadata equals 1, so in if's
executive part, set skinny_metadata to 1 is redundancy. Remove it.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
@chunk_objectid of btrfs_make_block_group() function is always fixed to
BTRFS_FIRST_FREE_OBJECTID, so there is no need to pass it as parameter
explicitly.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_reserve_extent() uses int @data to determine if we're allocating
data extent, while reuse the parameter later to pass it as profile
(data/meta/sys).
It's a little confusing, this patch will follow kernel parameter to use
bool @is_data to replace it.
And in btrfs_reserve_extent(), use dedicated u64 @profile.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Some parameter of trans is not used indeed.
Let's remove them.
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When passing directory larger than block device using --rootdir
parameter, we get the following backtrace:
------
extent-tree.c:2693: btrfs_reserve_extent: BUG_ON `ret` triggered, value -28
./mkfs.btrfs(+0x1a05d)[0x557939e6b05d]
./mkfs.btrfs(btrfs_reserve_extent+0xb5a)[0x557939e710c8]
./mkfs.btrfs(+0xb0b6)[0x557939e5c0b6]
./mkfs.btrfs(main+0x15d5)[0x557939e5de04]
/usr/lib/libc.so.6(__libc_start_main+0xea)[0x7f83b101af6a]
./mkfs.btrfs(_start+0x2a)[0x557939e5af5a]
------
Nothing special, just BUG_ON() abusing from ancient code.
Fix them by using correct return.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
free_block_group_cache() calls clear_extent_bits() with wrong end, which
is one byte larger than the correct range.
This will cause the next adjacent cache state to be split. And due to
the split, private pointer (which points to block group cache) will be
reset to NULL.
This is very hard to detect as this function only gets called in
cleanup_temp_chunks() which is just before mkfs finishes. This bug only
gets exposed when reworking --rootdir option.
Signed-off-by: Qu Wenruo <quwenruo.btrfs@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The warning can pop up frequently on a fuzzed image, the message seems
to be enough. Add a more fitting error code too.
Signed-off-by: David Sterba <dsterba@suse.com>
As btrfs_update_block_group fails when the block group is not found in
cache, we can exit btrfs_free_block_group, not much to rollback. The
caller will also exit in turn.
Signed-off-by: David Sterba <dsterba@suse.com>
Metadata blocks are always nodesize. When reading the
superblock::sys_array, the actual size of data is fixed to 4k and
smaller than nodesize, but otherwise everything works as before.
Signed-off-by: David Sterba <dsterba@suse.com>
If the found %ins is crossing a stripe len, ie. BTRFS_STRIPE_LEN, we'd
search again with a stripe-aligned %search_start. The current code
calculates %search_start by adding a wrong offset, in order to fix it, the
start position of the block group should be taken, otherwise, it'll end up
with looking at the same block group forever.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 functions are involved in this refactor: btrfs_make_block_group()
btrfs_make_block_groups(), btrfs_alloc_chunk, btrfs_alloc_data_chunk().
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Just to keep the 1st paramter the same as kernel.
We can also save a few lines since the parameter is shorter now.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When a 0 sized block group item is found, set_extent_bits() will not
really set any bits.
While set_state_private() still inserts allocated block group cache into
block group extent_io_tree.
So at close_ctree() time, we won't free the private block group cache
stored since we can't find any bit set for the 0 sized block group.
To fix it, at btrfs_read_block_groups() we skip any 0 sized block group,
so such leak won't happen.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Large numbers like (1024 * 1024 * 1024) may cost reader/reviewer to
waste one second to convert to 1G.
Introduce kernel include/linux/sizes.h to replace any intermediate
number larger than 4096 (not including 4096) to SZ_*.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit 854437ca(btrfs-progs: extent-tree: avoid allocating tree block
that crosses stripe boundary) introduces check for logical bytenr not
crossing stripe boundary.
However that check is not completely correct.
It only checks if the logical bytenr and length agaist absolute logical
offset.
That's to say, it only check if a tree block lies in 64K logical stripe.
But in fact, it's possible a block group starts at bytenr unaligned with
64K, just like the following case.
Then btrfsck will give false alert.
0 32K 64K 96K 128K 160K ...
|--------------- Block group A ---------------------
|<-----TB 32K------>|
|/Scrub stripe unit/|
| WRONG UNIT |
In that case, TB(tree block) at bytenr 32K in fact fits into the kernel
scrub stripe unit.
But doesn't fit into the pure logical 64K stripe.
Fix check_crossing_stripes() to compare bytenr to block group start, not
to absolute logical bytenr.
Reported-by: Jussi Kansanen <jussi.kansanen@gmail.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we discover a bad BLOCK_GROUP_ITEM_KEY with offset = 0, we'll end up looping
forever when we read the block groups in. This is due to the search for the
next block group starting at the current object + the offset. If offset is 0,
we'll just get the same key over and over and never advance. This patch
ensures that we'll advance at least one objectid per iteration.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new function check_data_extent_item() to check if the
corresponding data backref exists in extent tree.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Btrfs_record_file_extent() will split extents using max extent size(128M).
It works well for real file extents, but not that well for large
hole extent, as hole doesn't have extent size limit.
In that case, it will only insert one 128M hole, and skip the rest,
leading to discontinuous extent error for converted btrfs.
Fix it by not splitting hole extents.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Cleanup all the old btrfs-convert facilities, including:
1) btrfs_convert_operations->alloc/free/test_extents*
No need to do non-standard extent allocation.
After init_btrfs() everything can be done by normal routine.
Now only 4 functions are needed in btrfs_convert_operations.
1) open_fs
2) read_used_space
3) copy_inodes
4) close_fs
2) fs_info->extent_ops
Same as above.
3) Old init_btrfs(), create_image(), create_file_image_range()
Replaced with newer and cleaner one.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Before this patch, btrfs-convert only rely on large enough initial
system/metadata chunk size to ensure no newer system/meta chunk will be
created.
But that's not safe enough. So add two new members in fs_info,
avoid_sys/meta_chunk_alloc flags to prevent any newer system or meta
chunks to be created before init_btrfs_v2().
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Btrfs_record_file_extent() has some small problems like:
1) Can't handle overlapping extents
2) May create extent larger than BTRFS_MAX_EXTENT_SIZE
So enhance it using previously added facilites.
This is used for later btrfs-convert, as for new convert, we create
saved image first, then copy inode.
Which will also cause extent overlapping.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new function, btrfs_search_overlap_extent() to find the first
overlapping extent.
It's useful for later btrfs-convert rework.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nodesize is used in kernel, the values are always equal. We have to keep
leafsize in headers, similarly the tree setting functions still take and
set leafsize, but it's effectively a no-op.
Signed-off-by: David Sterba <dsterba@suse.com>
Since open_ctree_fs_info() now may return a fs_info even without any
roots, modify functions like read_tree_block() to operate with such
fs_info.
This provides the basis for btrfs-find-root to operate on chunk tree
with corrupted fs.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ coding style adjustments, unified declarations ]
Signed-off-by: David Sterba <dsterba@suse.com>
This reuses the existing code for checking the free space cache, we just
need to load the free space tree. While we do that, we check a couple of
invariants on the free space tree itself. This requires pulling in some
code from the kernel to exclude the super stripes.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Line of
#include "math.h"
in extent-tree.c using quotas is historical reason, (we had custom
math.h before).
Use "<>" instead of quotes in this header file.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Breaking from the while loop makes ret overwritten to zero, goto error
label directly and return -ENOMEM.
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As convert implement its own alloc extent, avoid such metadata problem
too.
Reported-by: Chris Murphy <lists@colorremedies.com>
Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>