The warning can pop up frequently on a fuzzed image, the message seems
to be enough. Add a more fitting error code too.
Signed-off-by: David Sterba <dsterba@suse.com>
As btrfs_update_block_group fails when the block group is not found in
cache, we can exit btrfs_free_block_group, not much to rollback. The
caller will also exit in turn.
Signed-off-by: David Sterba <dsterba@suse.com>
Metadata blocks are always nodesize. When reading the
superblock::sys_array, the actual size of data is fixed to 4k and
smaller than nodesize, but otherwise everything works as before.
Signed-off-by: David Sterba <dsterba@suse.com>
If the found %ins is crossing a stripe len, ie. BTRFS_STRIPE_LEN, we'd
search again with a stripe-aligned %search_start. The current code
calculates %search_start by adding a wrong offset, in order to fix it, the
start position of the block group should be taken, otherwise, it'll end up
with looking at the same block group forever.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 functions are involved in this refactor: btrfs_make_block_group()
btrfs_make_block_groups(), btrfs_alloc_chunk, btrfs_alloc_data_chunk().
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Just to keep the 1st paramter the same as kernel.
We can also save a few lines since the parameter is shorter now.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When a 0 sized block group item is found, set_extent_bits() will not
really set any bits.
While set_state_private() still inserts allocated block group cache into
block group extent_io_tree.
So at close_ctree() time, we won't free the private block group cache
stored since we can't find any bit set for the 0 sized block group.
To fix it, at btrfs_read_block_groups() we skip any 0 sized block group,
so such leak won't happen.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Large numbers like (1024 * 1024 * 1024) may cost reader/reviewer to
waste one second to convert to 1G.
Introduce kernel include/linux/sizes.h to replace any intermediate
number larger than 4096 (not including 4096) to SZ_*.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit 854437ca(btrfs-progs: extent-tree: avoid allocating tree block
that crosses stripe boundary) introduces check for logical bytenr not
crossing stripe boundary.
However that check is not completely correct.
It only checks if the logical bytenr and length agaist absolute logical
offset.
That's to say, it only check if a tree block lies in 64K logical stripe.
But in fact, it's possible a block group starts at bytenr unaligned with
64K, just like the following case.
Then btrfsck will give false alert.
0 32K 64K 96K 128K 160K ...
|--------------- Block group A ---------------------
|<-----TB 32K------>|
|/Scrub stripe unit/|
| WRONG UNIT |
In that case, TB(tree block) at bytenr 32K in fact fits into the kernel
scrub stripe unit.
But doesn't fit into the pure logical 64K stripe.
Fix check_crossing_stripes() to compare bytenr to block group start, not
to absolute logical bytenr.
Reported-by: Jussi Kansanen <jussi.kansanen@gmail.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we discover a bad BLOCK_GROUP_ITEM_KEY with offset = 0, we'll end up looping
forever when we read the block groups in. This is due to the search for the
next block group starting at the current object + the offset. If offset is 0,
we'll just get the same key over and over and never advance. This patch
ensures that we'll advance at least one objectid per iteration.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new function check_data_extent_item() to check if the
corresponding data backref exists in extent tree.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Btrfs_record_file_extent() will split extents using max extent size(128M).
It works well for real file extents, but not that well for large
hole extent, as hole doesn't have extent size limit.
In that case, it will only insert one 128M hole, and skip the rest,
leading to discontinuous extent error for converted btrfs.
Fix it by not splitting hole extents.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Cleanup all the old btrfs-convert facilities, including:
1) btrfs_convert_operations->alloc/free/test_extents*
No need to do non-standard extent allocation.
After init_btrfs() everything can be done by normal routine.
Now only 4 functions are needed in btrfs_convert_operations.
1) open_fs
2) read_used_space
3) copy_inodes
4) close_fs
2) fs_info->extent_ops
Same as above.
3) Old init_btrfs(), create_image(), create_file_image_range()
Replaced with newer and cleaner one.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Before this patch, btrfs-convert only rely on large enough initial
system/metadata chunk size to ensure no newer system/meta chunk will be
created.
But that's not safe enough. So add two new members in fs_info,
avoid_sys/meta_chunk_alloc flags to prevent any newer system or meta
chunks to be created before init_btrfs_v2().
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Btrfs_record_file_extent() has some small problems like:
1) Can't handle overlapping extents
2) May create extent larger than BTRFS_MAX_EXTENT_SIZE
So enhance it using previously added facilites.
This is used for later btrfs-convert, as for new convert, we create
saved image first, then copy inode.
Which will also cause extent overlapping.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new function, btrfs_search_overlap_extent() to find the first
overlapping extent.
It's useful for later btrfs-convert rework.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nodesize is used in kernel, the values are always equal. We have to keep
leafsize in headers, similarly the tree setting functions still take and
set leafsize, but it's effectively a no-op.
Signed-off-by: David Sterba <dsterba@suse.com>
Since open_ctree_fs_info() now may return a fs_info even without any
roots, modify functions like read_tree_block() to operate with such
fs_info.
This provides the basis for btrfs-find-root to operate on chunk tree
with corrupted fs.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ coding style adjustments, unified declarations ]
Signed-off-by: David Sterba <dsterba@suse.com>
This reuses the existing code for checking the free space cache, we just
need to load the free space tree. While we do that, we check a couple of
invariants on the free space tree itself. This requires pulling in some
code from the kernel to exclude the super stripes.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Line of
#include "math.h"
in extent-tree.c using quotas is historical reason, (we had custom
math.h before).
Use "<>" instead of quotes in this header file.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Breaking from the while loop makes ret overwritten to zero, goto error
label directly and return -ENOMEM.
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As convert implement its own alloc extent, avoid such metadata problem
too.
Reported-by: Chris Murphy <lists@colorremedies.com>
Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now find_free_extent() function won't return a metadata extent that
crosses stripe boundary.
Reported-by: Chris Murphy <lists@colorremedies.com>
Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function will be used to free a empty chunk.
This provides the basis for later temp chunk cleanup.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce two functions, free_space_info and free_block_group_cache.
The former will free the space of a empty block group.
The latter will free the in memory block group cache along with its
space in space_info and device space.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce two functions, free_chunk_item and free_system_chunk_item.
First one will free chunk item in chunk tree.
The latter one will free a system chunk in super block.
They are used for later chunk/block group free function.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce two functions, free_dev_extent_item and
free_chunk_dev_extent_items, to free dev extent items in a chunk.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function is used to free a block group item. It must be called
with all the space in the block group pinned. Or there is a possibility
that tree blocks be allocated into the range.
The function is used for later block group/chunk free function.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have this check in the kernel but not in userspace, which makes fsck
fail when we wouldn't have a problem in the kernel. This was meant to
catch this case because it really isn't good, unfortunately it will
require a design change to fix in the kernel so in the meantime add this
check so we can be sure our tests only catch real problems. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
We hold a transaction open for the entirety of fixing extent refs. This works
out ok most of the time but we can be tight on space and run out of space when
fixing things. To get around this just push down the transaction starting dance
into the functions that actually fix things. This keeps us from ending up with
ENOSPC because we pinned everything and allows the code to be a bit simpler.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Hitting enospc problems with a really corrupt fs uncovered the fact that we
match any flag in a block group when creating space info's. This is a problem
if we have a raid level set, we'll end up with only one space info that covers
metadata and data because they share a raid level. We don't want this, we want
to separate out the data and metadata space infos, so mask off the raid level
and only use the main flags. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>