This piece of code has been commented since 2009, given the number of
changes that have happened it's unlikely it could be made to work or is
needed at all. Just delete it.
The code was disabled in commit 95d3f20b51 ("Mixed back reference
(FORWARD ROLLING FORMAT CHANGE)") that changed the format significantly
and we don't need the compatibility code anymore.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
'pin' is always true in __free_extent so there is no point in checking
it. Just remove the if and unindent the code.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a bug report of unexpected ENOSPC from btrfs-convert, issue #123.
After some debugging, even when we have enough unallocated space, we
still hit ENOSPC at btrfs_reserve_extent().
[CAUSE]
Btrfs-progs relies on chunk preallocator to make enough space for
data/metadata.
However after the introduction of delayed-ref, it's no longer reliable
to rely on btrfs_space_info::bytes_used and
btrfs_space_info::bytes_pinned to calculate used metadata space.
For a running transaction with a lot of allocated tree blocks,
btrfs_space_info::bytes_used stays its original value, and will only be
updated when running delayed ref.
This makes btrfs-progs chunk preallocator completely useless. And for
btrfs-convert/mkfs.btrfs --rootdir, if we're going to have enough
metadata to fill a metadata block group in one transaction, we will hit
ENOSPC no matter whether we have enough unallocated space.
[FIX]
This patch will introduce btrfs_space_info::bytes_reserved to track how
many space we have reserved but not yet committed to extent tree.
To support this change, this commit also introduces the following
modification:
- More comment on btrfs_space_info::bytes_*
To make code a little easier to read
- Export update_space_info() to preallocate empty data/metadata space
info for mkfs.
For mkfs, we only have a temporary fs image with SYSTEM chunk only.
Export update_space_info() so that we can preallocate empty
data/metadata space info before we start a transaction.
- Proper btrfs_space_info::bytes_reserved update
The timing is the as kernel (except we don't need to update
bytes_reserved for data extents)
* Increase bytes_reserved when call alloc_reserved_tree_block()
* Decrease bytes_reserved when running delayed refs
With the help of head->must_insert_reserved to determine whether we
need to decrease.
Issue: #123
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In github issues, one user reports unexpected ENOSPC error if enabling
datasum druing convert. After some investigation, it looks like that
during ext2_saved/image creation, we could create large file extent
whose size can be 128M (max data extent size).
In that case, its csum block will be at least 128K. Under certain case
we need to allocate extra metadata chunks to fulfill such space
requirement.
However we only do metadata prealloc if we're reserving extents for fs
trees. (we use btrfs_root::ref_cows to determine whether we should do
metadata prealloc, and that member is only set for fs trees).
There is no explaination on why we only do metadata prealloc for file
trees, but from my educated guess, it could be related to avoid nested
extent/chunk tree modication.
At least extent reservation for csum tree shouldn't be a problem with
metadata block group preallocation.
So adding new condition for metadata preallocate to avoid unexpected
ENOSPC problem.
Issue: #123
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is a indirect recursion which can reach the extent reservation:
btrfs_reserve_extent() <--|
|- do_chunk_alloc() |
|- btrfs_alloc_chunk() |
|- btrfs_insert_item() |
|- btrfs_reserve_extent() <--|
Currently, we're using root->ref_cows to determine whether we should do
chunk prealloc to avoid such loop.
But that's still a hidden trap. Instead of solving it using some hidden
tricks, this patch will make chunk/block group allocation exclusive.
Now if do_chunk_alloc() determines to alloc chunk, it will set a flag in
transaction handle so new call of do_chunk_alloc() will refuse to
allocate new chunk until current chunk allocation finishes.
The chunks get over-allocated by 2M so there's enough space in case the
recursive call asks for a different type of blockgroup.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
BTRFS_COMPAT_EXTENT_TREE_V0 is introduced for a short time in kernel,
and it's over 10 years ago.
Nowadays there should be no user for that feature, and kernel has remove
this support in Jun, 2018. There is no need for btrfs-progs to support
it.
This patch will remove EXTENT_TREE_V0 related code and replace those
BUG_ON() to a more graceful error message.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is a bug report of BUG_ON() which is caused by __free_extent()
failed to lookup a backref extent:
Failed to find [1429288337408, 168, 16384]
btrfs unable to find ref byte nr 1429288583168 parent 0 root 2 owner 0 offset 0
convert/source-ext2.c:834: ext2_copy_inodes: BUG_ON ret triggered, value -5
./btrfs-convert[0x410941]
./btrfs-convert(main+0x1fdc)[0x40d3b8]
/lib64/libc.so.6(__libc_start_main+0xf3)[0x7f93bb7d2f33]
./btrfs-convert(_start+0x2e)[0x40a96e]
It's still unclear how this bug can be triggered, but adding such debug
output will provide more info for us to debug.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When running fuzz-tests/003 and fuzz-tests/009, btrfs-progs will crash
due to BUG_ON().
[CAUSE]
We abused BUG_ON() in btrfs_commit_transaction(), which is one of the
most error prone function for fuzzed images.
Currently to cleanup the aborted transaction, we only need to clean up
the only per-transaction data: delayed refs.
This patch will introduce a new function, btrfs_destroy_delayed_refs()
to cleanup delayed refs when we failed to commit transaction.
With that function, we will gently destroy per-trans delayed ref, and
remove the BUG_ON()s in btrfs_commit_transaction().
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch will refactor btrfs_finish_extent_commit():
- Make it return void
There is no failure pattern for btrfs_finish_extent_commit(), thus it
always return 0. And the caller doesn't care about the return value.
So no need to return int.
- Remove @root and @unpin parameters
@root is only used to extract fs_info, which can be extracted from
transaction handler already.
@unpin is always fs_info->pinned_extents.
All these parameters can be extracted from @trans, no need to pass
them.
The function signature now matches the kernel counterpart.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
cleanup_ref_head() will only return 0 or 1, no way to return a negative
value. So remove the dead branch.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit 7a12d8470e ("btrfs-progs: Do metadata preallocation as long as
we're not modifying extent tree") tries to fix#123, however due to the
fact that chunk tree also has root->ref_cows set, we will call
do_chunk_alloc() until call stack explodes.
So revert that offending patch until we have a much better comment on
root->ref_cows and find a better solution to this problem.
Signed-off-by: Qu Wenruo <wqu@suse.com>
In github issues, one user reports unexpected ENOSPC error if enabling
datasum druing convert. After some investigation, it looks like that
during ext2_saved/image creation, we could create large file extent
whose size can be 128M (max data extent size).
In that case, its csum block will be at least 128K. Under certain case
we need to allocate extra metadata chunks to fulfill such space
requirement.
However we only do metadata prealloc if we're reserving extents for fs
trees. (we use btrfs_root::ref_cows to determine whether we should do
metadata prealloc, and that member is only set for fs trees).
There is no explaination on why we only do metadata prealloc for file
trees, but at least from my investigation, it could be related to avoid
nested extent tree modication.
At least extent reservation for csum tree shouldn't be a problem with
metadata block group preallocation.
So change the metadata block group preallocation check from
"root->ref_cow" to "root->root_key.objectid !=
BTRFS_EXTENT_TREE_OBJECTID", and add some comment for it.
Issue: #123
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Found using -Wmissing-prototypes in GCC. This should improve LTO
behavior.
Note that set_free_space_tree_thresholds is an unused function. Adding
inline seems to remove the unused function warning.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For now this doesn't change the functionality since FST code is not yet
enabled via the compat bits. But this will be needed when it's enabled
so that the FST is correctly modified during repair operations that
allocate/deallocate extents.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that delayed refs have been wired let's merge the two function. In
the process also remove one BUG_ON since alloc_reserved_tree_block's
callers can handle errors. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that delayed refs have been all wired up clean up the __free_extent2
adapter function since it's no longer needed. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Given that the new delayed refs infrastructure is implemented and wired
up, there is no point in keeping the old code. So just remove it.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This commit enables the delayed refs infrastructures. This entails doing
the following:
1. Replacing existing calls of btrfs_extent_post_op (which is the
equivalent of delayed refs) with the proper btrfs_run_delayed_refs.
As well as eliminating open-coded calls to finish_current_insert and
del_pending_extents which execute the delayed ops.
2. Wiring up the addition of delayed refs when freeing extents
(btrfs_free_extent) and when adding new extents (alloc_tree_block).
3. Adding calls to btrfs_run_delayed refs in the transaction commit
path alongside comments why every call is needed, since it's not
always obvious (those call sites were derived empirically by running
and debugging existing tests)
4. Correctly flagging the transaction in which we are reinitialising
the extent tree.
5. Moving btrfs_write_dirty_block_groups to
btrfs_write_dirty_block_groups since blockgroups should be written to
disk after the last delayed refs have been run.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The root argument is used only to get a reference to the fs_info, this
can be achieved with the transaction handle being passed so use that.
This is in preparation for moving this function in the main transaction
commit routine. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This commit pulls those portions of the kernel implementation of
delayed refs which are necessary to have them working in user-space.
I've done the following modifications:
1. Replaced all kmem_cache_alloc calls to kmalloc.
2. Removed all locking-related code, since we are single threaded in
userspace.
3. Removed code which deals with data refs - delayed refs in user space
are going to be used only for cowonly trees.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is a simple adapter function to convert the delayed-refs structures
to the current arguments of alloc_reserved_tree_block.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is a simple adapter to convert the arguments delayed ref arguments
to the existing arguments of __free_extent.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When running test fuzz/003, we could hit the following BUG_ON:
====== RUN MAYFAIL btrfs check --init-csum-tree tests//fuzz-tests/images/bko-155621-bad-block-group-offset.raw.restored
Unable to find block group for 0
Unable to find block group for 0
Unable to find block group for 0
extent-tree.c:2657: alloc_tree_block: BUG_ON `ret` triggered, value -28
failed (ignored, ret=134): btrfs check --init-csum-tree tests/fuzz-tests/images/bko-155621-bad-block-group-offset.raw.restored
mayfail: returned code 134 (SIGABRT), not ignored
test failed for case 003-multi-check-unmounted
Just remove that BUG_ON() and allow us to exit gracefully, the caller
handles the errors.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently some instances of btrfs_free_extent are called with the
last parameter ("offset") being set to 1. This makes no sense, since
offset is used for data extents. I suspect this is a left-over from
95d3f20b51 ("Mixed back reference (FORWARD ROLLING FORMAT CHANGE)")
since this commit changed the signature of the function from :
-int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_root
- *root, u64 bytenr, u64 num_bytes, u64 parent,
- u64 root_objectid, u64 ref_generation,
- u64 owner_objectid, int pin);
to
+int btrfs_free_extent(struct btrfs_trans_handle *trans,
+ struct btrfs_root *root,
+ u64 bytenr, u64 num_bytes, u64 parent,
+ u64 root_objectid, u64 owner, u64 offset);
I.e the last parameter was "pin" and not offset. So these are just
leftovers with no semantic meaning. Fix this by passing 0.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is not really needed, since we can reference the fs_info from the
passed transaction. This is in preparation for delayed-refs support.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This argument is no longer used in this function so remove it.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
They are not really needed, what free_extent_hook wants is really a
pointer to fs_info so give it to it directly. This is in preparation
of delayed refs code.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is in preparation of delayed refs code.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Instead of updating this during update_block_group, move the updating
code at the places where we free/allocate a block. This resembles the
current state of the kernel code. This is in prep for delayed refs.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It's not needed, since we can obtain a reference to fs_info from the
passed transaction handle. This is needed by delayed refs code.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This argument is used to obtain a reference to fs_info, which can
already be done from the passed trans handle, so use that instead.
This is in preparation for delayed refs support.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Presently btrfs-progs haven't pulled the enum defining the symbolic
names of read ahead constants. This commit adds the enum and
simultaneously converts all usages to respective symbolic name.
No functional change, just making the code human readable.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It's not needed since we can acquire a reference to the fs_info from
the transaction handle already passed.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It's used only to get a reference to fs_info, which can be obtained from
the transaction handle.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
That function really wants an fs_info and not a root. Accidentally,
this also makes the kernel/user space signatures to be coherent.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is no longer used by the callees of that function so remove it.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function actually uses only the extent_buffer arg but takes 3
arguments. Furthermore, it's current interface doesn't even mirror
the kernel counterpart. Just remove the extra arguments.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function needs btrfs_fs_info and not a root. So make it directly
take btrfs_fs_info,
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Just reference it directly from trans->fs_info.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function always operates on the extent root which can be
referenced from trans->fs_info. Do that to simplify function's
signature.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It's always set to extent_root and the function already takes a
transaction handle where fs_info could be referenced and in turn
the extent_tree.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Originally commit 2681e00f00 ("btrfs-progs: check for matchingi
free space in cache") added the account_super_bytes function to prevent
false negative when running btrfs check. Turns out this function is
really copied exclude_super_stripes, excluding the calls to
exclude_super_stripes. Later commit e4797df6a9 ("btrfs-progs: check
the free space tree in btrfsck") introduced proper version of
exclude_super_stripes. Instead of duplicating the function, just remove
account_super_bytes and use exclude_super_stripes instead of the former.
This also has the benefit of bringing the userspace code a bit closer
to the kernel counterpart.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This parameter was introduced with the original implementation of the
function but has never really been used, so just remove it.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Just like kernel cleanup made by David, btrfs_print_leaf() and
btrfs_print_tree() doesn't need btrfs_root parameter at all.
With previous patches as preparation, now we can remove the btrfs_root
parameter.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Although skinny_metadata's type is int, its value just can be 0/1. And
if condition be true only when skinny_metadata equals 1, so in if's
executive part, set skinny_metadata to 1 is redundancy. Remove it.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
@chunk_objectid of btrfs_make_block_group() function is always fixed to
BTRFS_FIRST_FREE_OBJECTID, so there is no need to pass it as parameter
explicitly.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_reserve_extent() uses int @data to determine if we're allocating
data extent, while reuse the parameter later to pass it as profile
(data/meta/sys).
It's a little confusing, this patch will follow kernel parameter to use
bool @is_data to replace it.
And in btrfs_reserve_extent(), use dedicated u64 @profile.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Some parameter of trans is not used indeed.
Let's remove them.
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When passing directory larger than block device using --rootdir
parameter, we get the following backtrace:
------
extent-tree.c:2693: btrfs_reserve_extent: BUG_ON `ret` triggered, value -28
./mkfs.btrfs(+0x1a05d)[0x557939e6b05d]
./mkfs.btrfs(btrfs_reserve_extent+0xb5a)[0x557939e710c8]
./mkfs.btrfs(+0xb0b6)[0x557939e5c0b6]
./mkfs.btrfs(main+0x15d5)[0x557939e5de04]
/usr/lib/libc.so.6(__libc_start_main+0xea)[0x7f83b101af6a]
./mkfs.btrfs(_start+0x2a)[0x557939e5af5a]
------
Nothing special, just BUG_ON() abusing from ancient code.
Fix them by using correct return.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
free_block_group_cache() calls clear_extent_bits() with wrong end, which
is one byte larger than the correct range.
This will cause the next adjacent cache state to be split. And due to
the split, private pointer (which points to block group cache) will be
reset to NULL.
This is very hard to detect as this function only gets called in
cleanup_temp_chunks() which is just before mkfs finishes. This bug only
gets exposed when reworking --rootdir option.
Signed-off-by: Qu Wenruo <quwenruo.btrfs@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>