Add declarations for global fs_info and task context so they can be
accessed from any .c file once the main.c will be split. Add prefix "g_"
for the task.
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
If we emulate a write error during commit transaction, by setting the
block device read-only, then we can easily have the following crash
using "btrfs check --clear-space-cache v2":
Opening filesystem to check...
Checking filesystem on /dev/test/scratch1
UUID: 5945915b-37f1-4bfa-9f64-684b318b8f73
Clear free space cache v2
Error writing to device 1
kernel-shared/transaction.c:156: __commit_transaction: BUG_ON `ret` triggered, value 1
./btrfs(+0x570c9)[0x562ec894f0c9]
./btrfs(+0x57167)[0x562ec894f167]
./btrfs(__commit_transaction+0x13b)[0x562ec894f7f2]
./btrfs(btrfs_commit_transaction+0x214)[0x562ec894fa64]
./btrfs(btrfs_clear_free_space_tree+0x177)[0x562ec8941ae6]
./btrfs(+0xc8958)[0x562ec89c0958]
./btrfs(+0xc9d53)[0x562ec89c1d53]
./btrfs(+0x17ec7)[0x562ec890fec7]
./btrfs(main+0x12f)[0x562ec8910908]
/usr/lib/libc.so.6(+0x232d0)[0x7ff917ee82d0]
/usr/lib/libc.so.6(__libc_start_main+0x8a)[0x7ff917ee838a]
./btrfs(_start+0x25)[0x562ec890fdc5]
Aborted (core dumped)
[CAUSE]
The call trace has shown it's a BUG_ON(), and it's from
__commit_transaction(), which is writing tree blocks back.
[FIX]
The fix is pretty simple, just return error.
In fact we even have an error value check in btrfs_commit_transaction()
just after __commit_transaction() call (although not catching the return
value from it).
And since we're here, also call btrfs_abort_transaction() to prevent
newer transactions from being started.
Now we won't have a full crash:
Opening filesystem to check...
Checking filesystem on /dev/test/scratch1
UUID: 5945915b-37f1-4bfa-9f64-684b318b8f73
Clear free space cache v2
Error writing to device 1
ERROR: failed to write bytenr 30425088 length 16384: Operation not permitted
ERROR: failed to write tree block 30425088: Operation not permitted
ERROR: failed to clear free space cache v2: -1
extent buffer leak: start 30720000 len 16384
Reported-by: Christoph Anton Mitterer <calestyo@scientia.org>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When transaction is aborted halfway, we can have extent buffer leaked,
and in that case, the same leaked extent buffer can be reported for
multiple times:
ERROR: failed to clear free space cache v2: -1
extent buffer leak: start 30441472 len 16384
WARNING: dirty eb leak (aborted trans): start 30441472 len 16384
extent buffer leak: start 30720000 len 16384
extent buffer leak: start 30425088 len 16384
extent buffer leak: start 30425088 len 16384 << Duplicated
WARNING: dirty eb leak (aborted trans): start 30425088 len 16384
Note that 30425088 line is reported twice (not accounting the "dirty eb
leak" line).
[CAUSE]
When we detected a leaked eb, we call free_extent_buffer_nocache(), but
free_extent_buffer_nocache() can only remove the eb when its reduced
refs is 0.
If the eb has refs 2, it will need two free_extent_buffer_nocache()
calls to remove it from the cache.
[FIX]
Just reset the eb->refs to 1 so that free_extent_buffer_nocache() can
remove it from cache for sure.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The function was introduced by commit a5ce5d2198 ("btrfs-progs:
extent-cache: actually cache extent buffers") but never got utilized.
Thus we can just remove it.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
RST format provides cross reference function that users can navigate
manual pages click. This patch is written by macro that replaces old
references to doc role in RST format.
Issue: #495
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The logic at the beginning of this function to handle reserved ranges
was pretty complex and hard to follow. By refactoring it to use the
existing intersect_with_reserved() function, we can remove most of the
comparisons and boolean operators while preserving the exact same logic.
This change is only for readability. It does not change the logic itself
at all.
Author: Thomas Hebb <tommyhebb@gmail.com>
Pull-request: #494
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We currently open code a similar operation in create_image_file_range().
By exposing intersect_with_reserved() outside of source-fs.c and
slightly changing its semantics to return the entire range instead of
just the end address, we can reuse it in create_image_file_range().
Author: Thomas Hebb <tommyhebb@gmail.com>
Pull-request: #494
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When checking if the requested range starts in a valid region but later
hits a reserved range, we require the reserved range to end before the
requested one does.
This is incorrect. Since we're going to truncate the requested range
anyway, we want this check to pass even if the requested range ends
partway through a reserved range.
Fix the issue by checking against the reserved range's start address
instead of its end.
Luckily, I don't believe this bug makes a difference in the current code
path, since the range we pass to this function never ends before the end
of the filesystem.
Issue: #297
Issue: #349
Author: Thomas Hebb <tommyhebb@gmail.com>
Pull-request: #494
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
intersect_with_reserved() currently succeeds if (bytenr + num_bytes) is
greater than or equal to the first address in the range, assuming that
bytenr is also not past the end of the range.
This is wrong. (bytenr + num bytes) is one byte past the last address in
the range we're checking, meaning that our range only overlaps the
reserved range if it's strictly greater than the reserved range's start
address.
For example, imagine a range at 0x3000 with length 0x1000 that we're
checking against a reserved range that starts at 0x4000. The addresses
in our range are 0x3000-0x3fff: it doesn't overlap. But the current
check, (0x3000 + 0x1000 >= 0x4000), will erroneously pass.
Fix the issue by changing >= to >.
Issue: #297
Issue: #349
Author: Thomas Hebb <tommyhebb@gmail.com>
Pull-request: #494
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is currently defined in source-fs.h, but main.c uses it far more
than source-fs.c does. Put it in common.h instead, since it's a useful
standalone type.
Author: Thomas Hebb <tommyhebb@gmail.com>
Pull-request: #494
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Create a few emulated zoned devices and run mkfs, the zone reset is
expected to be run in parallel. It's using memory-backed devices so it's
too fast to measure the differences and we can't expect availability of
slow zoned devices so this test is very simplistic.
Signed-off-by: David Sterba <dsterba@suse.com>
I've written a simple shell wrapper for null_blk configuration
(https://github.com/kdave/nullb). Make a local copy of version 0.1 to
avoid external dependency for our tests.
Signed-off-by: David Sterba <dsterba@suse.com>
When devices are formatted as btrfs, btrfs_prepare_device is called
sequentially for each device, which takes too much time.
Put each btrfs_prepare_device into a thread, wait for the first thread
to complete to mkfs.btrfs, and wait for other threads to complete before
adding other devices to the file system.
During the preparation it's either trim/discard or zone reset.
This was tested with TCMU emulation with two zoned devices. Each device
is 2000G (about 19.53 TiB), the region size is 4MB, Use the following
parameters for targetcli:
create name=zbc0 size=20000G cfgstring=model-HM/zsize-4/conv-100@~/zbc0.raw
Call difftime to calculate the running time of the function
btrfs_prepare_device. Calculate the time from thread creation to
completion of all threads after patching:
$ lsscsi -p
[10:0:1:0] (0x14) LIO-ORG TCMU ZBC device 0002 /dev/sdb - none
[11:0:1:0] (0x14) LIO-ORG TCMU ZBC device 0002 /dev/sdc - none
$ sudo mkfs.btrfs -d single -m single -O zoned /dev/sdc /dev/sdb -f
....
time for prepare devices:4.000000.
....
$ sudo mkfs.btrfs -d single -m single -O zoned /dev/sdc /dev/sdb -f
...
time for prepare devices:2.000000.
...
Issue: #496
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Li Zhang <zhanglikernel@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The egrep command is deprecated (per manual page of grep) for a long
time and will probably be removed, the replacement is 'grep -E'.
Signed-off-by: David Sterba <dsterba@suse.com>
Process an enable_verity cmd by running the enable verity ioctl on the
file. Since enabling verity denies write access to the file, it is
important that we don't have any open write file descriptors.
This also revs the send stream format to version 3 with no format
changes besides the new commands and attributes. This version is not
finalized and commands may change, also this needs to be synchronized
with any kernel changes.
Note: the build is conditional on the header linux/fsverity.h
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
The block group tree doesn't yet have full bi-directional conversion
support from btrfstune, and it seems we may want one or two release
cycles to rule out some extra bugs before really releasing the progs
support.
This patch will hide the block group tree feature behind experimental
flag for the following tools:
- btrfstune
"-b" option to convert to bg tree.
- mkfs.btrfs
hide "block-group-tree" feature from both -O (the new default position
for all features) and -R (the old, soon to be deprecated one).
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The online manual pages of the btrfs utilities seem to have been moved to
`readthedocs.io`; update references in the README accordingly.
Author: Guillaume Legrand
Pull-request: #500
Signed-off-by: David Sterba <dsterba@suse.com>
swapon fails with an unclear error message, add some hints were to look
for more information.
Author: Torstein Eide
Pull-request: #491
Signed-off-by: David Sterba <dsterba@suse.com>
Mention the version support for the cross-mount support, since 5.18.
Author: AtticFinder65536
Pull-request: #480
Signed-off-by: David Sterba <dsterba@suse.com>
The radix-tree is not used in userspace code. In kernel it's for
tracking unpersisted and in-memory structures and has been replaced by
the xarray.
Signed-off-by: David Sterba <dsterba@suse.com>
The random-test exercises the b-tree operations but hasn't been in use
for a long time and we won't probably resurrect it. Also it's the only
user of the radix_tree structures, that are otherwise used in the kernel
code, it needs the kerne-lib radix-tree implementation. Let's remove it
as it's basically dead code.
Signed-off-by: David Sterba <dsterba@suse.com>
The tool IWYU (include what you use) suggests to remove and add some
includes. Update the includes of implementation files only.
Signed-off-by: David Sterba <dsterba@suse.com>
Lots of code still uses fprintf(stderr, "...") that should be the
error() helper. The kernel-shared code is left out of the conversion for
now.
Signed-off-by: David Sterba <dsterba@suse.com>
The tool IWYU (include what you use) suggests to remove and add some
includes. This is only partial to avoid accidental build breakage, the
includes are entangled and will have to be cleaned in the future again.
Signed-off-by: David Sterba <dsterba@suse.com>
The features are split to -O and -R but it does not make much sense from
user POV, there are different levels of compatibility but it does not
need to be selected that way. Merge the tables into one but hide it
behind experimental build until the conversion is complete.
Signed-off-by: David Sterba <dsterba@suse.com>
Some tests don't use the /tmp temporary files and store it locally in
the test directory. To support NFS this needs to be created by a few
commands. To avoid accidental breakage add a convenience helper.
Signed-off-by: David Sterba <dsterba@suse.com>