[BUG]
Command `btrfs scrub start -B` and `btrfs scrub status` are reporting
very different results for "Total to scrub":
$ sudo btrfs scrub start -B /mnt/btrfs/
scrub done for c107ef62-0a5d-4fd7-a119-b88f38b8e084
Scrub started: Mon Jun 5 07:54:07 2023
Status: finished
Duration: 0:00:00
Total to scrub: 1.52GiB
Rate: 0.00B/s
Error summary: no errors found
$ sudo btrfs scrub status /mnt/btrfs/
UUID: c107ef62-0a5d-4fd7-a119-b88f38b8e084
Scrub started: Mon Jun 5 07:54:07 2023
Status: finished
Duration: 0:00:00
Total to scrub: 12.00MiB
Rate: 0.00B/s
Error summary: no errors found
This can be very confusing for end users.
[CAUSE]
It's the function print_fs_stat() handling the "Total to scrub" output.
For `btrfs scrub start` command, we use the used bytes (aka, the total
used dev extents of a device) for output.
This is not really accurate, as the chunks may be mostly empty just like
the following:
$ btrfs fi df /mnt/btrfs/
Data, single: total=1.01GiB, used=9.06MiB
System, DUP: total=40.00MiB, used=64.00KiB
Metadata, DUP: total=256.00MiB, used=1.38MiB
GlobalReserve, single: total=22.00MiB, used=0.00B
Thus we're reporting 1.5GiB to scrub (1.01GiB + 40MiB * 2 + 256MiB * 2).
But in reality, we only scrubbed 12MiB
(9.06MiB + 64KiB * 2 + 1.38MiB * 2).
[FIX]
Instead of using the used dev-extent bytes of a device, go with proper
scrubbed bytes for each device.
This involves print_fs_stat() and print_scrub_dev() called inside
scrub_start().
Now the output should match each other.
Issue: #636
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are reports that json output of 'qgroup show' crashes due to
internal error when printing the limit values:
INTERNAL ERROR: unknown unit base, mode 2304
btrfs(internal_error+0x10a)[0x5605c37ce48a]
btrfs(pretty_size_snprintf+0x5c)[0x5605c37d105c]
btrfs(fmt_print+0x44e)[0x5605c37d178e]
btrfs(+0x7ed1d)[0x5605c3800d1d]
btrfs(main+0x8f)[0x5605c379beff]
/lib64/libc.so.6(+0x27bb0)[0x7f83924ddbb0]
/lib64/libc.so.6(__libc_start_main+0x8b)[0x7f83924ddc79]
btrfs(_start+0x25)[0x5605c379d405]
common/units.c:82: pretty_size_snprintf: Assertion `0` failed, value 0
btrfs(+0x1d4b1)[0x5605c379f4b1]
btrfs(pretty_size_snprintf+0x7b)[0x5605c37d107b]
btrfs(fmt_print+0x44e)[0x5605c37d178e]
btrfs(+0x7ed1d)[0x5605c3800d1d]
btrfs(main+0x8f)[0x5605c379beff]
/lib64/libc.so.6(+0x27bb0)[0x7f83924ddbb0]
/lib64/libc.so.6(__libc_start_main+0x8b)[0x7f83924ddc79]
btrfs(_start+0x25)[0x5605c379d405]
This is caused by "size" format that requires the unit mode, but it was not
specified and some stack value used. As json prints the raw values, use
the plain %llu format.
Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1206960
Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1209136#c15
Signed-off-by: David Sterba <dsterba@suse.com>
Since commit c8593f65cbf3 ("btrfs-progs: sync tree-checker.[ch] from
kernel"), btrfs-progs can do the kernel level tree block checks, which
is not really sutiable for dump-tree.
Under a lot of cases, we're using dump-tree tool to debug to collect the
details from end users.
If it's a bitflip causing a rejection, we would be unable to determine
the cause.
So this patch would add OPEN_CTREE_SKIP_LEAF_ITEM_CHECKS for dump-tree.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It's a known problem that a received subvolume would lose its UUID after
switching to RW. Thus it can lead to later receive problems for
snapshotting and cloning.
In that case, we just output a simple error message like:
ERROR: cannot find parent subvolume
Or
ERROR: clone: did not find source subvol
Normally we need to use "btrfs receive --dump" to know what the missing
subvolume UUID is, which would take extra work.
This patch would:
- Add extra subvolume UUID to the output
- Unify the error messages to the same format
Now the error messages would look like:
ERROR: snapshot: cannot find parent subvolume 1b4e28ba-2fa1-11d2-883f-b9a761bde3fb
ERROR: clone: cannot find source subvolume 1b4e28ba-2fa1-11d2-883f-b9a761bde3fb
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When compiling on a system with gcc 12.2.1, the following warning is
generated. It can be fixed by adding a static storage class specifier.
cmds/inspect.c:733:5: warning: no previous prototype for ‘cmp_cse_devid_start’ [-Wmissing-prototypes]
733 | int cmp_cse_devid_start(const void *va, const void *vb)
| ^~~~~~~~~~~~~~~~~~~
cmds/inspect.c:754:5: warning: no previous prototype for ‘cmp_cse_devid_lstart’ [-Wmissing-prototypes]
754 | int cmp_cse_devid_lstart(const void *va, const void *vb)
| ^~~~~~~~~~~~~~~~~~~~
cmds/inspect.c:775:5: warning: no previous prototype for ‘print_list_chunks’ [-Wmissing-prototypes]
775 | int print_list_chunks(struct list_chunks_ctx *ctx, unsigned sort_mode,
| ^~~~~~~~~~~~~~~~~
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The fixes involve the following changes:
- Unexport functions which are not utilized out of the file
* print_path_column()
* parse_reflink_range()
* btrfs_list_setup_print_column()
* device_get_partition_size_sysfs()
* max_zone_append_size()
- Include related headers before implementing the function
* change-uuid.c
* convert-bgt.c
* seed.h
- Add missing headers caused by the above header changes
* include <uuid/uuid.h> for tune/tune.h.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We were using this in cmds/restore.c, however it only does anything if
path->reada is set, and we don't set that in cmds/restore.c. Remove
this usage of reada_for_search and make the function static.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The in-kernel version of read_tree_block adds some extra sanity checks
to make sure we don't return blocks that don't match what we expect.
This includes the owning root, the level, and the expected first key.
We don't actually do these checks in btrfs-progs, however kernel code
we're going to sync will expect this calling convention, so update it to
match the in-kernel code and then update all the callers.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel we pass in the root_id for btrfs_free_tree_block instead
of the root itself. Update the btrfs-progs version of the helper to
match what we do in the kernel.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is a mirror of the change I've done in the kernel, but in progs
it's even more simply because clean_tree_block was just a wrapper around
clear_extent_buffer_dirty. Change this to btrfs_clear_buffer_dirty, and
then update all the callers to use this helper instead of
clean_tree_block and clear_extent_buffer_dirty.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is how btrfs_alloc_tree_block is defined in the kernel, so when we
go to sync this code in it'll be easier if we're already setup to accept
this argument. Since we're in progs we don't care about nesting so just
use BTRFS_NORMAL_NESTING everywhere, as we sync in the kernel code it'll
get updated to whatever is appropriate.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is in keeping with what the function actually does, and is named
this way in the kernel.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is a bit larger than the previous syncs, because we use
extent_io_tree's everywhere. There's a lot of stuff added to
kerncompat.h, and then I went through and cleaned up all the API
changes, which were
- extent_io_tree_init takes an fs_info and an owner now.
- extent_io_tree_cleanup is now extent_io_tree_release.
- set_extent_dirty takes a gfpmask.
- clear_extent_dirty takes a cached_state.
- find_first_extent_bit takes a cached_state.
The diffstat looks insane for this, but keep in mind extent-io-tree.c
and extent-io-tree.h are ~2000 loc just by themselves.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch syncs file-item.h into btrfs-progs. This carries with it an
API change for btrfs_del_csums, which takes a root argument in the
kernel, so all callsites have been updated accordingly.
I didn't sync file-item.c because it carries with it a bunch of bio
related helpers which are difficult to adapt to the kernel.
Additionally there's a few helpers in the local copy of file-item.c that
aren't in the kernel that are required for different tools.
This requires more cleanups in both the kernel and progs in order to
sync file-item.c, so for now just do file-item.h in order to pull things
out of ctree.h.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This syncs accessors.[ch] from the kernel. For the most part
accessors.h will remain the same, there's just some helpers that need to
be adjusted for eb->data instead of eb->pages. Additionally accessors.c
needed to be completely updated to deal with this as well.
This is a set of files where we will likely only sync the header going
forward, and leave the C file in place as it needs to be specific to
btrfs-progs.
This forced a few "unrelated" changes
- Using btrfs_dir_item_ftype() instead of btrfs_dir_item_type(). This
is due to the encryption changes, and was simpler to just do in this
patch.
- Adjusting some of the print tree code to use the actual helpers and
not the btrfs-progs ones.
A local definition of static_assert is used to avoid compilation
failures on older gcc (< 9) where the 2nd parameter is mandatory.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We want to keep this file locally as we want to be uptodate with
upstream, so we can build btrfs-progs regardless of which kernel is
currently installed. Sync this with the upstream version and put it in
kernel-shared/uapi to maintain some semblance of where this file comes
from.
There are some changes that need to be synced back to kernel. A local
definition of static_assert is used to avoid compilation problems on gcc
(< 9) due to mandatory 2nd parameter.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that the libbtrfs stuff has it's own local copy of ctree.h and
ioctl.h, let's rename these qgroup struct members to match the kernel
names, this way it'll make it easier to sync the kernel code into
btrfs-progs.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While syncing messages.[ch] I had to back out the ASSERT() code in
kerncompat.h, which means we now rely on the kernel code for ASSERT().
In order to maintain some semblance of separation introduce UASSERT()
and use that in all the purely userspace code.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Running 'btrfs inspect tree-stats' on a mounted filesystem works though
it reads directly from block devices. This can lead to inconsistent
data, warnings, errors or or a crash. More checks could be added but at
least explain things in more detail.
Issue: #520
Signed-off-by: David Sterba <dsterba@suse.com>
Attempting to create a snapshot of subvolume with an active swapfile
prints the errno message corresponding to ETXTBSY but this is confusing
so change it to be more descriptove to:
ERROR: cannot snapshot '/hibernate': source subvolume contains an active swapfile (Text file busy)
Issue: #607
Signed-off-by: David Sterba <dsterba@suse.com>
The *64 interfaces, such as fstat64, off64_t, etc, are legacy interfaces
created at a time when 64-bit file support was still new. They are
generally exposed when defining a macro named _LARGEFILE64_SOURCE, as
e.g. the glibc docs[0] say.
The modern way to utilise largefile support, is to continue to use the
regular interfaces (off_t, fstat, ..), and define _FILE_OFFSET_BITS=64.
We already use the autoconf macro AC_SYS_LARGEFILE[1] which arranges this
and sets this macro for us. Therefore, we can utilise the non-64 names
without fear of breaking on 32-bit systems.
This fixes the build against musl libc, ever since musl dropped the
*64 compat from interfaces by default[2] just for _GNU_SOURCE, unless
_LARGEFILE64_SOURCE is defined. However, there are plans for a future
removal of the whole *64 header API, and that workaround (adding another
define) might cease to exist.
So, rename all *64 API use to the regular non-suffixed names. For
consistency, rename the internal functions that were *64 named
(lstat64_path, ..) too.
This should have no regressions on any platform.
[0]: https://www.gnu.org/software/libc/manual/html_node/Feature-Test-Macros.html#index-_005fLARGEFILE64_005fSOURCE
[1]: https://www.gnu.org/software/autoconf/manual/autoconf-2.67/html_node/System-Services.html
[2]: 25e6fee27f
Pull-request: #615
Signed-off-by: psykose <alice@ayaya.dev>
Signed-off-by: David Sterba <dsterba@suse.com>
The commit 6f7151f499 extended the set of recognized valid subcommands
for the old path syntax but wrongly checks for more than 2 parameters.
That way a shortened and valid new syntax is not recognized (here 'can'
is short for 'cancel' and the short form is not in the list):
btrfs-progs-6.1.3
btrfs bal can /
ERROR: balance cancel on '/' failed: Not in progress
btrfs-progs-6.2.2
btrfs bal can /
WARNING: deprecated syntax, please use 'btrfs balance start'
ERROR: cannot access 'can': No such file or directory
Issue: #612
Fixes: 6f7151f499 ("btrfs-progs: balance: fix some cases wrongly parsed as old syntax")
Signed-off-by: David Sterba <dsterba@suse.com>
For older kernels, the sysfs interface providing the fsid may not be present yet.
Since 32c2e57c65 ("btrfs-progs: read fsid from the sysfs in
device_is_seed"), the fallback to the previous approach to determine
the fsid was not used anymore.
Ensure negative return values of sysfs_open_fsid_file are handled by
falling back to the dev_to_fsid in this case.
Pull-request: #599
Signed-off-by: David Sterba <dsterba@suse.com>
Deprecate old 'btrfs balance' syntax since new syntax has been
introduced in 2012. We will remove the old syntax completely in a few
releases.
Signed-off-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Some cases of 'btrfs balance' are wrongly parsed as old syntax.
$ btrfs balance status
ERROR: cannot access 'status': No such file or directory
Currently, only 'start' is successfully excluded in the check of old
syntax. Fix it by adding others in the check of old syntax.
Signed-off-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In Dry run the following error appeared and aborted execution:
ERROR: failed to access 'XYZ' to restore metadata/xattrs: No such file or directory
Skip the metadata and xattrs handling when dry run is enabled.
Signed-off-by: Holger Jakob <jakob@dsi.uni-stuttgart.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Restore was only setting xattrs on files but ignored directories.
Signed-off-by: Holger Jakob <jakob@dsi.uni-stuttgart.de>
Signed-off-by: David Sterba <dsterba@suse.com>
On a 32bit host the split qgroupid is wrong due to the way the numbers
are passed to the formatter as variable length arguments. The level is
u16, promoted to int and then parsed as u64. This means that the values
are shifted and some stack data are printed instead.
Example error messages from yast2-bootloader:
SystemCmd.cc(addLine):569 Adding Line 7 " "qgroupid": "21474836480/23885859321282560","
The value 21474836480 = 0x5000000 is 0x5 shifted by 32 bits,
23885859321282560 is 0x54dc1000000000 and shifting by 32 does not
lead to a valid value which should be 0 in this case.
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1209136
Signed-off-by: David Sterba <dsterba@suse.com>
The tabular output prints the same value for all columns:
# btrfs device stats /srv/btrfs-data
[/dev/sdc1].write_io_errs 0
[/dev/sdc1].read_io_errs 0
[/dev/sdc1].flush_io_errs 0
[/dev/sdc1].corruption_errs 0
[/dev/sdc1].generation_errs 0
[/dev/sdb1].write_io_errs 7489899
[/dev/sdb1].read_io_errs 3751023
[/dev/sdb1].flush_io_errs 117
[/dev/sdb1].corruption_errs 68
[/dev/sdb1].generation_errs 25
# btrfs device stats -T /srv/btrfs-data
Id Path Write errors Read errors Flush errors Corruption errors Generation errors
-- --------- ------------ ----------- ------------ ----------------- -----------------
1 /dev/sdc1 0 0 0 0 0
2 /dev/sdb1 25 25 25 25 25
The table_printf has a fixed list of columns and should not iterate over
them. Only check if some of the value is set and return error.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=217045
Issue: #585
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Currently cli/009 test case failed with different exit number:
====== RUN CHECK /home/adam/btrfs-progs/btrfstune --help
usage: btrfstune [options] device
[...]
failed: /home/adam/btrfs-progs/btrfstune --help
test failed for case 009-btrfstune
[CAUSE]
In tune/main.c, we have the following call on usage():
static void print_usage(int ret)
{
usage(&tune_cmd);
exit(ret);
}
However usage() itself would always call exit(1):
void usage(const struct cmd_struct *cmd)
{
usage_command_usagestr(cmd->usagestr, NULL, 0, true, true);
exit(1);
}
This makes prevents any caller of usage() to modify its exit number.
[FIX]
Add a new argument @error for print_usage(), so we can properly return 0
for -h/--help usage.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The kernel commit a26d60dedf9a ("btrfs: sysfs: add devinfo/fsid to
retrieve actual fsid from the device") introduced a sysfs interface
to access the device's fsid from the userspace. This is a more
reliable method to obtain the fsid compared to reading the
superblock, and it even works if the device is not present.
Additionally, this sysfs interface can be read by non-root users.
Therefore, it is recommended to utilize this new sysfs interface to
retrieve the fsid.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
load_device_info() checks if the device is a seed device by reading
superblock::fsid and comparing it with the mount fsid, and it fails
to work if the device is missing (a RAID1 seed fs). Move this part
of the code into a new helper function device_is_seed() in
preparation to make device_is_seed() work with the new sysfs
devinfo/<devid>/fsid interface.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add option --uuid with same semantics that is provided by command
'mkswap'. By default a random UUID is generated, to not set any use
'btrfs filesystem mkswapfile -U clear swapfile'.
Issue: #581
Signed-off-by: David Sterba <dsterba@suse.com>
The CI build on musl warns:
[CC] cmds/balance.o
In file included from cmds/inspect.c:19:
/usr/include/sys/fcntl.h:1:2: warning: #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h> [-Wcpp]
1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>
| ^~~~~~~
On glibc the header directly includes fcntl.h.
Signed-off-by: David Sterba <dsterba@suse.com>
[WARNING]
Clang 15.0.7 warns about several unused variables:
kernel-shared/zoned.c:829:6: warning: variable 'num_sequential' set but not used [-Wunused-but-set-variable]
u32 num_sequential = 0, num_conventional = 0;
^
cmds/scrub.c:1174:6: warning: variable 'n_skip' set but not used [-Wunused-but-set-variable]
int n_skip = 0;
^
mkfs/main.c:493:6: warning: variable 'total_block_count' set but not used [-Wunused-but-set-variable]
u64 total_block_count = 0;
^
image/main.c:2246:6: warning: variable 'bytenr' set but not used [-Wunused-but-set-variable]
u64 bytenr = 0;
^
[CAUSE]
Most of them are just straightforward set but not used variables.
The only exception is total_block_count, which has commented out code
relying on it.
[FIX]
Just remove those variables, and for @total_block_count, also remove the
comments.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[FALSE ALERT]
Unlike gcc, clang doesn't really understand the comments, thus it's
reportings tons of fall through related errors:
cmds/reflink.c:124:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
case 'r':
^
cmds/reflink.c:124:3: note: insert '__attribute__((fallthrough));' to silence this warning
case 'r':
^
__attribute__((fallthrough));
cmds/reflink.c:124:3: note: insert 'break;' to avoid fall-through
case 'r':
^
break;
[CAUSE]
Although gcc is fine with /* fallthrough */ comments, clang is not.
[FIX]
So just introduce a fallthrough macro to handle the situation properly,
and use that macro instead.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Before decompressing, we zero out the content of the entire output buffer,
so that we don't get any garbage after the last byte of data. We do this
for all compression algorithms. However zstd, at least with libzstd 1.5.2
on Debian (version 1.5.2+dfsg-1), the decompression routine can end up
touching the content of the output buffer beyond the last valid byte of
decompressed data, resulting in a corruption.
Example reproducer:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdj
MNT=/mnt/sdj
rm -f /tmp/send.stream
umount $DEV &> /dev/null
mkfs.btrfs -f $DEV &> /dev/null || echo "MKFS failed!"
mount -o compress=zstd $DEV $MNT
# File foo is not sector size aligned, 127K.
xfs_io -f -c "pwrite -S 0xab 0 3" \
-c "pwrite -S 0xcd 3 130042" \
-c "pwrite -S 0xef 130045 3" $MNT/foo
# Now do an fallocate that increases the size of foo from 127K to 128K.
xfs_io -c "falloc 0 128K " $MNT/foo
btrfs subvolume snapshot -r $MNT $MNT/snap
btrfs send --compressed-data -f /tmp/send.stream $MNT/snap
echo -e "\nFile data in the original filesystem:\n"
od -A d -t x1 $MNT/snap/foo
umount $MNT
mkfs.btrfs -f $DEV &> /dev/null || echo "MKFS failed!"
mount $DEV $MNT
btrfs receive --force-decompress -f /tmp/send.stream $MNT
echo -e "\nFile data in the new filesystem:\n"
od -A d -t x1 $MNT/snap/foo
umount $MNT
Running the reproducer gives:
$ ./test.sh
(...)
File data in the original filesystem:
0000000 ab ab ab cd cd cd cd cd cd cd cd cd cd cd cd cd
0000016 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
*
0130032 cd cd cd cd cd cd cd cd cd cd cd cd cd ef ef ef
0130048 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0131072
At subvol snap
File data in the new filesystem:
0000000 ab ab ab cd cd cd cd cd cd cd cd cd cd cd cd cd
0000016 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
*
0130032 cd cd cd cd cd cd cd cd cd cd cd cd cd ef ef ef
0130048 cd cd cd cd 00 00 00 00 00 00 00 00 00 00 00 00
0130064 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0131072
The are 4 bytes with a value of 0xcd instead of 0x00, at file offset
127K (130048).
Fix this by explicitly zeroing out the part of the output buffer that was
not used after decompressing with zstd.
The decompression of compressed extents, sent when using the send v2
stream, happens in the following cases:
1) By explicitly passing --force-decompress to the receive command, as in
the reproducer above;
2) Calling the BTRFS_IOC_ENCODED_WRITE ioctl failed with -ENOTTY, meaning
the kernel on the receiving side is old and does not implement that
ioctl;
3) Calling the BTRFS_IOC_ENCODED_WRITE ioctl failed with -ENOSPC;
4) Calling the BTRFS_IOC_ENCODED_WRITE ioctl failed with -EINVAL.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
By default print how many subvolumes are considered for checks, either
found or specified on the command line. Once a subvolume is removed from
the list print the progress from the total count.
$ btrfs subvolume sync /path
Waiting for 130 subvolumes
Subvolume id 256 is gone (1/130)
Subvolume id 257 is gone (2/130)
...
Subvolume id 384 is gone (129/130)
Subvolume id 385 is gone (130/130)
Signed-off-by: David Sterba <dsterba@suse.com>
Per user report on https://old.reddit.com/r/btrfs/comments/107fnw1/btrfs_filesystem_mkswapfile_results_in_an/
the swapfile header does not contain the correct number of pages that
matches the file size and the activated swapfile is only 1GiB:
# btrfs filesystem mkswapfile -s 10g swapfile
# swapon swapfile
# cat /proc/swaps
Filename Type Size Used Priority
/swap/swapfile file 1048572 0 -2
A workaround is to run 'mkswap swapfile' before activation. Proper fix
is to calculate the number of (fixed size) 4K pages available for the
swap.
Issue: #568
Signed-off-by: David Sterba <dsterba@suse.com>
In default build there's a warning (reported by CI) that the
experimental list-chunk command and related functions are not used, so
add the condition there as well.
In file included from cmds/inspect.c:45:
./cmds/commands.h:67:26: warning: 'cmd_struct_inspect_list_chunks' defined but not used [-Wunused-const-variable=]
#define __CMD_NAME(name) cmd_struct_ ##name
^~~~~~~~~~~
./cmds/commands.h:73:26: note: in expansion of macro '__CMD_NAME'
const struct cmd_struct __CMD_NAME(name) = \
^~~~~~~~~~
./cmds/commands.h:88:2: note: in expansion of macro 'DEFINE_COMMAND'
DEFINE_COMMAND(name, token, cmd_ ##name, \
^~~~~~~~~~~~~~
cmds/inspect.c:1115:8: note: in expansion of macro 'DEFINE_SIMPLE_COMMAND'
static DEFINE_SIMPLE_COMMAND(inspect_list_chunks, "list-chunks");
^~~~~~~~~~~~~~~~~~~~~
Signed-off-by: David Sterba <dsterba@suse.com>
This reverts commit 03451430de.
(It's not 1:1, there are some additional trivial fixups in cmds/qgroup.c)
This breaks a lot of 3rd party tools that depend on it as Neal reports:
* btrfs-assistant
* buildah
* cri-o
* podman
* skopeo
* containerd
* moby/docker
* snapper
* source-to-image
Link: https://lore.kernel.org/linux-btrfs/CAEg-Je8L7jieKdoWoZBuBZ6RdXwvwrx04AB0fOZF1fr5Pb-o1g@mail.gmail.com/
Reported-by: Neal Gompa <ngompa@fedoraproject.org>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Since commit d729048be6 ("btrfs-progs: stop using
btrfs_root_item_v0"), "btrfs subvolume list -u" not longer correctly
reports UUID nor timestamp, while older (btrfs-progs v6.0.2) still works
correctly:
v6.0.2:
# btrfs subv list -u /mnt/btrfs/
ID 256 gen 12 top level 5 uuid ed4af580-d512-2644-b392-2a71aaeeb99e path subv1
ID 257 gen 13 top level 5 uuid a22ccba7-0a0a-a94f-af4b-5116ab58bb61 path subv2
v6.1:
# ./btrfs subv list -u /mnt/btrfs/
ID 256 gen 12 top level 5 uuid - path subv1
ID 257 gen 13 top level 5 uuid - path subv2
[CAUSE]
Commit d729048be6 ("btrfs-progs: stop using btrfs_root_item_v0")
removed old btrfs_root_item_v0, but incorrectly changed the check for
v0 root item.
Now we will treat v0 root items as latest root items, causing possible
out-of-bound access, while treating current root items as older v0 root
items, ignoring the UUID nor timestamp.
[FIX]
Fix the bug by using correct checks, and add extra comments on the
branches.
Issue: #562
Fixes: d729048be6 ("btrfs-progs: stop using btrfs_root_item_v0")
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add initial reflink group with example command 'clone' to test the
interface. Work in progress, experimental build needed.
Issue: #396
Signed-off-by: David Sterba <dsterba@suse.com>
Verify if a given file is suitable for a swapfile and print the physical
offset (ie. the ultimate on-device physical offset), and the resume
offset value (physical / page size).
This can be the kernel parameter or written to /sys/power/resume_offset
before hibernation. Option -r or --resume-offset prints just the value.
Copied and simplified from Omar Sandoval's tool to print extents:
https://github.com/osandov/osandov-linux/blob/master/scripts/btrfs_map_physical.c
Issue: #544
Issue: #533
Signed-off-by: David Sterba <dsterba@suse.com>
New command 'btrfs inspect-internal list-chunks' will list layout of
chunks as stored on the devices. This corresponds to the physical
layout, sorted by the physical offset. The block group usage can be
shown as well, but the search is too slow so it's off by default.
If the physical offset sorting is selected, the empty space between
chunks is also shown.
Signed-off-by: David Sterba <dsterba@suse.com>
This is a useless helper, the csum is always the first thing in the
header, simply read from offset 0.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch copies in compression.h from the kernel. This is relatively
straightforward, we just have to drop the compression types definition
from ctree.h, and update the image to use BTRFS_NR_COMPRESS_TYPES
instead of BTRFS_COMPRESS_LAST, and add a few things to kerncompat.h to
make everything build smoothly.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This isn't defined in the kernel, we simply check if the root item size
is less than btrfs_root_item, so adjust the user of btrfs_root_item_v0
to make a similar check.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
BTRFS_IOCTL_DEV_REPLACE_RESULT_NO_RESULT is defined to make sure we
differentiate internal errors from actual error codes that come back
from the device replace ioctl. Take this out of ioctl.c and move it
into replace.c.
Though it's in public header ioctl.h, there are no users for that so it
can be moved.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We return __u16 in the kernel, as this is actually the size of
btrfs_qgroup_level. Adjust the existing helpers and update all the
callers to deal with the new size appropriately. This will make syncing
the kernel code cleaner.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We're going to sync the kernel source into btrfs-progs, and in the
kernel we have all these qgroup fields named with short names instead of
the full name, so rename
referenced -> rfer
compressed -> cmpr
exclusive -> excl
to match the kernel and update all the users, this will make the sync
cleaner.
ioctl.h is a public header but there are no users of the
btrfs_qgroup_limit structure.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When attempting an encoded write, if it fails for some specific reason
like -EINVAL (when an offset is not sector size aligned) or -ENOSPC, we
then fallback into decompressing the data and writing it using regular
buffered IO. This logic however is not correct, one of the reasons is
that it assumes the encoded offset is smaller than the unencoded file
length and that they can be compared, but one is an offset and the other
is a length, not an end offset, so they can't be compared to get correct
results. This bad logic will often result in not copying all data, or even
no data at all, resulting in a silent data loss. This is easily seen in
with the following reproducer:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdj
MNT=/mnt/sdj
umount $DEV &> /dev/null
mkfs.btrfs -f $DEV > /dev/null
mount -o compress $DEV $MNT
# File foo has a size of 33K, not aligned to the sector size.
xfs_io -f -c "pwrite -S 0xab 0 33K" $MNT/foo
xfs_io -f -c "pwrite -S 0xcd 0 64K" $MNT/bar
# Now clone the first 32K of file bar into foo at offset 0.
xfs_io -c "reflink $MNT/bar 0 0 32K" $MNT/foo
# Snapshot the default subvolume and create a full send stream (v2).
btrfs subvolume snapshot -r $MNT $MNT/snap
btrfs send --compressed-data -f /tmp/test.send $MNT/snap
echo -e "\nFile bar in the original filesystem:"
od -A d -t x1 $MNT/snap/bar
umount $MNT
mkfs.btrfs -f $DEV > /dev/null
mount $DEV $MNT
echo -e "\nReceiving stream in a new filesystem..."
btrfs receive -f /tmp/test.send $MNT
echo -e "\nFile bar in the new filesystem:"
od -A d -t x1 $MNT/snap/bar
umount $MNT
Running the test without this patch:
$ ./test.sh
(...)
File bar in the original filesystem:
0000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
*
0065536
Receiving stream in a new filesystem...
At subvol snap
File bar in the new filesystem:
0000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
*
0033792
We end up with file bar having less data, and a smaller size, than in the
original filesystem.
This happens because when processing file bar, send issues the following
commands:
clone bar - source=foo source offset=0 offset=0 length=32768
write bar - offset=32768 length=1024
encoded_write bar - offset=33792, len=4096, unencoded_offset=33792, unencoded_file_len=31744, unencoded_len=65536, compression=1, encryption=0
The first 32K are cloned from file foo, as that file ranged is shared
between the files.
Then there's a regular write operation for the file range [32K, 33K),
since file foo has different data from bar for that file range.
Finally for the remainder of file bar, the send side issues an encoded
write since the extent is compressed in the source filesystem, for the
file offset 33792 (33K), remaining 31K of data. The receiver will try the
encoded write, but that fails with -EINVAL since the offset 33K is not
sector size aligned, so it will fallback to decompressing the data and
writing it using regular buffered writes. However that results in doing
no writes at decompress_and_write() because 'pos' is initialized to the
value of 33K (unencoded_offset) and unencoded_file_len is 31K, so the
while loop has no iterations.
Another case where we can fallback to decompression plus regular buffered
writes is when the destination filesystem has a sector size larger then
the sector size of the source filesystem (for example when the source
filesystem is on x86_64 with a 4K sector size and the destination
filesystem is PowerPC with a 64K sector size). In that scenario encoded
write attempts will fail with -EINVAL due to offsets not being aligned
with the sector size of the destination filesystem, and the receive will
attempt the fallback of decompressing the buffer and writing the
decompressed using regular buffered IO.
Fix this by tracking the number of written bytes instead, and increment
it, and the unencoded offset, after each write.
Fixes: d20e759fc9 ("btrfs-progs: receive: encoded_write fallback to explicit decode and write")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Unlike for commands from the v1 stream, we have no debug messages logged
when processing fallocate commands, which makes it harder to debug issues.
So add log messages, when the log verbosity level is >= 3, for fallocate
commands, mentioning the value of all fields.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Unlike for commands from the v1 stream, we have no debug messages logged
when processing encoded write commands, which makes it harder to debug
issues.
So add log messages, when the log verbosity level is >= 3, for encoded
write commands, mentioning the value of all fields and also log when we
fallback from an encoded write to the decompress and write approach.
The log messages look like this:
encoded_write f3 - offset=33792, len=4096, unencoded_offset=33792, unencoded_file_len=31744, unencoded_len=65536, compression=1, encryption=0
encoded_write f3 - falling back to decompress and write due to errno 22 ("Invalid argument")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Defrag should not print the filenames by default, this got accidentally
changed in v6.0. Do a workaround that restores the original behaviour,
ie. no filenames and print them with -v, either as global or local
option. Proper fix is not to initialize with BTRFS_BCONF_UNSET and only
adjust the levels by -v/-q options.
Issue: #540
Signed-off-by: David Sterba <dsterba@suse.com>
Currently fileattr commands, introduced in the send stream v2, always
fail, since we have commented the FS_IOC_SETFLAGS ioctl() call and set
'ret' to -EOPNOTSUPP, which is then overwritten to -errno, which may
have a random value since it was not initialized before. This results
in a failure like this:
ERROR: fileattr: set file attributes on p0/f1 failed: Invalid argument
The error reason may be something else, since errno is undefined at
this point.
Unfortunately we don't have a way yet to apply attributes, since the
attributes value we get from the kernel is what we store in flags field
of the inode item. This means that for example we can not just call
FS_IOC_SETFLAGS with the values we got, since they need to be converted
from BTRFS_INODE_* flags to FS_* flags
Besides that we'll have to reorder how we apply certain attributes like
FS_NOCOW_FL for example, which must happen always on an empty file and
right now we run write commands before attempting to change attributes,
as that's the order the kernel sends the operations.
So for now comment all the code, so that anyone using the v2 stream will
not have a receive failure but will get a behaviour like the v1 stream:
file attributes are ignored. This will have to be fixed later, but right
now people can't use a send stream v2 for the purpose of getting better
performance by avoid decompressing extents at the source and compression
of the data at the destination.
Link: https://lore.kernel.org/linux-btrfs/6cb11fa5-c60d-e65b-0295-301a694e66ad@inbox.ru/
Fixes: 8356c423e619 ("btrfs-progs: receive: implement FILEATTR command")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This reverts commit 55438f3930b23fdebeeda7b93ec00414c64a1361.
The patch breaks resize cancel.
Reproducer:
#!/bin/bash
fallocate -l 7g /var/tmp/7g1 && fallocate -l 7g /var/tmp/7g2
thing1=$(sudo losetup --show -f /var/tmp/7g1)
thing2=$(sudo losetup --show -f /var/tmp/7g2)
echo Make the fs
mkfs.btrfs -L test539 $thing1
mkdir test539
mount -L test539 test539
echo Get rid of devid:1 by adding a new device and removing the original
btrfs dev add $thing2 test539
btrfs dev del $thing1 test539
echo Creating wiggleroom
fallocate -l 3g test539/3g1 && fallocate -l 3g test539/3g2
rm test539/3g1
echo Start a resize operation and wait 3s to run a cancel
echo Under 6.0 cancel, under 6.0.1 no cancel and runs out of space
btrfs fi re 2:-4g test539 &
sleep 3s && btrfs fi re cancel test539
wait
echo Cleanup
umount test539
losetup -d $thing1 && losetup -d $thing2
rm /var/tmp/7g{1,2}
rmdir test539
Issue: #539
Signed-off-by: David Sterba <dsterba@suse.com>
Add a command to create a new swapfile. The same can be achieved by
seandalone tools but they're just wrappers around the syscalls. The swap
format is simple enough to be created directly without mkswap command so
the swapfile can be created in one go.
The file must not exist before, this is to avoid problems with file
attributes or any other effects of existing extents. This also means the
command can't be used on block devices.
Default size is 2G, minimum size is 40KiB.
Signed-off-by: David Sterba <dsterba@suse.com>
Kernel function name is btrfs_qgroup_subvolid so rename it in progs. The
libbtrfs can't API be changed without versioning so at least add the new
helper.
Signed-off-by: David Sterba <dsterba@suse.com>
A stale qgroup is level 0 and without a corresponding subvolume. There's
no convenient command for removing them and kernel does not remove them
automatically. Add a command so users don't have to parse and script the
output and/or delete them manually.
Signed-off-by: David Sterba <dsterba@suse.com>
Use more human readable column description and adjust the width. Use a
single "-" for an empty value as is done elsewhere too.
Sample output:
Qgroupid Referenced Exclusive Path
-------- ---------- --------- ----
0/5 16.00KiB 16.00KiB <toplevel>
0/256 16.00KiB 16.00KiB subv1
0/257 16.00KiB 16.00KiB <stale>
0/258 16.00KiB 16.00KiB dir1/subv3
0/259 16.00KiB 16.00KiB snap1
1/1 16.00KiB 16.00KiB <0 member qgroups>
Signed-off-by: David Sterba <dsterba@suse.com>
There are two column name definitions, one for sorting and one for more
human readable format but it was not used for some reason.
Signed-off-by: David Sterba <dsterba@suse.com>
Convert fputs and printf to message helpers that respect the verbosity
levels.
- print <stale> instead of <missing> for qgroups without a corresponding
subvolume after it was deleted
- print <toplevel> for toplevel
- for higher level qgroups print the number of member groups, 0 if empty
and not a special string
- drop the <FS_ROOT>
- print paths relative to toplevel path, like subvolume list does by
default
Signed-off-by: David Sterba <dsterba@suse.com>
Previous patch optionally printed the path but it would be better to
print it by default, so drop the option and verbosity. This is a
separate change as the original change was from an old pull request and
it was ported without significant changes first.
Signed-off-by: David Sterba <dsterba@suse.com>
The 'btrfs qgroup show' command currently only prints qgroup IDs,
forcing the user to resolve which subvolume each corresponds to.
Adds subvolume path resolution to 'qgroup show' so that when
the -P option is used, the last column contains the pathname of
the root of the subvolume it describes. In the case of nested
qgroups, it will show the number of member qgroups or the paths
of the members if the -v option is used.
Path can also be used as a sort parameter.
Sample output:
qgroupid rfer excl path
-------- ---- ---- ----
0/5 16.00KiB 16.00KiB <FS_ROOT>
0/256 16.00KiB 16.00KiB <FS_ROOT>/subv1
0/257 16.00KiB 16.00KiB <missing>
0/258 16.00KiB 16.00KiB <FS_ROOT>/subv3
0/259 16.00KiB 16.00KiB <FS_ROOT>/snap1
Pull-request: #139
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Adds a new options -W and --wait-norescan to wait for a rescan without
starting a new operation. This is useful for things like fstests where
we want do to do a "btrfs quota enable" and not continue until the
subsequent rescan has finished.
In addition to documenting the new option in the man page, clean up the
rescan entry to document the -w option a bit better.
Pull-request: #139
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The message could be confusing in case there's no send in progress and
the real reason is lack of permissions when deleting a subvolume.
Mention the permissions as first reason. Also update documentation.
Signed-off-by: David Sterba <dsterba@suse.com>
check_resize_args() function checks user argument amount but does not
return the correct value in case it's not valid.
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When 'btrfs send --proto 2', the max buffer in kernel is changed from
BTRFS_SEND_BUF_SIZE_V1(SZ_64K) to (SZ_16K + BTRFS_MAX_COMPRESSED).
The performance is improved when we use the same buffer size in
btrfs-progs:
without this patch: 57.96s
with this patch: 48.44s
Bigger buffer size 512K was tested too, but it did not improve protocol
2 over 1 significantly.
Signed-off-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Replace fprintf(stderr, ...) by the level-aware helper instead of the
explicit verbosity level checks. No change for commands that don't have
the global -q/-v options, otherwise the output can be quieted.
Signed-off-by: David Sterba <dsterba@suse.com>
Replace fprintf(stderr, ...) by the level-aware helper. No change for
commands that don't have the global -q/-v options, otherwise the output
can be quieted.
Signed-off-by: David Sterba <dsterba@suse.com>
Replace fprintf(stderr, ...) by the level-aware helper. No change for
commands that don't have the global -q/-v options, otherwise the output
can be quieted.
Signed-off-by: David Sterba <dsterba@suse.com>
Replace printing to stderr and stdout by the level-aware helper. No
change for commands that don't have the global -q/-v options, otherwise
the output can be quieted.
Signed-off-by: David Sterba <dsterba@suse.com>
Replace fprintf(stderr, ...) by the level-aware helper. No change for
commands that don't have the global -q/-v options, otherwise the output
can be quieted.
Signed-off-by: David Sterba <dsterba@suse.com>