When attempting an encoded write, if it fails for some specific reason
like -EINVAL (when an offset is not sector size aligned) or -ENOSPC, we
then fallback into decompressing the data and writing it using regular
buffered IO. This logic however is not correct, one of the reasons is
that it assumes the encoded offset is smaller than the unencoded file
length and that they can be compared, but one is an offset and the other
is a length, not an end offset, so they can't be compared to get correct
results. This bad logic will often result in not copying all data, or even
no data at all, resulting in a silent data loss. This is easily seen in
with the following reproducer:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdj
MNT=/mnt/sdj
umount $DEV &> /dev/null
mkfs.btrfs -f $DEV > /dev/null
mount -o compress $DEV $MNT
# File foo has a size of 33K, not aligned to the sector size.
xfs_io -f -c "pwrite -S 0xab 0 33K" $MNT/foo
xfs_io -f -c "pwrite -S 0xcd 0 64K" $MNT/bar
# Now clone the first 32K of file bar into foo at offset 0.
xfs_io -c "reflink $MNT/bar 0 0 32K" $MNT/foo
# Snapshot the default subvolume and create a full send stream (v2).
btrfs subvolume snapshot -r $MNT $MNT/snap
btrfs send --compressed-data -f /tmp/test.send $MNT/snap
echo -e "\nFile bar in the original filesystem:"
od -A d -t x1 $MNT/snap/bar
umount $MNT
mkfs.btrfs -f $DEV > /dev/null
mount $DEV $MNT
echo -e "\nReceiving stream in a new filesystem..."
btrfs receive -f /tmp/test.send $MNT
echo -e "\nFile bar in the new filesystem:"
od -A d -t x1 $MNT/snap/bar
umount $MNT
Running the test without this patch:
$ ./test.sh
(...)
File bar in the original filesystem:
0000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
*
0065536
Receiving stream in a new filesystem...
At subvol snap
File bar in the new filesystem:
0000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
*
0033792
We end up with file bar having less data, and a smaller size, than in the
original filesystem.
This happens because when processing file bar, send issues the following
commands:
clone bar - source=foo source offset=0 offset=0 length=32768
write bar - offset=32768 length=1024
encoded_write bar - offset=33792, len=4096, unencoded_offset=33792, unencoded_file_len=31744, unencoded_len=65536, compression=1, encryption=0
The first 32K are cloned from file foo, as that file ranged is shared
between the files.
Then there's a regular write operation for the file range [32K, 33K),
since file foo has different data from bar for that file range.
Finally for the remainder of file bar, the send side issues an encoded
write since the extent is compressed in the source filesystem, for the
file offset 33792 (33K), remaining 31K of data. The receiver will try the
encoded write, but that fails with -EINVAL since the offset 33K is not
sector size aligned, so it will fallback to decompressing the data and
writing it using regular buffered writes. However that results in doing
no writes at decompress_and_write() because 'pos' is initialized to the
value of 33K (unencoded_offset) and unencoded_file_len is 31K, so the
while loop has no iterations.
Another case where we can fallback to decompression plus regular buffered
writes is when the destination filesystem has a sector size larger then
the sector size of the source filesystem (for example when the source
filesystem is on x86_64 with a 4K sector size and the destination
filesystem is PowerPC with a 64K sector size). In that scenario encoded
write attempts will fail with -EINVAL due to offsets not being aligned
with the sector size of the destination filesystem, and the receive will
attempt the fallback of decompressing the buffer and writing the
decompressed using regular buffered IO.
Fix this by tracking the number of written bytes instead, and increment
it, and the unencoded offset, after each write.
Fixes: d20e759fc9 ("btrfs-progs: receive: encoded_write fallback to explicit decode and write")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Unlike for commands from the v1 stream, we have no debug messages logged
when processing fallocate commands, which makes it harder to debug issues.
So add log messages, when the log verbosity level is >= 3, for fallocate
commands, mentioning the value of all fields.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Unlike for commands from the v1 stream, we have no debug messages logged
when processing encoded write commands, which makes it harder to debug
issues.
So add log messages, when the log verbosity level is >= 3, for encoded
write commands, mentioning the value of all fields and also log when we
fallback from an encoded write to the decompress and write approach.
The log messages look like this:
encoded_write f3 - offset=33792, len=4096, unencoded_offset=33792, unencoded_file_len=31744, unencoded_len=65536, compression=1, encryption=0
encoded_write f3 - falling back to decompress and write due to errno 22 ("Invalid argument")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Defrag should not print the filenames by default, this got accidentally
changed in v6.0. Do a workaround that restores the original behaviour,
ie. no filenames and print them with -v, either as global or local
option. Proper fix is not to initialize with BTRFS_BCONF_UNSET and only
adjust the levels by -v/-q options.
Issue: #540
Signed-off-by: David Sterba <dsterba@suse.com>
Currently fileattr commands, introduced in the send stream v2, always
fail, since we have commented the FS_IOC_SETFLAGS ioctl() call and set
'ret' to -EOPNOTSUPP, which is then overwritten to -errno, which may
have a random value since it was not initialized before. This results
in a failure like this:
ERROR: fileattr: set file attributes on p0/f1 failed: Invalid argument
The error reason may be something else, since errno is undefined at
this point.
Unfortunately we don't have a way yet to apply attributes, since the
attributes value we get from the kernel is what we store in flags field
of the inode item. This means that for example we can not just call
FS_IOC_SETFLAGS with the values we got, since they need to be converted
from BTRFS_INODE_* flags to FS_* flags
Besides that we'll have to reorder how we apply certain attributes like
FS_NOCOW_FL for example, which must happen always on an empty file and
right now we run write commands before attempting to change attributes,
as that's the order the kernel sends the operations.
So for now comment all the code, so that anyone using the v2 stream will
not have a receive failure but will get a behaviour like the v1 stream:
file attributes are ignored. This will have to be fixed later, but right
now people can't use a send stream v2 for the purpose of getting better
performance by avoid decompressing extents at the source and compression
of the data at the destination.
Link: https://lore.kernel.org/linux-btrfs/6cb11fa5-c60d-e65b-0295-301a694e66ad@inbox.ru/
Fixes: 8356c423e6 ("btrfs-progs: receive: implement FILEATTR command")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This reverts commit 55438f3930.
The patch breaks resize cancel.
Reproducer:
#!/bin/bash
fallocate -l 7g /var/tmp/7g1 && fallocate -l 7g /var/tmp/7g2
thing1=$(sudo losetup --show -f /var/tmp/7g1)
thing2=$(sudo losetup --show -f /var/tmp/7g2)
echo Make the fs
mkfs.btrfs -L test539 $thing1
mkdir test539
mount -L test539 test539
echo Get rid of devid:1 by adding a new device and removing the original
btrfs dev add $thing2 test539
btrfs dev del $thing1 test539
echo Creating wiggleroom
fallocate -l 3g test539/3g1 && fallocate -l 3g test539/3g2
rm test539/3g1
echo Start a resize operation and wait 3s to run a cancel
echo Under 6.0 cancel, under 6.0.1 no cancel and runs out of space
btrfs fi re 2:-4g test539 &
sleep 3s && btrfs fi re cancel test539
wait
echo Cleanup
umount test539
losetup -d $thing1 && losetup -d $thing2
rm /var/tmp/7g{1,2}
rmdir test539
Issue: #539
Signed-off-by: David Sterba <dsterba@suse.com>
Add a command to create a new swapfile. The same can be achieved by
seandalone tools but they're just wrappers around the syscalls. The swap
format is simple enough to be created directly without mkswap command so
the swapfile can be created in one go.
The file must not exist before, this is to avoid problems with file
attributes or any other effects of existing extents. This also means the
command can't be used on block devices.
Default size is 2G, minimum size is 40KiB.
Signed-off-by: David Sterba <dsterba@suse.com>
Kernel function name is btrfs_qgroup_subvolid so rename it in progs. The
libbtrfs can't API be changed without versioning so at least add the new
helper.
Signed-off-by: David Sterba <dsterba@suse.com>
A stale qgroup is level 0 and without a corresponding subvolume. There's
no convenient command for removing them and kernel does not remove them
automatically. Add a command so users don't have to parse and script the
output and/or delete them manually.
Signed-off-by: David Sterba <dsterba@suse.com>
Use more human readable column description and adjust the width. Use a
single "-" for an empty value as is done elsewhere too.
Sample output:
Qgroupid Referenced Exclusive Path
-------- ---------- --------- ----
0/5 16.00KiB 16.00KiB <toplevel>
0/256 16.00KiB 16.00KiB subv1
0/257 16.00KiB 16.00KiB <stale>
0/258 16.00KiB 16.00KiB dir1/subv3
0/259 16.00KiB 16.00KiB snap1
1/1 16.00KiB 16.00KiB <0 member qgroups>
Signed-off-by: David Sterba <dsterba@suse.com>
There are two column name definitions, one for sorting and one for more
human readable format but it was not used for some reason.
Signed-off-by: David Sterba <dsterba@suse.com>
Convert fputs and printf to message helpers that respect the verbosity
levels.
- print <stale> instead of <missing> for qgroups without a corresponding
subvolume after it was deleted
- print <toplevel> for toplevel
- for higher level qgroups print the number of member groups, 0 if empty
and not a special string
- drop the <FS_ROOT>
- print paths relative to toplevel path, like subvolume list does by
default
Signed-off-by: David Sterba <dsterba@suse.com>
Previous patch optionally printed the path but it would be better to
print it by default, so drop the option and verbosity. This is a
separate change as the original change was from an old pull request and
it was ported without significant changes first.
Signed-off-by: David Sterba <dsterba@suse.com>
The 'btrfs qgroup show' command currently only prints qgroup IDs,
forcing the user to resolve which subvolume each corresponds to.
Adds subvolume path resolution to 'qgroup show' so that when
the -P option is used, the last column contains the pathname of
the root of the subvolume it describes. In the case of nested
qgroups, it will show the number of member qgroups or the paths
of the members if the -v option is used.
Path can also be used as a sort parameter.
Sample output:
qgroupid rfer excl path
-------- ---- ---- ----
0/5 16.00KiB 16.00KiB <FS_ROOT>
0/256 16.00KiB 16.00KiB <FS_ROOT>/subv1
0/257 16.00KiB 16.00KiB <missing>
0/258 16.00KiB 16.00KiB <FS_ROOT>/subv3
0/259 16.00KiB 16.00KiB <FS_ROOT>/snap1
Pull-request: #139
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Adds a new options -W and --wait-norescan to wait for a rescan without
starting a new operation. This is useful for things like fstests where
we want do to do a "btrfs quota enable" and not continue until the
subsequent rescan has finished.
In addition to documenting the new option in the man page, clean up the
rescan entry to document the -w option a bit better.
Pull-request: #139
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The message could be confusing in case there's no send in progress and
the real reason is lack of permissions when deleting a subvolume.
Mention the permissions as first reason. Also update documentation.
Signed-off-by: David Sterba <dsterba@suse.com>
check_resize_args() function checks user argument amount but does not
return the correct value in case it's not valid.
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When 'btrfs send --proto 2', the max buffer in kernel is changed from
BTRFS_SEND_BUF_SIZE_V1(SZ_64K) to (SZ_16K + BTRFS_MAX_COMPRESSED).
The performance is improved when we use the same buffer size in
btrfs-progs:
without this patch: 57.96s
with this patch: 48.44s
Bigger buffer size 512K was tested too, but it did not improve protocol
2 over 1 significantly.
Signed-off-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Replace fprintf(stderr, ...) by the level-aware helper instead of the
explicit verbosity level checks. No change for commands that don't have
the global -q/-v options, otherwise the output can be quieted.
Signed-off-by: David Sterba <dsterba@suse.com>
Replace fprintf(stderr, ...) by the level-aware helper. No change for
commands that don't have the global -q/-v options, otherwise the output
can be quieted.
Signed-off-by: David Sterba <dsterba@suse.com>
Replace fprintf(stderr, ...) by the level-aware helper. No change for
commands that don't have the global -q/-v options, otherwise the output
can be quieted.
Signed-off-by: David Sterba <dsterba@suse.com>
Replace printing to stderr and stdout by the level-aware helper. No
change for commands that don't have the global -q/-v options, otherwise
the output can be quieted.
Signed-off-by: David Sterba <dsterba@suse.com>
Replace fprintf(stderr, ...) by the level-aware helper. No change for
commands that don't have the global -q/-v options, otherwise the output
can be quieted.
Signed-off-by: David Sterba <dsterba@suse.com>
The message about inaccessible file is printed on stderr but it may be
missed in the output so use the helper for proper warning.
Signed-off-by: David Sterba <dsterba@suse.com>
The (unsigned long long) type casts can be dropped, printf understands
%llu and u64 and does not warn. In cases where the type is not u64 keep
the cast.
Signed-off-by: David Sterba <dsterba@suse.com>
Replace printf by the level-aware helper. No change for commands that
don't have the global -q/-v options, otherwise the output can be
quieted.
Signed-off-by: David Sterba <dsterba@suse.com>
Replace printf by the level-aware helper. No change for commands that
don't have the global -q/-v options, otherwise the output can be
quieted.
Signed-off-by: David Sterba <dsterba@suse.com>
Replace printf by the level-aware helper. No change for commands that
don't have the global -q/-v options, otherwise the output can be
quieted.
Signed-off-by: David Sterba <dsterba@suse.com>
Replace printf by the level-aware helper. No change for commands that
don't have the global -q/-v options, otherwise the output can be
quieted.
Signed-off-by: David Sterba <dsterba@suse.com>
Replace printf by the level-aware helper. No change for commands that
don't have the global -q/-v options, otherwise the output can be
quieted.
There's no change in qgroup.c yet as the output relies on return value
of the formatter and pr_verbose does not do that.
Signed-off-by: David Sterba <dsterba@suse.com>
Replace printf by the level-aware helper. No change for commands that
don't have the global -q/-v options, otherwise the output can be
quieted.
Signed-off-by: David Sterba <dsterba@suse.com>
Replace printf by the level-aware helper. No change for commands that
don't have the global -q/-v options, otherwise the output can be
quieted.
Signed-off-by: David Sterba <dsterba@suse.com>
Replace printf by the level-aware helper. No change for commands that
don't have the global -q/-v options, otherwise the output can be
quieted.
Signed-off-by: David Sterba <dsterba@suse.com>
Replace printf by the level-aware helper. No change for commands that
don't have the global -q/-v options, otherwise the output can be
quieted.
Signed-off-by: David Sterba <dsterba@suse.com>
Replace printf by the level-aware helper. No change for commands that
don't have the global -q/-v options, otherwise the output can be
quieted.
Signed-off-by: David Sterba <dsterba@suse.com>
Replace printf by the level-aware helper. No change for commands that
don't have the global -q/-v options, otherwise the output can be
quieted.
Signed-off-by: David Sterba <dsterba@suse.com>
To make the levels more understandable, use the LOG_ levels instead of
the hardcoded values. Previously the semantics would assume level 0 as
default and 1 and up for increased verbosity, so the LOG_ levels are
typically larger by one.
Signed-off-by: David Sterba <dsterba@suse.com>
Use LOG_DEFAULT message level for all commands where it currently uses
the LOG_ALWAYS level. There are now hardcoded values in many other calls
to pr_verbose and this will be updated in following patches.
Signed-off-by: David Sterba <dsterba@suse.com>
Switch the remaining use of assert() as it lacks the verbose assert that
we have for ASSERT (but otherwise is equivalent).
Signed-off-by: David Sterba <dsterba@suse.com>
Rename MUST_LOG Use a prefix LOG_ so we can add more levels, use it
where it was hardcoded as argument to pr_verbose.
Signed-off-by: David Sterba <dsterba@suse.com>
Process an enable_verity cmd by running the enable verity ioctl on the
file. Since enabling verity denies write access to the file, it is
important that we don't have any open write file descriptors.
This also revs the send stream format to version 3 with no format
changes besides the new commands and attributes. This version is not
finalized and commands may change, also this needs to be synchronized
with any kernel changes.
Note: the build is conditional on the header linux/fsverity.h
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Lots of code still uses fprintf(stderr, "...") that should be the
error() helper. The kernel-shared code is left out of the conversion for
now.
Signed-off-by: David Sterba <dsterba@suse.com>