We all know there's some dark and scary corners with RAID5/6, but users
may not know. Add a warning message in mkfs so anybody trying to use
this will know things can go very wrong.
Issue: #265
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ reword message ]
Signed-off-by: David Sterba <dsterba@suse.com>
Some tests report that decompressing the image failed, which did not
fail the test but could lead to wrong errors in case the image is not
overwritten and leaves some old state. Use --force parameter.
[TEST] fuzz-tests.sh
[TEST/fuzz] 001-simple-check-unmounted
xz: btrfs-progs/tests/fuzz-tests/images/bko-97021-invalid-chunk-sectorsize.raw: File exists
failed to decompress image btrfs-progs/tests/fuzz-tests/images/bko-97021-invalid-chunk-sectorsize.raw.xz
[TEST/fuzz] 002-simple-image
xz: btrfs-progs/tests/fuzz-tests/images/bko-97021-invalid-chunk-sectorsize.raw: File exists
failed to decompress image btrfs-progs/tests/fuzz-tests/images/bko-97021-invalid-chunk-sectorsize.raw.xz
Signed-off-by: David Sterba <dsterba@suse.com>
Add scripts that can be used to build docker images and executed from
inside docker containers to verify build or run the testsuite.
Some tweaks are needed at each step to make things work.
- docker-build - build the image
- docker-run - run the default command (test-build)
- run-tests - run the testsuite
Signed-off-by: David Sterba <dsterba@suse.com>
The ci/test-build script unconditionally downloads the latest devel
snapshot. This is not practical for local development. Add a conditional
check for a file named devel.tar.gz, either it's missing or empty, then
download.
The empty file is also considered because this allows to use a docker
image that does not support conditional contents, so a stub file is a
fallback.
Signed-off-by: David Sterba <dsterba@suse.com>
In 5.10 the convert gained support for extended inode time precision,
but this is not available on older distros and breaks build. Add a
configure-time check for the EXT4_EPOCH_MASK macro and add a stub in
case it's not detected.
This means that the 64bit timestamps will not be transferred from the
original filesystem in such environment, at least a warning is printed.
Issue: #344
Signed-off-by: David Sterba <dsterba@suse.com>
Line continuations and not simple "\n" for the json output, this got
inherited to the plain text output, but this is not necessary.
This also caused problems in fstests btrfs/006 where the extra newline
does not match the golden output and the test fails, when printing
device stats that now use the output formatter.
Change the plain text formatting to always expect that a fmt_print or a
manual line print (like is for the device stats) will append the newline
and remove it from the end of formatting.
Link: https://lore.kernel.org/linux-btrfs/CAL3q7H4b7QhL02aSOpN0-k_9P2EAbj1t+NkA6VwidKEg4S996w@mail.gmail.com
Reported-by: Filipe Manana <fdmanana@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This aligns the man page with the usage output of the tool, the notation
in the help text could be confusing as it reads like -r and -i are
mutually exclusive.
Signed-off-by: Christian Amsüss <chrysn@fsfe.org>
Signed-off-by: David Sterba <dsterba@suse.com>
When replace starts with no-background and fails for the reason that
a BTRFS_FS_EXCL_OP is in progress, we still return the value 0 and also
leak the target device open, because in cmd_replace_start() we missed
the goto leave_with_error for this error.
So the test case btrfs/064 in its seqres.full output reports...
Replacing /dev/sdf with /dev/sdc
ERROR: /dev/sdc is mounted
instead of...
Replacing /dev/sdc with /dev/sdf
ERROR: ioctl(DEV_REPLACE_START) '/mnt/scratch': add/delete/balance/replace/resize operation in progress
for the failed replace attempts in the test case
Fix it by jumping to the error label which also fixes the leaked open
device.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The current mount detection code in btrfs receive is not quite perfect.
For example, suppose /tmp is mounted as a tmpfs. In that case,
btrfs receive /tmp2 will find /tmp as the longest mount that matches a
prefix of /tmp2 and blow up because it is not a btrfs filesystem, even
if /tmp2 is just a directory in / mounted as btrfs.
Fix this by replacing the substring check with a dirname recursion to
only check the directories in the path of the dir, rather than every
substring.
Add a new test for this case.
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a new subcommand 'btrfs rescue create-control-device' that creates
/dev/btrfs-control. This is helpful on systems that may not have `mknod`
installed and the device node is missing for some reason.
Issue: #223
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
[ update docs ]
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new function, check_test_results(), for
misc/fsck/convert/mkfs test cases.
This function is currently to catch warning message for subpage support,
but can be later expanded for other usages.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For the incoming subpage support, there is a new requirement for tree
blocks. Tree blocks should not cross 64K page boundary.
For current btrfs-progs and kernel, there shouldn't be any causes to
create such tree blocks. But still, we want to detect such tree blocks
in the wild before subpage support fully lands in upstream.
This patch will add such check for both lowmem and original mode.
Currently it's just a warning, since there aren't many users using 64K
page size yet.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When the btrfs_read_fs_root() function is searching a ROOT_ITEM with
location key offset other than -1, it currently fails via BUG_ON.
The offset can have other value than -1, though. This can happen for
example if a subvolume is renamed:
$ btrfs subvolume create X && sync
Create subvolume './X'
$ btrfs inspect-internal dump-tree /dev/root | grep -B 2 'name: X$
location key (270 ROOT_ITEM 18446744073709551615) type DIR
transid 283 data_len 0 name_len 1
name: X
$ mv X Y && sync
$ btrfs inspect-internal dump-tree /dev/root | grep -B 2 'name: Y$
location key (270 ROOT_ITEM 0) type DIR
transid 285 data_len 0 name_len 1
name: Y
As can be seen the offset changed from -1ULL to 0.
Do not fail in this case.
Signed-off-by: Marek Behún <marek.behun@nic.cz>
CC: Qu Wenruo <wqu@suse.com>
CC: Tom Rini <trini@konsulko.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_open_dir already has a check whether the passed path is a
directory and if so it returns a specific error code (-3) when such an
error occurs. Use this instead of open-coding the directory check. To
avoid regression in cli/003 test also move directory checks before fs
type in btrfs_open.
Output before this check:
ERROR: resize works on mounted filesystems and accepts only
directories as argument. Passing file containing a btrfs image
would resize the underlying filesystem instead of the image.
After:
ERROR: not a directory: /root/btrfs-progs/tests/test.img
ERROR: resize works on mounted filesystems and accepts only
directories as argument. Passing file containing a btrfs image
would resize the underlying filesystem instead of the image.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a test case which ensures that when resize is tried on an image
instead of a directory appropriate warning is produced and the command
fails.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This reverts commit 61ecaff036.
The libmount functionality is not used anymore, we can remove it
entirely.
Signed-off-by: David Sterba <dsterba@suse.com>
Partial revert of 922eaa7b54 ("btrfs-progs: build: fix linking with
static libmount"), remove the necessary workarounds like the weak
symbols and link time warnings. Symbols renamed not to clash with
libmount (parse_size, canonicalize_path) haven't been reverted because
the new names are acceptable.
Signed-off-by: David Sterba <dsterba@suse.com>
This reverts commit 98a88aec64.
The libmount dependency will be dropped so remove it from the
documentation as well.
Signed-off-by: David Sterba <dsterba@suse.com>
In commit 57cfe29e69 ("btrfs-progs: utils: introduce
find_mount_fsroot") the entries in /proc/self/mountinfo are parsed by a
convenience library libmount, because getmntent does not provide the
information we need to distinguish bind mounts.
Using libmount turned out to be problematic in several ways:
- static build got broken due to clashing symbols, eg. for parsing size
or path canonicalization (#333)
- long-term distros do not have libmount new enough (2.24+) to provide
some functions (mnt_table_is_empty, #334)
- libmount internally uses getgrnam_r/mnt_get_uid/... that are not
static-build friendly, a warning is printed during link time for each
binary; we don't use any of the functions
- libmount has further library dependencies that we don't need:
$ ldd /usr/lib64/libmount.so.1
linux-vdso.so.1 (0x00007fff4f175000)
libc.so.6 => /lib64/libc.so.6 (0x00007f44a1763000)
libblkid.so.1 => /usr/lib64/libblkid.so.1 (0x00007f44a1730000)
libselinux.so.1 => /usr/lib64/libselinux.so.1 (0x00007f44a1704000)
/lib64/ld-linux-x86-64.so.2 (0x00007f44a1998000)
libpcre.so.1 => /usr/lib64/libpcre.so.1 (0x00007f44a166c000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f44a1666000)
namely selinux, pcre and dl.
Summing it up, libmount causes more trouble than it's worth using a
convenience library, we want to keep the dependencies minimal so the
custom mountinfo parser was inevitable.
Issue: #333
Issue: #334
Issue: #336
Signed-off-by: David Sterba <dsterba@suse.com>
We had a few bugs on the kernel side of send/receive where capabilities
ended up being lost after receiving a send stream. They all stem from the
fact that the kernel used to send all xattrs before issuing the chown
command, and the later clears any existing capabilities in a file or
directory.
Initially a workaround was added to btrfs-progs' receive command, in commit
123a2a0850 ("btrfs-progs: receive: restore capabilities after chown"),
and that fixed some instances of the problem. More recently, other instances
of the problem were found, a proper fix for the kernel was made, which fixes
the root problem by making send always emit the setxattr command for setting
capabilities after issuing a chown command. This was done in kernel commit
89efda52e6b693 ("btrfs: send: emit file capabilities after chown"), which
landed in kernel 5.8.
However, the workaround on the receive command now causes us to incorrectly
set a capability on a file that should not have it, because it assumes all
setxattr commands for a file always comes before a chown.
Example reproducer:
$ cat send-caps.sh
#!/bin/bash
DEV1=/dev/sdh
DEV2=/dev/sdi
MNT1=/mnt/sdh
MNT2=/mnt/sdi
mkfs.btrfs -f $DEV1 > /dev/null
mkfs.btrfs -f $DEV2 > /dev/null
mount $DEV1 $MNT1
mount $DEV2 $MNT2
touch $MNT1/foo
touch $MNT1/bar
setcap cap_net_raw=p $MNT1/foo
btrfs subvolume snapshot -r $MNT1 $MNT1/snap1
btrfs send $MNT1/snap1 | btrfs receive $MNT2
echo
echo "capabilities on destination filesystem:"
echo
getcap $MNT2/snap1/foo
getcap $MNT2/snap1/bar
umount $MNT1
umount $MNT2
When running the test script, we can see that both files foo and bar get
the capability set, when only file foo should have it:
$ ./send-caps.sh
Create a readonly snapshot of '/mnt/sdh' in '/mnt/sdh/snap1'
At subvol /mnt/sdh/snap1
At subvol snap1
capabilities on destination filesystem:
/mnt/sdi/snap1/foo cap_net_raw=p
/mnt/sdi/snap1/bar cap_net_raw=p
Since the kernel fix was backported to all currently supported stable
releases (5.10.x, 5.4.x, 4.19.x, 4.14.x, 4.9.x and 4.4.x), remove the
workaround from receive. Having such a workaround relying on the order
of commands in a send stream is always troublesome and doomed to break
one day.
A test case for fstests will come soon.
Issue: #85
Issue: #202
Issue: #292
Reported-by: Richard Brown <rbrown@suse.de>
Reviewed-by: Su Yue <l@damenly.su>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When the test environment is in 'docker', there's some delay before the
device mapper nodes appear in /dev/mapper. Add a delay before the test
continues, usually 1 second was enough but give it more just in case.
Add ad fallback to skip the test if the device node does not show up,
this is a problem in the running environment and the testsuite should
continue.
Signed-off-by: David Sterba <dsterba@suse.com>
Add some randomization to the long device name in case the testsuite
runs multiple times on the same host.
Signed-off-by: David Sterba <dsterba@suse.com>
In some environments the which utility might not be available and the
shell builtin 'type -p' is readily available.
Signed-off-by: David Sterba <dsterba@suse.com>
Add the specific libmount version that has the required functions,
though it still fails to build on CentOS 7 and similar. The libmount
dependency could be dropped in the future but for now at least document
it.
Issue: #334
Signed-off-by: David Sterba <dsterba@suse.com>
That scrub works only on a mounted filesystem is not clear in connection
with the possibility to start scrub on a given device. Update the manual
page and mention the mount requirement where approrpriate.
Issue: #335
Signed-off-by: David Sterba <dsterba@suse.com>
Testing the statically built binaries is not straightforward, add a
convenient way to do that:
$ make TEST_FLAVOR=static
There should be no difference in the test results.
Signed-off-by: David Sterba <dsterba@suse.com>
The libmount dependency has been added in commit 61ecaff036
("btrfs-progs: build: add libmount dependency"), and static build got
broken. There are functions that do basically the same thing and also
share the name, which in turn fails at link time.
ld: /../lib64/libmount.a(libcommon_la-canonicalize.o): in function `canonicalize_dm_name':
util-linux-2.34/lib/canonicalize.c:58: multiple definition of `canonicalize_dm_name';
common/path-utils.static.o:btrfs-progs/common/path-utils.c:286: first defined here
In case the collision can be resolved by renaming, it's done
(canonicalize_path and parse_size). There are 2 symbols from selinux
that are substituted by a weak aliases during the static build.
There's one new warning due to use of getgrnam_r in libmount that
depends on dynamic linking and may not work properly with static build.
We're not using the related functions directly or indirectly, so it
should be safe to ignore the warnings.
ld: ../lib64/libmount.a(la-utils.o): in function `mnt_get_gid':
util-linux-2.34/libmount/src/utils.c:625: warning: Using 'getgrnam_r' in statically linked applications
+requires at runtime the shared libraries from the glibc version used for linking
Issue: #333
Signed-off-by: David Sterba <dsterba@suse.com>
The jobs has been failing for some time due the time limit 1h:
+ qemu-system-x86_64 -m 512 -nographic -kernel /repo/bzImage -drive
file=/repo/qemu-image.img,index=0,media=disk,format=raw -fsdev
local,id=btrfs-progs,path=/repo,security_model=mapped -device
virtio-9p-pci,fsdev=btrfs-progs,mount_tag=btrfs-progs -append
'console=tty1 root=/dev/sda rw'
main-loop: WARNING: I/O thread spun for 1000 iterations
ERROR: Job failed: execution took longer than 1h0m0s seconds
We'd still like to use the qemu test as it could pull the recent
development kernel that the base image does not provide. However the
overall performance is too bad and it does not make sense to waste
GitLab resources.
Also remove the build status badge from README as it changed at some
point and does not render as a SVG image anymore.
Issue: #171
Signed-off-by: David Sterba <dsterba@suse.com>
The id 0 of the default subvolume is an internal alias for the toplevel
fs tree, kernel does that conversion. Until 2116398b1d ("btrfs-progs:
use libbtrfsutil for set-default") there was no manual conversion and
the value was passed to kernel as-is. With the switch to the
libbtrfsutil API this got broken (4.19).
$ btrfs subvol set-default 0 /path
In this case the default subvolume would be containing subvolume of
/path instead of the toplevel one.
Fix it by manually switching the 0 to 5 in case user specifies that to
avoid the difference in the API, that we can't change.
Issue: #327
Reported-by: Chris Murphy
Signed-off-by: David Sterba <dsterba@suse.com>
There are cases where v1 free space cache is still left while user has
already enabled v2 cache. In that case, we still want to force v1 space
cache cleanup in btrfs-check.
This patch will only warn and not exit if v2 is detected while the user
asked to clear v1.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
To use optimized CRC implementation, the input buffer must be
unsigned long aligned. btrfs receive calculates checksum based on
read_buf, including btrfs_cmd_header (with zeroed CRC field)
and command content.
Reorder the buffer to the beginning of the structure and force the
alignment to 64, this should be cacheline friendly and could speed up
the data transfers.
Interesting parts from the report:
Sending host:
Fedora 33
AMD ThreadRipper 1920X - 128GB RAM
2x10GBit Ethernet, bonded
MegaRaid 9270
6x16TB Seagate Exos in RAID5
Receiving host:
Fedora 33
Intel i3-7300 - HT enabled - 32GB RAM
10GBit Ethernet, single connection
MegaRaid 9260
12x8TB WD NAS drives in RAID5
The 2 hosts are connected to the same 10G switch. The sender could definitely
saturate a 10GBit link. The practically achievable writes on the backup host
would be lower, but still at least 400MB/s. The file system contains mostly
large files of 1GB+, so there is little meta-data.
With btrfs send/receive I'm getting a steady transfer rate of 60MB/s. The copy
has been running for a little over 5 days now, having only transferred some
25TB. This is way too slow for this setup.
Analyzing resource usage, the sender side is fine, both the btrfs send and the
corresponding ssh process only use about 10-10% CPU, which on a 24 threaded
machine is virtually nothing. However, the receiver is running with a load of
~2.6, with the sshd using 30-50% CPU and the btrfs receive a further 60-70%.
The rest of the load comes from IO wait. So the bottleneck is the btrfs receive
clearly.
Issue: #324
Signed-off-by: Sheng Mao <shngmao@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
By using find_mount_fsroot we ensure that we return a valid path to the
final user, by ensuring that even if we return a bind mount, the
pathname of btrfs used was the same from the original mount.
This for a case when bind mounts and normal mount -o subvol=/path are
mixed.
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This new function checks for filesystem path name that was mounted, thus
being different from find_mount_root. By using libmount we can easily
parse /proc/self/mountinfo file and check for the pathname field.
The function is useful to filter bind mounts with content different from
the original mount, thus making it safe to assume that the reported path
can be accessed by the user, with the right content.
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>