Add some randomization to the long device name in case the testsuite
runs multiple times on the same host.
Signed-off-by: David Sterba <dsterba@suse.com>
In some environments the which utility might not be available and the
shell builtin 'type -p' is readily available.
Signed-off-by: David Sterba <dsterba@suse.com>
Add the specific libmount version that has the required functions,
though it still fails to build on CentOS 7 and similar. The libmount
dependency could be dropped in the future but for now at least document
it.
Issue: #334
Signed-off-by: David Sterba <dsterba@suse.com>
That scrub works only on a mounted filesystem is not clear in connection
with the possibility to start scrub on a given device. Update the manual
page and mention the mount requirement where approrpriate.
Issue: #335
Signed-off-by: David Sterba <dsterba@suse.com>
Testing the statically built binaries is not straightforward, add a
convenient way to do that:
$ make TEST_FLAVOR=static
There should be no difference in the test results.
Signed-off-by: David Sterba <dsterba@suse.com>
The libmount dependency has been added in commit 61ecaff036
("btrfs-progs: build: add libmount dependency"), and static build got
broken. There are functions that do basically the same thing and also
share the name, which in turn fails at link time.
ld: /../lib64/libmount.a(libcommon_la-canonicalize.o): in function `canonicalize_dm_name':
util-linux-2.34/lib/canonicalize.c:58: multiple definition of `canonicalize_dm_name';
common/path-utils.static.o:btrfs-progs/common/path-utils.c:286: first defined here
In case the collision can be resolved by renaming, it's done
(canonicalize_path and parse_size). There are 2 symbols from selinux
that are substituted by a weak aliases during the static build.
There's one new warning due to use of getgrnam_r in libmount that
depends on dynamic linking and may not work properly with static build.
We're not using the related functions directly or indirectly, so it
should be safe to ignore the warnings.
ld: ../lib64/libmount.a(la-utils.o): in function `mnt_get_gid':
util-linux-2.34/libmount/src/utils.c:625: warning: Using 'getgrnam_r' in statically linked applications
+requires at runtime the shared libraries from the glibc version used for linking
Issue: #333
Signed-off-by: David Sterba <dsterba@suse.com>
The jobs has been failing for some time due the time limit 1h:
+ qemu-system-x86_64 -m 512 -nographic -kernel /repo/bzImage -drive
file=/repo/qemu-image.img,index=0,media=disk,format=raw -fsdev
local,id=btrfs-progs,path=/repo,security_model=mapped -device
virtio-9p-pci,fsdev=btrfs-progs,mount_tag=btrfs-progs -append
'console=tty1 root=/dev/sda rw'
main-loop: WARNING: I/O thread spun for 1000 iterations
ERROR: Job failed: execution took longer than 1h0m0s seconds
We'd still like to use the qemu test as it could pull the recent
development kernel that the base image does not provide. However the
overall performance is too bad and it does not make sense to waste
GitLab resources.
Also remove the build status badge from README as it changed at some
point and does not render as a SVG image anymore.
Issue: #171
Signed-off-by: David Sterba <dsterba@suse.com>
The id 0 of the default subvolume is an internal alias for the toplevel
fs tree, kernel does that conversion. Until 2116398b1d ("btrfs-progs:
use libbtrfsutil for set-default") there was no manual conversion and
the value was passed to kernel as-is. With the switch to the
libbtrfsutil API this got broken (4.19).
$ btrfs subvol set-default 0 /path
In this case the default subvolume would be containing subvolume of
/path instead of the toplevel one.
Fix it by manually switching the 0 to 5 in case user specifies that to
avoid the difference in the API, that we can't change.
Issue: #327
Reported-by: Chris Murphy
Signed-off-by: David Sterba <dsterba@suse.com>
There are cases where v1 free space cache is still left while user has
already enabled v2 cache. In that case, we still want to force v1 space
cache cleanup in btrfs-check.
This patch will only warn and not exit if v2 is detected while the user
asked to clear v1.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
To use optimized CRC implementation, the input buffer must be
unsigned long aligned. btrfs receive calculates checksum based on
read_buf, including btrfs_cmd_header (with zeroed CRC field)
and command content.
Reorder the buffer to the beginning of the structure and force the
alignment to 64, this should be cacheline friendly and could speed up
the data transfers.
Interesting parts from the report:
Sending host:
Fedora 33
AMD ThreadRipper 1920X - 128GB RAM
2x10GBit Ethernet, bonded
MegaRaid 9270
6x16TB Seagate Exos in RAID5
Receiving host:
Fedora 33
Intel i3-7300 - HT enabled - 32GB RAM
10GBit Ethernet, single connection
MegaRaid 9260
12x8TB WD NAS drives in RAID5
The 2 hosts are connected to the same 10G switch. The sender could definitely
saturate a 10GBit link. The practically achievable writes on the backup host
would be lower, but still at least 400MB/s. The file system contains mostly
large files of 1GB+, so there is little meta-data.
With btrfs send/receive I'm getting a steady transfer rate of 60MB/s. The copy
has been running for a little over 5 days now, having only transferred some
25TB. This is way too slow for this setup.
Analyzing resource usage, the sender side is fine, both the btrfs send and the
corresponding ssh process only use about 10-10% CPU, which on a 24 threaded
machine is virtually nothing. However, the receiver is running with a load of
~2.6, with the sshd using 30-50% CPU and the btrfs receive a further 60-70%.
The rest of the load comes from IO wait. So the bottleneck is the btrfs receive
clearly.
Issue: #324
Signed-off-by: Sheng Mao <shngmao@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
By using find_mount_fsroot we ensure that we return a valid path to the
final user, by ensuring that even if we return a bind mount, the
pathname of btrfs used was the same from the original mount.
This for a case when bind mounts and normal mount -o subvol=/path are
mixed.
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This new function checks for filesystem path name that was mounted, thus
being different from find_mount_root. By using libmount we can easily
parse /proc/self/mountinfo file and check for the pathname field.
The function is useful to filter bind mounts with content different from
the original mount, thus making it safe to assume that the reported path
can be accessed by the user, with the right content.
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
API provided by libmount allows to read various information from /proc
files about mount paths.
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The lowmem mode excludes all referenced blocks from the allocator in
order to avoid accidentally overwriting blocks while fixing the file
system. However for leaves it wouldn't exclude anything, it would just
pin them down, which gets cleaned up on transaction commit. We're safe
for the first modification, but subsequent modifications could blow up
in our face. Fix this by properly excluding leaves as well as all of
the nodes.
Reviewed-by: Su Yue <l@damenly.su>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently btrfs-convert only copies ext2 inode timestamps
i_[cma]time from ext4, while filling 0 to nsec and crtime fields.
This change copies nsec and crtime by parsing i_[cma]time_extra fields.
Author: Jiachen YANG <farseerfc@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a simple framework to exercise the json formatter and add testing
target that validates the output.
Run 'make test-json' to execute all available tests, requires 'jq'
utility for validation (https://github.com/stedolan/jq).
Signed-off-by: David Sterba <dsterba@suse.com>
In cases where the compiler does not initialize the formatter context to
all zeros, there could be garbage values left on the depth 0 that is not
explicitly initialized. This could lead to mistakenly printing a ","
separator before the last closing "}", like
{
"__header": {
"version": "1"
},
}
Signed-off-by: David Sterba <dsterba@suse.com>
The long options array for send is missing the zero terminator, so
unknown options result in a crash:
# btrfs send --foo
Segmentation fault (core dumped)
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a Makefile so tests can be run the same way from the standalone
testsuite as they are from git. The build dependencies are not checked
and the default path is for the system binaries.
Because of the path auto detection, running 'make' from the tests/
directory now works the same way as from the toplevel git directory.
Signed-off-by: David Sterba <dsterba@suse.com>
Pre-created image contains a subvolume and a snapshot so that cleaning
of multiple roots is also tested. The mount option 'inode_cache' will be
removed so we need the crafted image.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Inode cache feature is going to be removed in kernel 5.11. After this
kernel version items left on disk by this feature will take some extra
space. Testing showed that the size is actually negligible but for
completeness' sake give ability to users to remove such left-overs.
This is achieved by iterating every fs root and removing respective
items as well as relevant csum extents since the ino cache used the csum
tree for csums.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add support for json formatting. Switch hard coded printing code to
formatted print with output formatter. Json output would be useful for
other programs that parse output of the command.
The plain text format is not changed for backward compatibility but this
requires to do another switch by the output type.
Example text format:
device: /dev/vdb
devid 1
write_io_errs: 0
read_io_errs: 0
flush_io_errs: 0
corruption_errs: 0
generation_errs: 0
Example json format:
{
"__header": {
"version": "1"
},
"device-stats": [
{
"device": "/dev/vdb",
"devid": "1",
"write_io_errs": "0",
"read_io_errs": "0",
"flush_io_errs": "0",
"corruption_errs": "0",
"generation_errs": "0"
}
]
}
Issue: #291
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Extends fmt_print_start_group() so it can handle when name argument is
NULL. It is useful for printing unnamed array or map.
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While trying to run down a corruption problem I needed to use
btrfs-image to generate known good states in between tests. At some
point this started failing with
either extent tree is corrupted or deprecated extent ref format
This is because the fs had an extent item that was large enough that it
no longer had inline extent references, they were all keyed extent
references. The check is bogus, we can have extent items that are >=
the extent item size, not just > than the extent item size. Fix the
check so that we can generate metadata dumps properly.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While debugging a corruption problem I realized we don't spit out the
flags for nodes, which is needed when debugging relocation problems so
we know which nodes are the RELOC root items and which are the actual fs
tree's items. Fix this by unifying the header printing helper so both
leaf's and nodes get the same information printed out.
node 41070940160 level 1 items 34 free space 87 generation 7709536 owner ROOT_TREE
node 41070940160 flags 0x1(WRITTEN) backref revision 1
Same for leaves:
leaf 41070944256 items 12 free space 515 generation 7709536 owner ROOT_TREE
leaf 41070944256 flags 0x1(WRITTEN) backref revision 1
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While debugging some corruption, I got confused because it appeared as
if we had an invalid parent set on a extent reference, because of this
message:
tree backref 67014213632 parent 5 root 5 not found in extent tree
But it turns out that parent and the root are a union, and we were just
printing it out regardless of the type of backref it was. Fix the error
message to be consistent with the other mismatch messages, simply print
parent or root, depending on the ref type.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The two variants of unit options are not suitable for all commands, the
short options could interfere with existing options or limit future
extensions.
In 'filesystem du' the short options are not documented neither in help
text, nor in documentation so fix the code
In 'scrub status' it's the same but the documentation needs to be fixed
as well.
Signed-off-by: David Sterba <dsterba@suse.com>
The help text and documentation of the --rootid and --uuid parameters
is wrong as it does not say there's a required parameter. Add it and
enhance the docs to clarify what the options do.
Issue: #317
Signed-off-by: David Sterba <dsterba@suse.com>
User reported that 'btrfs subvolume show -u -- /mnt' causes double free.
Pointer subvol_path was freed in iterations but still keeps the old
value. In the last iteration, error BTRFS_UTIL_ERROR_STOP_ITERATION
returned, then the double free of subvol_path happens in the out goto
label.
Set subvol_path to NULL after each free() in the loop to fix the issue.
Issue: #317
Signed-off-by: Su Yue <l@damenly.su>
Signed-off-by: David Sterba <dsterba@suse.com>
The exclusive ops will not start if there's one already running. Now
that we have the sysfs export (since kernel 5.10) to check if there's
one already running, use it to allow enqueueing of the operations as a
convenience.
Supported enqueuing:
btrfs balance start --enqueue
btrfs filesystem resize --enqueue
btrfs device add --enqueue
btrfs device delete --enqueue
btrfs replace start --enqueue
This patch implements the functionality based on Goldwyn's patch
https://lore.kernel.org/linux-btrfs/?q=20200825150338.32610-4-rgoldwyn%40suse.de
but on top of previous preparatory patches.
Note that 'filesystem resize' options could confuse getopt as the
negative size change looks like a series of short options and there's no
way to make getopt ignore the short options, so there's a custom option
parser.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Add available space information from statfs(). This can be different from
'Free (estimated)' in some cases. This patch provide more information
about filesystem usage like below.
Overall:
Device size: 5.00GiB
Device allocated: 1.02GiB
Device unallocated: 3.98GiB
Device missing: 0.00B
Used: 88.00KiB
Free (estimated): 4.48GiB (min: 2.49GiB)
Free (statfs, df) 4.48GiB
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 832.00KiB (used: 0.00B)
Multiple profiles: no
Issue: #306
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The limitation is there since first commit implementing swapfiles support.
Pull-request: #315
Signed-off-by: Tomasz Torcz <tomek@pipebreaker.pl>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch will:
- Add a new test image for fsck/044
This new image has a corrupted extent item generation for tree block.
This image can expose a bug in original mode, which can't detect the
problem.
This image also utilize the tree block generation detection code,
which the existing image doesn't.
- Rename the existing image
To reflect the fact that the existing one is only for data extent.
- Remove the test.sh
So that the generic path will test both detection and repair.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is pretty much the same as for lowmem mode, it will try to reset
the extent item generation using either the tree block generation or
current transid.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In check_block(), we unconditionally reset extent_record::generation.
This is in fact correct, but this makes original mode fail to detect bad
extent item generation.
So change to behavior to set the generation if and only if the tree
block generation is higher.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is an internal report about bad extent item generation triggering
tree-checker.
This patch will add the repair ability to btrfs check --mode=lowmem
mode, by resetting the generation field of extent item.
Currently the correct generation for tree block is fetched from its
header, while for data extent it uses transid as fallback.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are several problems for current sectorsize check:
- No check at all for sectorsize
This means you can even specify "-s 62k".
- No way to specify sectorsize smaller than page size
Fix all these problems by:
- Introduce btrfs_check_sectorsize()
To do:
* power of 2 check for sectorsize
* lower and upper boundary check for sectorsize
* warn about sectorsize mismatch with page size
- Remove the max() between page size and sectorsize
This allows us to override the sectorsize for 64K page systems.
- Make nodesize calculation based on sectorsize
No need to use page size any more.
Users who specify sectorsize manually really know what they are doing,
and we have warned them already.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>