Work in progress, experimental build. Listing command that utilizes
unprivileged ioctls to list subvolumes accessible and visible to the
user. If run as root it's using the search tree ioctl and has complete
information.
Reduced tabular output by default.
Issue: #515
Signed-off-by: David Sterba <dsterba@suse.com>
Add initial reflink group with example command 'clone' to test the
interface. Work in progress, experimental build needed.
Issue: #396
Signed-off-by: David Sterba <dsterba@suse.com>
Verify if a given file is suitable for a swapfile and print the physical
offset (ie. the ultimate on-device physical offset), and the resume
offset value (physical / page size).
This can be the kernel parameter or written to /sys/power/resume_offset
before hibernation. Option -r or --resume-offset prints just the value.
Copied and simplified from Omar Sandoval's tool to print extents:
https://github.com/osandov/osandov-linux/blob/master/scripts/btrfs_map_physical.c
Issue: #544
Issue: #533
Signed-off-by: David Sterba <dsterba@suse.com>
Track how many rows belong to a header (names and separators) so they
can be printed or cleared separately. The rest is body that could be
cleared and filled repeatedly. The ranged API allows to print any number
of rows if the table is filled partially.
Signed-off-by: David Sterba <dsterba@suse.com>
Cleanups are for integer types, prototypes and comments.
New functionality: spacing can be set after table allocation as
->spacing, now able to print 1 or 2 spaces between columns.
Signed-off-by: David Sterba <dsterba@suse.com>
New command 'btrfs inspect-internal list-chunks' will list layout of
chunks as stored on the devices. This corresponds to the physical
layout, sorted by the physical offset. The block group usage can be
shown as well, but the search is too slow so it's off by default.
If the physical offset sorting is selected, the empty space between
chunks is also shown.
Signed-off-by: David Sterba <dsterba@suse.com>
This is what we do in the kernel, and while we're syncing individual
files we're going to have state where some callers are using a const,
but progs isn't. So adjust write_extent_buffer to take a const eb in
order to make this less painful.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is a useless helper, the csum is always the first thing in the
header, simply read from offset 0.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We're using btrfs_item_nr_offset(leaf, 0) to get the start of the leaf
data in the kernel, we don't have btrfs_leaf_data. Replace all
occurrences of btrfs_leaf_data() with btrfs_item_nr_offset(leaf, 0) in
order to make syncing accessors.[ch] easier. ctree.c will be synced
later, so this is simply an intermediate step.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch copies in compression.h from the kernel. This is relatively
straightforward, we just have to drop the compression types definition
from ctree.h, and update the image to use BTRFS_NR_COMPRESS_TYPES
instead of BTRFS_COMPRESS_LAST, and add a few things to kerncompat.h to
make everything build smoothly.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have been overloading the extent_state flags for use on the extent
buffers as well. When we sync extent-io-tree.[ch] this will become
impossible, so rename these flags to EXTENT_BUFFER_* and use those
definitions instead of the extent_state definitions.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We used to store random private things into extent_states, but we
haven't done this for a while and there are no users of this code,
simply delete it.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have some extra features in the btrfs-progs copy of the
extent_io_tree that don't exist in the kernel. In order to make syncing
easier simply move this functionality into btrfs_fs_info, that way we
can sync in the new extent_io_tree code and not have to worry about
breaking anything.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We do not use the io_tree, don't bother passing it into
verify_parent_transid.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs-progs has a cache tree embedded in the extent_io_tree in order to
track extent buffers. We use the extent_io_tree part to track dirty,
and the cache tree to keep the extent buffers in. When we sync
extent-io-tree.[ch] we'll lose this ability, so separate out the dirty
tracking into its own extent_io_tree. Subsequent patches will adjust
the extent buffer lookup so it doesn't use the custom extent_io_tree
thing.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is a cleanup patch to make syncing the btrfs kernel code into
btrfs-progs easier. In btrfs-progs we have an extra cache in the
extent_io_tree that's exclusively used for the extent buffer tracking.
In order to untangle this dependency start passing around the fs_info to
search for extent_buffers, and then have the helpers use the appropriate
structure to find the extent buffer.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This isn't defined in the kernel, we simply check if the root item size
is less than btrfs_root_item, so adjust the user of btrfs_root_item_v0
to make a similar check.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We're going to sync btrfs.h into btrfs-progs from the kernel, however
libbtrfs still needs ioctl.h. To deal with this copy ioctl.h into
libbtrfs, and update that code to use the local copy and update the
libbtrfs headers list to use this copy.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This matches what we did in the kernel, btrfs_item_data_end is more
inline with what the helper does, which is give us the offset of the end
of the data portion of the item, not the offset of the end of the item
itself.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When we sync the kernel we're going to pull in the fs.h dependency,
which defines BLOCK_SIZE/BLOCK_MASK. Avoid this conflict by renaming
the image definitions with the IMAGE_ prefix.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
BTRFS_IOCTL_DEV_REPLACE_RESULT_NO_RESULT is defined to make sure we
differentiate internal errors from actual error codes that come back
from the device replace ioctl. Take this out of ioctl.c and move it
into replace.c.
Though it's in public header ioctl.h, there are no users for that so it
can be moved.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We return __u16 in the kernel, as this is actually the size of
btrfs_qgroup_level. Adjust the existing helpers and update all the
callers to deal with the new size appropriately. This will make syncing
the kernel code cleaner.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We're going to sync the kernel source into btrfs-progs, and in the
kernel we have all these qgroup fields named with short names instead of
the full name, so rename
referenced -> rfer
compressed -> cmpr
exclusive -> excl
to match the kernel and update all the users, this will make the sync
cleaner.
ioctl.h is a public header but there are no users of the
btrfs_qgroup_limit structure.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This doesn't really belong with the ioctl definitions, and when we sync
the ioctl definitions with the kernel this helper will go away, so
adjust this now.
The ioctl.h is a public API header but the helper is not used in any 3rd
party tool so we can safely move it.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The kernel switched to this recently, switch btrfs-progs to this as well
to avoid issues with syncing the kernel code.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We were not clearing the .o files for btrfs-convert as we had the wrong
directory, which meant I missed a compile error that happened when I was
messing with kernel-shared. Fix this by making sure we clear the .o
files for convert properly.
Fixes: 753baf2443 ("btrfs-progs: build: redirect dependency files files to .deps")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In converting some of our helpers to take new args I would miss some
locations because we don't stop on any warning, and I would miss the
warning in the scrollback.
Note: the Centos7 build is not yet warning-free so we can't enable
-Werror by default. It can be used as:
make EXTRA_CFLAGS=-Werror
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ add note ]
Signed-off-by: David Sterba <dsterba@suse.com>
When attempting an encoded write, if it fails for some specific reason
like -EINVAL (when an offset is not sector size aligned) or -ENOSPC, we
then fallback into decompressing the data and writing it using regular
buffered IO. This logic however is not correct, one of the reasons is
that it assumes the encoded offset is smaller than the unencoded file
length and that they can be compared, but one is an offset and the other
is a length, not an end offset, so they can't be compared to get correct
results. This bad logic will often result in not copying all data, or even
no data at all, resulting in a silent data loss. This is easily seen in
with the following reproducer:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdj
MNT=/mnt/sdj
umount $DEV &> /dev/null
mkfs.btrfs -f $DEV > /dev/null
mount -o compress $DEV $MNT
# File foo has a size of 33K, not aligned to the sector size.
xfs_io -f -c "pwrite -S 0xab 0 33K" $MNT/foo
xfs_io -f -c "pwrite -S 0xcd 0 64K" $MNT/bar
# Now clone the first 32K of file bar into foo at offset 0.
xfs_io -c "reflink $MNT/bar 0 0 32K" $MNT/foo
# Snapshot the default subvolume and create a full send stream (v2).
btrfs subvolume snapshot -r $MNT $MNT/snap
btrfs send --compressed-data -f /tmp/test.send $MNT/snap
echo -e "\nFile bar in the original filesystem:"
od -A d -t x1 $MNT/snap/bar
umount $MNT
mkfs.btrfs -f $DEV > /dev/null
mount $DEV $MNT
echo -e "\nReceiving stream in a new filesystem..."
btrfs receive -f /tmp/test.send $MNT
echo -e "\nFile bar in the new filesystem:"
od -A d -t x1 $MNT/snap/bar
umount $MNT
Running the test without this patch:
$ ./test.sh
(...)
File bar in the original filesystem:
0000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
*
0065536
Receiving stream in a new filesystem...
At subvol snap
File bar in the new filesystem:
0000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
*
0033792
We end up with file bar having less data, and a smaller size, than in the
original filesystem.
This happens because when processing file bar, send issues the following
commands:
clone bar - source=foo source offset=0 offset=0 length=32768
write bar - offset=32768 length=1024
encoded_write bar - offset=33792, len=4096, unencoded_offset=33792, unencoded_file_len=31744, unencoded_len=65536, compression=1, encryption=0
The first 32K are cloned from file foo, as that file ranged is shared
between the files.
Then there's a regular write operation for the file range [32K, 33K),
since file foo has different data from bar for that file range.
Finally for the remainder of file bar, the send side issues an encoded
write since the extent is compressed in the source filesystem, for the
file offset 33792 (33K), remaining 31K of data. The receiver will try the
encoded write, but that fails with -EINVAL since the offset 33K is not
sector size aligned, so it will fallback to decompressing the data and
writing it using regular buffered writes. However that results in doing
no writes at decompress_and_write() because 'pos' is initialized to the
value of 33K (unencoded_offset) and unencoded_file_len is 31K, so the
while loop has no iterations.
Another case where we can fallback to decompression plus regular buffered
writes is when the destination filesystem has a sector size larger then
the sector size of the source filesystem (for example when the source
filesystem is on x86_64 with a 4K sector size and the destination
filesystem is PowerPC with a 64K sector size). In that scenario encoded
write attempts will fail with -EINVAL due to offsets not being aligned
with the sector size of the destination filesystem, and the receive will
attempt the fallback of decompressing the buffer and writing the
decompressed using regular buffered IO.
Fix this by tracking the number of written bytes instead, and increment
it, and the unencoded offset, after each write.
Fixes: d20e759fc9 ("btrfs-progs: receive: encoded_write fallback to explicit decode and write")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Unlike for commands from the v1 stream, we have no debug messages logged
when processing fallocate commands, which makes it harder to debug issues.
So add log messages, when the log verbosity level is >= 3, for fallocate
commands, mentioning the value of all fields.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Unlike for commands from the v1 stream, we have no debug messages logged
when processing encoded write commands, which makes it harder to debug
issues.
So add log messages, when the log verbosity level is >= 3, for encoded
write commands, mentioning the value of all fields and also log when we
fallback from an encoded write to the decompress and write approach.
The log messages look like this:
encoded_write f3 - offset=33792, len=4096, unencoded_offset=33792, unencoded_file_len=31744, unencoded_len=65536, compression=1, encryption=0
encoded_write f3 - falling back to decompress and write due to errno 22 ("Invalid argument")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Defrag should not print the filenames by default, this got accidentally
changed in v6.0. Do a workaround that restores the original behaviour,
ie. no filenames and print them with -v, either as global or local
option. Proper fix is not to initialize with BTRFS_BCONF_UNSET and only
adjust the levels by -v/-q options.
Issue: #540
Signed-off-by: David Sterba <dsterba@suse.com>
The new test case will make sure btrfs check is fine checking a degraded
raid5 filesystem.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
For a degraded RAID5, btrfs check will fail to even read the chunk root:
# mkfs.btrfs -f -m raid5 -d raid5 $dev1 $dev2 $dev3
# wipefs -fa $dev1
# btrfs check $dev2
Opening filesystem to check...
warning, device 1 is missing
bad tree block 22036480, bytenr mismatch, want=22036480, have=0
ERROR: cannot read chunk root
ERROR: cannot open file system
[CAUSE]
Although read_tree_block() function from btrfs-progs is properly
iterating the mirrors (mirror 1 is reading from the disk directly,
mirror 2 will be rebuild from parity), the raid56 recovery path is not
handling the read error correctly.
The existing code will try to read the full stripe, but any read failure
(including missing device) will immediately cause an error:
for (i = 0; i < num_stripes; i++) {
ret = btrfs_pread(multi->stripes[i].dev->fd, pointers[i],
BTRFS_STRIPE_LEN, multi->stripes[i].physical,
fs_info->zoned);
if (ret < BTRFS_STRIPE_LEN) {
ret = -EIO;
goto out;
}
}
[FIX]
To make failed_a/failed_b calculation much easier, and properly handle
too many missing devices, here this patch will introduce a new bitmap
based solution.
The new @failed_stripe_bitmap will represent all the failed stripes.
So the initial read will mark all the missing devices in the
@failed_stripe_bitmap, and later operations will all operate on that
bitmap.
Only before we call raid56_recov(), we convert the bitmap to the old
failed_a/failed_b interface and continue.
Now btrfs check can handle above case properly:
# btrfs check $dev2
Opening filesystem to check...
warning, device 1 is missing
Checking filesystem on /dev/test/scratch2
UUID: 8b2e1cb4-f35b-4856-9b11-262d39d8458b
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 147456 bytes used, no error found
total csum bytes: 0
total tree bytes: 147456
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 139871
file data blocks allocated: 0
referenced 0
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently fileattr commands, introduced in the send stream v2, always
fail, since we have commented the FS_IOC_SETFLAGS ioctl() call and set
'ret' to -EOPNOTSUPP, which is then overwritten to -errno, which may
have a random value since it was not initialized before. This results
in a failure like this:
ERROR: fileattr: set file attributes on p0/f1 failed: Invalid argument
The error reason may be something else, since errno is undefined at
this point.
Unfortunately we don't have a way yet to apply attributes, since the
attributes value we get from the kernel is what we store in flags field
of the inode item. This means that for example we can not just call
FS_IOC_SETFLAGS with the values we got, since they need to be converted
from BTRFS_INODE_* flags to FS_* flags
Besides that we'll have to reorder how we apply certain attributes like
FS_NOCOW_FL for example, which must happen always on an empty file and
right now we run write commands before attempting to change attributes,
as that's the order the kernel sends the operations.
So for now comment all the code, so that anyone using the v2 stream will
not have a receive failure but will get a behaviour like the v1 stream:
file attributes are ignored. This will have to be fixed later, but right
now people can't use a send stream v2 for the purpose of getting better
performance by avoid decompressing extents at the source and compression
of the data at the destination.
Link: https://lore.kernel.org/linux-btrfs/6cb11fa5-c60d-e65b-0295-301a694e66ad@inbox.ru/
Fixes: 8356c423e6 ("btrfs-progs: receive: implement FILEATTR command")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We're trying to get a U32 for the attributes, but the kernel sends a U64
(which is correct as we store attributes in a u64 flags field of the
inode). This makes anyone trying to receive a v2 send stream to fail with:
ERROR: invalid size for attribute, expected = 4, got = 8
We actually recently got such a report of someone using send stream v2 and
getting such failure. See the Link tag below.
Link: https://lore.kernel.org/linux-btrfs/6cb11fa5-c60d-e65b-0295-301a694e66ad@inbox.ru/
Fixes: 7a6fb356dc ("btrfs-progs: receive: process setflags ioctl commands")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This reverts commit 55438f3930.
The patch breaks resize cancel.
Reproducer:
#!/bin/bash
fallocate -l 7g /var/tmp/7g1 && fallocate -l 7g /var/tmp/7g2
thing1=$(sudo losetup --show -f /var/tmp/7g1)
thing2=$(sudo losetup --show -f /var/tmp/7g2)
echo Make the fs
mkfs.btrfs -L test539 $thing1
mkdir test539
mount -L test539 test539
echo Get rid of devid:1 by adding a new device and removing the original
btrfs dev add $thing2 test539
btrfs dev del $thing1 test539
echo Creating wiggleroom
fallocate -l 3g test539/3g1 && fallocate -l 3g test539/3g2
rm test539/3g1
echo Start a resize operation and wait 3s to run a cancel
echo Under 6.0 cancel, under 6.0.1 no cancel and runs out of space
btrfs fi re 2:-4g test539 &
sleep 3s && btrfs fi re cancel test539
wait
echo Cleanup
umount test539
losetup -d $thing1 && losetup -d $thing2
rm /var/tmp/7g{1,2}
rmdir test539
Issue: #539
Signed-off-by: David Sterba <dsterba@suse.com>
Add a command to create a new swapfile. The same can be achieved by
seandalone tools but they're just wrappers around the syscalls. The swap
format is simple enough to be created directly without mkswap command so
the swapfile can be created in one go.
The file must not exist before, this is to avoid problems with file
attributes or any other effects of existing extents. This also means the
command can't be used on block devices.
Default size is 2G, minimum size is 40KiB.
Signed-off-by: David Sterba <dsterba@suse.com>
The send stream v2 is supported if the file exists and does not contain
"1". The previous fix reversed the condition but this does not work on
kernel with v1 only support.
Signed-off-by: David Sterba <dsterba@suse.com>
We want to notrun if this test fails, not if it succeeds. Additionally
we want -s, as -q will still print an error if it gets ENOENT from the
file we're trying to grep.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>