Now that LZO and ZSTD are optional for not just restore, rename the
build variables to a more generic name and update configure summary.
Signed-off-by: David Sterba <dsterba@suse.com>
There are build-time options for LZO and ZSTD support, the stream v2+
supports compression. The help text lists what has been compiled in,
similar to what 'restore' does, with a similar limitation that a stream
with compressed data cannot be processed if any of the extents is
compressed.
Signed-off-by: David Sterba <dsterba@suse.com>
First, add a --proto option to allow specifying the desired send
protocol version. It defaults to one, the original version. In a couple
of releases once people are aware that protocol revisions are happening,
we can change it to default to zero, which means the latest version
supported by the kernel. This is based on Dave Sterba's patch.
Also add a --compressed-data flag to instruct the kernel to use
encoded_write commands for compressed extents. This requires an explicit
opt in separate from the protocol version because:
1. The user may not want compression on the receiving side, or may want
a different compression algorithm/level on the receiving side.
2. It has a soft requirement for kernel support on the receiving side
(btrfs-progs can fall back to decompressing and writing if the kernel
doesn't support BTRFS_IOC_ENCODED_WRITE, but the user may not be
prepared to pay that CPU cost). Going forward, since it's easier to
update progs than the kernel, I think we'll want to make new send
features that require kernel support opt-in, whereas anything that
only requires a progs update can happen automatically.
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In send stream v2, send can emit a command for setting inode flags via
the setflags ioctl. Pass the flags attribute through to the ioctl call
in receive.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Send stream v2 can emit fallocate commands, so receive must support them
as well. The implementation simply passes along the arguments to the
syscall. Note that mode is encoded as a u32 in send stream but fallocate
takes an int, so there is a unsigned->signed conversion there.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
An encoded_write can fail if the file system it is being applied to does
not support encoded writes or if it can't find enough contiguous space
to accommodate the encoded extent. In those cases, we can likely still
process an encoded_write by explicitly decoding the data and doing a
normal write.
Add the necessary fallback path for decoding data compressed with zlib,
lzo, or zstd. zlib and zstd have reusable decoding context data
structures which we cache in the receive context so that we don't have
to recreate them on every encoded_write.
Finally, add a command line flag for force-decompress which causes
receive to always use the fallback path rather than first attempting the
encoded write.
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a new btrfs_send_op and support for both dumping and proper receive
processing which does actual encoded writes.
Encoded writes are only allowed on a file descriptor opened with an
extra flag that allows encoded writes, so we also add support for this
flag when opening or reusing a file for writing.
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Show the list of supported compression algorithms in the help string as
we now have optional LZO and ZSTD.
Signed-off-by: David Sterba <dsterba@suse.com>
LZO as a compression format is pretty archaic these days, there are
better algorithms in all metrics for compression and decompression, and
lzo hasn't had a new release since 2017.
Add an option to disable LZO (defaulting to enabled), and respect it in
cmds/restore.c.
NOTE: disabling support for LZO will make make it impossible to restore
data from filesystems where the compression has ever been used. It's not
recommended to build without the support in general.
Signed-off-by: Ross Burton <ross.burton@arm.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The function read_extent_from_disk() is only a wrapper to read tree
block.
And read_extent_data() is just a while loop to eliminate short read
caused by stripe boundary.
In fact, a lot of call sites of read_extent_data() are either reading
metadata (thus no possible short read) or doing extra loop by
themselves.
This patch will replace those two functions with read_data_from_disk(),
making it the only entrance for data/metadata read.
And update read_data_from_disk() to return the read bytes, so caller can
do a simple while loop.
For the few callers of read_extent_data(), open-code a small while loop
for them.
This will allow later RAID56 read repair using P/Q much easier.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently, if user specifies value 'no' or 'none' on the command line,
it gets translated to an empty value that is passed to kernel. There was
a change in kernel 5.14 done by commit 5548c8c6f55b ("btrfs: props:
change how empty value is interpreted") that changes the behaviour
in that case.
The empty value is supposed to mean 'the default value' for any
property. For compression there is a need to distinguish resetting the
value and also setting the NOCOMPRESS property. The translation to empty
value makes that impossible.
The explanation and behaviour copied from the kernel patch:
Old behaviour:
$ lsattr file
---------------------- file
# the NOCOMPRESS bit is set
$ btrfs prop set file compression ''
$ lsattr file
---------------------m file
This is equivalent to 'btrfs prop set file compression no' in current
btrfs-progs as the 'no' or 'none' values are translated to an empty
string.
This is where the new behaviour is different: empty string drops the
compression flag (-c) and nocompress (-m):
$ lsattr file
---------------------- file
# No change
$ btrfs prop set file compression ''
$ lsattr file
---------------------- file
$ btrfs prop set file compression lzo
$ lsattr file
--------c------------- file
$ btrfs prop get file compression
compression=lzo
$ btrfs prop set file compression ''
# Reset to the initial state
$ lsattr file
---------------------- file
# Set NOCOMPRESS bit
$ btrfs prop set file compression no
$ lsattr file
---------------------m file
This obviously brings problems with backward compatibility, so this
patch should not be backported without making sure the updated
btrfs-progs are also used and that scripts have been updated to use the
new semantics.
Summary:
- old kernel:
no, none, "" - set NOCOMPRESS bit
- new kernel:
no, none - set NOCOMPRESS bit
"" - drop all compression flags, ie. COMPRESS and NOCOMPRESS
Signed-off-by: Li Zhang <zhanglikernel@gmail.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Add the appropriate support to the print tree and dump tree code to spit
out the block group tree.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that all callers are using the _nr variations we can simply rename
these helpers to btrfs_item_##member/btrfs_set_item_##member and change
the actual item SETGET funcs to raw_item_##member/set_raw_item_##member
and then change all callers to drop the _nr part.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This matches how the kernel does it, simply pass in the slot and fix up
btrfs_file_extent_inline_item_len to use the btrfs_item_nr() helper and
the correct define. Fixup all the callers to use the slot now instead
of passing in the btrfs_item.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
I started hitting a segfault on fuzz test 006 because we couldn't find
the extent root. This is because the global root search stuff expects
the actual key to be setup properly, not just an objectid. Fix this by
initializing the key properly so we can find the extent root and other
trees properly.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
A snapshot could be created in an existing directory, explain the
difference in the command line help options.
Pull-request: #117
Author: Howard <hwj@BridgeportContractor.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The kernel uses 'unsigned long' for u64 specifically for ppc64 and
mips64.
Remove asm/types.h include as it will get included properly later.
Fixe -Wformat warnings.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The search for default subvolume could fail for two reasons, the lack of
CAP_SYS_ADMIN for TREE_SEARCH ioctl is one but the default subvolume
could be unset as well, thus no restrictions for deletion.
Signed-off-by: David Sterba <dsterba@suse.com>
Checking the default subvolume uses TREE_SEARCH which is a CAP_SYS_ADMIN
only operation, and thus will fail when unprivileged, even if we have
permissions to actually delete the subvolume.
This produces a warning even if all is ok. Let's hide it if we're not
root (root but !CAP is odd enough to warn).
Fixes 87804a3f06 ("btrfs-progs: subvolume: check deleting default subvolume")
Link: https://bugs.debian.org/998840
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: David Sterba <dsterba@suse.com>
Pointer returned from get_parent needs additional handling otherwise
we could return an error and then try to free it. Reset the pointer when
the error occurs so the cleanup is always done on a valid pointer.
Issue: #423
Signed-off-by: David Sterba <dsterba@suse.com>
The function autodetect_object_types() tries to detect the type of
btrfs object passed. If it is an "inode" type (e.g. file) this function
returns the type as "inode". If it is a block device, it return it as
"block device".
However it doesn't handle the case where the object passed is a link
to a block device (which could be a valid btrfs device). For example
LVM/DM creates link to block devices. In this case it should return
the type as "block device".
This patch replace the lstat() call with a stat().
Reported-by: Boris Burkov <boris@bur.io>
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
Signed-off-by: David Sterba <dsterba@suse.com>
When some error happens when trying to search for parent subvolume
then parent_subvol will contain errno so don't try to free that
Crash backtrace would look like:
0 process_snapshot at cmds/receive.c:358
358 free(parent_subvol->path);
1 0x00005646898aaa67 in read_and_process_cmd at common/send-stream.c:348
2 btrfs_read_and_process_send_stream at common/send-stream.c:525
3 0x00005646898c9b8b in do_receive at cmds/receive.c:1113
4 cmd_receive at cmds/receive.c:1316
5 0x00005646898750b1 in cmd_execute at cmds/commands.h:125
6 main at btrfs.c:405
(gdb) p parent_subvol
$1 = (struct subvol_info *) 0xfffffffffffffffe
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Dāvis Mosāns <davispuh@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add the on disk definitions for the block group tree. This will be part
of the super block so we need to add the appropriate helpers to the
super block, as well as adding it to the backup roots.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When we switch to multiple global trees we'll need to access the
appropriate extent root depending on the block group or possibly root.
To handle this, use a helper in most places and then the actual root in
places where it is required. We will whittle down the direct accessors
with future patches, but this does the bulk of the preparatory work.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filesystem du command fails and exits when it access file that has
permission denied. But it can continue the command except the files.
This patch prints error message just like /bin/du does and it continues
if it can.
Issue: #421
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
With extent tree v2 we will have per-block group checksums, so add a
helper to access the csum root and rename the fs_info csum_root to
_csum_root to catch all the places that are accessing it directly.
Convert everybody to use the helper except for internal things.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Running with ASAN we won't pass the self tests because we leak the whole
fs_info with btrfs filesystem show. Fix this by making sure we close
out the fs_info and clean up all of the memory and such.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a bug report that a corrupted key type (expected
UUID_KEY_SUBVOL, has EXTENT_ITEM) causing newer kernel to reject a
mount.
Although the root cause is not determined yet, with roll out of v5.11
kernel to various distros, such problem should be prevented by
tree-checker, no matter if it's hardware problem or not.
And older kernel with "-o uuid_rescan" mount option won't help, as
uuid_rescan will only delete items with
UUID_KEY_SUBVOL/UUID_KEY_RECEIVED_SUBVOL key types, not deleting such
corrupted key.
[FIX]
To fix such problem we have to rely on offline tool, thus there we
introduce a new rescue tool, clear-uuid-tree, to empty and then remove
uuid tree.
Kernel will re-generate the correct uuid tree at next mount.
Reported-by: S. <sb56637@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Current formula calculates the stripe size, however that's not what we
want in the case of RAID1/DUP profiles. In those cases since chunk are
mirrored across devices we want the full size of the chunk. Without this
patch the 'btrfs fi usage' output from an fs which is using RAID1 is:
Data,RAID1: Size:2.00GiB, Used:1.00GiB (50.03%)
/dev/vdc 1.00GiB
/dev/vdf 1.00GiB
Metadata,RAID1: Size:256.00MiB, Used:1.34MiB (0.52%)
/dev/vdc 128.00MiB
/dev/vdf 128.00MiB
System,RAID1: Size:8.00MiB, Used:16.00KiB (0.20%)
/dev/vdc 4.00MiB
/dev/vdf 4.00MiB
Unallocated:
/dev/vdc 8.87GiB
/dev/vdf 8.87GiB
So a 2 gigabyte RAID1 chunk actually will take up 4 gigabytes on the
actual disks 2 each. In this case this is being miscalculated as taking
up 1GiB on each device.
This also leads to erroneously calculated unallocated space. The correct
output in this case is:
Data,RAID1: Size:2.00GiB, Used:1.00GiB (50.03%)
/dev/vdc 2.00GiB
/dev/vdf 2.00GiB
Metadata,RAID1: Size:256.00MiB, Used:1.34MiB (0.52%)
/dev/vdc 256.00MiB
/dev/vdf 256.00MiB
System,RAID1: Size:8.00MiB, Used:16.00KiB (0.20%)
/dev/vdc 8.00MiB
/dev/vdf 8.00MiB
Unallocated:
/dev/vdc 7.74GiB
/dev/vdf 7.74GiB
Fix it by only utilising the chunk formula for profiles which are not
RAID1/DUP.
Issue: #422
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit 80714610f3 ("btrfs-progs: use raid table for ncopies")
slightly broke how raid ratio are being calculated since the resulting
code would always reset ratio to be 1 in case we didn't have RAID56
profile. The correct behavior is to simply set it to 0 if we have RAID56
as the calculation is different in this case and leave it intact
otherwise.
This bug manifests by doing all size-related calculation for 'btrfs
filesystem usage' command as if all block groups are of type SINGLE. Fix
this by only resetting ratio 0 in case of RAID56.
Issue: #422
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Just like kernel commit 22b6331d9617 ("btrfs: store precalculated
csum_size in fs_info"), we can cache csum_size and csum_type in
btrfs_fs_info.
Furthermore, there is already a 32 bits hole in btrfs_fs_info, and we
can fit csum_type and csum_size into the hole without increase the size
of btrfs_fs_info.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are a lot of call sites where we use the following code snippet:
u8 super_block_data[BTRFS_SUPER_INFO_SIZE];
struct btrfs_super_block *sb;
u64 ret;
sb = (struct btrfs_super_block *)super_block_data;
The reason for this is, structure btrfs_super_block was smaller than
BTRFS_SUPER_INFO_SIZE.
Thus for anything with csum involved, we have to use a proper 4K buffer.
Since the recent unification of sizeof(struct btrfs_super_block), we no
longer need such workaround, and can use struct btrfs_super_block
directly to do any operation.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There's a report that a read-only subvolume with a received_uuid set
emits the warning in command 'btrfs subvolume show', which is obviously
wrong.
The reason is that there are different types of root item flags,
depending on how we read them. The check in cmd_subvol_show uses the
ioctl GET_SUBVOL_INFO and the appropriate flag is raw
BTRFS_ROOT_SUBVOL_RDONLY (0x1), while there's another SUBVOL_GETFLAGS that
maps the flags and the raw value is different (BTRFS_SUBVOL_RDONLY, 0x2).
Due to this the warning was issued. Fix that by using the right flag
constant. The test has been extended to check for all combinations of
read-write and received_uuid.
Issue: #419
Signed-off-by: David Sterba <dsterba@suse.com>
The profile descriptions allow us to use a single formula to calculate
chunk size. Right now there are no profiles with parity (raid5-like) and
sub_stripes (raid10-like), which makes it easier.
- parity stripes are subtracted from the total count
- then divided by number of sub stripes
Practically speaking, 1:1 copy profiles do not have any adjustments.
Signed-off-by: David Sterba <dsterba@suse.com>
The striped profiles covering arbitrary number of devices are often
hardcoded so use the new helper btrfs_bg_type_is_stripey for that.
Signed-off-by: David Sterba <dsterba@suse.com>
There's opencoded value of raid table ncopies in
print_filesystem_usage_overall, add a helper and use it.
Signed-off-by: David Sterba <dsterba@suse.com>
After removing uuid search fallback code the structure has become
trivial and copies the fd that all callers have in their context.
Signed-off-by: David Sterba <dsterba@suse.com>
After the uuid search fallback code has been removed, the finit helper
has become empty and can be removed.
Signed-off-by: David Sterba <dsterba@suse.com>
All the comparators switch the result based on is_descending, but that
can be factored to the caller to simplify the comparators.
Signed-off-by: David Sterba <dsterba@suse.com>