Now that LZO and ZSTD are optional for not just restore, rename the
build variables to a more generic name and update configure summary.
Signed-off-by: David Sterba <dsterba@suse.com>
There are build-time options for LZO and ZSTD support, the stream v2+
supports compression. The help text lists what has been compiled in,
similar to what 'restore' does, with a similar limitation that a stream
with compressed data cannot be processed if any of the extents is
compressed.
Signed-off-by: David Sterba <dsterba@suse.com>
Copy contents from https://btrfs.wiki.kernel.org/index.php/Changelog#By_feature
The formatting is done by a definition and list, instead of a table.
Unfortunatelly RST does not wrap long text in table cells so the width
exceeds visible area.
Signed-off-by: David Sterba <dsterba@suse.com>
Fix the following warnings:
[SPHINX] man
../CHANGES:37: ERROR: Unexpected indentation.
../CHANGES:1183: WARNING: Block quote ends without a blank line; unexpected unindent.
./Documentation/DocConventions.rst:31: ERROR: Unexpected indentation.
./Documentation/Source-repositories.rst:2: WARNING: Duplicate explicit target name: "web access".
./Documentation/DocConventions.rst: WARNING: document isn't included in any toctree
The free format of CHANGES sometimes does not align with RST so fix it
so it's visually similar in both formats. Remove RST references to 'web
access', URLs are parsed and rendered clickable and we don't have full
reference list yet. Doc conventions have been updated to RST but not
finalized so put it to TODO section.
Signed-off-by: David Sterba <dsterba@suse.com>
Adapt the existing send/receive tests by passing '-o compress-force' to
the mount commands in a new test. After writing a few files in the
various compression formats, send/receive them with and without
--force-decompress to test both the encoded_write path and the fallback
to decode+write.
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
First, add a --proto option to allow specifying the desired send
protocol version. It defaults to one, the original version. In a couple
of releases once people are aware that protocol revisions are happening,
we can change it to default to zero, which means the latest version
supported by the kernel. This is based on Dave Sterba's patch.
Also add a --compressed-data flag to instruct the kernel to use
encoded_write commands for compressed extents. This requires an explicit
opt in separate from the protocol version because:
1. The user may not want compression on the receiving side, or may want
a different compression algorithm/level on the receiving side.
2. It has a soft requirement for kernel support on the receiving side
(btrfs-progs can fall back to decompressing and writing if the kernel
doesn't support BTRFS_IOC_ENCODED_WRITE, but the user may not be
prepared to pay that CPU cost). Going forward, since it's easier to
update progs than the kernel, I think we'll want to make new send
features that require kernel support opt-in, whereas anything that
only requires a progs update can happen automatically.
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In send stream v2, send can emit a command for setting inode flags via
the setflags ioctl. Pass the flags attribute through to the ioctl call
in receive.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Send stream v2 can emit fallocate commands, so receive must support them
as well. The implementation simply passes along the arguments to the
syscall. Note that mode is encoded as a u32 in send stream but fallocate
takes an int, so there is a unsigned->signed conversion there.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
An encoded_write can fail if the file system it is being applied to does
not support encoded writes or if it can't find enough contiguous space
to accommodate the encoded extent. In those cases, we can likely still
process an encoded_write by explicitly decoding the data and doing a
normal write.
Add the necessary fallback path for decoding data compressed with zlib,
lzo, or zstd. zlib and zstd have reusable decoding context data
structures which we cache in the receive context so that we don't have
to recreate them on every encoded_write.
Finally, add a command line flag for force-decompress which causes
receive to always use the fallback path rather than first attempting the
encoded write.
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a new btrfs_send_op and support for both dumping and proper receive
processing which does actual encoded writes.
Encoded writes are only allowed on a file descriptor opened with an
extra flag that allows encoded writes, so we also add support for this
flag when opening or reusing a file for writing.
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Update our copy of send.h from the kernel. This adds the new commands
and attributes for v2 as well as explicit enum numbering.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The new format privileges the BTRFS_SEND_A_DATA attribute by
guaranteeing it will always be the last attribute in any command that
needs it, and by implicitly encoding the data length as the difference
between the total command length in the command header and the sizes of
the rest of the attributes (and of course the tlv_type identifying the
DATA attribute). To parse the new stream, we must read the tlv_type and
if it is not DATA, we proceed normally, but if it is DATA, we don't
parse a tlv_len but simply compute the length.
In addition, we add some bounds checking when parsing each chunk of
data, as well as for the tlv_len itself.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
In send stream v2, write commands can now be an arbitrary size. For that
reason, we can no longer allocate a fixed array in sctx for read_cmd.
Instead, read_cmd dynamically allocates sctx->read_buf. To avoid
needless reallocations, we reuse read_buf between read_cmd calls by also
keeping track of the size of the allocated buffer in sctx->read_buf_sz.
We do the first allocation of the old default size at the start of
processing the stream, and we only reallocate if we encounter a command
that needs a larger buffer.
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
An encoded extent can be up to 128K in length, which exceeds the largest
value expressible by the current send stream format's 16 bit tlv_len
field. Since encoded writes cannot be split into multiple writes by
btrfs send, the send stream format must change to accommodate encoded
writes.
Supporting this changed format requires retooling how we store the
commands we have processed. We currently store pointers to the struct
btrfs_tlv_headers in the command buffer. This is not sufficient to
represent the new BTRFS_SEND_A_DATA format. Instead, parse the attribute
headers and store them in a new struct btrfs_send_attribute which has a
32bit length field. This is transparent to users of the various TLV_GET
macros.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
The header extent-cache.h is internal and provides only structures defined
in ctree.h, we don't need anything else besides the structures.
Signed-off-by: David Sterba <dsterba@suse.com>
The header extent_io.h is internal and provides only structures defined
in ctree.h, we don't need anything else besides the structures and
declarations used for inline functions in other headers.
Signed-off-by: David Sterba <dsterba@suse.com>
Headers that are only exported and not used for build do not need the
BTRFS_FLAT_INCLUDES switch (between local and installed headers). Now
that there are local copies of the shared headers drop the respective
part from local headers.
Signed-off-by: David Sterba <dsterba@suse.com>
Let libbtrfs use own copy of the exported header files to avoid
potential breakage when syncing with kernel headers and also to remove
declarations that are not used by userspace. The send.h is frozen to
support protocol v1.
Signed-off-by: David Sterba <dsterba@suse.com>
Creating a simple directory structure leads to the following error:
$ btrfs check
Checking filesystem on test.img
UUID: 8f2292ad-c80e-4ab4-8a72-29aa3a83002c
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
unresolved ref dir 260 index 0 namelen 2 name .. filetype 0 errors 3, no dir item, no dir index
ERROR: errors found in fs roots
found 101085184 bytes used, error(s) found
total csum bytes: 98460
total tree bytes: 262144
total fs tree bytes: 49152
total extent tree bytes: 16384
btree space waste bytes: 151864
file data blocks allocated: 167931904
referenced 167931904
The self-reference should exist for the toplevel directory, where the
parent directory points to itself.
Issue: #453
Author: tyan0
Signed-off-by: David Sterba <dsterba@suse.com>
The snapper build fails due to updates to kernel-lib files, the type
casts do not work the same way in C++. Simplify READ_ONCE/WRITE_ONCE
even more, drop use of 'new' as identifier.
Issue: https://github.com/openSUSE/snapper/issues/725
Signed-off-by: David Sterba <dsterba@suse.com>
As we're not supporting arbitrarily big or small zone sizes in the kernel,
reject devices that don't fit in progs as well.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The ext3 has been superseded by ext4, we don't need to test it
separately so this reduces the convert tests run time.
Signed-off-by: David Sterba <dsterba@suse.com>
Show the list of supported compression algorithms in the help string as
we now have optional LZO and ZSTD.
Signed-off-by: David Sterba <dsterba@suse.com>
LZO as a compression format is pretty archaic these days, there are
better algorithms in all metrics for compression and decompression, and
lzo hasn't had a new release since 2017.
Add an option to disable LZO (defaulting to enabled), and respect it in
cmds/restore.c.
NOTE: disabling support for LZO will make make it impossible to restore
data from filesystems where the compression has ever been used. It's not
recommended to build without the support in general.
Signed-off-by: Ross Burton <ross.burton@arm.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It is not possible to use mixed profile together with other profiles.
The current wording is not clear about this, so let's add a
clarification note.
Author: Forza
Signed-off-by: David Sterba <dsterba@suse.com>
Add test case is to make sure on a relative large empty fs, we won't
create bitmaps to unnecessarily increase the size of free space tree.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When creating btrfs with new v2 cache (the default behavior), mkfs.btrfs
always create the free space tree using bitmap.
It's fine for small fs, but will be a disaster if the device is large
and the data profile is something like RAID0:
$ mkfs.btrfs -f -m raid1 -d raid0 /dev/test/scratch[1234]
btrfs-progs v5.17
[...]
Block group profiles:
Data: RAID0 4.00GiB
Metadata: RAID1 256.00MiB
System: RAID1 8.00MiB
[..]
$ btrfs ins dump-tree -t free-space /dev/test/scratch1
btrfs-progs v5.17
free space tree key (FREE_SPACE_TREE ROOT_ITEM 0)
node 30441472 level 1 items 10 free space 483 generation 6 owner FREE_SPACE_TREE
node 30441472 flags 0x1(WRITTEN) backref revision 1
fs uuid deddccae-afd0-4160-9a12-48fe7b526fb1
chunk uuid 68f6cf98-afe3-4f47-9797-37fd9c610219
key (1048576 FREE_SPACE_INFO 4194304) block 30457856 gen 6
key (475004928 FREE_SPACE_BITMAP 8388608) block 30703616 gen 5
key (953155584 FREE_SPACE_BITMAP 8388608) block 30720000 gen 5
key (1431306240 FREE_SPACE_BITMAP 8388608) block 30736384 gen 5
key (1909456896 FREE_SPACE_BITMAP 8388608) block 30752768 gen 5
key (2387607552 FREE_SPACE_BITMAP 8388608) block 30769152 gen 5
key (2865758208 FREE_SPACE_BITMAP 8388608) block 30785536 gen 5
key (3343908864 FREE_SPACE_BITMAP 8388608) block 30801920 gen 5
key (3822059520 FREE_SPACE_BITMAP 8388608) block 30818304 gen 5
key (4300210176 FREE_SPACE_BITMAP 8388608) block 30834688 gen 5
[...]
^^^ So many bitmaps that an empty fs will have two levels for free
space tree already
[CAUSE]
Member btrfs_block_group::bitmap_high_thresh is never properly set to
any value other than 0, thus in function
update_free_space_extent_count(), the following check is always true:
if (!(flags & BTRFS_FREE_SPACE_USING_BITMAPS) &&
extent_count > block_group->bitmap_high_thresh) {
ret = convert_free_space_to_bitmaps(trans, block_group, path);
Thus we always got converted to bitmaps.
[FIX]
Cross-port the function set_free_space_tree_thresholds() from kernel,
and call that function in btrfs_make_block_group() and
read_one_block_group() so that every block group has bitmap_high_thresh
properly set.
Now even for that 4GiB large data chunk, we still only have one free extent:
btrfs-progs v5.17
free space tree key (FREE_SPACE_TREE ROOT_ITEM 0)
leaf 30572544 items 15 free space 15860 generation 6 owner FREE_SPACE_TREE
leaf 30572544 flags 0x1(WRITTEN) backref revision 1
fs uuid b24e52ea-6580-4a88-aa70-cb173090bfe3
chunk uuid d85f3905-fc61-4084-b335-2b6b97814b8e
[...]
item 13 key (298844160 FREE_SPACE_INFO 4294967296) itemoff 16235 itemsize 8
free space info extent count 1 flags 0
item 14 key (298844160 FREE_SPACE_EXTENT 4294967296) itemoff 16235 itemsize 0
free space extent
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are several call sites utilizing btrfs_commit_transaction() just
to update members in super blocks, without any metadata update.
This can be problematic for some simple call sites, like zero_log_tree()
or check_and_repair_super_num_devs().
If we have big problems preventing the fs to be mounted in the first
place, and need to clear the log or super block size, but by some other
problems in extent tree, we're unable to allocate new blocks.
Then we fall into a deadlock that, we need to mount (even
ro,rescue=all) to collect extra info, but btrfs-progs can not do any
super block updates.
Fix the problem by allowing the following super blocks only operations
to be done without using btrfs_commit_transaction():
- btrfs_fix_super_size()
- check_and_repair_super_num_devs()
- zero_log_tree().
There are some exceptions in btrfstune.c, related to the csum type
conversion and seed flags.
In those btrfstune cases, we in fact wants to proper error report in
btrfs_commit_transaction(), as those operations are not mount critical,
and any early error can be helpful to expose any problems in the fs.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The current subpage support in v5.18 has several limits, the most
obvious ones are:
- Only support 64KiB page size
- No RAID56 support
The supports are already queued for v5.19.
And some minor ones:
- No inline extent write support
Read is always supported.
Subpage mount will always just act as "max_inline=0".
- Compression write is only for page aligned range.
Read is always supported, no matter the alignment.
- Extra memory usage for scrub
Patchset is hanging there for a while though.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>