Add a new btrfs_send_op and support for both dumping and proper receive
processing which does actual encoded writes.
Encoded writes are only allowed on a file descriptor opened with an
extra flag that allows encoded writes, so we also add support for this
flag when opening or reusing a file for writing.
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Update our copy of send.h from the kernel. This adds the new commands
and attributes for v2 as well as explicit enum numbering.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The new format privileges the BTRFS_SEND_A_DATA attribute by
guaranteeing it will always be the last attribute in any command that
needs it, and by implicitly encoding the data length as the difference
between the total command length in the command header and the sizes of
the rest of the attributes (and of course the tlv_type identifying the
DATA attribute). To parse the new stream, we must read the tlv_type and
if it is not DATA, we proceed normally, but if it is DATA, we don't
parse a tlv_len but simply compute the length.
In addition, we add some bounds checking when parsing each chunk of
data, as well as for the tlv_len itself.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
In send stream v2, write commands can now be an arbitrary size. For that
reason, we can no longer allocate a fixed array in sctx for read_cmd.
Instead, read_cmd dynamically allocates sctx->read_buf. To avoid
needless reallocations, we reuse read_buf between read_cmd calls by also
keeping track of the size of the allocated buffer in sctx->read_buf_sz.
We do the first allocation of the old default size at the start of
processing the stream, and we only reallocate if we encounter a command
that needs a larger buffer.
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
An encoded extent can be up to 128K in length, which exceeds the largest
value expressible by the current send stream format's 16 bit tlv_len
field. Since encoded writes cannot be split into multiple writes by
btrfs send, the send stream format must change to accommodate encoded
writes.
Supporting this changed format requires retooling how we store the
commands we have processed. We currently store pointers to the struct
btrfs_tlv_headers in the command buffer. This is not sufficient to
represent the new BTRFS_SEND_A_DATA format. Instead, parse the attribute
headers and store them in a new struct btrfs_send_attribute which has a
32bit length field. This is transparent to users of the various TLV_GET
macros.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
The header extent-cache.h is internal and provides only structures defined
in ctree.h, we don't need anything else besides the structures.
Signed-off-by: David Sterba <dsterba@suse.com>
The header extent_io.h is internal and provides only structures defined
in ctree.h, we don't need anything else besides the structures and
declarations used for inline functions in other headers.
Signed-off-by: David Sterba <dsterba@suse.com>
Headers that are only exported and not used for build do not need the
BTRFS_FLAT_INCLUDES switch (between local and installed headers). Now
that there are local copies of the shared headers drop the respective
part from local headers.
Signed-off-by: David Sterba <dsterba@suse.com>
Let libbtrfs use own copy of the exported header files to avoid
potential breakage when syncing with kernel headers and also to remove
declarations that are not used by userspace. The send.h is frozen to
support protocol v1.
Signed-off-by: David Sterba <dsterba@suse.com>
Creating a simple directory structure leads to the following error:
$ btrfs check
Checking filesystem on test.img
UUID: 8f2292ad-c80e-4ab4-8a72-29aa3a83002c
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
unresolved ref dir 260 index 0 namelen 2 name .. filetype 0 errors 3, no dir item, no dir index
ERROR: errors found in fs roots
found 101085184 bytes used, error(s) found
total csum bytes: 98460
total tree bytes: 262144
total fs tree bytes: 49152
total extent tree bytes: 16384
btree space waste bytes: 151864
file data blocks allocated: 167931904
referenced 167931904
The self-reference should exist for the toplevel directory, where the
parent directory points to itself.
Issue: #453
Author: tyan0
Signed-off-by: David Sterba <dsterba@suse.com>
The snapper build fails due to updates to kernel-lib files, the type
casts do not work the same way in C++. Simplify READ_ONCE/WRITE_ONCE
even more, drop use of 'new' as identifier.
Issue: https://github.com/openSUSE/snapper/issues/725
Signed-off-by: David Sterba <dsterba@suse.com>
As we're not supporting arbitrarily big or small zone sizes in the kernel,
reject devices that don't fit in progs as well.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The ext3 has been superseded by ext4, we don't need to test it
separately so this reduces the convert tests run time.
Signed-off-by: David Sterba <dsterba@suse.com>
Show the list of supported compression algorithms in the help string as
we now have optional LZO and ZSTD.
Signed-off-by: David Sterba <dsterba@suse.com>
LZO as a compression format is pretty archaic these days, there are
better algorithms in all metrics for compression and decompression, and
lzo hasn't had a new release since 2017.
Add an option to disable LZO (defaulting to enabled), and respect it in
cmds/restore.c.
NOTE: disabling support for LZO will make make it impossible to restore
data from filesystems where the compression has ever been used. It's not
recommended to build without the support in general.
Signed-off-by: Ross Burton <ross.burton@arm.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It is not possible to use mixed profile together with other profiles.
The current wording is not clear about this, so let's add a
clarification note.
Author: Forza
Signed-off-by: David Sterba <dsterba@suse.com>
Add test case is to make sure on a relative large empty fs, we won't
create bitmaps to unnecessarily increase the size of free space tree.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When creating btrfs with new v2 cache (the default behavior), mkfs.btrfs
always create the free space tree using bitmap.
It's fine for small fs, but will be a disaster if the device is large
and the data profile is something like RAID0:
$ mkfs.btrfs -f -m raid1 -d raid0 /dev/test/scratch[1234]
btrfs-progs v5.17
[...]
Block group profiles:
Data: RAID0 4.00GiB
Metadata: RAID1 256.00MiB
System: RAID1 8.00MiB
[..]
$ btrfs ins dump-tree -t free-space /dev/test/scratch1
btrfs-progs v5.17
free space tree key (FREE_SPACE_TREE ROOT_ITEM 0)
node 30441472 level 1 items 10 free space 483 generation 6 owner FREE_SPACE_TREE
node 30441472 flags 0x1(WRITTEN) backref revision 1
fs uuid deddccae-afd0-4160-9a12-48fe7b526fb1
chunk uuid 68f6cf98-afe3-4f47-9797-37fd9c610219
key (1048576 FREE_SPACE_INFO 4194304) block 30457856 gen 6
key (475004928 FREE_SPACE_BITMAP 8388608) block 30703616 gen 5
key (953155584 FREE_SPACE_BITMAP 8388608) block 30720000 gen 5
key (1431306240 FREE_SPACE_BITMAP 8388608) block 30736384 gen 5
key (1909456896 FREE_SPACE_BITMAP 8388608) block 30752768 gen 5
key (2387607552 FREE_SPACE_BITMAP 8388608) block 30769152 gen 5
key (2865758208 FREE_SPACE_BITMAP 8388608) block 30785536 gen 5
key (3343908864 FREE_SPACE_BITMAP 8388608) block 30801920 gen 5
key (3822059520 FREE_SPACE_BITMAP 8388608) block 30818304 gen 5
key (4300210176 FREE_SPACE_BITMAP 8388608) block 30834688 gen 5
[...]
^^^ So many bitmaps that an empty fs will have two levels for free
space tree already
[CAUSE]
Member btrfs_block_group::bitmap_high_thresh is never properly set to
any value other than 0, thus in function
update_free_space_extent_count(), the following check is always true:
if (!(flags & BTRFS_FREE_SPACE_USING_BITMAPS) &&
extent_count > block_group->bitmap_high_thresh) {
ret = convert_free_space_to_bitmaps(trans, block_group, path);
Thus we always got converted to bitmaps.
[FIX]
Cross-port the function set_free_space_tree_thresholds() from kernel,
and call that function in btrfs_make_block_group() and
read_one_block_group() so that every block group has bitmap_high_thresh
properly set.
Now even for that 4GiB large data chunk, we still only have one free extent:
btrfs-progs v5.17
free space tree key (FREE_SPACE_TREE ROOT_ITEM 0)
leaf 30572544 items 15 free space 15860 generation 6 owner FREE_SPACE_TREE
leaf 30572544 flags 0x1(WRITTEN) backref revision 1
fs uuid b24e52ea-6580-4a88-aa70-cb173090bfe3
chunk uuid d85f3905-fc61-4084-b335-2b6b97814b8e
[...]
item 13 key (298844160 FREE_SPACE_INFO 4294967296) itemoff 16235 itemsize 8
free space info extent count 1 flags 0
item 14 key (298844160 FREE_SPACE_EXTENT 4294967296) itemoff 16235 itemsize 0
free space extent
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are several call sites utilizing btrfs_commit_transaction() just
to update members in super blocks, without any metadata update.
This can be problematic for some simple call sites, like zero_log_tree()
or check_and_repair_super_num_devs().
If we have big problems preventing the fs to be mounted in the first
place, and need to clear the log or super block size, but by some other
problems in extent tree, we're unable to allocate new blocks.
Then we fall into a deadlock that, we need to mount (even
ro,rescue=all) to collect extra info, but btrfs-progs can not do any
super block updates.
Fix the problem by allowing the following super blocks only operations
to be done without using btrfs_commit_transaction():
- btrfs_fix_super_size()
- check_and_repair_super_num_devs()
- zero_log_tree().
There are some exceptions in btrfstune.c, related to the csum type
conversion and seed flags.
In those btrfstune cases, we in fact wants to proper error report in
btrfs_commit_transaction(), as those operations are not mount critical,
and any early error can be helpful to expose any problems in the fs.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The current subpage support in v5.18 has several limits, the most
obvious ones are:
- Only support 64KiB page size
- No RAID56 support
The supports are already queued for v5.19.
And some minor ones:
- No inline extent write support
Read is always supported.
Subpage mount will always just act as "max_inline=0".
- Compression write is only for page aligned range.
Read is always supported, no matter the alignment.
- Extra memory usage for scrub
Patchset is hanging there for a while though.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It should have been deleted, as CHANGES mentioned this in v5.14, but
obvious it's not.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When running some tests, I notice that my debug build of btrfs-convert
is throwing out garbage for target fs label:
$ ./btrfs-convert ~/test.img
btrfs-convert from btrfs-progs v5.17
Source filesystem:
Type: ext2
Label:
Blocksize: 4096
UUID: 29d159a8-cb46-41d3-8089-3c5c65e4afae
Target filesystem:
Label: @pcwU <<< Garbage here
Blocksize: 4096
Nodesize: 16384
UUID: 682bf5f2-8cb1-4390-b9ac-6883cd87ed39
Checksum: crc32c
...
[CAUSE]
The fslabel[] array is just not initialized, thus it can contain
garbage.
[FIX]
Initialize fslabel[] array to all zero.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When testing my new RAID56J code, there is a bug causing dev extents
overlapping.
Although both modes can detect the problem, lowmem has leaked some
extent buffers:
$ btrfs check --mode=lowmem /dev/test/scratch1
Opening filesystem to check...
Checking filesystem on /dev/test/scratch1
UUID: 65775ce9-bb9d-4f61-a210-beea52eef090
[1/7] checking root items
[2/7] checking extents
ERROR: dev extent devid 1 offset 1095761920 len 1073741824 overlap with previous dev extent end 1096810496
ERROR: dev extent devid 2 offset 1351614464 len 1073741824 overlap with previous dev extent end 1352663040
ERROR: dev extent devid 3 offset 1351614464 len 1073741824 overlap with previous dev extent end 1352663040
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs done with fs roots in lowmem mode, skipping
[7/7] checking quota groups skipped (not enabled on this FS)
found 3221372928 bytes used, error(s) found
total csum bytes: 0
total tree bytes: 147456
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 136231
file data blocks allocated: 3221225472
referenced 3221225472
extent buffer leak: start 30752768 len 16384
extent buffer leak: start 30752768 len 16384
extent buffer leak: start 30752768 len 16384
[CAUSE]
In the function check_dev_item(), we iterate through all the dev
extents, but when we found overlapping extents, we exit without
releasing the path, causing extent buffer leakage.
[FIX]
Just release the path before we exit the function.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Keep only existing exports, the commented out functions have been hidden
in 56e9963474 ("btrfs-progs: libbtrfs: hide unused symbols, same
version") in 5.14 and no problems have been reported.
Signed-off-by: David Sterba <dsterba@suse.com>