The ext3 has been superseded by ext4, we don't need to test it
separately so this reduces the convert tests run time.
Signed-off-by: David Sterba <dsterba@suse.com>
Show the list of supported compression algorithms in the help string as
we now have optional LZO and ZSTD.
Signed-off-by: David Sterba <dsterba@suse.com>
LZO as a compression format is pretty archaic these days, there are
better algorithms in all metrics for compression and decompression, and
lzo hasn't had a new release since 2017.
Add an option to disable LZO (defaulting to enabled), and respect it in
cmds/restore.c.
NOTE: disabling support for LZO will make make it impossible to restore
data from filesystems where the compression has ever been used. It's not
recommended to build without the support in general.
Signed-off-by: Ross Burton <ross.burton@arm.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It is not possible to use mixed profile together with other profiles.
The current wording is not clear about this, so let's add a
clarification note.
Author: Forza
Signed-off-by: David Sterba <dsterba@suse.com>
Add test case is to make sure on a relative large empty fs, we won't
create bitmaps to unnecessarily increase the size of free space tree.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When creating btrfs with new v2 cache (the default behavior), mkfs.btrfs
always create the free space tree using bitmap.
It's fine for small fs, but will be a disaster if the device is large
and the data profile is something like RAID0:
$ mkfs.btrfs -f -m raid1 -d raid0 /dev/test/scratch[1234]
btrfs-progs v5.17
[...]
Block group profiles:
Data: RAID0 4.00GiB
Metadata: RAID1 256.00MiB
System: RAID1 8.00MiB
[..]
$ btrfs ins dump-tree -t free-space /dev/test/scratch1
btrfs-progs v5.17
free space tree key (FREE_SPACE_TREE ROOT_ITEM 0)
node 30441472 level 1 items 10 free space 483 generation 6 owner FREE_SPACE_TREE
node 30441472 flags 0x1(WRITTEN) backref revision 1
fs uuid deddccae-afd0-4160-9a12-48fe7b526fb1
chunk uuid 68f6cf98-afe3-4f47-9797-37fd9c610219
key (1048576 FREE_SPACE_INFO 4194304) block 30457856 gen 6
key (475004928 FREE_SPACE_BITMAP 8388608) block 30703616 gen 5
key (953155584 FREE_SPACE_BITMAP 8388608) block 30720000 gen 5
key (1431306240 FREE_SPACE_BITMAP 8388608) block 30736384 gen 5
key (1909456896 FREE_SPACE_BITMAP 8388608) block 30752768 gen 5
key (2387607552 FREE_SPACE_BITMAP 8388608) block 30769152 gen 5
key (2865758208 FREE_SPACE_BITMAP 8388608) block 30785536 gen 5
key (3343908864 FREE_SPACE_BITMAP 8388608) block 30801920 gen 5
key (3822059520 FREE_SPACE_BITMAP 8388608) block 30818304 gen 5
key (4300210176 FREE_SPACE_BITMAP 8388608) block 30834688 gen 5
[...]
^^^ So many bitmaps that an empty fs will have two levels for free
space tree already
[CAUSE]
Member btrfs_block_group::bitmap_high_thresh is never properly set to
any value other than 0, thus in function
update_free_space_extent_count(), the following check is always true:
if (!(flags & BTRFS_FREE_SPACE_USING_BITMAPS) &&
extent_count > block_group->bitmap_high_thresh) {
ret = convert_free_space_to_bitmaps(trans, block_group, path);
Thus we always got converted to bitmaps.
[FIX]
Cross-port the function set_free_space_tree_thresholds() from kernel,
and call that function in btrfs_make_block_group() and
read_one_block_group() so that every block group has bitmap_high_thresh
properly set.
Now even for that 4GiB large data chunk, we still only have one free extent:
btrfs-progs v5.17
free space tree key (FREE_SPACE_TREE ROOT_ITEM 0)
leaf 30572544 items 15 free space 15860 generation 6 owner FREE_SPACE_TREE
leaf 30572544 flags 0x1(WRITTEN) backref revision 1
fs uuid b24e52ea-6580-4a88-aa70-cb173090bfe3
chunk uuid d85f3905-fc61-4084-b335-2b6b97814b8e
[...]
item 13 key (298844160 FREE_SPACE_INFO 4294967296) itemoff 16235 itemsize 8
free space info extent count 1 flags 0
item 14 key (298844160 FREE_SPACE_EXTENT 4294967296) itemoff 16235 itemsize 0
free space extent
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are several call sites utilizing btrfs_commit_transaction() just
to update members in super blocks, without any metadata update.
This can be problematic for some simple call sites, like zero_log_tree()
or check_and_repair_super_num_devs().
If we have big problems preventing the fs to be mounted in the first
place, and need to clear the log or super block size, but by some other
problems in extent tree, we're unable to allocate new blocks.
Then we fall into a deadlock that, we need to mount (even
ro,rescue=all) to collect extra info, but btrfs-progs can not do any
super block updates.
Fix the problem by allowing the following super blocks only operations
to be done without using btrfs_commit_transaction():
- btrfs_fix_super_size()
- check_and_repair_super_num_devs()
- zero_log_tree().
There are some exceptions in btrfstune.c, related to the csum type
conversion and seed flags.
In those btrfstune cases, we in fact wants to proper error report in
btrfs_commit_transaction(), as those operations are not mount critical,
and any early error can be helpful to expose any problems in the fs.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The current subpage support in v5.18 has several limits, the most
obvious ones are:
- Only support 64KiB page size
- No RAID56 support
The supports are already queued for v5.19.
And some minor ones:
- No inline extent write support
Read is always supported.
Subpage mount will always just act as "max_inline=0".
- Compression write is only for page aligned range.
Read is always supported, no matter the alignment.
- Extra memory usage for scrub
Patchset is hanging there for a while though.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It should have been deleted, as CHANGES mentioned this in v5.14, but
obvious it's not.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When running some tests, I notice that my debug build of btrfs-convert
is throwing out garbage for target fs label:
$ ./btrfs-convert ~/test.img
btrfs-convert from btrfs-progs v5.17
Source filesystem:
Type: ext2
Label:
Blocksize: 4096
UUID: 29d159a8-cb46-41d3-8089-3c5c65e4afae
Target filesystem:
Label: @pcwU <<< Garbage here
Blocksize: 4096
Nodesize: 16384
UUID: 682bf5f2-8cb1-4390-b9ac-6883cd87ed39
Checksum: crc32c
...
[CAUSE]
The fslabel[] array is just not initialized, thus it can contain
garbage.
[FIX]
Initialize fslabel[] array to all zero.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When testing my new RAID56J code, there is a bug causing dev extents
overlapping.
Although both modes can detect the problem, lowmem has leaked some
extent buffers:
$ btrfs check --mode=lowmem /dev/test/scratch1
Opening filesystem to check...
Checking filesystem on /dev/test/scratch1
UUID: 65775ce9-bb9d-4f61-a210-beea52eef090
[1/7] checking root items
[2/7] checking extents
ERROR: dev extent devid 1 offset 1095761920 len 1073741824 overlap with previous dev extent end 1096810496
ERROR: dev extent devid 2 offset 1351614464 len 1073741824 overlap with previous dev extent end 1352663040
ERROR: dev extent devid 3 offset 1351614464 len 1073741824 overlap with previous dev extent end 1352663040
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs done with fs roots in lowmem mode, skipping
[7/7] checking quota groups skipped (not enabled on this FS)
found 3221372928 bytes used, error(s) found
total csum bytes: 0
total tree bytes: 147456
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 136231
file data blocks allocated: 3221225472
referenced 3221225472
extent buffer leak: start 30752768 len 16384
extent buffer leak: start 30752768 len 16384
extent buffer leak: start 30752768 len 16384
[CAUSE]
In the function check_dev_item(), we iterate through all the dev
extents, but when we found overlapping extents, we exit without
releasing the path, causing extent buffer leakage.
[FIX]
Just release the path before we exit the function.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Keep only existing exports, the commented out functions have been hidden
in 56e9963474 ("btrfs-progs: libbtrfs: hide unused symbols, same
version") in 5.14 and no problems have been reported.
Signed-off-by: David Sterba <dsterba@suse.com>
Copy inline helpers for the cached variant of the rbtree, not used yet.
Rename 'new' for C++ compatibility.
Signed-off-by: David Sterba <dsterba@suse.com>
For easier source synchronization with kernel, add the _ONCE wrappers,
but only the simplified version as we don't do any lock-less
algorithms or use the semantics in userspace.
Signed-off-by: David Sterba <dsterba@suse.com>
In order to use rb_root_cached we need to sync with kernel sources. Copy
the file from linux.git/include/linux/rbtree_types.h and update so it's
C++ protected for inclusion to libbtrfs and remove duplicate
definitions.
Signed-off-by: David Sterba <dsterba@suse.com>
Qu noticed that the full checksums are still printed even if the
experimental build is not enabled. This is caused by wrong use of #ifdef
(as the macro is always defined), this must be "#if".
Fixes: 1bb6fb896d ("btrfs-progs: btrfstune: experimental, new option to switch csums")
Reported-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For the default CRC32C checksum, print-tree now prints tons of
unnecessary padding zeros:
btrfs-progs v5.17
chunk tree
leaf 22036480 items 7 free space 15430 generation 6 owner CHUNK_TREE
leaf 22036480 flags 0x1(WRITTEN) backref revision 1
checksum stored 0ac1b9fa00000000000000000000000000000000000000000000000000000000
checksum calced 0ac1b9fa00000000000000000000000000000000000000000000000000000000
fs uuid 3d95b7e3-3ab6-4927-af56-c58aa634342e
This is caused by commit 1bb6fb896d ("btrfs-progs: btrfstune:
experimental, new option to switch csums"), and it looks like most
distros just enable EXPERIMENTAL features by default.
(Which is a good thing to provide much better coverage).
So here we just limit the csum print to the utilized csum size.
Now the output looks like:
btrfs-progs v5.17
chunk tree
leaf 22036480 items 4 free space 15781 generation 6 owner CHUNK_TREE
leaf 22036480 flags 0x1(WRITTEN) backref revision 1
checksum stored 676b812f
checksum calced 676b812f
fs uuid d11f8799-b6dc-415d-b1ed-cebe6da5f0b7
Fixes: 1bb6fb896d ("btrfs-progs: btrfstune: experimental, new option to switch csums")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Kernel commit efc0e69c2fea ("btrfs: introduce exclusive operation
BALANCE_PAUSED state") allows to start a device add when there's a
paused balance, eg. to let the balance finish when there's not enough
chunk space. Add the support for that, though this needs an updated
kernel to export the 'balance paused' in sysfs.
Signed-off-by: David Sterba <dsterba@suse.com>
This member is not used by anyone, just remove it.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add headings to versions, reorder so the minor releases are under the
major so it's properly nested, keep the last version expanded.
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When running btrfs/011 with subpage case, even with RAID56 support, it
still fails with the following error:
QA output created by 011
*** test btrfs replace
mkfs failed
(see /home/adam/xfstests-dev/results//btrfs/011.full for details)
The full log shows:
---------workout "-m single -d single -M" 1 no 64-----------
ERROR: illegal nodesize 65536 (not equal to 4096 for mixed block group)
mkfs failed
This is a critical error, making test case to be aborted, without
checking the rest profiles.
[CAUSE]
Mkfs.btrfs always uses the maximum value between sectorsize and page
size for its mixed profile nodesize.
For subpage case, it means we always go PAGE_SIZE, no matter whatever
the sectorsize is passed in.
[FIX]
Just get rid of the direct PAGE_SIZE usage when determining nodesize for
mixed profiles.
And use sectorsize directly (either passed in by the user, or
determined from page size).
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Previously 'btrfs check --check-data-csum' will report tons of false
alerts for RAID56.
Add a test case to make sure with the new RAID56 rebuild ability, there
should be no false alerts.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This new ability is added by:
- Allow btrfs_map_block() to return the chunk type
This makes later work much easier
- Only reset stripe offset inside btrfs_map_block() when needed
Currently if @raid_map is not NULL, btrfs_map_block() will consider
this call is for WRITE and will reset stripe offset.
This is no longer the case, as for RAID56 read with mirror_num 1/0,
we will still call btrfs_map_block() with non-NULL raid_map.
Add a small check to make sure we won't reset stripe offset for
mirror 1/0 read.
- Add new helper read_raid56() to handle rebuild
We will read the full stripe (including all data and P/Q stripes)
do the rebuild, then only copy the refered part to the caller.
There is a catch for RAID6, we have no way to exhaust all combination,
so the current repair will assume the mirror = 0 data is corrupted,
then try to find a missing device.
But if no missing device can be found, it will assume P is corrupted.
This is just a guess, and can to totally wrong, but we have no better
idea.
Now btrfs-progs have full read ability for RAID56.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Those two members are a shortcut for non-RAID56 profiles.
But we should not use such shortcut, and move all our logical address
read/write to the unified read_data_from_disk()/write_data_to_disk().
With previous refactors, now we're safe to remove them.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The function read_extent_from_disk() is only a wrapper to read tree
block.
And read_extent_data() is just a while loop to eliminate short read
caused by stripe boundary.
In fact, a lot of call sites of read_extent_data() are either reading
metadata (thus no possible short read) or doing extra loop by
themselves.
This patch will replace those two functions with read_data_from_disk(),
making it the only entrance for data/metadata read.
And update read_data_from_disk() to return the read bytes, so caller can
do a simple while loop.
For the few callers of read_extent_data(), open-code a small while loop
for them.
This will allow later RAID56 read repair using P/Q much easier.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>