[BUG]
If restoring dumped image to a new file, under most cases kernel will
reject it since version 5.11:
# mkfs.btrfs -f /dev/test/test
# btrfs-image /dev/test/test /tmp/dump
# btrfs-image -r /tmp/dump ~/test.img
# mount ~/test.img /mnt/btrfs
mount: /mnt/btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
# dmesg -t | tail -n 7
loop0: detected capacity change from 10592 to 0
BTRFS info (device loop0): disk space caching is enabled
BTRFS info (device loop0): has skinny extents
BTRFS info (device loop0): flagging fs with big metadata feature
BTRFS error (device loop0): device total_bytes should be at most 5423104 but found 10737418240
BTRFS error (device loop0): failed to read chunk tree: -22
BTRFS error (device loop0): open_ctree failed
[CAUSE]
When btrfs-image restores an image into a file, and the source image
contains only single device, then we don't need to modify the
chunk/device tree, as we can reuse the existing chunk/dev tree without
any problem.
This also means, for such restore, we also won't do any target file
enlarge. This behavior itself is fine, as at that time, kernel won't
check if the device is smaller than the device size recorded in device
tree.
But later kernel commit 3a160a933111 ("btrfs: drop never met disk total
bytes check in verify_one_dev_extent") introduces new check on device
size at mount time, rejecting any loop file which is smaller than the
original device size.
[FIX]
Do extra file enlarge for single device restore if the restored file is
smaller than the device size.
Reported-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Su Yue <l@damenly.su>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In restore_metadump(), we call stat() but never use the result. This
call site is left by some code refactoring, as the stat() call is now
moved into fixup_device_size(). We can safely remove the call.
Reviewed-by: Su Yue <l@damenly.su>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There's a group of functions that are related to opening filesystem in
various modes, this can be moved to a separate file.
Signed-off-by: David Sterba <dsterba@suse.com>
Extending open_ctree with more parameters would be difficult, we'll need
to add more so factor out the parameters to a structure for easier
extension.
Signed-off-by: David Sterba <dsterba@suse.com>
While trying to run down a corruption problem I needed to use
btrfs-image to generate known good states in between tests. At some
point this started failing with
either extent tree is corrupted or deprecated extent ref format
This is because the fs had an extent item that was large enough that it
no longer had inline extent references, they were all keyed extent
references. The check is bogus, we can have extent items that are >=
the extent item size, not just > than the extent item size. Fix the
check so that we can generate metadata dumps properly.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Although btrfs-image will dump log tree, we will modify the restored
image if it's not a multi-device restore.
In that case, since log tree blocks are not recorded in the extent tree,
extent allocator will try to re-use the tree blocks belonging to log
trees, this would lead to transid mismatch if btrfs-restore chooses to
do device/chunk fixup.
This patch will fix such problem by pinning down all the log trees
blocks before fixing up the device and chunk trees.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Btrfs-image -r doesn't always create exactly the same fs of the original
fs, this is because btrfs-image can create dump from multi-device, and
restore it to single device.
Thus we need some special modification to chunk and device tree. This
behavior is mostly to handle the old behavior where we always restore
the metadata into SINGLE profile no matter what.
However this is not needed if the source fs only has one device, in that
case, we can use all the metadata as is, no need to modify the fs at
all, resulting an exact copy of metadata.
This patch will do extra check when doing restore, to avoid modify the
restored fs if the source is also single device.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This commit organises block groups cache in
btrfs_fs_info::block_group_cache_tree. And any dirty block groups are
linked in transaction_handle::dirty_bgs.
To keep coherence of bisect, it does almost replace in place:
1. Replace the old btrfs group lookup functions with new functions
introduced in former commits.
2. set_extent_bits(..., BLOCK_GROUP_DIRYT) things are replaced by linking
the block group cache into trans::dirty_bgs. Checking and clearing bits
are transformed too.
3. set_extent_bits(..., bit | EXTENT_LOCKED) things are replaced by
new the btrfs_add_block_group_cache() which inserts caches into
btrfs_fs_info::block_group_cache_tree directly. Other operations are
converted to tree operations.
Signed-off-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We are going to touch dirty_bgs in transaction directly, so every call
chain should pass @trans to the leaf functions.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We access btrfs_block_group_cache::item mostly for @used and @flags.
@flags is already a dedicated member in btrfs_block_group_cache, only
@used doesn't have a dedicated member.
This patch will remove btrfs_block_group_cache::item and add
btrfs_block_group_cache::used.
It's the btrfs-progs equivalent of the following kernel patches:
btrfs: move block_group_item::used to block group
btrfs: move block_group_item::flags to block group
btrfs: remove embedded block_group_cache::item
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Before this patch, we were using a very inefficient way to search
chunks:
We iterate through all clusters to find the chunk root tree block first,
then re-iterate all clusters again to find every child tree block.
Each time we need to iterate all clusters just to find a chunk tree
block. This is obviously inefficient, especially when chunk tree gets
larger. So the original author leaves a comment on it:
/* If you have to ask you aren't worthy */
static int search_for_chunk_blocks()
This patch will change the behavior so that we will only iterate all
clusters once.
The idea behind the optimization is, since we have the superblock
restored first, we could use the CHUNK_ITEMs in
super_block::sys_chunk_array to build a SYSTEM chunk mapping.
Then, when we start to iterate through all items, we can easily skip
unrelated items at different level:
- At cluster level
If a cluster starts beyond last system chunk map, it must not contain
any chunk tree blocks (as chunk tree blocks only lives inside system
chunks)
- At item level
If one item has no intersection with any system chunk map, then it
must not contain any tree blocks.
By this, we can iterate through all clusters just once, and find out all
CHUNK_ITEMs.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new helper function, is_in_sys_chunks(), to determine if an
item is in the range of system chunks.
Since btrfs-image will merge adjacent same type extents into one item,
this function is designed to return true for any bytes in system chunk
range.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently we are doing a pretty slow search for system chunks before
restoring real data.
The current behavior is to search all clusters for chunk tree root
first, then search all clusters again and again for every chunk tree
block.
This causes recursive calls and pretty slow start up, the only good news
is since chunk tree are normally small, we don't need to iterate too
many times, thus overall it's acceptable.
To address such bad behavior, we could take usage of system chunk array
in the super block.
By recording all system chunks ranges, we could easily determine if an
extent belongs to chunk tree, thus do one loop simple linear search for
chunk tree leaves.
This patch only introduces the code base for later patches.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is no need to allocate 2 * max_pending_size (which can be 256M) if
we're just extracting super block.
We only need to prepare BTRFS_SUPER_INFO_SIZE as buffer size.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We can easily get confusing error message like:
ERROR: restore failed: Success
This is caused by wrong "%m" usage, as we normally use ret to indicate
error, without populating errno.
This patch will fix it by output the return value directly as normally
we have extra error message to show more meaning message than the return
value.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
With the introduction of xxhash64 to btrfs-progs we created a crypto/
directory for all the hashes used in btrfs (although no
cryptographically secure hash is there yet).
Move the crc32c implementation from kernel-lib/ to crypto/ as well so we
have all hashes consolidated.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Adding this table will make extending btrfs-progs with new checksum types
easier.
Also add accessor functions to access the table fields.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
In preparation to supporting new checksum algorithm pass the checksum type
to btrfs_csum_data/btrfs_csum_final, this allows us to encapsulate any
differences in processing into the respective functions
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Build several standalone tools into one binary and switch the function
by name (symlink or hardlink).
* btrfs
* mkfs.btrfs
* btrfs-image
* btrfs-convert
* btrfstune
The static target is also supported. The name of resulting boxed
binaries is btrfs.box and btrfs.box.static . All the binaries can be
built at the same time without prior configuration.
text data bss dec hex filename
822454 27000 19724 869178 d433a btrfs
927314 28816 20812 976942 ee82e btrfs.box
2067745 58004 44736 2170485 211e75 btrfs.static
2627198 61724 83800 2772722 2a4ef2 btrfs.box.static
File sizes:
857496 btrfs
968536 btrfs.box
2141400 btrfs.static
2704472 btrfs.box.static
Standalone utilities:
512504 btrfs-convert
495960 btrfs-image
471224 btrfstune
491864 mkfs.btrfs
1747720 btrfs-convert.static
1411416 btrfs-image.static
1304256 btrfstune.static
1361696 mkfs.btrfs.static
So the shared 900K binary saves ~2M, or ~5.7M for static build.
Signed-off-by: David Sterba <dsterba@suse.cz>
Create directory for all sources that can be used by anything that's not
rellated to a relevant kernel part, all common functions, helpers,
utilities that do not fit any other specific category.
The traditional location would be probably lib/ with all things that are
statically linked to the main binaries, but we have libbtrfs and
libbtrfsutil so this would be confusing.
Signed-off-by: David Sterba <dsterba@suse.com>
This patch will export disk-io.c::check_super() as btrfs_check_super()
and use it in btrfs-image for extra verification.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When there are over 32 (in my example, 35) online CPUs, btrfs-image -c9
will just hang.
[CAUSE]
Btrfs-image has a hard coded limit (32) on how many threads we can use.
For the "-t" option we do the up limit check.
But when we don't specify "-t" option and speicified "-c" option, then
btrfs-image will try to auto detect the number of online CPUs, and use
it without checking if it's over the up limit.
And for num_threads larger than the up limit, we will over write the
adjust members of metadump_struct/mdrestore_struct, corrupting
pthread_mutex_t and pthread_cond_t, causing synchronising problem.
Nowadays, with SMT/HT and higher cpu core counts, it's not hard to go
beyond 32 threads, and hit the bug.
[FIX]
Just do extra num_threads check before using the number from sysconf().
Reviewed-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
BTRFS_COMPAT_EXTENT_TREE_V0 is introduced for a short time in kernel,
and it's over 10 years ago.
Nowadays there should be no user for that feature, and kernel has remove
this support in Jun, 2018. There is no need for btrfs-progs to support
it.
This patch will remove EXTENT_TREE_V0 related code and replace those
BUG_ON() to a more graceful error message.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
misc-test/021 will fail with the following error message:
====== RUN CHECK root_helper btrfs-progs/btrfs-image -r btrfs-progs/tests//test.img /dev/loop2
ERROR: failed to enlarge result image: Invalid argument
ERROR: failed to fix chunks and devices mapping, the fs may not be mountable: Invalid argument
extent buffer leak: start 30638080 len 16384
extent buffer leak: start 30408704 len 16384
WARNING: dirty eb leak (aborted trans): start 30408704 len 16384
ERROR: restore failed: Invalid argument
failed: root_helper btrfs-progs/btrfs-image -r btrfs-progs/tests//test.img /dev/loop2
test failed for case 021-image-multi-devices
[CAUSE]
For misc-test/021, we're using loopback device, which doesn't support
ftruncate. That's why it returns EINVAL.
[FIX]
Only call ftruncate64() if we're operating on a regluar file.
Fixes: a7a44164c84e ("btrfs-progs: image: Use correct device size when restoring")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When restoring btrfs image, the total_bytes of device item is not
updated correctly. In fact total_bytes can be left 0 for restored image.
It doesn't trigger any error because btrfs check never checks
total_bytes of dev item. However this is going to change.
Fix it by populating total_bytes of device item with the end position of
last dev extent to make later btrfs check happy.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add support for a new metadata_uuid field. This is just a preparatory
commit which switches all users of the fsid field for metdata comparison
purposes to utilize the new field. This more or less mirrors the
kernel patch, additionally:
* Update 'btrfs inspect-internal dump-super' to account for the new
field. This involes introducing the 'metadata_uuid' line to the
output and updating the logic for comparing the fs uuid to the
dev_item uuid.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since btrfs-image is just restoring tree blocks without really checking
if that tree block contents makes sense, for multi-device image, block
group items will keep the incorrect block group flags.
For example, for a metadata RAID1 data RAID0 btrfs recovered to a single
disk, its chunk tree will look like:
item 1 key (FIRST_CHUNK_TREE CHUNK_ITEM 22020096)
length 8388608 owner 2 stripe_len 65536 type SYSTEM
item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 30408704)
length 1073741824 owner 2 stripe_len 65536 type METADATA
item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 1104150528)
length 1073741824 owner 2 stripe_len 65536 type DATA
All chunks have correct type (SINGLE).
While its block group items will look like:
item 1 key (22020096 BLOCK_GROUP_ITEM 8388608)
block group used 16384 chunk_objectid 256 flags SYSTEM|RAID1
item 3 key (30408704 BLOCK_GROUP_ITEM 1073741824)
block group used 114688 chunk_objectid 256 flags METADATA|RAID1
item 11 key (1104150528 BLOCK_GROUP_ITEM 1073741824)
block group used 1572864 chunk_objectid 256 flags DATA|RAID0
All block group items still have the wrong profiles.
And btrfs check (lowmem mode for better output) will report error for
such image:
ERROR: chunk[22020096 30408704) related block group item flags mismatch, wanted: 2, have: 18
ERROR: chunk[30408704 1104150528) related block group item flags mismatch, wanted: 4, have: 20
ERROR: chunk[1104150528 2177892352) related block group item flags mismatch, wanted: 1, have: 9
This patch will do an extra repair for block group items to fix the
profile of them.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Current fixup_devices() will only remove DEV_ITEMs and reset DEV_ITEM
size. Later we need to do more fixup works, so change the name to
fixup_chunks_and_devices() and refactor the original device size fixup
to fixup_device_size().
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Similar to the changes where strerror(errno) was converted, continue
with the remaining cases where the argument was stored in another
variable.
The savings in object size are about 4500 bytes:
$ size btrfs.old btrfs.new
text data bss dec hex filename
805055 24248 19748 849051 cf49b btrfs.old
804527 24248 19748 848523 cf28b btrfs.new
Signed-off-by: David Sterba <dsterba@suse.com>
At restore time, btrfs-image will try to fixup dev extents, this will
trigger a transaction.
It's normally OK to start a new transaction, but for an image with log
tree, a new transaction will increment generation, and log tree generation
will not match super block generation + 1.
There is no good way to solve it yet. Just output a warning for now.
Signed-off-by: Qu Wenruo <wqu@suse.com>
[ update error message ]
Signed-off-by: David Sterba <dsterba@suse.com>
For read_data_extent() in convert/main.c it's using mirror number in a
incorrect way, which will not get correct copy for RAID1:
for (cur_mirror = 0; cur_mirror < num_copies; cur_mirror++) {
In such case, for RAID1 @cur_mirror will only be 0 and 1.
However for 0 and 1 case, btrfs_map_block() will only return the first
copy. To reach the 2nd copy, it correct @cur_mirror range should be 1
and 2.
So with this off-by-one error, btrfs-image will never be able to read
out data extent if the first stripe of the chunk is the missing one.
Fix it by starting @cur_mirror from 1 and to @num_copies (including).
Fixes: 2d46558b30 ("btrfs-progs: Use existing facility to replace read_data_extent function")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The kernel code no longer has BTRFS_CRC32_SIZE and only uses
btrfs_csum_sizes[]. So, update the progs code as well.
Suggested-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As btrfs is specific to Linux, %m can be used instead of strerror(errno)
in format strings. This has some size reduction benefits for embedded
systems.
glibc, musl, and uclibc-ng all support %m as a modifier to printf.
A quick glance at the BIONIC libc source indicates that it has
support for %m as well. BSDs and Windows do not but I do believe
them to be beyond the scope of btrfs-progs.
Compiled sizes on Ubuntu 16.04:
Before:
3916512 btrfs
233688 libbtrfs.so.0.1
4899 bcp
2367672 btrfs-convert
2208488 btrfs-corrupt-block
13302 btrfs-debugfs
2152160 btrfs-debug-tree
2136024 btrfs-find-root
2287592 btrfs-image
2144600 btrfs-map-logical
2130760 btrfs-select-super
2152608 btrfstune
2131760 btrfs-zero-log
2277752 mkfs.btrfs
9166 show-blocks
After:
3908744 btrfs
233256 libbtrfs.so.0.1
4899 bcp
2366560 btrfs-convert
2207432 btrfs-corrupt-block
13302 btrfs-debugfs
2151104 btrfs-debug-tree
2134968 btrfs-find-root
2281864 btrfs-image
2143536 btrfs-map-logical
2129704 btrfs-select-super
2151552 btrfstune
2130696 btrfs-zero-log
2276272 mkfs.btrfs
9166 show-blocks
Total savings: 23928 (24 kilo)bytes
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The function uses the reverse CRC32C table to quickly calculate a
4-byte suffix, that when added to the original data will make it
match desired checksum.
Author: Piotr Pawlow <pp@siedziba.pl>
[ minor adjustments ]
Signed-off-by: David Sterba <dsterba@suse.com>
The table will be used to speed up calculations of CRC32C collisions.
Author: Piotr Pawlow <pp@siedziba.pl>
Signed-off-by: David Sterba <dsterba@suse.com>
Function find_collision sometimes generated file names with non-printable
DEL characters (code 127), for example file name "|5gp!" would be changed
to "U'2<DEL>y" when using "crc-collisions" sanitize mode.
Author: Piotr Pawlow <pp@siedziba.pl>
Signed-off-by: David Sterba <dsterba@suse.com>
Tree blocks are always nodesize. As readahead is only an optimization,
exact size is not required and is only advisory.
Signed-off-by: David Sterba <dsterba@suse.com>
Making the code data-race safe requires that reads *and* writes
happen under a mutex lock, if any of the access are writes. See
Dmitri Vyukov, "Benign data races: what could possibly go wrong?"
for more details.
The fix here was to put most of the main loop of restore_worker
under a mutex lock.
This race was detected using fsck-tests/012-leaf-corruption.
==================
WARNING: ThreadSanitizer: data race
Write of size 4 by main thread:
#0 add_cluster btrfs-progs/image/main.c:1931
#1 restore_metadump btrfs-progs/image/main.c:2566
#2 main btrfs-progs/image/main.c:2859
Previous read of size 4 by thread T6:
#0 restore_worker btrfs-progs/image/main.c:1720
Location is stack of main thread.
Thread T6 (running) created by main thread at:
#0 pthread_create <null>
#1 mdrestore_init btrfs-progs/image/main.c:1868
#2 restore_metadump btrfs-progs/image/main.c:2534
#3 main btrfs-progs/image/main.c:2859
SUMMARY: ThreadSanitizer: data race btrfs-progs/image/main.c:1931 in
add_cluster
Signed-off-by: Adam Buchbinder <abuchbinder@google.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The only reasom read_tree_block() needs a btrfs_root parameter is to get
its node/sector size.
And long ago, I have already introduced a compactible interface,
read_tree_block_fs_info() to pass btrfs_fs_info instead of btrfs_root.
Since we have cleaned up all root->sector/node/stripesize users, we
should be OK to refactor read_tree_block() function.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
We correctly build an image from a multiple devices filesystem but when
restoring the image into a single device we were missing updating the
number of devices in the superblock to the value 1 (we already took care
of setting the number of stripes to 1 for each chunk item and setting
the device id for each chunk item to match the device id from the super
block).
This missing update of the number of devices makes it impossible to mount
the restored filesystem on recent kernels, more specifically since the
linux kernel commit 99e3ecfcb9f4ca35192d20a5bea158b81f600062
("Btrfs: add more validation checks for superblock"), that produce the
following message in the dmesg/syslog:
[21097.542047] BTRFS error (device sdi): super_num_devices 2 mismatch with num_devices 1 found here
[21097.543972] BTRFS error (device sdi): failed to read chunk tree: -22
[21097.720360] BTRFS error (device sdi): open_ctree failed
So fix this by updating the number of devices to 1 in the superblock.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While performing a memcpy, we are copying from uninitialized dst
as opposed to src->data. Though using eb->len is correct, I used
src->len to make it more readable.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>