This patch would modify btrfs_csum_file_block() to handle csum type
other than the one used in the current fs.
The new data checksum would use a different objectid (-13) to
distinguish with the existing one (-10).
This needs to change tree-checker to skip the item size checks,
since new csum can be larger than the original csum.
After this stage, the resulted csum tree would look like this:
item 0 key (CSUM_CHANGE EXTENT_CSUM 13631488) itemoff 8091 itemsize 8192
range start 13631488 end 22020096 length 8388608
item 1 key (EXTENT_CSUM EXTENT_CSUM 13631488) itemoff 7067 itemsize 1024
range start 13631488 end 14680064 length 1048576
Note the itemsize is 8 times the original one, as the original csum is
CRC32, while target csum is SHA256, which is 8 times the size.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch introduces a new helper function,
read_verify_one_data_sector(), to do the data read and checksum
verification (against the old csum).
This data would be later re-used to generate a new csum.
And since we're introduce the helper function, we also build the
skeleton to iterate the data extents using the old csum tree.
This method is much better compared to iterating using extent tree,
which has no directly indicator on whether the data extent has csum or
not (nodatasum or preallocated).
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The overall idea is to make sure no running operations (balance,
dev-replace, dirty log) for the fs before csum change.
And also reject half converted csums for now.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The existing attempt for changing csum types is as the following:
- Create a new temporary csum root
- Generate new data csums into the temporary csum root
- Drop the old csum tree and make the temporary one as csum root
- Change the checksums for metadata in-place
Unfortunately after some experiments, the csum root switch method has a
big pitfall, the backref items in extent tree.
Those backref items still point back to the old tree, meaning without a
lot of extra tricks, the extent tree would be corrupted.
Thus we have to go a new single tree variant:
- Generate new data csums into the csum root
The new data csums would have a different objectid to distinguish
them.
- Drop the old data csum items
- Change the key objectids of the new csums
- Change the checksums for metadata in-place
This means unfortunately we have to revert most of the old code, and
update the temporary item format.
The new temporary item would only record the target csum type.
At every stage we have a method to determine the progress, thus no need
for an item, but in the future it's still open for change.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add structures and prototypes. Leave out undefined types (time64_t,
block_group_rsv) for now. Kernel 6.4-rc1.
Signed-off-by: David Sterba <dsterba@suse.com>
The send.h for libbtrfs has been separated some time ago so we're now
free to keep up with kernel, 6.4-rc1.
Signed-off-by: David Sterba <dsterba@suse.com>
It's a known problem that a received subvolume would lose its UUID after
switching to RW. Thus it can lead to later receive problems for
snapshotting and cloning.
In that case, we just output a simple error message like:
ERROR: cannot find parent subvolume
Or
ERROR: clone: did not find source subvol
Normally we need to use "btrfs receive --dump" to know what the missing
subvolume UUID is, which would take extra work.
This patch would:
- Add extra subvolume UUID to the output
- Unify the error messages to the same format
Now the error messages would look like:
ERROR: snapshot: cannot find parent subvolume 1b4e28ba-2fa1-11d2-883f-b9a761bde3fb
ERROR: clone: cannot find source subvolume 1b4e28ba-2fa1-11d2-883f-b9a761bde3fb
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The new test case would create an empty ext4 with 64K block size, which
can lead to a new data chunk which is no longer 1:1 mapped.
Then convert the fs and verify it with --check-data-csum to make sure
the image file is fine.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a report that btrfs-convert leads to bad csum for the image
file.
The reproducer looks like this:
(note the 64K block size, it's used to force a certain chunk layout)
# touch test.img
# truncate -s 10G test.img
# mkfs.ext4 -b 64K test.img
# btrfs-convert -N 64K test.img
# btrfs check --check-data-csum test.img
Opening filesystem to check...
Checking filesystem on /home/adam/test.img
UUID: 39d49537-a9f5-47f1-b6ab-7857707b9133
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking csums against data
mirror 1 bytenr 4563140608 csum 0x3f1fa0ef expected csum 0xa4c4c072
mirror 1 bytenr 4563206144 csum 0x55dcf0d3 expected csum 0xa4c4c072
mirror 1 bytenr 4563271680 csum 0x4491b00a expected csum 0xa4c4c072
mirror 1 bytenr 4563337216 csum 0x655d1f61 expected csum 0xa4c4c072
mirror 1 bytenr 4563402752 csum 0xd37114d3 expected csum 0xa4c4c072
mirror 1 bytenr 4563468288 csum 0x4c2dab30 expected csum 0xa4c4c072
mirror 1 bytenr 4563533824 csum 0xa80fceed expected csum 0xa4c4c072
mirror 1 bytenr 4563599360 csum 0xaf610db8 expected csum 0xa4c4c072
mirror 1 bytenr 4563795968 csum 0x67b3c8a0 expected csum 0xa4c4c072
ERROR: errors found in csum tree
[6/7] checking root refs
...
[CAUSE]
Above initial failure is for logical bytenr of 4563140608, which is
inside the relocated range of the image file offset [0, 1M).
During convert, we migrate the original image file ranges which would
later be covered by super and other reserved ranges.
The migration happens as:
- Read out the original data
- Reserve a new file extent
- Write the data back to the file extent
Note that, the new file extent can be inside some new data chunks,
thus it's no longer 1:1 mapped.
- Generate the new csum for the new file extent
The problem happens at the last stage. We should read out the data from
the new file extent, but we call read_disk_extent() using the logical
bytenr, however read_disk_extent() is not doing logical -> physical
mapping.
Thus we will read some garbage, not the newly written data, and use
those garbage to generate csum. And caused the above problem.
[FIX]
Instead of read_disk_extent(), call read_data_from_disk(), which would
do the proper logical -> physical mapping, thus would fix the bug.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The function write_and_map_eb() is quite abused as a way to write any
generic buffer back to disk.
But we have a more suitable function already, write_data_to_disk().
This patch would remove the abused write_data_to_disk() calls, and
convert the only three valid call sites to write_data_to_disk() instead.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The following functions accept a buffer for write, which can be marked
as const:
- btrfs_pwrite()
- write_data_to_disk()
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It's not a common practice to use the same io function for both read and
write (we have pread() and pwrite(), not pio()).
Furthermore the original function has the following problems:
- Not returning proper error number
If we had ioctl/stat errors we just return 0 with errno set.
Thus caller would treat it as a short read, not a proper error.
- Unnecessary @ret_rw
This is not that obvious if we have different handling for read and
write, but if we split them it's super obvious we can reuse @ret.
- No proper copy back for short read
- Unable to constify the @buf pointer for write operation
All those problems would be addressed in this patch.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When compiling on a system with gcc 12.2.1, the following warning is
generated. It can be fixed by adding a static storage class specifier.
cmds/inspect.c:733:5: warning: no previous prototype for ‘cmp_cse_devid_start’ [-Wmissing-prototypes]
733 | int cmp_cse_devid_start(const void *va, const void *vb)
| ^~~~~~~~~~~~~~~~~~~
cmds/inspect.c:754:5: warning: no previous prototype for ‘cmp_cse_devid_lstart’ [-Wmissing-prototypes]
754 | int cmp_cse_devid_lstart(const void *va, const void *vb)
| ^~~~~~~~~~~~~~~~~~~~
cmds/inspect.c:775:5: warning: no previous prototype for ‘print_list_chunks’ [-Wmissing-prototypes]
775 | int print_list_chunks(struct list_chunks_ctx *ctx, unsigned sort_mode,
| ^~~~~~~~~~~~~~~~~
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Compiler is throwing out this false positive warning in the following
function flow. Apparently, %parent and %p are initialized in tree_search_for_insert().
__set_extent_bit()
state = tree_search_for_insert(tree, start, &p, &parent);
insert_state_fast(tree, prealloc, p, parent, bits, changeset);
rb_link_node(&state->rb_node, parent, node);
Compile warnings:
In file included from ./common/extent-cache.h:23,
from kernel-shared/ctree.h:26,
from kernel-shared/extent-io-tree.c:4:
kernel-shared/extent-io-tree.c: In function ‘__set_extent_bit’:
./kernel-lib/rbtree.h:80:28: warning: ‘parent’ may be used uninitialized in this function [-Wmaybe-uninitialized]
node->__rb_parent_color = (unsigned long)parent;
^~~~~~~~~~~~~~~~~~~~~
kernel-shared/extent-io-tree.c:996:18: note: ‘parent’ was declared here
struct rb_node *parent;
^~~~~~
In file included from ./common/extent-cache.h:23,
from kernel-shared/ctree.h:26,
from kernel-shared/extent-io-tree.c:4:
./kernel-lib/rbtree.h:83:11: warning: ‘p’ may be used uninitialized in this function [-Wmaybe-uninitialized]
*rb_link = node;
~~~~~~~~~^~~~~~
kernel-shared/extent-io-tree.c:995:19: note: ‘p’ was declared here
struct rb_node **p;
Fix:
Initialize to NULL, as in the kernel.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
With all the known warnings fixed, we can enable -Wmissing-prototypes
and prevent such warnings from happening.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The fixes involve the following changes:
- Unexport functions which are not utilized out of the file
* print_path_column()
* parse_reflink_range()
* btrfs_list_setup_print_column()
* device_get_partition_size_sysfs()
* max_zone_append_size()
- Include related headers before implementing the function
* change-uuid.c
* convert-bgt.c
* seed.h
- Add missing headers caused by the above header changes
* include <uuid/uuid.h> for tune/tune.h.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The optimized implementation sha256_process_x86() is not declared
anywhere, this can be caught by -Wmissing-prototypes option.
Just declare it properly in sha.h.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since kernel 3.12, any btrfs mounted by a kernel would have an UUID
tree created, to record all the UUID of its subvolumes.
Without UUID tree, libbtrfs send functionality has to go through all the
subvolumes seen so far, and record those subvolumes' UUID internally so
that libbtrfs send can locate a desired subvolume.
Since commit 194b90aa2c ("btrfs-progs: libbtrfs: remove declarations
without exports in send-utils") we're deprecating this old interface
already, meaning deprecated users won't be able to build its own
subvolume list already.
And we received no error report on this so far. So let's finish the
cleanup by removing the support for fs without an UUID tree.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function is introduced by commit b031fe84fd ("btrfs-progs: zoned:
implement zoned chunk allocator") but it never got called since then.
Furthermore in the kernel zoned code, there is no such function from the
very beginning, and everything is handled by
btrfs_find_allocatable_zones().
Thus we can safely remove the function.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This syncs tree-checker.c from the kernel. The main modification was to
add a open ctree flag to skip the deeper leaf checks, and plumbing this
through tree-checker.c. We need this for things like fsck or
btrfs-image that need to work with slightly corrupted file systems, and
these checks simply make us unable to look at the corrupted blocks.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In btrfs-progs we check the actual leaf pointers as well as the chunk
itself in btrfs_check_chunk_valid. However in the kernel the leaf stuff
is handled separately as part of the read, and then we have the chunk
checker itself. Change the btrfs-progs version to match the in-kernel
version temporarily so it makes syncing the in-kernel code easier.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
These helpers are called __btrfs_check_* in the kernel as they return
the special enum to indicate what part of the leaf/node failed. Rename
the uses in btrfs-progs to match the kernel naming convention to make it
easier to sync that code.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is used by tree-checker.c, so sync this into volumes.h to make it
easier to sync tree-checker.c.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This exists in the kernel to do the read on an extent buffer we may have
already looked up and initialized. Simply create this helper by
extracting out the existing code from read_tree_block and make
read_tree_block call this helper. This gives us the helper we need to
sync ctree.c into btrfs-progs, and keeps the code the same in
btrfs-progs.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have this extra parameter in the kernel to indicate if we are atomic
and thus can't lock the io_tree when checking the transid for an extent
buffer. This isn't necessary in btrfs-progs, but to allow for easier
syncing of ctree.c add this argument to our copy of btrfs_buffer_uptodate.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This exists in the kernel as a wrapper for readahead_tree_block, and is
used extensively in ctree.c in the kernel. Sync this helper so that we
can easily sync ctree.c
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel we only take a bytenr for this as the extent buffer cache
is indexed on bytenr. Since we're passing in the btrfs_fs_info we can
simply use the ->nodesize for the blocksize, and drop the blocksize
argument completely. This brings us into parity with the kernel, which
will allow the syncing of ctree.c.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When we sync ctree.c into btrfs-progs we're going to need to have a
bunch of flags and definitions that exist in btrfs_path in the kernel
that do not exist in btrfs_progs. Sync these changes into btrfs-progs
to enable us to sync ctree.c into btrfs-progs.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We were using this in cmds/restore.c, however it only does anything if
path->reada is set, and we don't set that in cmds/restore.c. Remove
this usage of reada_for_search and make the function static.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The in-kernel version of read_tree_block adds some extra sanity checks
to make sure we don't return blocks that don't match what we expect.
This includes the owning root, the level, and the expected first key.
We don't actually do these checks in btrfs-progs, however kernel code
we're going to sync will expect this calling convention, so update it to
match the in-kernel code and then update all the callers.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is used to protect the used count for btrfs_root in the kernel,
sync it to btrfs-progs to allow us to sync ctree.c into btrfs-progs.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is sprinkled throughout the kernel code for the in-kernel self
tests. Add the helper to btrfs-progs to make it easier to sync the
kernel code.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This does exactly what free_extent_buffer_nocache does, but we call
btrfs_free_extent_buffer_stale in the kernel code, so add this extra
helper. Once the kernel code is synced we can get rid of the old
helper.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel we pass in the root_id for btrfs_free_tree_block instead
of the root itself. Update the btrfs-progs version of the helper to
match what we do in the kernel.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Neither of these actually need the root argument, we provide all the
information for the ref through the arguments we pass through. Remove
the root argument from both of them. These needed to be done in the
same patch because of the __btrfs_mod_ref helper which will pick one or
the other function for processing reference updates.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This exists in the kernel and is used throughout ctree.c, sync this
helper to make sync'ing ctree.c easier.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In order to sync ctree.c we're going to have to have definitions for the
tree-mod-log stuff. However we don't need any of the code, we don't do
live backref lookups in btrfs-progs, so simply sync the header file and
stub all the helpers out.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This isn't used anywhere other than ctree.c, make it static.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is a mirror of the change I've done in the kernel, but in progs
it's even more simply because clean_tree_block was just a wrapper around
clear_extent_buffer_dirty. Change this to btrfs_clear_buffer_dirty, and
then update all the callers to use this helper instead of
clean_tree_block and clear_extent_buffer_dirty.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel we just pass in the extent_buffer and get the fields we
need from that to update the extent ref flags. Update this helper to
match the kernel to make syncing ctree.c easier.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This helper in the kernel is named btrfs_set_disk_extent_flags, which is
a more accurate description than btrfs_set_block_flags. Rename to the
in kernel name to make syncing ctree.c easier.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The following are some extent buffer helpers we have in the kernel but
not in btrfs-progs. Sync these in to make syncing ctree.c easier.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is how btrfs_alloc_tree_block is defined in the kernel, so when we
go to sync this code in it'll be easier if we're already setup to accept
this argument. Since we're in progs we don't care about nesting so just
use BTRFS_NORMAL_NESTING everywhere, as we sync in the kernel code it'll
get updated to whatever is appropriate.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>