[BUG]
'btrfstune --csum' would always fail for a newly created btrfs:
# truncate -s 1G test.img
# ./mkfs.btrfs -f test.img
# ./btrsftune --csum xxhash test.img
WARNING: Experimental build with unstable or unfinished features
WARNING: Switching checksums is experimental, do not use for valuable data!
Proceed to switch checksums
ERROR: failed to start transaction: Unknown error -28
ERROR: failed to start transaction: No space left on device
ERROR: failed to generate new data csums: No space left on device
ERROR: btrfstune failed
[CAUSE]
After commit e79f18a4a7 ("btrfs-progs: introduce a basic metadata free
space reservation check"), btrfs_start_transaction() would check the
metadata space.
But at the time of introduction of csum conversion, the parameter for
btrfs_start_transaction() was incorrect.
The 2nd parameter is the *number* of items to be added (if we're
deleting items, just pass 1).
However commit 08a3bd7694 ("btrfs-progs: tune: add the ability to
generate new data checksums") is using the item size, not the number of
items to be added.
This means we're passing a number 8 * nodesize times larger than the
original size, no wonder we would error out with -ENOSPC.
[FIX]
Use proper calcuation to convert the new csum item size to number of
leaves needed, and double it just in case.
Pull-request: #820
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
If a btrfs filesystem had dev-replace ran in the past, even it's already
finished, btrfstune would refuse to change its csum:
WARNING: Experimental build with unstable or unfinished features
WARNING: Switching checksums is experimental, do not use for valuable data!
Proceed to switch checksums
ERROR: running dev-replace detected, please finish or cancel it.
ERROR: btrfstune failed
[CAUSE]
The current dev-replace detection is only checking if we have
DEV_REPLACE item in device tree.
However DEV_REPLACE item will also exist even if a dev-replace finished,
so the existing check can not handle such case at all.
[FIX]
If an dev-replace item is found, further check the state of the item to
prevent false alerts.
Issue: #798
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are quite some variable shadowing in btrfs-progs, most of them are
just reusing some common names like tmp.
And those are quite safe and the shadowed one are even different type.
But there are some exceptions:
- @end in traverse_tree_blocks()
There is already an @end with the same type, but a different meaning
(the end of the current extent buffer passed in).
Just rename it to @child_end.
- @start in generate_new_data_csums_range()
Just rename it to @csum_start.
- @size of fixup_chunk_tree_block()
This one is particularly bad, we declare a local @size and initialize
it to -1, then before we really utilize the variable @size, we
immediately reset it to 0, then pass it to logical_to_physical().
Then there is a location to check if @size is -1, which will always be
true.
According to the code in logical_to_physical(), @size would be clamped
down by its original value, thus our local @size will always be 0.
This patch would rename the local @size to @found_size, and only set
it to -1.
The call site is only to pass something as logical_to_physical()
requires a non-NULL pointer.
We don't really need to bother the returned value.
- duplicated @ref declaration in run_delayed_tree_ref()
- duplicated @super_flags in change_meta_csums()
Just delete the duplicated one.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel we have a control structure call btrfs_tree_parent_check
to pass around the various sanity checks we have for extent buffers.
Add this to btrfs_tree_parent_check and then update the callers.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function and it's related functions only exist for the utilities
that populate existing file systems, and do not exist in the upstream
kernel. Move this function and the related function into it's own
common source file and out of the kernel-shared sources, and then update
all of the users to include the new location of this code.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel we just pass the btrfs_fs_info, and we const'ify the
new_key. Update the btrfs-progs definition to make syncing ctree.c
easier.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently "btrfstune --csum" allows us to change the csum to the same
one, this is good for testing but not good for end users, as if the end
user interrupts it, they have to resume the change (even it's to the
same csum type) until it finished, or kernel would reject such fs.
Furthermore, we never change the super block csum type until we
completely changed the csum type, thus for resume cases, the fs would
still show up as using the old csum type thus won't cause any problem
resuming.
So here we just reject the csum conversion if the target csum type is
the same as the existing csum type.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When the csum conversion is interrupted when changing data csum
objectid, we should just resume the objectid conversion.
This situation can be detected by comparing the old and new csum items.
They should both exist but doesn't intersect (interrupted halfway), or
only new csum items exist (interrupted after we have deleted old csums).
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If the csum conversion is interrupted when old csums are being deleted,
we should resume by continue deleting the old csums.
The function delete_old_data_csums() can handle half deleted cases
already.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have a very rare chance to hit a fs with empty csum tree but still
has CHANGING_DATA_CSUM flag.
The window is very small, but it's still possible, so handle it by
jumping directly to metadata csum change.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are two possible situations where there is no new csum items at
all:
- We just inserted csum change item
This means all csums are really old csum type, and we can start
the conversion from the beginning, only need to skip the csum change
item insert.
- We finished data csum conversion but not yet started metadata
conversion
This means all csums are already new csum type, and we can resume
by starting changing metadata csums.
To distinguish the two cases, we need to read the first sector, and
verify the data content against both csum types.
If the csum matches with old csum type, we resume from generating new
data csum.
If the csum matches with new csum type, we resume from rewriting
metadata csum.
If the csum doesn't match either csum type, we have some big problems
then.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If a csum change is interrupted at new data csum generation stage, we
can detect such situation by checking the old and new csum items.
At the new data csum generation stage, old csums are untouched, and only
new csums items (with different objectid) are inserted into the csum
tree.
Thus the old csum items should cover a larger range, while the new csum
items should be a subset of the old csums.
The resume part would start by re-generating the remaining part, then go
through the conversion stages.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For interrupted metadata checksum change, we only need to call the same
change_meta_csums().
Since we don't have any record on the last converted metadata, thus we
have to go through all metadata anyway.
And the existing change_meta_csums() has already implemented the needed
checks to skip converted metadata.
Since we're here, also implement all the surrounding checks, like making
sure the new target csum type matches the interrupted one.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Doing the following csum change in a row, it would fail:
# mkfs.btrfs -f --csum crc32c $dev
# btrfstune --csum sha256 $dev
# btrfstune --csum crc32c $dev
# btrfstune --csum sha256 $dev
WARNING: Experimental build with unstable or unfinished features
WARNING: Switching checksums is experimental, do not use for valuable data!
Proceed to switch checksums
ERROR: failed to insert csum change item: File exists
ERROR: failed to generate new data csums: File exists
WARNING: reserved space leaked, flag=0x4 bytes_reserved=16384
extent buffer leak: start 30572544 len 16384
extent buffer leak: start 30441472 len 16384
WARNING: dirty eb leak (aborted trans): start 30441472 len 16384
[CAUSE]
During every csum change operation, btrfstune would insert an temporaray
csum change item into root tree.
But unfortunately after the conversion btrfstune doesn't properly delete
the csum change item, result the following items in the root tree:
item 10 key (CSUM_CHANGE TEMPORARY_ITEM 0) itemoff 13423 itemsize 0
temporary item objectid CSUM_CHANGE offset 0
target csum type crc32c (0)
item 11 key (CSUM_CHANGE TEMPORARY_ITEM 2) itemoff 13423 itemsize 0
temporary item objectid CSUM_CHANGE offset 2
target csum type sha256 (2)
Thus at the last conversion try to go back to SHA256, we failed to
insert the same item, and caused the above error.
[FIX]
After finishing the metadata csum conversion, do a proper removal of the
csum item.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The csum change for metadata is like uuid-change, we go with in-place
csum update without any COW.
During the rewrite, we will manually check the csum (both old and new)
for each tree block.
And only rewrite the csum if the tree block matches its old csum.
(For tree block matches its new csum, we need to do nothing).
And when everything is done, just update the superblock to reflect the
csum type change.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
At this stage, the csum tree should only contain the temporary csum
items (CSUM_CHANGE, EXTENT_CSUM, logical), and no more old csum items.
Now we can convert those temporary csum items back to regular csum items
by changing their key objectids back to EXTENT_CSUM.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The new helper function, delete_old_data_csums(), would delete the old
data csums while keep the new one untouched.
Since the new data csums have a key objectid (-13) smaller than the
old data csums (-10), we can safely delete from the tail of the btree.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch would modify btrfs_csum_file_block() to handle csum type
other than the one used in the current fs.
The new data checksum would use a different objectid (-13) to
distinguish with the existing one (-10).
This needs to change tree-checker to skip the item size checks,
since new csum can be larger than the original csum.
After this stage, the resulted csum tree would look like this:
item 0 key (CSUM_CHANGE EXTENT_CSUM 13631488) itemoff 8091 itemsize 8192
range start 13631488 end 22020096 length 8388608
item 1 key (EXTENT_CSUM EXTENT_CSUM 13631488) itemoff 7067 itemsize 1024
range start 13631488 end 14680064 length 1048576
Note the itemsize is 8 times the original one, as the original csum is
CRC32, while target csum is SHA256, which is 8 times the size.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch introduces a new helper function,
read_verify_one_data_sector(), to do the data read and checksum
verification (against the old csum).
This data would be later re-used to generate a new csum.
And since we're introduce the helper function, we also build the
skeleton to iterate the data extents using the old csum tree.
This method is much better compared to iterating using extent tree,
which has no directly indicator on whether the data extent has csum or
not (nodatasum or preallocated).
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The overall idea is to make sure no running operations (balance,
dev-replace, dirty log) for the fs before csum change.
And also reject half converted csums for now.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The existing attempt for changing csum types is as the following:
- Create a new temporary csum root
- Generate new data csums into the temporary csum root
- Drop the old csum tree and make the temporary one as csum root
- Change the checksums for metadata in-place
Unfortunately after some experiments, the csum root switch method has a
big pitfall, the backref items in extent tree.
Those backref items still point back to the old tree, meaning without a
lot of extra tricks, the extent tree would be corrupted.
Thus we have to go a new single tree variant:
- Generate new data csums into the csum root
The new data csums would have a different objectid to distinguish
them.
- Drop the old data csum items
- Change the key objectids of the new csums
- Change the checksums for metadata in-place
This means unfortunately we have to revert most of the old code, and
update the temporary item format.
The new temporary item would only record the target csum type.
At every stage we have a method to determine the progress, thus no need
for an item, but in the future it's still open for change.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The fixes involve the following changes:
- Unexport functions which are not utilized out of the file
* print_path_column()
* parse_reflink_range()
* btrfs_list_setup_print_column()
* device_get_partition_size_sysfs()
* max_zone_append_size()
- Include related headers before implementing the function
* change-uuid.c
* convert-bgt.c
* seed.h
- Add missing headers caused by the above header changes
* include <uuid/uuid.h> for tune/tune.h.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The checksum conversion is still experimental and still does not convert
all filesystems correctly. Do not use on valuable data.
Previous implementation copied the UUID conversion which was not a good
base for the checksum conversion so it left out basically all trees
except extent and checksum.
This update adds the base for the required safety features:
- let the old csum tree intact until the full conversion is done (ie.
all data are still verifiable)
- add on-disk status tracking item, this should keep the from/to
checksum conversion, last generation to catch potential updates of the
underlying filesystem if conversion is interrupted and the filesystem
mounted
- convert most of the fundamental trees, the subvolumes, tree log and
relocation trees are not converted
- trees are converted in-place to avoid potentially running out of space
but this might be better done by transaction protection with a
temporary tree
Known issues:
- not all trees are converted
- not all checksums are correctly inserted into the new tree and reading
the files leads to EIO
Issue: #438
Signed-off-by: David Sterba <dsterba@suse.com>
The conversion have been copy&pasted from one code but not all messages
reflect that and mistakenly say fsid instead of csum, etc. Also rename
functions converting the trees to more descriptive names.
Signed-off-by: David Sterba <dsterba@suse.com>