Commit Graph

36 Commits

Author SHA1 Message Date
Qu Wenruo
4d5711f4dd btrfs-progs: tune: reject csum change if the fs is already using the target csum type
Currently "btrfstune --csum" allows us to change the csum to the same
one, this is good for testing but not good for end users, as if the end
user interrupts it, they have to resume the change (even it's to the
same csum type) until it finished, or kernel would reject such fs.

Furthermore, we never change the super block csum type until we
completely changed the csum type, thus for resume cases, the fs would
still show up as using the old csum type thus won't cause any problem
resuming.

So here we just reject the csum conversion if the target csum type is
the same as the existing csum type.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:33 +02:00
Qu Wenruo
03b4952971 btrfs-progs: tune: implement resume support for data csum objectid change
When the csum conversion is interrupted when changing data csum
objectid, we should just resume the objectid conversion.

This situation can be detected by comparing the old and new csum items.
They should both exist but doesn't intersect (interrupted halfway), or
only new csum items exist (interrupted after we have deleted old csums).

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo
1aa4a9c609 btrfs-progs: tune: implement resume support for half deleted old csums
If the csum conversion is interrupted when old csums are being deleted,
we should resume by continue deleting the old csums.

The function delete_old_data_csums() can handle half deleted cases
already.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo
7f077e3173 btrfs-progs: tune: implement resume support for empty csum tree
We have a very rare chance to hit a fs with empty csum tree but still
has CHANGING_DATA_CSUM flag.

The window is very small, but it's still possible, so handle it by
jumping directly to metadata csum change.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo
8f23ca0dd1 btrfs-progs: tune: implement resume support for csum tree without any new csum item
There are two possible situations where there is no new csum items at
all:

- We just inserted csum change item
  This means all csums are really old csum type, and we can start
  the conversion from the beginning, only need to skip the csum change
  item insert.

- We finished data csum conversion but not yet started metadata
  conversion
  This means all csums are already new csum type, and we can resume
  by starting changing metadata csums.

To distinguish the two cases, we need to read the first sector, and
verify the data content against both csum types.

If the csum matches with old csum type, we resume from generating new
data csum.
If the csum matches with new csum type, we resume from rewriting
metadata csum.
If the csum doesn't match either csum type, we have some big problems
then.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo
ba965712b6 btrfs-progs: tune: implement resume support for generating new data csum
If a csum change is interrupted at new data csum generation stage, we
can detect such situation by checking the old and new csum items.

At the new data csum generation stage, old csums are untouched, and only
new csums items (with different objectid) are inserted into the csum
tree.

Thus the old csum items should cover a larger range, while the new csum
items should be a subset of the old csums.

The resume part would start by re-generating the remaining part, then go
through the conversion stages.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo
9ff505e738 btrfs-progs: tune: implement resume support for metadata checksum
For interrupted metadata checksum change, we only need to call the same
change_meta_csums().

Since we don't have any record on the last converted metadata, thus we
have to go through all metadata anyway.
And the existing change_meta_csums() has already implemented the needed
checks to skip converted metadata.

Since we're here, also implement all the surrounding checks, like making
sure the new target csum type matches the interrupted one.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo
cd877c2d23 btrfs-progs: tune: delete the csum change item after converting the fs
[BUG]
Doing the following csum change in a row, it would fail:

 # mkfs.btrfs -f --csum crc32c $dev
 # btrfstune --csum sha256 $dev
 # btrfstune --csum crc32c $dev
 # btrfstune --csum sha256 $dev
 WARNING: Experimental build with unstable or unfinished features
 WARNING: Switching checksums is experimental, do not use for valuable data!

 Proceed to switch checksums
 ERROR: failed to insert csum change item: File exists
 ERROR: failed to generate new data csums: File exists
 WARNING: reserved space leaked, flag=0x4 bytes_reserved=16384
 extent buffer leak: start 30572544 len 16384
 extent buffer leak: start 30441472 len 16384
 WARNING: dirty eb leak (aborted trans): start 30441472 len 16384

[CAUSE]
During every csum change operation, btrfstune would insert an temporaray
csum change item into root tree.

But unfortunately after the conversion btrfstune doesn't properly delete
the csum change item, result the following items in the root tree:

	item 10 key (CSUM_CHANGE TEMPORARY_ITEM 0) itemoff 13423 itemsize 0
		temporary item objectid CSUM_CHANGE offset 0
		target csum type crc32c (0)
	item 11 key (CSUM_CHANGE TEMPORARY_ITEM 2) itemoff 13423 itemsize 0
		temporary item objectid CSUM_CHANGE offset 2
		target csum type sha256 (2)

Thus at the last conversion try to go back to SHA256, we failed to
insert the same item, and caused the above error.

[FIX]
After finishing the metadata csum conversion, do a proper removal of the
csum item.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo
e7b79adf5a btrfs-progs: tune: add the ability to change metadata csums
The csum change for metadata is like uuid-change, we go with in-place
csum update without any COW.

During the rewrite, we will manually check the csum (both old and new)
for each tree block.
And only rewrite the csum if the tree block matches its old csum.
(For tree block matches its new csum, we need to do nothing).

And when everything is done, just update the superblock to reflect the
csum type change.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo
87016bffbb btrfs-progs: tune: add the ability to migrate the temporary csum items to regular csum items
At this stage, the csum tree should only contain the temporary csum
items (CSUM_CHANGE, EXTENT_CSUM, logical), and no more old csum items.

Now we can convert those temporary csum items back to regular csum items
by changing their key objectids back to EXTENT_CSUM.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo
4cf02fbc84 btrfs-progs: tune: add the ability to delete old data csums
The new helper function, delete_old_data_csums(), would delete the old
data csums while keep the new one untouched.

Since the new data csums have a key objectid (-13) smaller than the
old data csums (-10), we can safely delete from the tail of the btree.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo
08a3bd7694 btrfs-progs: tune: add the ability to generate new data checksums
This patch would modify btrfs_csum_file_block() to handle csum type
other than the one used in the current fs.

The new data checksum would use a different objectid (-13) to
distinguish with the existing one (-10).
This needs to change tree-checker to skip the item size checks,
since new csum can be larger than the original csum.

After this stage, the resulted csum tree would look like this:

	item 0 key (CSUM_CHANGE EXTENT_CSUM 13631488) itemoff 8091 itemsize 8192
		range start 13631488 end 22020096 length 8388608
	item 1 key (EXTENT_CSUM EXTENT_CSUM 13631488) itemoff 7067 itemsize 1024
		range start 13631488 end 14680064 length 1048576

Note the itemsize is 8 times the original one, as the original csum is
CRC32, while target csum is SHA256, which is 8 times the size.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo
2c02b799b1 btrfs-progs: tune: add the ability to read and verify the data before generating new checksum
This patch introduces a new helper function,
read_verify_one_data_sector(), to do the data read and checksum
verification (against the old csum).

This data would be later re-used to generate a new csum.

And since we're introduce the helper function, we also build the
skeleton to iterate the data extents using the old csum tree.

This method is much better compared to iterating using extent tree,
which has no directly indicator on whether the data extent has csum or
not (nodatasum or preallocated).

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo
7dd3cda377 btrfs-progs: tune: implement the prerequisite checks for csum change
The overall idea is to make sure no running operations (balance,
dev-replace, dirty log) for the fs before csum change.

And also reject half converted csums for now.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo
33a21e7578 btrfs-progs: tune: rework the main idea of csum change
The existing attempt for changing csum types is as the following:

- Create a new temporary csum root
- Generate new data csums into the temporary csum root
- Drop the old csum tree and make the temporary one as csum root
- Change the checksums for metadata in-place

Unfortunately after some experiments, the csum root switch method has a
big pitfall, the backref items in extent tree.

Those backref items still point back to the old tree, meaning without a
lot of extra tricks, the extent tree would be corrupted.

Thus we have to go a new single tree variant:

- Generate new data csums into the csum root
  The new data csums would have a different objectid to distinguish
  them.
- Drop the old data csum items
- Change the key objectids of the new csums
- Change the checksums for metadata in-place

This means unfortunately we have to revert most of the old code, and
update the temporary item format.

The new temporary item would only record the target csum type.
At every stage we have a method to determine the progress, thus no need
for an item, but in the future it's still open for change.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo
b3327119ec btrfs-progs: fix -Wmissing-prototypes warnings
The fixes involve the following changes:

- Unexport functions which are not utilized out of the file
  * print_path_column()
  * parse_reflink_range()
  * btrfs_list_setup_print_column()
  * device_get_partition_size_sysfs()
  * max_zone_append_size()

- Include related headers before implementing the function
  * change-uuid.c
  * convert-bgt.c
  * seed.h

- Add missing headers caused by the above header changes
  * include <uuid/uuid.h> for tune/tune.h.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:31 +02:00
Josef Bacik
b3477244f9 btrfs-progs: update read_tree_block to match the kernel definition
The in-kernel version of read_tree_block adds some extra sanity checks
to make sure we don't return blocks that don't match what we expect.
This includes the owning root, the level, and the expected first key.
We don't actually do these checks in btrfs-progs, however kernel code
we're going to sync will expect this calling convention, so update it to
match the in-kernel code and then update all the callers.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:30 +02:00
Josef Bacik
a754fe29d9 btrfs-progs: sync uapi/btrfs.h into btrfs-progs
We want to keep this file locally as we want to be uptodate with
upstream, so we can build btrfs-progs regardless of which kernel is
currently installed.  Sync this with the upstream version and put it in
kernel-shared/uapi to maintain some semblance of where this file comes
from.

There are some changes that need to be synced back to kernel. A local
definition of static_assert is used to avoid compilation problems on gcc
(< 9) due to mandatory 2nd parameter.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:28 +02:00
Josef Bacik
bf0f3db765 btrfs-progs: introduce UASSERT() for purely userspace code
While syncing messages.[ch] I had to back out the ASSERT() code in
kerncompat.h, which means we now rely on the kernel code for ASSERT().
In order to maintain some semblance of separation introduce UASSERT()
and use that in all the purely userspace code.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:28 +02:00
Qu Wenruo
d4f4d7b76e btrfs-progs: tune: add --convert-to-free-space-tree option
From the very beginning of free-space-tree feature, we allow mount
option "space_cache=v2" to convert the filesystem to the new feature.

But this is not the proper practice for new features (no matter if it's
incompat or compat_ro).

This is already making the clear_cache/space_cache mount option more
complex.

Thus this patch introduces the proper way to enable free-space-tree, and
I hope one day we can deprecate the "space_cache=" mount option.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:27 +02:00
David Sterba
455b1cf094 btrfs-progs: tune: rename bgt conversion options
Rename the options so they more accurately reflect what the command is
actually doing. The feature is enabled/disabled in the end but it's not
a simple on/off like for others, the conversion takes time.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-27 14:23:52 +02:00
Qu Wenruo
68a04bc710 btrfs-progs: tune: add new option to convert back to extent tree
With previous btrfstune support to convert to block-group-tree, it has
implemented most of the infrastructure for bi-directional conversion.

This patch will implement the remaining conversion support to go back to
extent tree.

The modification includes:

- New convert_to_extent_tree() function in btrfstune.c
  It's almost the same as convert_to_bg_tree(), but with small changes:
  * No need to set extra features like NO_HOLES/FST.
  * Need to delete the block group tree when everything finished.

- Update btrfs_delete_and_free_root() to handle non-global roots
  Currently the function can only accepts global roots (extent/csum/free
  space trees)

  If we pass a non-global root into the function, we will screw up
  global_roots_tree and crash.

  Since we're going to use btrfs_delete_and_free_root() to free block
  group tree which is not a global tree, this is needed.

- New handling for half converted fs in get_last_converted_bg()
  There are two cases need to be handled:

  * The bg tree is already empty
    We need to grab the first bg in extent tree.
    Or at conversion function we will fail at grabbing the first bg.

  * The bg tree is not empty
    Then we need to grab the last bg in extent tree.

- Extra root switching in involved functions. This involves:

  * read_converting_block_groups()
  * insert_block_group_item()
  * update_block_group_item()

  We just need to update our target root according to the current
  compat_ro and super flags.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-19 01:10:24 +02:00
Qu Wenruo
716c3be363 btrfs-progs: move block-group-tree out of experimental features
The feedback from the community on block group tree is very positive,
the only complain is, end users need to recompile btrfs-progs with
experimental features to enjoy the new feature.

So let's move it out of experimental features and let more people enjoy
faster mount speed.

Also change the option of btrfstune, from `-b` to
`--enable-block-group-tree` to avoid short option.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 19:28:05 +02:00
David Sterba
a7fa81f296 btrfs-progs: open code print_usage where applicable
After previous change to usage() that now has the return code, there's
no purpose of the print_usage() wrapper so it can be removed.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-28 20:11:23 +01:00
Qu Wenruo
f61b90aff9 btrfs-progs: make usage call properly return an exit value
[BUG]
Currently cli/009 test case failed with different exit number:

  ====== RUN CHECK /home/adam/btrfs-progs/btrfstune --help
  usage: btrfstune [options] device
  [...]
  failed: /home/adam/btrfs-progs/btrfstune --help
  test failed for case 009-btrfstune

[CAUSE]
In tune/main.c, we have the following call on usage():

  static void print_usage(int ret)
  {
	usage(&tune_cmd);
	exit(ret);
  }

However usage() itself would always call exit(1):

  void usage(const struct cmd_struct *cmd)
  {
	usage_command_usagestr(cmd->usagestr, NULL, 0, true, true);
	exit(1);
  }

This makes prevents any caller of usage() to modify its exit number.

[FIX]
Add a new argument @error for print_usage(), so we can properly return 0
for -h/--help usage.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-28 20:11:23 +01:00
David Sterba
9be33f558c btrfs-progs: tune: update checksum conversion
The checksum conversion is still experimental and still does not convert
all filesystems correctly. Do not use on valuable data.

Previous implementation copied the UUID conversion which was not a good
base for the checksum conversion so it left out basically all trees
except extent and checksum.

This update adds the base for the required safety features:

- let the old csum tree intact until the full conversion is done (ie.
  all data are still verifiable)
- add on-disk status tracking item, this should keep the from/to
  checksum conversion, last generation to catch potential updates of the
  underlying filesystem if conversion is interrupted and the filesystem
  mounted
- convert most of the fundamental trees, the subvolumes, tree log and
  relocation trees are not converted
- trees are converted in-place to avoid potentially running out of space
  but this might be better done by transaction protection with a
  temporary tree

Known issues:

- not all trees are converted
- not all checksums are correctly inserted into the new tree and reading
  the files leads to EIO

Issue: #438
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-28 20:11:22 +01:00
David Sterba
79b3ed9ab8 btrfs-progs: tune: fix typos and mistakes in messages
The conversion have been copy&pasted from one code but not all messages
reflect that and mistakenly say fsid instead of csum, etc. Also rename
functions converting the trees to more descriptive names.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-28 19:49:31 +01:00
David Sterba
85f49ce2cd btrfs-progs: tune: convert printf to message helpers
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-28 19:49:31 +01:00
David Sterba
512467c17e btrfs-progs: tune: use help and cmd_struct for printing help text
Unify the btrfstune help text so it uses the help framework. The cmd
struct is set up only partially.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-18 17:44:02 +01:00
David Sterba
4de40f3a5e btrfs-progs: tune: remove unused includes
Remove includes reported by include-what-you-use after the refactoring.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-18 17:44:02 +01:00
David Sterba
3cf00a0a38 btrfs-progs: tune: factor out checksum change to own file
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-18 17:44:01 +01:00
David Sterba
44fecc5fd2 btrfs-progs: tune: factor out conversion to block group tree to own file
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-18 17:43:47 +01:00
David Sterba
8962705987 btrfs-progs: tune: factor out metadata UUID change to own file
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-18 17:43:32 +01:00
David Sterba
b2616464d3 btrfs-progs: tune: factor out UUID change to own file
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-18 17:43:19 +01:00
David Sterba
b380d972e2 btrfs-progs: tune: factor out seeding to own file
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-18 17:43:04 +01:00
David Sterba
f66478b961 btrfs-progs: move btrfstune to own directory
Move the source file to own directory so it can be further split and
refactored. File needs to be renamed to main.c so the build magic works.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-18 17:42:04 +01:00