Commit Graph

72 Commits

Author SHA1 Message Date
Qu Wenruo 6a3e646139 btrfs-progs: change-csum: handle finished dev-replace correctly
[BUG]
If a btrfs filesystem had dev-replace ran in the past, even it's already
finished, btrfstune would refuse to change its csum:

  WARNING: Experimental build with unstable or unfinished features
  WARNING: Switching checksums is experimental, do not use for valuable data!

  Proceed to switch checksums
  ERROR: running dev-replace detected, please finish or cancel it.
  ERROR: btrfstune failed

[CAUSE]
The current dev-replace detection is only checking if we have
DEV_REPLACE item in device tree.
However DEV_REPLACE item will also exist even if a dev-replace finished,
so the existing check can not handle such case at all.

[FIX]
If an dev-replace item is found, further check the state of the item to
prevent false alerts.

Issue: #798
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-03 21:59:43 +02:00
Aidan Gibson 7b48204560 btrfs-progs: tune: fix minor spelling error in message in checksum change
Pull-request: #799
Author: Aidan Gibson <tronicdude@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-03 21:09:51 +02:00
Anand Jain e6a466d44d btrfs-progs: tune: fix btrfstune --help for -m -M option
The -m | -M option for btrfstune, sounds like metadata_uuid is being
changed which is wrong. The fsid is being changed the original fsid is
being copied into the metadata_uuid. So update the help text.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-17 17:36:59 +02:00
Qu Wenruo 738bb7f0f0 btrfs-progs: tune: properly open zoned devices for RW
[BUG]
There is a report that, for zoned devices btrfstune is unable to convert
it to block group tree.

 # btrfstune /dev/nullb0 --convert-to-block-group-tree
 Error reading 1342193664, -1
 Error reading 1342193664, -1
 ERROR: cannot read chunk root
 ERROR: open ctree failed

[CAUSE]
For read-write opened zoned devices, all the read/write has to be
aligned to its sector size.

However btrfs stores its metadata by extent_buffer::data[], which has
all the structures before it, thus never aligned to zoned device sector
size.

Normally we would require btrfs_pread() and btrfs_pwrite() to do the
extra alignment, but during open_ctree(), we are not aware if a device
is zoned or not.

Thus we rely on if the fd is opened with O_DIRECT flag, if the fd has
O_DIRECT, then we would temporarily set fs_info->zoned for chunk tree
read.

Unforunately not all open_ctree_fd() callers have the flags set
properly, and btrfstune is one of the missing call site.

This makes all the read not properly aligned and cause read failure.

[FIX]
Just manually check if the target device is a zoned one, and set
O_DIRECT accordingly.

Issue: #765
Pull-request: #767
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-04-18 19:16:00 +02:00
Qu Wenruo b310b231f1 btrfs-progs: tune: add the missing newline for --convert-from-block-group-tree
There is a missing newline for a successful
--convert-from-block-group-tree run, meanwhile
--convert-to-block-group-tree has the correct newline.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-04-02 20:49:55 +02:00
Qu Wenruo ace7d241cc btrfs-progs: tune: fix the missing close() of filesystem fd
[BUG]
In "btrfs tune" subcommand, we're using open_ctree_fd(), which
requires passing a valid fd, and the caller is responsible to properly
handling the lifespan of the fd.

But we just call close_ctree() and btrfs_close_all_devices(), without
closing the fd.

[FIX]
Just manually close the fd.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-02-08 08:30:37 +01:00
David Sterba d71ece203a btrfs-progs: tune: use long option to enable simple quota
The initial patch used -q for enabling simple quota but this is not
right, -q is reserved for --quiet and we want the long options first.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-21 15:51:07 +02:00
Qu Wenruo 146cca7e16 btrfs-progs: move clear-cache.[ch] from check/ to common/ directory
The clear-cache functionality is shared by several commands:

- btrfs check
  For --clear-cache and --clear-ino-cache.

- btrfstune
  Mostly for block-group-tree feature conversion.

- btrfs-convert
  To enable the now default v2 space cache.

Thus it's no longer proper to keep clear-cache.[ch] under check/
directory, move them to common/ directory.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-13 18:13:12 +02:00
Qu Wenruo 930c6362d1 btrfs-progs: fix all variable shadowing
There are quite some variable shadowing in btrfs-progs, most of them are
just reusing some common names like tmp.
And those are quite safe and the shadowed one are even different type.

But there are some exceptions:

- @end in traverse_tree_blocks()
  There is already an @end with the same type, but a different meaning
  (the end of the current extent buffer passed in).
  Just rename it to @child_end.

- @start in generate_new_data_csums_range()
  Just rename it to @csum_start.

- @size of fixup_chunk_tree_block()
  This one is particularly bad, we declare a local @size and initialize
  it to -1, then before we really utilize the variable @size, we
  immediately reset it to 0, then pass it to logical_to_physical().
  Then there is a location to check if @size is -1, which will always be
  true.

  According to the code in logical_to_physical(), @size would be clamped
  down by its original value, thus our local @size will always be 0.

  This patch would rename the local @size to @found_size, and only set
  it to -1.
  The call site is only to pass something as logical_to_physical()
  requires a non-NULL pointer.
  We don't really need to bother the returned value.

- duplicated @ref declaration in run_delayed_tree_ref()
- duplicated @super_flags in change_meta_csums()
  Just delete the duplicated one.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-10 19:16:29 +02:00
Qu Wenruo b874fe6dda btrfs-progs: properly reserve metadata space for bgt conversion
There is a bug report [1] that btrfstune failed to convert to block
group tree.  The bug report shows some very weird call trace, mostly
fail at free space tree code.

One of the concern is the failed conversion may be caused by exhausted
metadata space.  In that case, we're doing quite some metadata
operations (one transaction to convert 64 block groups in one go).

But for each transaction we have only reserved the metadata for one
block group conversion (1 block for extent tree and 1 block for block
group tree, this excludes free space and extent tree operations, as they
should go global reserve).

Although we're not 100% sure about the failed conversion, at least we
should handle the metadata reservation correctly, by properly reserve
the needed space for the batch, and reduce the batch size just in case
there isn't much metadata space left.

[1] https://lore.kernel.org/linux-btrfs/3c93d0b5-a8cb-ebe3-f8d6-76ea6340f23e@gmail.com/

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-10 19:15:35 +02:00
David Sterba 21aa6777b2 btrfs-progs: clean up includes, using include-what-you-use
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:57 +02:00
Josef Bacik f4e16e0238 btrfs-progs: update read_tree_block to take a btrfs_parent_tree_check
In the kernel we've added a control struct to handle the different
checks we want to do on extent buffers when we read them.  Update our
copy of read_tree_block to take this as an argument, then update all of
the callers to use the new structure.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:57 +02:00
Josef Bacik a38570f9d6 btrfs-progs: use btrfs_tree_parent_check for btrfs_read_extent_buffer
In the kernel we have a control structure call btrfs_tree_parent_check
to pass around the various sanity checks we have for extent buffers.
Add this to btrfs_tree_parent_check and then update the callers.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:57 +02:00
Josef Bacik 3808db2b3e btrfs-progs: move btrfs_record_file_extent and code into a new file
This function and it's related functions only exist for the utilities
that populate existing file systems, and do not exist in the upstream
kernel.  Move this function and the related function into it's own
common source file and out of the kernel-shared sources, and then update
all of the users to include the new location of this code.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:56 +02:00
Josef Bacik 8069b8b8cd btrfs-progs: drop btrfs_init_path
This simply zero's out the path, and this is used everywhere we use a
stack path.  Drop this usage and simply init the path's to empty instead
of using a function to do the memset.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:56 +02:00
Josef Bacik 8ea15251e7 btrfs-progs: update btrfs_set_item_key_safe to match kernel definition
In the kernel we just pass the btrfs_fs_info, and we const'ify the
new_key.  Update the btrfs-progs definition to make syncing ctree.c
easier.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:55 +02:00
Anand Jain cabf70c7b8 btrfs-progs: tune: recover from failed btrfstune -m|M
Currently, to fix device following the write failure of one or more devices
during btrfstune -m|M, we rely on the kernel's ability to reassemble devices,
even when they possess distinct fsids.

Kernel hinges combinations of metadata_uuid and generation number, with
additional cues taken from the fsid and the BTRFS_SUPER_FLAG_CHANGING_FSID_V2
flag. This patch adds this logic to btrfs-progs.

In complex scenarios (such as multiple fsids with the same metadata_uuid and
matching generation), user intervention becomes necessary to resolve the
situations which btrfs-progs can do better.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:55 +02:00
Anand Jain dd718e9f6f btrfs-progs: tune: use the latest bdev in fs_devices for super_copy
Currently, btrfstune relies on the superblock of the device specified
in the btrfstune argument for fs_info::super_copy. However, it should
use fs_devices::latest_bdev, as it points to the device with the highest
fs_devices::generation number. This will contain the superblock updates
that other devices may have missed and we can now support reuniting
devices following failures of btrfstune -m|M|u|U as in the patches:

   btrfs-progs: add support to fix superblock with CHANGING_FSID_V2 flag
   btrfs-progs: recover from the failed btrfstune -m|M

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:55 +02:00
Boris Burkov 4a0e715d78 btrfs-progs: tune: add support for squota
Add the ability to enable simple quotas on an existing file system at
rest with btrfstune.

This is similar to the functionality in mkfs, except it must also find
all the roots for which it must create qgroups. Note that this *does
not* retroactively compute usage for existing extents as that is
impossible for data. This is consistent with the behavior of the live
enable ioctl.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:55 +02:00
Qu Wenruo 7a0da24f52 btrfs-progs: tune: do not allow multiple incompatible features to be set
[PROBLEM]
Btrfstune allows multiple different options to be executed in one go,
some options are completely fine, like no-holes along with extref, but
with more and more options, we need more exclusive checks.

In fact a lot of new options are already not following the old
success/total checks.

[ENHANCEMENT]
There is really no need to allow multiple features to be set in one go.

So this patch introduces an array which groups all the compatible
options into following categories:

- Extent tree
  This includes converting to/from extent and block group tree.

- Space cache
  This includes converting to v2 space cache.

- Metadata UUID
  This includes changing metadata uuid.

- FSID change
  This includes the slower full fs fsid rewrites.

- Csum change
  This includes the csum rewrites.

- Seed devices
  This includes changing the device seed flag.

- Legacy options
  This includes no-holes/extref/skinny-metadata features, which are
  already default mkfs features.

Now we only allow options inside the same group to be specified.
E.g. "btrfstune -r -S 1" would fail as it includes both legacy and seed
groups.
Meanwhile "btrfstune -r -n" would still be allowed.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-01 13:58:46 +02:00
Anand Jain 9f6e0ab2a4 btrfs-progs: tune: fix return without flag reset commit
In the function set_metadata_uuid(), we set the flag
BTRFS_SUPER_FLAG_CHANGING_FSID_V2 in step 1 at line 71 as shown below:

   71         super_flags |= BTRFS_SUPER_FLAG_CHANGING_FSID_V2;
   72         btrfs_set_super_flags(disk_super, super_flags);
   73         ret = btrfs_commit_transaction(trans, root);

However, we fail to reset this flag if there is no change in the fsid on
the incoming disks, as we return too early.

  105       } else {
  106               /* Setting the same fsid as current, do nothing */
  107               return 0;

Fix this by allowing the thread to pass through the step 2, where we
reset the flag.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-28 17:24:58 +02:00
Anand Jain fd3c4c4df1 btrfs-progs: tune: pass metadata_uuid in check_unfinished_fsid_change
In preparation to use check_unfinished_fsid_change() to support the
ability to reunite devices after a failed 'btrfstune -m|M' command,
rename %unused2 to %metadata_uuid as the function
check_unfinished_fsid_change() write the metadata_uuid from the ctree to
it.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-28 17:24:58 +02:00
Anand Jain aff1f16deb btrfs-progs: tune: pass fsid in check_unfinished_fsid_change arg2
In preparation to use check_unfinished_fsid_change() to support the
ability to reunite devices after a failed 'btrfstune -m|M' command,
delete unused1 argument instead reuse %fsid as the function
check_unfinished_fsid_change() returns the fsid.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-28 17:24:58 +02:00
Anand Jain 70bedfa68a btrfs-progs: tune: rename uuid_changed to fsid_changed in set_metadata_uuid
We never change the metadata_uuid; we only change the fsid.  So
'%fsid_changed' flows more appropriately than '%uuid_changed'.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-28 17:24:58 +02:00
Anand Jain c6ecbe54bf btrfs-progs: tune: rename new_uuid to new_fsid in set_metadata_uuid
%new_uuid is being used to say there is a new fsid. So why not just call
it %new_fsid.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-28 17:24:58 +02:00
Anand Jain 21440a64f7 btrfs-progs: tune: rename new_fsid to fsid in set_metadata_uuid
The %new_fsid is not only new it can be the fsid from the passed disk
so just rename it to %fsid. Also, in the next patch the %new_fsid will
be a bool variable to indicate if the %fsid is new from the fsid in the
disk.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-28 17:24:58 +02:00
Anand Jain 9e9ac97902 btrfs-progs: tune: rename arg to new_fsid_str in set_metadata_uuid
In preparation to use check_unfinished_fsid_change() to support the
ability to reunite devices after a failed 'btrfstune -m|M' command,
%uuid_string arg is actually carries new fsid to be used. So just name
it to %new_fsid_str.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-28 17:24:58 +02:00
Anand Jain 3aa3342959 btrfs-progs: tune: cache local variable of fs_info
Since the root pointer dereferences for the fs_info several times,
it is rational to save the fs_info.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-28 17:24:56 +02:00
David Sterba 2b35741a39 btrfs-progs: fix more typos found by codespell
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-28 17:24:24 +02:00
Anand Jain 59fe1f46fe btrfs-progs: btrfstune: consolidate error handling in main()
The upcoming "--device" option requires memory to parse devices, which
should be freed before returning from the main() function. As a
preparation for adding the "--device" option to the "btrfstune" command,
provide a consolidated error return exit from the main function with a
"goto" labeled "free_out". The label "free_out" may not make sense
currently, but it will when the "--device" option is added.

There are several return statements within the main function, and
changing all of them in the main "--device" feature patch would deviate
from the actual for the feature changes. Hence, it made sense to create
a preparatory patch.

The return value for any failure remains the same as in the original code.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-28 17:24:23 +02:00
Anand Jain 9ce233965f btrfs-progs: track active metadata_uuid per fs_devices
When the fsid does not match the metadata_uuid, the METADATA_UUID flag
is set in the superblock.

Changing the fsid using the btrfstune -U|-u option is not possible on a
filesystem with the METADATA_UUID flag set. But we are checking the
METADATA_UUID only from the super_copy, and not from the other scanned
device.

To fix this bug, track the metadata_uuid at the fs_devices level instead
of checking it only on the specified device in the argument, and use it.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-28 17:24:22 +02:00
Anand Jain cb53874f6d btrfs-progs: track changing_fsid flag in fs_devices
To prepare for reuniting separated devices due to an incomplete fsid
change task, consolidate and monitor the per device's changing_fsid flag
in the struct btrfs_fs_devices.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-23 19:36:31 +02:00
Anand Jain 19c337ad9a btrfs-progs: btrfstune: check for missing devices
If btrfstune is executed on a filesystem that contains a missing device,
the command will now fail.

It is ok to fail when any of the options supported by btrfstune are
used, the filesystem devices should be all available.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-23 19:36:31 +02:00
Anand Jain 485fceec8d btrfs-progs: register device after successfully changing the fsid
Testing with the fstests config option POST_MKFS_CMD="btrfstune -m"
reported failure, as shown below:

  ./check btrfs/003

  [111.635618] BTRFS: device fsid a6599a65-8b6d-4156-bb55-0a3a2f0eae9d devid 1 transid 6 /dev/sdb2 scanned by systemd-udevd (1117)
  [111.642199] BTRFS: device fsid a6599a65-8b6d-4156-bb55-0a3a2f0eae9d devid 2 transid 6 /dev/sdb3 scanned by systemd-udevd (1114)
  [111.660882] BTRFS: device fsid a6599a65-8b6d-4156-bb55-0a3a2f0eae9d devid 3 transid 6 /dev/sdb5 scanned by systemd-udevd (1116)
  [111.672623] BTRFS: device fsid a6599a65-8b6d-4156-bb55-0a3a2f0eae9d devid 4 transid 6 /dev/sdb6 scanned by systemd-udevd (993)
  [111.701301] BTRFS: device fsid a6599a65-8b6d-4156-bb55-0a3a2f0eae9d devid 6 transid 6 /dev/sdb8 scanned by systemd-udevd (1080)
  [111.706513] BTRFS: device fsid a6599a65-8b6d-4156-bb55-0a3a2f0eae9d devid 5 transid 6 /dev/sdb7 scanned by systemd-udevd (1117)
  [111.716532] BTRFS: device fsid a6599a65-8b6d-4156-bb55-0a3a2f0eae9d devid 7 transid 6 /dev/sdb9 scanned by systemd-udevd (1114)
  [111.721253] BTRFS: device fsid a6599a65-8b6d-4156-bb55-0a3a2f0eae9d devid 8 transid 6 /dev/sdb10 scanned by mkfs.btrfs (1504)
  [112.405186] BTRFS: device fsid 1b3bacbf-14db-49c9-a3ef-547998aacc4e devid 4 transid 8 /dev/sdb6 scanned by systemd-udevd (1117)
  [112.422104] BTRFS: device fsid 1b3bacbf-14db-49c9-a3ef-547998aacc4e devid 6 transid 8 /dev/sdb8 scanned by systemd-udevd (1523)
  [112.448355] BTRFS: device fsid 1b3bacbf-14db-49c9-a3ef-547998aacc4e devid 1 transid 8 /dev/sdb2 scanned by systemd-udevd (1115)
  [112.456126] BTRFS error: device /dev/sdb3 belongs to fsid 1b3bacbf-14db-49c9-a3ef-547998aacc4e, and the fs is already mounted
  [112.461299] BTRFS error: device /dev/sdb7 belongs to fsid 1b3bacbf-14db-49c9-a3ef-547998aacc4e, and the fs is already mounted
  [112.465690] BTRFS info (device sdb2): using crc32c (crc32c-generic) checksum algorithm
  [112.468758] BTRFS info (device sdb2): using free space tree
  [112.471318] BTRFS error: device /dev/sdb9 belongs to fsid 1b3bacbf-14db-49c9-a3ef-547998aacc4e, and the fs is already mounted
  [112.475962] BTRFS error: device /dev/sdb10 belongs to fsid 1b3bacbf-14db-49c9-a3ef-547998aacc4e, and the fs is already mounted
  [112.481934] BTRFS error: device /dev/sdb5 belongs to fsid 1b3bacbf-14db-49c9-a3ef-547998aacc4e, and the fs is already mounted
  [112.494614] BTRFS error (device sdb2): devid 2 uuid 99a57db7-2ef6-4240-a700-07ee7e64fb36 is missing
  [112.497834] BTRFS error (device sdb2): failed to read chunk tree: -2
  [112.507705] BTRFS error (device sdb2): open_ctree failed

The original fsid created by mkfs was a6599a65-8b6d-4156-bb55-0a3a2f0eae9d,
and the fsid created by the btrfstune -m option was
1b3bacbf-14db-49c9-a3ef-547998aacc4e.

During mount (after btrfstune -m), only 3 out of 8 devices were scanned
by systemd, while the rest were still being discovered. Consequently, the
mount command raced to find the missing devices. Since the mount command
in the kernel sets the flag fsdevices::opened, any further new alloc_device()
were blocked, resulting in the error "the fs is already mounted."

It is a good idea to register all devices after changing the fsid.
The previous registrations are already stale after changing the fsid.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-11 19:59:17 +02:00
David Sterba 316f221908 btrfs-progs: btrfstune: add injection points to btrfsune --csum
Add points after each commit when changing the checksum type.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-07-27 14:45:29 +02:00
Anand Jain 77f366c9da btrfs-progs: add noscan parameter to check_where_mounted
The function check_where_mounted() scans the system for all other btrfs
devices, which is necessary for its operation.  However, in certain
cases, devices remaining in the scanned state is undesirable.  Introduce
the 'noscan' argument to make devices unscanned before return.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-07-26 15:00:48 +02:00
Qu Wenruo 4d5711f4dd btrfs-progs: tune: reject csum change if the fs is already using the target csum type
Currently "btrfstune --csum" allows us to change the csum to the same
one, this is good for testing but not good for end users, as if the end
user interrupts it, they have to resume the change (even it's to the
same csum type) until it finished, or kernel would reject such fs.

Furthermore, we never change the super block csum type until we
completely changed the csum type, thus for resume cases, the fs would
still show up as using the old csum type thus won't cause any problem
resuming.

So here we just reject the csum conversion if the target csum type is
the same as the existing csum type.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:33 +02:00
Qu Wenruo 03b4952971 btrfs-progs: tune: implement resume support for data csum objectid change
When the csum conversion is interrupted when changing data csum
objectid, we should just resume the objectid conversion.

This situation can be detected by comparing the old and new csum items.
They should both exist but doesn't intersect (interrupted halfway), or
only new csum items exist (interrupted after we have deleted old csums).

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo 1aa4a9c609 btrfs-progs: tune: implement resume support for half deleted old csums
If the csum conversion is interrupted when old csums are being deleted,
we should resume by continue deleting the old csums.

The function delete_old_data_csums() can handle half deleted cases
already.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo 7f077e3173 btrfs-progs: tune: implement resume support for empty csum tree
We have a very rare chance to hit a fs with empty csum tree but still
has CHANGING_DATA_CSUM flag.

The window is very small, but it's still possible, so handle it by
jumping directly to metadata csum change.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo 8f23ca0dd1 btrfs-progs: tune: implement resume support for csum tree without any new csum item
There are two possible situations where there is no new csum items at
all:

- We just inserted csum change item
  This means all csums are really old csum type, and we can start
  the conversion from the beginning, only need to skip the csum change
  item insert.

- We finished data csum conversion but not yet started metadata
  conversion
  This means all csums are already new csum type, and we can resume
  by starting changing metadata csums.

To distinguish the two cases, we need to read the first sector, and
verify the data content against both csum types.

If the csum matches with old csum type, we resume from generating new
data csum.
If the csum matches with new csum type, we resume from rewriting
metadata csum.
If the csum doesn't match either csum type, we have some big problems
then.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo ba965712b6 btrfs-progs: tune: implement resume support for generating new data csum
If a csum change is interrupted at new data csum generation stage, we
can detect such situation by checking the old and new csum items.

At the new data csum generation stage, old csums are untouched, and only
new csums items (with different objectid) are inserted into the csum
tree.

Thus the old csum items should cover a larger range, while the new csum
items should be a subset of the old csums.

The resume part would start by re-generating the remaining part, then go
through the conversion stages.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo 9ff505e738 btrfs-progs: tune: implement resume support for metadata checksum
For interrupted metadata checksum change, we only need to call the same
change_meta_csums().

Since we don't have any record on the last converted metadata, thus we
have to go through all metadata anyway.
And the existing change_meta_csums() has already implemented the needed
checks to skip converted metadata.

Since we're here, also implement all the surrounding checks, like making
sure the new target csum type matches the interrupted one.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo cd877c2d23 btrfs-progs: tune: delete the csum change item after converting the fs
[BUG]
Doing the following csum change in a row, it would fail:

 # mkfs.btrfs -f --csum crc32c $dev
 # btrfstune --csum sha256 $dev
 # btrfstune --csum crc32c $dev
 # btrfstune --csum sha256 $dev
 WARNING: Experimental build with unstable or unfinished features
 WARNING: Switching checksums is experimental, do not use for valuable data!

 Proceed to switch checksums
 ERROR: failed to insert csum change item: File exists
 ERROR: failed to generate new data csums: File exists
 WARNING: reserved space leaked, flag=0x4 bytes_reserved=16384
 extent buffer leak: start 30572544 len 16384
 extent buffer leak: start 30441472 len 16384
 WARNING: dirty eb leak (aborted trans): start 30441472 len 16384

[CAUSE]
During every csum change operation, btrfstune would insert an temporaray
csum change item into root tree.

But unfortunately after the conversion btrfstune doesn't properly delete
the csum change item, result the following items in the root tree:

	item 10 key (CSUM_CHANGE TEMPORARY_ITEM 0) itemoff 13423 itemsize 0
		temporary item objectid CSUM_CHANGE offset 0
		target csum type crc32c (0)
	item 11 key (CSUM_CHANGE TEMPORARY_ITEM 2) itemoff 13423 itemsize 0
		temporary item objectid CSUM_CHANGE offset 2
		target csum type sha256 (2)

Thus at the last conversion try to go back to SHA256, we failed to
insert the same item, and caused the above error.

[FIX]
After finishing the metadata csum conversion, do a proper removal of the
csum item.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo e7b79adf5a btrfs-progs: tune: add the ability to change metadata csums
The csum change for metadata is like uuid-change, we go with in-place
csum update without any COW.

During the rewrite, we will manually check the csum (both old and new)
for each tree block.
And only rewrite the csum if the tree block matches its old csum.
(For tree block matches its new csum, we need to do nothing).

And when everything is done, just update the superblock to reflect the
csum type change.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo 87016bffbb btrfs-progs: tune: add the ability to migrate the temporary csum items to regular csum items
At this stage, the csum tree should only contain the temporary csum
items (CSUM_CHANGE, EXTENT_CSUM, logical), and no more old csum items.

Now we can convert those temporary csum items back to regular csum items
by changing their key objectids back to EXTENT_CSUM.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo 4cf02fbc84 btrfs-progs: tune: add the ability to delete old data csums
The new helper function, delete_old_data_csums(), would delete the old
data csums while keep the new one untouched.

Since the new data csums have a key objectid (-13) smaller than the
old data csums (-10), we can safely delete from the tail of the btree.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo 08a3bd7694 btrfs-progs: tune: add the ability to generate new data checksums
This patch would modify btrfs_csum_file_block() to handle csum type
other than the one used in the current fs.

The new data checksum would use a different objectid (-13) to
distinguish with the existing one (-10).
This needs to change tree-checker to skip the item size checks,
since new csum can be larger than the original csum.

After this stage, the resulted csum tree would look like this:

	item 0 key (CSUM_CHANGE EXTENT_CSUM 13631488) itemoff 8091 itemsize 8192
		range start 13631488 end 22020096 length 8388608
	item 1 key (EXTENT_CSUM EXTENT_CSUM 13631488) itemoff 7067 itemsize 1024
		range start 13631488 end 14680064 length 1048576

Note the itemsize is 8 times the original one, as the original csum is
CRC32, while target csum is SHA256, which is 8 times the size.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo 2c02b799b1 btrfs-progs: tune: add the ability to read and verify the data before generating new checksum
This patch introduces a new helper function,
read_verify_one_data_sector(), to do the data read and checksum
verification (against the old csum).

This data would be later re-used to generate a new csum.

And since we're introduce the helper function, we also build the
skeleton to iterate the data extents using the old csum tree.

This method is much better compared to iterating using extent tree,
which has no directly indicator on whether the data extent has csum or
not (nodatasum or preallocated).

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
Qu Wenruo 7dd3cda377 btrfs-progs: tune: implement the prerequisite checks for csum change
The overall idea is to make sure no running operations (balance,
dev-replace, dirty log) for the fs before csum change.

And also reject half converted csums for now.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00