Commit Graph

256 Commits

Author SHA1 Message Date
David Sterba
9be33f558c btrfs-progs: tune: update checksum conversion
The checksum conversion is still experimental and still does not convert
all filesystems correctly. Do not use on valuable data.

Previous implementation copied the UUID conversion which was not a good
base for the checksum conversion so it left out basically all trees
except extent and checksum.

This update adds the base for the required safety features:

- let the old csum tree intact until the full conversion is done (ie.
  all data are still verifiable)
- add on-disk status tracking item, this should keep the from/to
  checksum conversion, last generation to catch potential updates of the
  underlying filesystem if conversion is interrupted and the filesystem
  mounted
- convert most of the fundamental trees, the subvolumes, tree log and
  relocation trees are not converted
- trees are converted in-place to avoid potentially running out of space
  but this might be better done by transaction protection with a
  temporary tree

Known issues:

- not all trees are converted
- not all checksums are correctly inserted into the new tree and reading
  the files leads to EIO

Issue: #438
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-28 20:11:22 +01:00
Qu Wenruo
f914949b1a btrfs-progs: fix set but not used variables
[WARNING]
Clang 15.0.7 warns about several unused variables:

  kernel-shared/zoned.c:829:6: warning: variable 'num_sequential' set but not used [-Wunused-but-set-variable]
          u32 num_sequential = 0, num_conventional = 0;
              ^
  cmds/scrub.c:1174:6: warning: variable 'n_skip' set but not used [-Wunused-but-set-variable]
          int n_skip = 0;
              ^
  mkfs/main.c:493:6: warning: variable 'total_block_count' set but not used [-Wunused-but-set-variable]
          u64 total_block_count = 0;
              ^
  image/main.c:2246:6: warning: variable 'bytenr' set but not used [-Wunused-but-set-variable]
          u64 bytenr = 0;
              ^

[CAUSE]
Most of them are just straightforward set but not used variables.

The only exception is total_block_count, which has commented out code
relying on it.

[FIX]
Just remove those variables, and for @total_block_count, also remove the
comments.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-18 17:44:03 +01:00
Qu Wenruo
3a1d4aa089 btrfs-progs: fix fallthrough cases with proper attributes
[FALSE ALERT]
Unlike gcc, clang doesn't really understand the comments, thus it's
reportings tons of fall through related errors:

  cmds/reflink.c:124:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
                  case 'r':
                  ^
  cmds/reflink.c:124:3: note: insert '__attribute__((fallthrough));' to silence this warning
                  case 'r':
                  ^
                  __attribute__((fallthrough));
  cmds/reflink.c:124:3: note: insert 'break;' to avoid fall-through
                  case 'r':
                  ^
                  break;

[CAUSE]
Although gcc is fine with /* fallthrough */ comments, clang is not.

[FIX]
So just introduce a fallthrough macro to handle the situation properly,
and use that macro instead.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-18 17:44:02 +01:00
Qu Wenruo
66dad3a8f5 btrfs-progs: fix a false alert on an uninitialized variable when BUG_ON() is involved
[FALSE ALERT]
Clang 15.0.7 gives the following false alert in get_dev_extent_len():

  kernel-shared/extent-tree.c:3328:2: warning: variable 'div' is used uninitialized whenever switch default is taken [-Wsometimes-uninitialized]
          default:
          ^~~~~~~
  kernel-shared/extent-tree.c:3332:24: note: uninitialized use occurs here
          return map->ce.size / div;
                                ^~~
  kernel-shared/extent-tree.c:3311:9: note: initialize the variable 'div' to silence this warning
          int div;
                 ^
                  = 0

And one in btrfs_stripe_length() too:

  kernel-shared/volumes.c:2781:2: warning: variable 'stripe_len' is used uninitialized whenever switch default is taken [-Wsometimes-uninitialized]
          default:
          ^~~~~~~
  kernel-shared/volumes.c:2785:9: note: uninitialized use occurs here
          return stripe_len;
                 ^~~~~~~~~~
  kernel-shared/volumes.c:2754:16: note: initialize the variable 'stripe_len' to silence this warning
          u64 stripe_len;
                        ^
                         = 0

[CAUSE]
Clang doesn't really understand what BUG_ON() means, thus in that
default case, we won't get uninitialized value but crash directly.

[FIX]
Silent the errors by assigning the default value properly using the
value of SINGLE profile.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-18 17:44:02 +01:00
Qu Wenruo
e2806fd624 btrfs-progs: remove an unnecessary branch to silent the clang warning
[FALSE ALERT]
With clang 15.0.7, there is a false alert on uninitialized value in
ctree.c:

  kernel-shared/ctree.c:3418:13: warning: variable 'offset' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
          } else if (ret < 0) {
                     ^~~~~~~
  kernel-shared/ctree.c:3428:41: note: uninitialized use occurs here
          write_extent_buffer(eb, &subvol_id_le, offset, sizeof(subvol_id_le));
                                                 ^~~~~~
  kernel-shared/ctree.c:3418:9: note: remove the 'if' if its condition is always true
          } else if (ret < 0) {
                 ^~~~~~~~~~~~~
  kernel-shared/ctree.c:3380:22: note: initialize the variable 'offset' to silence this warning
          unsigned long offset;
                              ^
                               = 0
  kernel-shared/ctree.c:3418:13: warning: variable 'eb' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
          } else if (ret < 0) {
                     ^~~~~~~
  kernel-shared/ctree.c:3429:26: note: uninitialized use occurs here
          btrfs_mark_buffer_dirty(eb);
                                  ^~
  kernel-shared/ctree.c:3418:9: note: remove the 'if' if its condition is always true
          } else if (ret < 0) {
                 ^~~~~~~~~~~~~
  kernel-shared/ctree.c:3378:26: note: initialize the variable 'eb' to silence this warning
          struct extent_buffer *eb;
                                  ^
                                   = NULL

[CAUSE]
The original code is handling the return value from
btrfs_insert_empty_item() like this:

	ret = btrfs_insert_empty_item();
	if (ret >= 0) {
		/* Do something for it. */
	} else if (ret == -EEXIST) {
		/* Do something else. */
	} else if (ret < 0) {
		/* Error handling. */
	}

But the problem is, the last one check is always true if we can reach
there.

Thus clang is providing the hint to remove the if () check.

[FIX]
Normally we prefer to do error handling first, so move the error
handling first so we don't need the if () else if () chain.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-18 17:44:02 +01:00
David Sterba
73efc566d5 btrfs-progs: kerncompat: hide definition of __init
As kerncompat.h is included from all libbtrfs headers we must be careful
about generic names like __init, in this case it breaks build of
snapper.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-01-03 17:24:36 +01:00
David Sterba
e78fe2d92e Revert "btrfs-progs: rename qgroup items to match the kernel naming scheme"
This reverts commit 03451430de.
(It's not 1:1, there are some additional trivial fixups in cmds/qgroup.c)

This breaks a lot of 3rd party tools that depend on it as Neal reports:

* btrfs-assistant
* buildah
* cri-o
* podman
* skopeo
* containerd
* moby/docker
* snapper
* source-to-image

Link: https://lore.kernel.org/linux-btrfs/CAEg-Je8L7jieKdoWoZBuBZ6RdXwvwrx04AB0fOZF1fr5Pb-o1g@mail.gmail.com/
Reported-by: Neal Gompa <ngompa@fedoraproject.org>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-01-03 13:10:54 +01:00
Josef Bacik
e380421ff2 btrfs-progs: make write_extent_buffer take a const eb
This is what we do in the kernel, and while we're syncing individual
files we're going to have state where some callers are using a const,
but progs isn't.  So adjust write_extent_buffer to take a const eb in
order to make this less painful.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-30 19:14:29 +01:00
Josef Bacik
405b5cadd3 btrfs-progs: replace btrfs_leaf_data with btrfs_item_nr_offset
We're using btrfs_item_nr_offset(leaf, 0) to get the start of the leaf
data in the kernel, we don't have btrfs_leaf_data.  Replace all
occurrences of btrfs_leaf_data() with btrfs_item_nr_offset(leaf, 0) in
order to make syncing accessors.[ch] easier.  ctree.c will be synced
later, so this is simply an intermediate step.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-30 19:14:29 +01:00
Josef Bacik
788a71c16a btrfs-progs: sync compression.h from the kernel
This patch copies in compression.h from the kernel.  This is relatively
straightforward, we just have to drop the compression types definition
from ctree.h, and update the image to use BTRFS_NR_COMPRESS_TYPES
instead of BTRFS_COMPRESS_LAST, and add a few things to kerncompat.h to
make everything build smoothly.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-30 19:14:29 +01:00
Josef Bacik
fac1fae3ef btrfs-progs: rename extent buffer flags to EXTENT_BUFFER_*
We have been overloading the extent_state flags for use on the extent
buffers as well.  When we sync extent-io-tree.[ch] this will become
impossible, so rename these flags to EXTENT_BUFFER_* and use those
definitions instead of the extent_state definitions.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-28 18:57:44 +01:00
Josef Bacik
83cc5a5489 btrfs-progs: delete state_private code
We used to store random private things into extent_states, but we
haven't done this for a while and there are no users of this code,
simply delete it.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-28 18:57:44 +01:00
Josef Bacik
20d88c17e7 btrfs-progs: move extent cache code directly into btrfs_fs_info
We have some extra features in the btrfs-progs copy of the
extent_io_tree that don't exist in the kernel.  In order to make syncing
easier simply move this functionality into btrfs_fs_info, that way we
can sync in the new extent_io_tree code and not have to worry about
breaking anything.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-28 18:57:44 +01:00
Josef Bacik
412eea9e97 btrfs-progs: do not pass io_tree into verify_parent_transid
We do not use the io_tree, don't bother passing it into
verify_parent_transid.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-28 18:57:44 +01:00
Josef Bacik
ccee633f3a btrfs-progs: move dirty eb tracking to it's own io_tree
btrfs-progs has a cache tree embedded in the extent_io_tree in order to
track extent buffers.  We use the extent_io_tree part to track dirty,
and the cache tree to keep the extent buffers in.  When we sync
extent-io-tree.[ch] we'll lose this ability, so separate out the dirty
tracking into its own extent_io_tree.  Subsequent patches will adjust
the extent buffer lookup so it doesn't use the custom extent_io_tree
thing.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-28 18:57:43 +01:00
Josef Bacik
af30cf2e3e btrfs-progs: make the find extent buffer helpers take fs_info
This is a cleanup patch to make syncing the btrfs kernel code into
btrfs-progs easier.  In btrfs-progs we have an extra cache in the
extent_io_tree that's exclusively used for the extent buffer tracking.
In order to untangle this dependency start passing around the fs_info to
search for extent_buffers, and then have the helpers use the appropriate
structure to find the extent buffer.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-28 18:57:43 +01:00
Josef Bacik
2b9fba7974 btrfs-progs: rename btrfs_item_end to btrfs_item_data_end
This matches what we did in the kernel, btrfs_item_data_end is more
inline with what the helper does, which is give us the offset of the end
of the data portion of the item, not the offset of the end of the item
itself.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-28 18:57:43 +01:00
Josef Bacik
53f8b8fd01 btrfs-progs: make btrfs_qgroup_level helper match the kernel
We return __u16 in the kernel, as this is actually the size of
btrfs_qgroup_level.  Adjust the existing helpers and update all the
callers to deal with the new size appropriately.  This will make syncing
the kernel code cleaner.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-28 18:57:43 +01:00
Josef Bacik
03451430de btrfs-progs: rename qgroup items to match the kernel naming scheme
We're going to sync the kernel source into btrfs-progs, and in the
kernel we have all these qgroup fields named with short names instead of
the full name, so rename

referenced -> rfer
compressed -> cmpr
exclusive -> excl

to match the kernel and update all the users, this will make the sync
cleaner.

ioctl.h is a public header but there are no users of the
btrfs_qgroup_limit structure.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-28 18:57:43 +01:00
Qu Wenruo
2aa4085bf7 btrfs-progs: properly handle degraded raid56 reads
[BUG]
For a degraded RAID5, btrfs check will fail to even read the chunk root:

  # mkfs.btrfs -f -m raid5 -d raid5 $dev1 $dev2 $dev3
  # wipefs -fa $dev1
  # btrfs check $dev2
  Opening filesystem to check...
  warning, device 1 is missing
  bad tree block 22036480, bytenr mismatch, want=22036480, have=0
  ERROR: cannot read chunk root
  ERROR: cannot open file system

[CAUSE]
Although read_tree_block() function from btrfs-progs is properly
iterating the mirrors (mirror 1 is reading from the disk directly,
mirror 2 will be rebuild from parity), the raid56 recovery path is not
handling the read error correctly.

The existing code will try to read the full stripe, but any read failure
(including missing device) will immediately cause an error:

	for (i = 0; i < num_stripes; i++) {
		ret = btrfs_pread(multi->stripes[i].dev->fd, pointers[i],
				  BTRFS_STRIPE_LEN, multi->stripes[i].physical,
				  fs_info->zoned);
		if (ret < BTRFS_STRIPE_LEN) {
			ret = -EIO;
			goto out;
		}
	}

[FIX]
To make failed_a/failed_b calculation much easier, and properly handle
too many missing devices, here this patch will introduce a new bitmap
based solution.

The new @failed_stripe_bitmap will represent all the failed stripes.

So the initial read will mark all the missing devices in the
@failed_stripe_bitmap, and later operations will all operate on that
bitmap.

Only before we call raid56_recov(), we convert the bitmap to the old
failed_a/failed_b interface and continue.

Now btrfs check can handle above case properly:

  # btrfs check $dev2
  Opening filesystem to check...
  warning, device 1 is missing
  Checking filesystem on /dev/test/scratch2
  UUID: 8b2e1cb4-f35b-4856-9b11-262d39d8458b
  [1/7] checking root items
  [2/7] checking extents
  [3/7] checking free space tree
  [4/7] checking fs roots
  [5/7] checking only csums items (without verifying data)
  [6/7] checking root refs
  [7/7] checking quota groups skipped (not enabled on this FS)
  found 147456 bytes used, no error found
  total csum bytes: 0
  total tree bytes: 147456
  total fs tree bytes: 32768
  total extent tree bytes: 16384
  btree space waste bytes: 139871
  file data blocks allocated: 0
   referenced 0

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-24 17:29:12 +01:00
David Sterba
1414ecbb6f btrfs-progs: replace strerror(errno) with %m in printf formats
We're using the %m format where possible.

Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-26 09:46:22 +02:00
David Sterba
44efde7fa0 btrfs-progs: unify naming of qgroup subvolid helpers
Kernel function name is btrfs_qgroup_subvolid so rename it in progs. The
libbtrfs can't API be changed without versioning so at least add the new
helper.

Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-26 09:36:44 +02:00
Qu Wenruo
720819745c btrfs-progs: print-tree: follow the supported flags when printing flags
Currently we put EXTENT_TREE_V2 incompat flag entry under EXPERIMENTAL
features, thus at compile time, incompat_flags_array[] is determined at
compile time.

But the truth is, we have @supported_flags for __print_readable_flag(),
which is already defined based on EXPERIMENTAL flag.

Thus for __print_readable_flag(), we can always include the entry for
EXTENT_TREE_V2, and only print the flag if it's in the @supported_flags

By this, we can remove one EXPERIMENTAL ifdef.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-18 16:03:43 +02:00
Anand Jain
c802b02988 btrfs-progs: dump-super: add extent-tree-v2
The extent-tree-v2 is still experimental but it should be printed among
the other incompat flags if enabled by the build.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:12 +02:00
Qu Wenruo
2cdc8dddbf btrfs-progs: mkfs: offset inode numbers of the source filesystem
[BUG]
When running mkfs tests on a newly rebooted minimal system, it can cause
mkfs/009 to fail.

The reproduce steps requires /tmp to has minimal files in the first
place.

  # mkdir /tmp/rootdir
  # xfs_io -f -c "pwrite 0 16k" /tmp/rootdir
  # mkfs.btrfs --rootdir /tmp/rootdir -f $dev
  # btrfs check $dev
  Opening filesystem to check...
  Checking filesystem on /dev/test/scratch1
  UUID: 6821b3db-f056-4c18-b797-32679dcd4272
  [1/7] checking root items
  [2/7] checking extents
  data backref 13631488 root 5 owner 170 offset 0 num_refs 0 not found in extent tree
  incorrect local backref count on 13631488 root 5 owner 170 offset 0 found 1 wanted 0 back 0x55ff6cd72260
  backref 13631488 root 5 not referenced back 0x55ff6cd4c1f0
  incorrect global backref count on 13631488 found 2 wanted 1
  backpointer mismatch on [13631488 16384]
  ERROR: errors found in extent allocation tree or chunk allocation

[CAUSE]
The extent tree has the following weird item:

	item 0 key (13631488 EXTENT_ITEM 16384) itemoff 16250 itemsize 33
		refs 1 gen 0 flags DATA
		tree block backref root FS_TREE

This is an extent item for data, thus it should not have an inline tree
backref.

Then checking the fs tree:

	item 0 key (170 INODE_ITEM 0) itemoff 16123 itemsize 160
		generation 7 transid 0 size 16384 nbytes 16384
		block group 0 mode 100600 links 1 uid 1000 gid 1000 rdev 0
		sequence 0 flags 0x0(none)
		atime 1664866393.0 (2022-10-04 14:53:13)
		ctime 1664863510.0 (2022-10-04 14:05:10)
		mtime 1664863455.0 (2022-10-04 14:04:15)
		otime 0.0 (1970-01-01 08:00:00)

There is an inode item before the root dir inode.

And that inode number 170 is causing the problem.

In traverse_directory(), we use the inode number reported from stat()
directly as btrfs inode number, and pass it to
btrfs_record_file_extent(), which finally calls btrfs_inc_extent_ref(),
with above 170 passed as @owner parameter.

But inside btrfs_inc_extent_ref() we use that @owner value to determine
if it's a data backref.
Since we got a smaller than BTRFS_FIRST_FREE_OBJECTID, btrfs treats it
as tree block, and cause the above problem.

[FIX]
As a quick fix, always add BTRFS_FIRST_FREE_OBJECTID to all inode number
directly grabbed from stat().

And add an ASSERT() in __btrfs_record_file_extent() to catch unexpected
objectid.

This is not a perfect solution, as the resulted fs will has a huge gap
in its inodes:

	item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
	item 4 key (426 INODE_ITEM 0) itemoff 15883 itemsize 160

For a proper fix, we should allocate new btrfs inode numbers in a
sequential order, but that would be another series of patches.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:10 +02:00
Qu Wenruo
dad9db45bb btrfs-progs: properly initialize extent generation in __btrfs_record_file_extent()
[BUG]
When using mkfs.btrfs --rootdir option, the data extents generated will
have 0 as their generation in extent tree:

  # mkdir /tmp/rootdir
  # xfs_io -f -c "pwrite 0 16k" /tmp/rootdir/foobar
  # mkfs.btrfs -f --rootdir /tmp/rootdir $dev
  # btrfs ins dump-tree -t extent $dev
  btrfs-progs v5.19.1
  extent tree key (EXTENT_TREE ROOT_ITEM 0)
  leaf 30474240 items 13 free space 15536 generation 7 owner EXTENT_TREE
  leaf 30474240 flags 0x1(WRITTEN) backref revision 1
  fs uuid c1f05988-49f9-4dd4-8489-b90d60f522ee
  chunk uuid 40f81603-fe75-4f58-aa9e-e74e28df8523
	item 0 key (13631488 EXTENT_ITEM 16384) itemoff 16230 itemsize 53
		refs 1 gen 0 flags DATA <<< Generation is 0
  ...

[CAUSE]
In __btrfs_record_file_extent() we just set the extent generation to 0.

[FIX]
Use trans->transid to properly fill extent generation.

Now after mkfs, the first data extent backref looks like this:

	item 0 key (13631488 EXTENT_ITEM 16384) itemoff 16230 itemsize 53
		refs 1 gen 7 flags DATA
        ...

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:10 +02:00
David Sterba
ccb2d4aa45 btrfs-progs: device-utils: rename btrfs_device_size
There's a group of helpers to read device size, the btrfs_device_size
should be one of them. Rename it and so minor cleanup.

Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:10 +02:00
David Sterba
a827bb2db8 btrfs-progs: use template for transaction commit error messages
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:10 +02:00
David Sterba
8fcafae04a btrfs-progs: use template for transaction start error messages
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:10 +02:00
David Sterba
c2be0e2ce0 btrfs-progs: use template for out of memory error messages
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:09 +02:00
David Sterba
2267708bfe btrfs-progs: move repair.c from common/ to check/
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:09 +02:00
Qu Wenruo
08bb354a1c btrfs-progs: properly handle write error when writing back tree blocks
[BUG]
If we emulate a write error during commit transaction, by setting the
block device read-only, then we can easily have the following crash
using "btrfs check --clear-space-cache v2":

  Opening filesystem to check...
  Checking filesystem on /dev/test/scratch1
  UUID: 5945915b-37f1-4bfa-9f64-684b318b8f73
  Clear free space cache v2
  Error writing to device 1
  kernel-shared/transaction.c:156: __commit_transaction: BUG_ON `ret` triggered, value 1
  ./btrfs(+0x570c9)[0x562ec894f0c9]
  ./btrfs(+0x57167)[0x562ec894f167]
  ./btrfs(__commit_transaction+0x13b)[0x562ec894f7f2]
  ./btrfs(btrfs_commit_transaction+0x214)[0x562ec894fa64]
  ./btrfs(btrfs_clear_free_space_tree+0x177)[0x562ec8941ae6]
  ./btrfs(+0xc8958)[0x562ec89c0958]
  ./btrfs(+0xc9d53)[0x562ec89c1d53]
  ./btrfs(+0x17ec7)[0x562ec890fec7]
  ./btrfs(main+0x12f)[0x562ec8910908]
  /usr/lib/libc.so.6(+0x232d0)[0x7ff917ee82d0]
  /usr/lib/libc.so.6(__libc_start_main+0x8a)[0x7ff917ee838a]
  ./btrfs(_start+0x25)[0x562ec890fdc5]
  Aborted (core dumped)

[CAUSE]
The call trace has shown it's a BUG_ON(), and it's from
__commit_transaction(), which is writing tree blocks back.

[FIX]
The fix is pretty simple, just return error.

In fact we even have an error value check in btrfs_commit_transaction()
just after __commit_transaction() call (although not catching the return
value from it).

And since we're here, also call btrfs_abort_transaction() to prevent
newer transactions from being started.

Now we won't have a full crash:

  Opening filesystem to check...
  Checking filesystem on /dev/test/scratch1
  UUID: 5945915b-37f1-4bfa-9f64-684b318b8f73
  Clear free space cache v2
  Error writing to device 1
  ERROR: failed to write bytenr 30425088 length 16384: Operation not permitted
  ERROR: failed to write tree block 30425088: Operation not permitted
  ERROR: failed to clear free space cache v2: -1
  extent buffer leak: start 30720000 len 16384

Reported-by: Christoph Anton Mitterer <calestyo@scientia.org>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:08 +02:00
Qu Wenruo
75800c2fee btrfs-progs: remove duplicated leaked extent buffer report
[BUG]
When transaction is aborted halfway, we can have extent buffer leaked,
and in that case, the same leaked extent buffer can be reported for
multiple times:

  ERROR: failed to clear free space cache v2: -1
  extent buffer leak: start 30441472 len 16384
  WARNING: dirty eb leak (aborted trans): start 30441472 len 16384
  extent buffer leak: start 30720000 len 16384
  extent buffer leak: start 30425088 len 16384
  extent buffer leak: start 30425088 len 16384 << Duplicated
  WARNING: dirty eb leak (aborted trans): start 30425088 len 16384

Note that 30425088 line is reported twice (not accounting the "dirty eb
leak" line).

[CAUSE]
When we detected a leaked eb, we call free_extent_buffer_nocache(), but
free_extent_buffer_nocache() can only remove the eb when its reduced
refs is 0.

If the eb has refs 2, it will need two free_extent_buffer_nocache()
calls to remove it from the cache.

[FIX]
Just reset the eb->refs to 1 so that free_extent_buffer_nocache() can
remove it from cache for sure.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:08 +02:00
Qu Wenruo
811ae819e3 btrfs-progs: remove unused function extent_io_tree_init_cache_max()
The function was introduced by commit a5ce5d2198 ("btrfs-progs:
extent-cache: actually cache extent buffers") but never got utilized.
Thus we can just remove it.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:08 +02:00
Boris Burkov
980ba4e842 btrfs-progs: receive: add support for fs-verity
Process an enable_verity cmd by running the enable verity ioctl on the
file. Since enabling verity denies write access to the file, it is
important that we don't have any open write file descriptors.

This also revs the send stream format to version 3 with no format
changes besides the new commands and attributes. This version is not
finalized and commands may change, also this needs to be synchronized
with any kernel changes.

Note: the build is conditional on the header linux/fsverity.h

Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:08 +02:00
David Sterba
feef6aaaf6 btrfs-progs: kernel-lib: remove radix-tree
The radix-tree is not used in userspace code. In kernel it's for
tracking unpersisted and in-memory structures and has been replaced by
the xarray.

Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:07 +02:00
Qu Wenruo
d8f3355734 btrfs-progs: unexport csum_tree_block()
The function csum_tree_block() is not really utilized by anyone, all
current callers just use csum_tree_block_size().

Furthermore there is a stale definition in common/utils.h which is using
the old "struct btrfs_root" as the first argument, while we have already
migrated to "struct btrfs_fs_info".

So just unexport csum_tree_block() and remove the stale definition.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:06:11 +02:00
David Sterba
d5e15ba825 btrfs-progs: fix may be unused warning in load_free_space_extents
Some compilers warn about potentially unused variable, however the value
validity is guarded by have_prev so this can't happen and it's probably
insufficient analysis on the compiler side. Let's initialize the
prev_key to zeros that would also work as the condition.

  In file included from /usr/include/stdio.h:894,
		   from ./kerncompat.h:27,
		   from ./kernel-lib/list.h:23,
		   from ./kernel-shared/ctree.h:24,
		   from kernel-shared/free-space-tree.c:19:
  In function ‘fprintf’,
      inlined from ‘load_free_space_extents’ at kernel-shared/free-space-tree.c:1446:5,
      inlined from ‘load_free_space_tree’ at kernel-shared/free-space-tree.c:1577:9:
  /usr/include/bits/stdio2.h:105:10: warning: ‘prev_key.objectid’ may be used uninitialized [-Wmaybe-uninitialized]
    105 |   return __fprintf_chk (__stream, __USE_FORTIFY_LEVEL - 1, __fmt,
	|          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    106 |                         __va_arg_pack ());
	|                         ~~~~~~~~~~~~~~~~~
  kernel-shared/free-space-tree.c: In function ‘load_free_space_tree’:
  kernel-shared/free-space-tree.c:1398:31: note: ‘prev_key.objectid’ was declared here
   1398 |         struct btrfs_key key, prev_key;

Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:06:11 +02:00
Qu Wenruo
2f2f6bfe17 btrfs-progs: btrfstune: add the ability to convert to block group tree feature
The new '-b' option will be responsible for converting to block group
tree compat ro feature.

The workflow looks like this for new convert:

- Setting CHANGING_BG_TREE flag
  And initialize fs_info->last_converted_bg_bytenr value to (u64)-1.

  Any bg with bytenr >= last_converted_bg_bytenr will have its bg item
  update go to the new root (bg tree).

- Iterate each block group by their bytenr in descending order
  This involves:
  * Delete the old bg item from the old tree (extent tree)
  * Update last_converted_bg_bytenr to the bytenr of the bg
  * Add the new bg item into the new tree (bg tree)
  * If we have converted a bunch of bgs, commit current transaction

- Clear CHANGING_BG_TREE flag
  And set the new BLOCK_GROUP_TREE compat ro flag and commit.

And since we're doing the convert in multiple transactions, we also need
to resume from last interrupted convert.

In that case, we just grab the last unconverted bg, and start from it.

And to co-operate with the new kernel requirement for both no-holes and
free-space-tree features, the convert tool will check for
free-space-tree feature. If not enabled, will error out with an error
message to how to continue (by mounting with "-o space_cache=v2").

For missing no-holes feature, we just need to set the flag during
convert.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-12 18:25:32 +02:00
Qu Wenruo
1430b41427 btrfs-progs: separate block group tree from extent tree v2
Block group tree feature is completely a standalone feature, and it has
been over 5 years before the initial introduction to solve the long
mount time.

I don't really want to waste another 5 years waiting for a feature which
may or may not work, but definitely not properly reviewed for its
preparation patches.

So this patch will separate the block group tree feature into a
standalone compat RO feature.

There is a catch, in mkfs create_block_group_tree(), current
tree-checker only accepts block group item with valid chunk_objectid,
but the existing code from extent-tree-v2 didn't properly initialize it.

This patch will also fix above mentioned problem so kernel can mount it
correctly.

Now mkfs/fsck should be able to handle the fs with block group tree.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-12 18:25:32 +02:00
Qu Wenruo
c5a21a7814 btrfs-progs: don't save block group root into super block
The extent tree v2 (thankfully not yet fully materialized) needs a
new root for storing all block group items.

My initial proposal years ago just added a new tree rootid, and load it
from tree root, just like what we did for quota/free space tree/uuid/extent
roots.

But the extent tree v2 patches introduced a completely new (and to me,
wasteful) way to store block group tree root into super block.

Currently there are only 3 trees stored in super blocks, and they all
have their valid reasons:

- Chunk root
  Needed for bootstrap.

- Tree root
  Really the entrance of all trees.

- Log root
  This is special as log root has to be updated out of existing
  transaction mechanism.

There is not even any reason to put block group root into super blocks,
the block group tree is updated at the same timing as old extent tree,
no need for extra bootstrap/out-of-transaction update.

So just move block group root from super block into tree root.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-12 15:31:27 +02:00
Qu Wenruo
e47c34821f btrfs-progs: rescue: allow fix-device-size to shrink device item
If we found that the underlying block device size is smaller than
total_bytes in dev item, kernel will reject the mount, and there is no
progs tool to fix it.

Under most case it's just a small mismatch, and there is no dev extent
in the shrunk range.

In that case, we can let "btrfs rescue fix-device-size" to reset the
total_bytes in dev items to fix.

We add some extra checks, like to make sure there is no dev extent in
the shrunk device range, to make sure we won't lose data during the
device item shrink.

And also update the test case to verify the repaired fs can pass the
check.

Issue: #504
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-12 15:31:21 +02:00
Qu Wenruo
75fea7496c btrfs-progs: use write_data_to_disk() to handle RAID56 in write_and_map_eb()
Function write_data_to_disk() can handle RAID56 writes without any
problem.

So just call write_data_to_disk() inside write_and_map_eb() instead of
manually doing the RAID56 write.

Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-08-16 15:18:12 +02:00
Qu Wenruo
2060120201 btrfs-progs: fix a BUG_ON() condition for write_data_to_disk()
The BUG_ON() condition in write_data_to_disk() is no longer correct.

Now write_raid56_with_parity() will return the bytes written of last
stripe.

Thus a success writeback can trigger the BUG_ON(ret).

Fix the condition to (ret < 0).

Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-08-16 15:18:12 +02:00
Qu Wenruo
fc6925bfd3 btrfs-progs: avoid repeated data write for metadata
[BUG]
Shinichiro reported that "mkfs.btrfs -m DUP" is doing repeated write
into the device.
For non-zoned device this is not a big deal, but for zoned device this
is critical, as zoned device doesn't support overwrite at all.

[CAUSE]
The problem is related to write_and_map_eb() call, since commit
2a93728391 ("btrfs-progs: use write_data_to_disk() to replace
write_extent_to_disk()"), we call write_data_to_disk() for metadata
write back.

But the problem is, write_data_to_disk() will call btrfs_map_block()
with rw = WRITE.

By that btrfs_map_block() will always return all stripes, while in
write_data_to_disk() we also iterate through each mirror of the range.

This results above repeated writeback.

[FIX]
Fix this problem by completely remove @mirror argument
from write_data_to_disk().
With extra comments to explicitly show that function will write to
all mirrors.

Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: 2a93728391 ("btrfs-progs: use write_data_to_disk() to replace write_extent_to_disk()")
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-08-16 15:18:12 +02:00
Boris Burkov
ba7b281049 btrfs-progs: add VERITY ro compat flag
This compat flag is missing, but is being checked by mount, and could
well be present legitimately.

Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-08-16 15:18:11 +02:00
Su Yue
d0a99313e5 btrfs-progs: save item data end in u64 to avoid overflow in btrfs_check_leaf()
Similar to kernel check_leaf(), calling btrfs_item_end_nr() may get a
reasonable value even an item has invalid offset/size due to u32
overflow.

Fix it by use u64 variable to store item data end in btrfs_check_leaf()
to avoid u32 overflow.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215299
Reported-by: Wenqing Liu <wenqingliu0120@gmail.com>
Signed-off-by: Su Yue <l@damenly.su>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-08-16 15:18:11 +02:00
Qu Wenruo
963188943f btrfs-progs: make btrfs_super_block::log_root_transid deprecated
This is the same on-disk format update synchronized from the kernel
code.

Unlike kernel, there are two callers reading this member:

- btrfs inspect dump-super
  It's just printing the value, add a notice about deprecation.

- btrfs-find-root
  In that case, since we always got 0, the root search for log root
  should never find a perfect match.

  Use btrfs_super_geneartion() + 1 to provide a better result.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-08-16 15:18:11 +02:00
David Sterba
8356c423e6 btrfs-progs: receive: implement FILEATTR command
The initial proposal for file attributes was built on simply doing
SETFLAGS but this builds on an old and non-extensible interface that has
no direct mapping for all inode flags. There's a unified interface
fileattr that covers file attributes and xflags, it should be possible
to add new bits.

On the protocol level the value is copied as-is in the original inode
but this does not provide enough information how to apply the bits on
the receiving side. Eg. IMMUTABLE flag prevents any changes to the file
and has to be handled manually.

The receiving side does not apply the bits yet, only parses it from the
stream.

Signed-off-by: David Sterba <dsterba@suse.com>
2022-08-16 15:18:11 +02:00
Omar Sandoval
0ee5b22345 btrfs-progs: send: stream v2 ioctl flags
First, add a --proto option to allow specifying the desired send
protocol version. It defaults to one, the original version. In a couple
of releases once people are aware that protocol revisions are happening,
we can change it to default to zero, which means the latest version
supported by the kernel. This is based on Dave Sterba's patch.

Also add a --compressed-data flag to instruct the kernel to use
encoded_write commands for compressed extents. This requires an explicit
opt in separate from the protocol version because:

1. The user may not want compression on the receiving side, or may want
   a different compression algorithm/level on the receiving side.
2. It has a soft requirement for kernel support on the receiving side
   (btrfs-progs can fall back to decompressing and writing if the kernel
   doesn't support BTRFS_IOC_ENCODED_WRITE, but the user may not be
   prepared to pay that CPU cost). Going forward, since it's easier to
   update progs than the kernel, I think we'll want to make new send
   features that require kernel support opt-in, whereas anything that
   only requires a progs update can happen automatically.

Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-06-07 13:59:33 +02:00