Commit Graph

218 Commits

Author SHA1 Message Date
Qu Wenruo
2f2f6bfe17 btrfs-progs: btrfstune: add the ability to convert to block group tree feature
The new '-b' option will be responsible for converting to block group
tree compat ro feature.

The workflow looks like this for new convert:

- Setting CHANGING_BG_TREE flag
  And initialize fs_info->last_converted_bg_bytenr value to (u64)-1.

  Any bg with bytenr >= last_converted_bg_bytenr will have its bg item
  update go to the new root (bg tree).

- Iterate each block group by their bytenr in descending order
  This involves:
  * Delete the old bg item from the old tree (extent tree)
  * Update last_converted_bg_bytenr to the bytenr of the bg
  * Add the new bg item into the new tree (bg tree)
  * If we have converted a bunch of bgs, commit current transaction

- Clear CHANGING_BG_TREE flag
  And set the new BLOCK_GROUP_TREE compat ro flag and commit.

And since we're doing the convert in multiple transactions, we also need
to resume from last interrupted convert.

In that case, we just grab the last unconverted bg, and start from it.

And to co-operate with the new kernel requirement for both no-holes and
free-space-tree features, the convert tool will check for
free-space-tree feature. If not enabled, will error out with an error
message to how to continue (by mounting with "-o space_cache=v2").

For missing no-holes feature, we just need to set the flag during
convert.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-12 18:25:32 +02:00
Qu Wenruo
1430b41427 btrfs-progs: separate block group tree from extent tree v2
Block group tree feature is completely a standalone feature, and it has
been over 5 years before the initial introduction to solve the long
mount time.

I don't really want to waste another 5 years waiting for a feature which
may or may not work, but definitely not properly reviewed for its
preparation patches.

So this patch will separate the block group tree feature into a
standalone compat RO feature.

There is a catch, in mkfs create_block_group_tree(), current
tree-checker only accepts block group item with valid chunk_objectid,
but the existing code from extent-tree-v2 didn't properly initialize it.

This patch will also fix above mentioned problem so kernel can mount it
correctly.

Now mkfs/fsck should be able to handle the fs with block group tree.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-12 18:25:32 +02:00
Qu Wenruo
c5a21a7814 btrfs-progs: don't save block group root into super block
The extent tree v2 (thankfully not yet fully materialized) needs a
new root for storing all block group items.

My initial proposal years ago just added a new tree rootid, and load it
from tree root, just like what we did for quota/free space tree/uuid/extent
roots.

But the extent tree v2 patches introduced a completely new (and to me,
wasteful) way to store block group tree root into super block.

Currently there are only 3 trees stored in super blocks, and they all
have their valid reasons:

- Chunk root
  Needed for bootstrap.

- Tree root
  Really the entrance of all trees.

- Log root
  This is special as log root has to be updated out of existing
  transaction mechanism.

There is not even any reason to put block group root into super blocks,
the block group tree is updated at the same timing as old extent tree,
no need for extra bootstrap/out-of-transaction update.

So just move block group root from super block into tree root.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-12 15:31:27 +02:00
Qu Wenruo
e47c34821f btrfs-progs: rescue: allow fix-device-size to shrink device item
If we found that the underlying block device size is smaller than
total_bytes in dev item, kernel will reject the mount, and there is no
progs tool to fix it.

Under most case it's just a small mismatch, and there is no dev extent
in the shrunk range.

In that case, we can let "btrfs rescue fix-device-size" to reset the
total_bytes in dev items to fix.

We add some extra checks, like to make sure there is no dev extent in
the shrunk device range, to make sure we won't lose data during the
device item shrink.

And also update the test case to verify the repaired fs can pass the
check.

Issue: #504
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-12 15:31:21 +02:00
Qu Wenruo
75fea7496c btrfs-progs: use write_data_to_disk() to handle RAID56 in write_and_map_eb()
Function write_data_to_disk() can handle RAID56 writes without any
problem.

So just call write_data_to_disk() inside write_and_map_eb() instead of
manually doing the RAID56 write.

Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-08-16 15:18:12 +02:00
Qu Wenruo
2060120201 btrfs-progs: fix a BUG_ON() condition for write_data_to_disk()
The BUG_ON() condition in write_data_to_disk() is no longer correct.

Now write_raid56_with_parity() will return the bytes written of last
stripe.

Thus a success writeback can trigger the BUG_ON(ret).

Fix the condition to (ret < 0).

Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-08-16 15:18:12 +02:00
Qu Wenruo
fc6925bfd3 btrfs-progs: avoid repeated data write for metadata
[BUG]
Shinichiro reported that "mkfs.btrfs -m DUP" is doing repeated write
into the device.
For non-zoned device this is not a big deal, but for zoned device this
is critical, as zoned device doesn't support overwrite at all.

[CAUSE]
The problem is related to write_and_map_eb() call, since commit
2a93728391 ("btrfs-progs: use write_data_to_disk() to replace
write_extent_to_disk()"), we call write_data_to_disk() for metadata
write back.

But the problem is, write_data_to_disk() will call btrfs_map_block()
with rw = WRITE.

By that btrfs_map_block() will always return all stripes, while in
write_data_to_disk() we also iterate through each mirror of the range.

This results above repeated writeback.

[FIX]
Fix this problem by completely remove @mirror argument
from write_data_to_disk().
With extra comments to explicitly show that function will write to
all mirrors.

Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: 2a93728391 ("btrfs-progs: use write_data_to_disk() to replace write_extent_to_disk()")
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-08-16 15:18:12 +02:00
Boris Burkov
ba7b281049 btrfs-progs: add VERITY ro compat flag
This compat flag is missing, but is being checked by mount, and could
well be present legitimately.

Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-08-16 15:18:11 +02:00
Su Yue
d0a99313e5 btrfs-progs: save item data end in u64 to avoid overflow in btrfs_check_leaf()
Similar to kernel check_leaf(), calling btrfs_item_end_nr() may get a
reasonable value even an item has invalid offset/size due to u32
overflow.

Fix it by use u64 variable to store item data end in btrfs_check_leaf()
to avoid u32 overflow.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215299
Reported-by: Wenqing Liu <wenqingliu0120@gmail.com>
Signed-off-by: Su Yue <l@damenly.su>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-08-16 15:18:11 +02:00
Qu Wenruo
963188943f btrfs-progs: make btrfs_super_block::log_root_transid deprecated
This is the same on-disk format update synchronized from the kernel
code.

Unlike kernel, there are two callers reading this member:

- btrfs inspect dump-super
  It's just printing the value, add a notice about deprecation.

- btrfs-find-root
  In that case, since we always got 0, the root search for log root
  should never find a perfect match.

  Use btrfs_super_geneartion() + 1 to provide a better result.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-08-16 15:18:11 +02:00
David Sterba
8356c423e6 btrfs-progs: receive: implement FILEATTR command
The initial proposal for file attributes was built on simply doing
SETFLAGS but this builds on an old and non-extensible interface that has
no direct mapping for all inode flags. There's a unified interface
fileattr that covers file attributes and xflags, it should be possible
to add new bits.

On the protocol level the value is copied as-is in the original inode
but this does not provide enough information how to apply the bits on
the receiving side. Eg. IMMUTABLE flag prevents any changes to the file
and has to be handled manually.

The receiving side does not apply the bits yet, only parses it from the
stream.

Signed-off-by: David Sterba <dsterba@suse.com>
2022-08-16 15:18:11 +02:00
Omar Sandoval
0ee5b22345 btrfs-progs: send: stream v2 ioctl flags
First, add a --proto option to allow specifying the desired send
protocol version. It defaults to one, the original version. In a couple
of releases once people are aware that protocol revisions are happening,
we can change it to default to zero, which means the latest version
supported by the kernel. This is based on Dave Sterba's patch.

Also add a --compressed-data flag to instruct the kernel to use
encoded_write commands for compressed extents. This requires an explicit
opt in separate from the protocol version because:

1. The user may not want compression on the receiving side, or may want
   a different compression algorithm/level on the receiving side.
2. It has a soft requirement for kernel support on the receiving side
   (btrfs-progs can fall back to decompressing and writing if the kernel
   doesn't support BTRFS_IOC_ENCODED_WRITE, but the user may not be
   prepared to pay that CPU cost). Going forward, since it's easier to
   update progs than the kernel, I think we'll want to make new send
   features that require kernel support opt-in, whereas anything that
   only requires a progs update can happen automatically.

Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-06-07 13:59:33 +02:00
Omar Sandoval
1c05b10008 btrfs-progs: receive: add send stream v2 commands and attributes
Update our copy of send.h from the kernel. This adds the new commands
and attributes for v2 as well as explicit enum numbering.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-06-07 13:59:32 +02:00
Boris Burkov
a82996e1b6 btrfs-progs: receive: dynamically allocate sctx->read_buf
In send stream v2, write commands can now be an arbitrary size. For that
reason, we can no longer allocate a fixed array in sctx for read_cmd.
Instead, read_cmd dynamically allocates sctx->read_buf. To avoid
needless reallocations, we reuse read_buf between read_cmd calls by also
keeping track of the size of the allocated buffer in sctx->read_buf_sz.

We do the first allocation of the old default size at the start of
processing the stream, and we only reallocate if we encounter a command
that needs a larger buffer.

Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-06-07 13:59:31 +02:00
David Sterba
0f65bf66be btrfs-progs: libbtrfs: drop ifdef BTRFS_FLAT_INCLUDES where not necessary
Headers that are only exported and not used for build do not need the
BTRFS_FLAT_INCLUDES switch (between local and installed headers). Now
that there are local copies of the shared headers drop the respective
part from local headers.

Signed-off-by: David Sterba <dsterba@suse.com>
2022-06-06 15:48:52 +02:00
Johannes Thumshirn
a7ae6d5948 btrfs-progs: zoned: add upper and lower zone size boundaries
As we're not supporting arbitrarily big or small zone sizes in the kernel,
reject devices that don't fit in progs as well.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-06-06 15:47:50 +02:00
Qu Wenruo
38f90e906e btrfs-progs: properly initialize block group thresholds
[BUG]
When creating btrfs with new v2 cache (the default behavior), mkfs.btrfs
always create the free space tree using bitmap.

It's fine for small fs, but will be a disaster if the device is large
and the data profile is something like RAID0:

  $ mkfs.btrfs  -f -m raid1 -d raid0 /dev/test/scratch[1234]
  btrfs-progs v5.17
  [...]
  Block group profiles:
    Data:             RAID0             4.00GiB
    Metadata:         RAID1           256.00MiB
    System:           RAID1             8.00MiB
  [..]

  $ btrfs ins dump-tree -t free-space /dev/test/scratch1
  btrfs-progs v5.17
  free space tree key (FREE_SPACE_TREE ROOT_ITEM 0)
  node 30441472 level 1 items 10 free space 483 generation 6 owner FREE_SPACE_TREE
  node 30441472 flags 0x1(WRITTEN) backref revision 1
  fs uuid deddccae-afd0-4160-9a12-48fe7b526fb1
  chunk uuid 68f6cf98-afe3-4f47-9797-37fd9c610219
          key (1048576 FREE_SPACE_INFO 4194304) block 30457856 gen 6
          key (475004928 FREE_SPACE_BITMAP 8388608) block 30703616 gen 5
          key (953155584 FREE_SPACE_BITMAP 8388608) block 30720000 gen 5
          key (1431306240 FREE_SPACE_BITMAP 8388608) block 30736384 gen 5
          key (1909456896 FREE_SPACE_BITMAP 8388608) block 30752768 gen 5
          key (2387607552 FREE_SPACE_BITMAP 8388608) block 30769152 gen 5
          key (2865758208 FREE_SPACE_BITMAP 8388608) block 30785536 gen 5
          key (3343908864 FREE_SPACE_BITMAP 8388608) block 30801920 gen 5
          key (3822059520 FREE_SPACE_BITMAP 8388608) block 30818304 gen 5
          key (4300210176 FREE_SPACE_BITMAP 8388608) block 30834688 gen 5
  [...]
  ^^^ So many bitmaps that an empty fs will have two levels for free
      space tree already

[CAUSE]
Member btrfs_block_group::bitmap_high_thresh is never properly set to
any value other than 0, thus in function
update_free_space_extent_count(), the following check is always true:

	if (!(flags & BTRFS_FREE_SPACE_USING_BITMAPS) &&
	    extent_count > block_group->bitmap_high_thresh) {
		ret = convert_free_space_to_bitmaps(trans, block_group, path);

Thus we always got converted to bitmaps.

[FIX]
Cross-port the function set_free_space_tree_thresholds() from kernel,
and call that function in btrfs_make_block_group() and
read_one_block_group() so that every block group has bitmap_high_thresh
properly set.

Now even for that 4GiB large data chunk, we still only have one free extent:

  btrfs-progs v5.17
  free space tree key (FREE_SPACE_TREE ROOT_ITEM 0)
  leaf 30572544 items 15 free space 15860 generation 6 owner FREE_SPACE_TREE
  leaf 30572544 flags 0x1(WRITTEN) backref revision 1
  fs uuid b24e52ea-6580-4a88-aa70-cb173090bfe3
  chunk uuid d85f3905-fc61-4084-b335-2b6b97814b8e
  [...]
          item 13 key (298844160 FREE_SPACE_INFO 4294967296) itemoff 16235 itemsize 8
                  free space info extent count 1 flags 0
          item 14 key (298844160 FREE_SPACE_EXTENT 4294967296) itemoff 16235 itemsize 0
                  free space extent

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-20 15:54:20 +02:00
Qu Wenruo
9bded24a46 btrfs-progs: do not use btrfs_commit_transaction() just to update super blocks
There are several call sites utilizing btrfs_commit_transaction() just
to update members in super blocks, without any metadata update.

This can be problematic for some simple call sites, like zero_log_tree()
or check_and_repair_super_num_devs().

If we have big problems preventing the fs to be mounted in the first
place, and need to clear the log or super block size, but by some other
problems in extent tree, we're unable to allocate new blocks.

Then we fall into a deadlock that, we need to mount (even
ro,rescue=all) to collect extra info, but btrfs-progs can not do any
super block updates.

Fix the problem by allowing the following super blocks only operations
to be done without using btrfs_commit_transaction():

- btrfs_fix_super_size()
- check_and_repair_super_num_devs()
- zero_log_tree().

There are some exceptions in btrfstune.c, related to the csum type
conversion and seed flags.

In those btrfstune cases, we in fact wants to proper error report in
btrfs_commit_transaction(), as those operations are not mount critical,
and any early error can be helpful to expose any problems in the fs.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-20 15:54:16 +02:00
David Sterba
f1178950d3 btrfs-progs: btrfstune: fix build-time detection of experimental features
Qu noticed that the full checksums are still printed even if the
experimental build is not enabled. This is caused by wrong use of #ifdef
(as the macro is always defined), this must be "#if".

Fixes: 1bb6fb896d ("btrfs-progs: btrfstune: experimental, new option to switch csums")
Reported-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-10 15:42:13 +02:00
Qu Wenruo
50a5dfde6d btrfs-progs: print-tree: print the checksum of header without tailing zeros
For the default CRC32C checksum, print-tree now prints tons of
unnecessary padding zeros:

  btrfs-progs v5.17
  chunk tree
  leaf 22036480 items 7 free space 15430 generation 6 owner CHUNK_TREE
  leaf 22036480 flags 0x1(WRITTEN) backref revision 1
  checksum stored 0ac1b9fa00000000000000000000000000000000000000000000000000000000
  checksum calced 0ac1b9fa00000000000000000000000000000000000000000000000000000000
  fs uuid 3d95b7e3-3ab6-4927-af56-c58aa634342e

This is caused by commit 1bb6fb896d ("btrfs-progs: btrfstune:
experimental, new option to switch csums"), and it looks like most
distros just enable EXPERIMENTAL features by default.
(Which is a good thing to provide much better coverage).

So here we just limit the csum print to the utilized csum size.

Now the output looks like:

  btrfs-progs v5.17
  chunk tree
  leaf 22036480 items 4 free space 15781 generation 6 owner CHUNK_TREE
  leaf 22036480 flags 0x1(WRITTEN) backref revision 1
  checksum stored 676b812f
  checksum calced 676b812f
  fs uuid d11f8799-b6dc-415d-b1ed-cebe6da5f0b7

Fixes: 1bb6fb896d ("btrfs-progs: btrfstune: experimental, new option to switch csums")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-10 13:44:37 +02:00
Qu Wenruo
851ef59b2c btrfs-progs: remove the unused btrfs_fs_info::seeding member
This member is not used by anyone, just remove it.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-29 22:13:22 +02:00
Qu Wenruo
4e9e978783 btrfs-progs: allow read_data_from_disk() to rebuild RAID56 using P/Q
This new ability is added by:

- Allow btrfs_map_block() to return the chunk type
  This makes later work much easier

- Only reset stripe offset inside btrfs_map_block() when needed
  Currently if @raid_map is not NULL, btrfs_map_block() will consider
  this call is for WRITE and will reset stripe offset.

  This is no longer the case, as for RAID56 read with mirror_num 1/0,
  we will still call btrfs_map_block() with non-NULL raid_map.

  Add a small check to make sure we won't reset stripe offset for
  mirror 1/0 read.

- Add new helper read_raid56() to handle rebuild
  We will read the full stripe (including all data and P/Q stripes)
  do the rebuild, then only copy the refered part to the caller.

  There is a catch for RAID6, we have no way to exhaust all combination,
  so the current repair will assume the mirror = 0 data is corrupted,
  then try to find a missing device.

  But if no missing device can be found, it will assume P is corrupted.
  This is just a guess, and can to totally wrong, but we have no better
  idea.

Now btrfs-progs have full read ability for RAID56.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-25 19:08:30 +02:00
Qu Wenruo
a99bece1cd btrfs-progs: remove extent_buffer::fd and extent_buffer::dev_bytes
Those two members are a shortcut for non-RAID56 profiles.

But we should not use such shortcut, and move all our logical address
read/write to the unified read_data_from_disk()/write_data_to_disk().

With previous refactors, now we're safe to remove them.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-25 19:08:30 +02:00
Qu Wenruo
3ff9d35257 btrfs-progs: use read_data_from_disk() to replace read_extent_from_disk() and replace read_extent_data()
The function read_extent_from_disk() is only a wrapper to read tree
block.

And read_extent_data() is just a while loop to eliminate short read
caused by stripe boundary.

In fact, a lot of call sites of read_extent_data() are either reading
metadata (thus no possible short read) or doing extra loop by
themselves.

This patch will replace those two functions with read_data_from_disk(),
making it the only entrance for data/metadata read.
And update read_data_from_disk() to return the read bytes, so caller can
do a simple while loop.

For the few callers of read_extent_data(), open-code a small while loop
for them.

This will allow later RAID56 read repair using P/Q much easier.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-25 19:08:30 +02:00
Qu Wenruo
2a93728391 btrfs-progs: use write_data_to_disk() to replace write_extent_to_disk()
Function write_extent_to_disk() is just writing the content of a tree
block to disk.

It can not handle RAID56, and its work is the same as
write_data_to_disk().

Thus we can replace write_extent_to_disk() with write_data_to_disk()
easily.

There is only one special call site in write_raid56_with_parity(), which
can easily be replace with btrfs_pwrite() directly.

This reduce the write entrance, and make later eb::fd removal easier.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-25 19:08:29 +02:00
Qu Wenruo
01c25d73f1 btrfs-progs: extract metadata restore read code into its own helper
For metadata restore, our logical address is mapped to a single device
with logical address 1:1 mapped to device physical address.

Move this part of code into a helper, this will make later extent buffer
read path refactoer much easier.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-25 19:07:09 +02:00
Qu Wenruo
7a0c4b5dc1 btrfs-progs: remove the unnecessary BTRFS_SUPER_INFO_OFFSET path for tree block read
We used to use read_whole_eb() to read super block, but it's no longer
the case (so long that I can not even find out which commit did the
conversion).

Thus there is no need for read_whole_eb() to handle super block read
anymore.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-25 19:07:08 +02:00
Qu Wenruo
f9659c7235 btrfs-progs: fix an error path which can lead to empty device list
[BUG]
With the incoming delayed chunk item insertion feature, there is a super
weird failure at mkfs/022:

  ====== RUN CHECK ./mkfs.btrfs -f --rootdir tmp.KnKpP5 -d dup -b 350M tests/test.img
  ...
  Checksum:           crc32c
  Number of devices:  0
  Devices:
     ID        SIZE  PATH

Note the "Number of devices: 0" line, this means our
fs_info->fs_devices->devices list is empty.

And since our rw device list is empty, we won't finish the mkfs with
proper superblock magic, and cause later btrfs check to fail.

[CAUSE]
Although the failure is only triggered by the incoming delayed chunk
item insertion feature, the bug itself is here for a while.

In btrfs_alloc_chunk(), we move rw devices to our @private_devs list
first, then in create_chunk(), we move it back to our rw devices list.

This dance is pretty dangerous, especially if btrfs_alloc_dev_extent()
failed inside create_chunk(), and current profile have multiple stripes
(including DUP), we will exit create_chunk() directly, without moving the
remaining devices in @private_devs list back to @dev_list.

Furthermore, btrfs_alloc_chunk() is expected to return -ENOSPC, as we
call btrfs_alloc_chunk() to pre-allocate chunks, and ignore the -ENOSPC
error if it's just a pre-allocation failure.

This existing error path can lead to the empty rw list seen above.

[FIX]
After create_chunk(), unconditionally move all devices in @private_devs
back to rw device list.

And add extra check to make sure our rw device list is never empty after
a chunk allocation attempt.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-25 18:33:29 +02:00
Qu Wenruo
4a940ab2c0 btrfs-progs: fix a memory leak when starting a transaction on fs with error
Function btrfs_start_transaction() will allocate the memory
unconditionally, but if the fs has an aborted transaction we don't free
the allocated memory but return error directly.

Fix it by only allocate the new memory after all the checks.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-25 18:32:17 +02:00
Naohiro Aota
fd4bab06a4 btrfs-progs: zoned: fix and simplify dev_extent_hole_check_zoned()
The previous patch revealed a bug in dev_extent_hole_check_zoned(). If the
given hole is OK to use as is, it should have just returned the hole. But
on the contrary, it shifts the hole start position by one zone. That
results in refusing any hole.

We don't use btrfs_ensure_empty_zones() in the btrfs-progs version of
dev_extent_hole_check_zoned() unlike the kernel side, because
btrfs_find_allocatable_zones() itself is doing the necessary checks. So, we
can just "return changed" if the "pos" is unchanged. That also makes the
loop and "changed" variable unnecessary.

So, fix and simplify the code in one shot.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-08 23:17:35 +02:00
Naohiro Aota
38670212dd btrfs-progs: fix ordering of hole_size setting and dev_extent_hole_check()
The hole_size is used by dev_extent_hole_check() to check the hole is OK as
a device extent. However, commit b031fe84fd ("btrfs-progs: zoned:
implement zoned chunk allocator") mis-ported the kernel code and placed
dev_extent_hole_check() before setting hole_check. That made the
dev_extent_hole_check() call here essentially pass through as we have
hole_size == 0 on mkfs time.

As a result, mkfs.btrfs creates data BG at 64 MB where the regular
superblock exists, when zone size is 16 MB.

Fix the ordering of hole_size setting and calling dev_extent_hole_check().

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-08 23:17:35 +02:00
Naohiro Aota
32c43d0c68 btrfs-progs: zoned: export sb_zone_number() and related constants
Move sb_zone_number() and related constants from zoned.c to the
corresponding header for later use.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-08 23:17:35 +02:00
Sweet Tea Dorminy
c494724858 btrfs-progs: dump-tree: add print support for verity items
'btrfs inspect-internals dump-tree' doesn't currently know about the two
types of verity items and prints them as 'UNKNOWN.36' or 'UNKNOWN.37'.
So add them to the known item types.

Suggested-by: Boris Burkov <boris@bur.io>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-24 00:49:19 +01:00
Josef Bacik
02fb308bdc btrfs-progs: make btrfs_create_tree take a key for the root key
We're going to start create global roots from mkfs, and we need to have
a offset set for the root key.  Make the btrfs_create_tree() take a key
for the root_key instead of just the objectid so we can setup these new
style roots properly.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-09 18:07:22 +01:00
Josef Bacik
5fb27deaf1 btrfs-progs: make btrfs_clear_free_space_tree extent tree v2 aware
With extent tree v2 we'll have multiple free space trees, and we can't
just unset the feature flags for the free space tree.  Fix this to loop
through all of the free space trees and clear them out properly.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-09 18:07:21 +01:00
Josef Bacik
c4164edeb5 btrfs-progs: add a btrfs_delete_and_free_root helper
The free space tree code already does this, but we need it for cleaning
up per block group roots.  Abstract this code out into a helper so that
we can use it in multiple places in the future.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-09 18:07:19 +01:00
Josef Bacik
e33738306c btrfs-progs: handle the per-block group global root id
We will now be using block_group->chunk_objectid to point at the global
root id for this particular block group.  For now we'll assign this
based on mod'ing the offset of the block group against the number of
global root id's and handle the block_group_item updating appropriately.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-09 18:07:17 +01:00
Josef Bacik
27eaa3b514 btrfs-progs: set the number of global roots in the super block
In order to make sure the file system is consistent we need to record
the number of global roots we should have in the super block.  We could
infer this from the number of global roots we find, however this could
lead to interesting fuzzing problems, so add a source of truth to the
super block in order to make it easier to verify the file system is
consistent.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-09 18:07:14 +01:00
Josef Bacik
c7a8363276 btrfs-progs: handle no bg item in extent tree for free space tree
We have an ASSERT(ret == 0) when populating the free space tree as we
should at least find the block group item with extent tree v1.  However
with v2 we no longer have the block group item in the extent tree, so
fix the population logic to handle an empty block group (which occurs
during mkfs) and only assert if ret != 0 and we don't have extent tree
v2 turned on.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-09 18:07:00 +01:00
Josef Bacik
cab7570ccc btrfs-progs: add print support for the block group tree
Add the appropriate support to the print tree and dump tree code to spit
out the block group tree.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-09 18:06:54 +01:00
Josef Bacik
9ee6cc78a8 btrfs-progs: add support for loading the block group root
This adds the ability to load the block group root, as well as make sure
the various backup super block and super block updates are made
appropriately.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-09 18:06:51 +01:00
Josef Bacik
4f184cc911 btrfs-progs: make all of the item/key_ptr offset helpers take an eb
When we change the size of the btrfs_header we're going to need to
change how these helpers calculate where to find the start of items or
block ptrs.  To prepare for that make these helpers take the
extent_buffer as an argument so we can do the appropriate math based on
the version type.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-09 15:13:14 +01:00
Josef Bacik
aba83381a5 btrfs-progs: rework the btrfs_node accessors to match the item accessors
We are duplicating the offsetof(btrfs_node, key_ptr) logic everywhere,
instead use the helper to do this work for us, and make all the node
accessors use the helper.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-09 15:13:14 +01:00
Josef Bacik
53654db311 btrfs-progs: replace btrfs_item_nr_offset(0)
btrfs_item_nr_offset(0) is simply offsetof(struct btrfs_leaf, items),
which is the same thing as btrfs_leaf_data(), so replace all calls of
btrfs_item_nr_offset(0) with btrfs_leaf_data().

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-09 15:13:14 +01:00
Josef Bacik
5dc3964aaa btrfs-progs: remove the _nr from the item helpers
Now that all callers are using the _nr variations we can simply rename
these helpers to btrfs_item_##member/btrfs_set_item_##member and change
the actual item SETGET funcs to raw_item_##member/set_raw_item_##member
and then change all callers to drop the _nr part.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-09 15:13:13 +01:00
Josef Bacik
f3be0ff01a btrfs-progs: rename btrfs_item_end_nr to btrfs_item_end
All callers use the btrfs_item_end_nr() variation, simply drop
btrfs_item_end() and make btrfs_item_end_nr() use the _nr() variations
of the item get helpers.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-09 15:13:13 +01:00
Josef Bacik
49539423fa btrfs-progs: change btrfs_file_extent_inline_item_len to take a slot
This matches how the kernel does it, simply pass in the slot and fix up
btrfs_file_extent_inline_item_len to use the btrfs_item_nr() helper and
the correct define.  Fixup all the callers to use the slot now instead
of passing in the btrfs_item.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-09 15:13:13 +01:00
Josef Bacik
04ffea07e4 btrfs-progs: add btrfs_set_item_*_nr() helpers
We have a lot of the following patterns

	item = btrfs_item_nr(nr);
	btrfs_set_item_*(eb, item, val);

	btrfs_set_item_*(eb, btrfs_item_nr(nr), val);

in a lot of places in our code.  Instead add _nr variations of these
helpers and convert all of the users to this new helper.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-09 15:13:13 +01:00
Josef Bacik
80ff3f2c8e btrfs-progs: btrfs_item_size_nr/btrfs_item_offset_nr everywhere
We have this pattern in a lot of places

	item = btrfs_item_nr(slot);
	btrfs_item_size(leaf, item);
	btrfs_item_offset(leaf, item);

when we could simply use

	btrfs_item_size_nr(leaf, slot);
	btrfs_item_offset_nr(leaf, slot);

Fix all callers of btrfs_item_size() and btrfs_item_offset() to use the
_nr variation of the helpers.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-09 15:13:13 +01:00
Josef Bacik
ed098523dc btrfs-progs: reduce usage of __BTRFS_LEAF_DATA_SIZE
This helper only takes the nodesize, but in the future it'll take a bool
to indicate if we're extent tree v2.  The remaining users are all where
we only have extent_buffer, but we should always have a valid
eb->fs_info in these cases, so add BUG_ON()'s for the !eb->fs_info case
and then convert these callers to use BTRFS_LEAF_DATA_SIZE which takes
the fs_info.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-09 15:13:13 +01:00