Commit Graph

29 Commits

Author SHA1 Message Date
Qu Wenruo
2cdc8dddbf btrfs-progs: mkfs: offset inode numbers of the source filesystem
[BUG]
When running mkfs tests on a newly rebooted minimal system, it can cause
mkfs/009 to fail.

The reproduce steps requires /tmp to has minimal files in the first
place.

  # mkdir /tmp/rootdir
  # xfs_io -f -c "pwrite 0 16k" /tmp/rootdir
  # mkfs.btrfs --rootdir /tmp/rootdir -f $dev
  # btrfs check $dev
  Opening filesystem to check...
  Checking filesystem on /dev/test/scratch1
  UUID: 6821b3db-f056-4c18-b797-32679dcd4272
  [1/7] checking root items
  [2/7] checking extents
  data backref 13631488 root 5 owner 170 offset 0 num_refs 0 not found in extent tree
  incorrect local backref count on 13631488 root 5 owner 170 offset 0 found 1 wanted 0 back 0x55ff6cd72260
  backref 13631488 root 5 not referenced back 0x55ff6cd4c1f0
  incorrect global backref count on 13631488 found 2 wanted 1
  backpointer mismatch on [13631488 16384]
  ERROR: errors found in extent allocation tree or chunk allocation

[CAUSE]
The extent tree has the following weird item:

	item 0 key (13631488 EXTENT_ITEM 16384) itemoff 16250 itemsize 33
		refs 1 gen 0 flags DATA
		tree block backref root FS_TREE

This is an extent item for data, thus it should not have an inline tree
backref.

Then checking the fs tree:

	item 0 key (170 INODE_ITEM 0) itemoff 16123 itemsize 160
		generation 7 transid 0 size 16384 nbytes 16384
		block group 0 mode 100600 links 1 uid 1000 gid 1000 rdev 0
		sequence 0 flags 0x0(none)
		atime 1664866393.0 (2022-10-04 14:53:13)
		ctime 1664863510.0 (2022-10-04 14:05:10)
		mtime 1664863455.0 (2022-10-04 14:04:15)
		otime 0.0 (1970-01-01 08:00:00)

There is an inode item before the root dir inode.

And that inode number 170 is causing the problem.

In traverse_directory(), we use the inode number reported from stat()
directly as btrfs inode number, and pass it to
btrfs_record_file_extent(), which finally calls btrfs_inc_extent_ref(),
with above 170 passed as @owner parameter.

But inside btrfs_inc_extent_ref() we use that @owner value to determine
if it's a data backref.
Since we got a smaller than BTRFS_FIRST_FREE_OBJECTID, btrfs treats it
as tree block, and cause the above problem.

[FIX]
As a quick fix, always add BTRFS_FIRST_FREE_OBJECTID to all inode number
directly grabbed from stat().

And add an ASSERT() in __btrfs_record_file_extent() to catch unexpected
objectid.

This is not a perfect solution, as the resulted fs will has a huge gap
in its inodes:

	item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
	item 4 key (426 INODE_ITEM 0) itemoff 15883 itemsize 160

For a proper fix, we should allocate new btrfs inode numbers in a
sequential order, but that would be another series of patches.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:10 +02:00
Qu Wenruo
dad9db45bb btrfs-progs: properly initialize extent generation in __btrfs_record_file_extent()
[BUG]
When using mkfs.btrfs --rootdir option, the data extents generated will
have 0 as their generation in extent tree:

  # mkdir /tmp/rootdir
  # xfs_io -f -c "pwrite 0 16k" /tmp/rootdir/foobar
  # mkfs.btrfs -f --rootdir /tmp/rootdir $dev
  # btrfs ins dump-tree -t extent $dev
  btrfs-progs v5.19.1
  extent tree key (EXTENT_TREE ROOT_ITEM 0)
  leaf 30474240 items 13 free space 15536 generation 7 owner EXTENT_TREE
  leaf 30474240 flags 0x1(WRITTEN) backref revision 1
  fs uuid c1f05988-49f9-4dd4-8489-b90d60f522ee
  chunk uuid 40f81603-fe75-4f58-aa9e-e74e28df8523
	item 0 key (13631488 EXTENT_ITEM 16384) itemoff 16230 itemsize 53
		refs 1 gen 0 flags DATA <<< Generation is 0
  ...

[CAUSE]
In __btrfs_record_file_extent() we just set the extent generation to 0.

[FIX]
Use trans->transid to properly fill extent generation.

Now after mkfs, the first data extent backref looks like this:

	item 0 key (13631488 EXTENT_ITEM 16384) itemoff 16230 itemsize 53
		refs 1 gen 7 flags DATA
        ...

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:10 +02:00
David Sterba
feef6aaaf6 btrfs-progs: kernel-lib: remove radix-tree
The radix-tree is not used in userspace code. In kernel it's for
tracking unpersisted and in-memory structures and has been replaced by
the xarray.

Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:07 +02:00
Qu Wenruo
2f2f6bfe17 btrfs-progs: btrfstune: add the ability to convert to block group tree feature
The new '-b' option will be responsible for converting to block group
tree compat ro feature.

The workflow looks like this for new convert:

- Setting CHANGING_BG_TREE flag
  And initialize fs_info->last_converted_bg_bytenr value to (u64)-1.

  Any bg with bytenr >= last_converted_bg_bytenr will have its bg item
  update go to the new root (bg tree).

- Iterate each block group by their bytenr in descending order
  This involves:
  * Delete the old bg item from the old tree (extent tree)
  * Update last_converted_bg_bytenr to the bytenr of the bg
  * Add the new bg item into the new tree (bg tree)
  * If we have converted a bunch of bgs, commit current transaction

- Clear CHANGING_BG_TREE flag
  And set the new BLOCK_GROUP_TREE compat ro flag and commit.

And since we're doing the convert in multiple transactions, we also need
to resume from last interrupted convert.

In that case, we just grab the last unconverted bg, and start from it.

And to co-operate with the new kernel requirement for both no-holes and
free-space-tree features, the convert tool will check for
free-space-tree feature. If not enabled, will error out with an error
message to how to continue (by mounting with "-o space_cache=v2").

For missing no-holes feature, we just need to set the flag during
convert.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-12 18:25:32 +02:00
Qu Wenruo
38f90e906e btrfs-progs: properly initialize block group thresholds
[BUG]
When creating btrfs with new v2 cache (the default behavior), mkfs.btrfs
always create the free space tree using bitmap.

It's fine for small fs, but will be a disaster if the device is large
and the data profile is something like RAID0:

  $ mkfs.btrfs  -f -m raid1 -d raid0 /dev/test/scratch[1234]
  btrfs-progs v5.17
  [...]
  Block group profiles:
    Data:             RAID0             4.00GiB
    Metadata:         RAID1           256.00MiB
    System:           RAID1             8.00MiB
  [..]

  $ btrfs ins dump-tree -t free-space /dev/test/scratch1
  btrfs-progs v5.17
  free space tree key (FREE_SPACE_TREE ROOT_ITEM 0)
  node 30441472 level 1 items 10 free space 483 generation 6 owner FREE_SPACE_TREE
  node 30441472 flags 0x1(WRITTEN) backref revision 1
  fs uuid deddccae-afd0-4160-9a12-48fe7b526fb1
  chunk uuid 68f6cf98-afe3-4f47-9797-37fd9c610219
          key (1048576 FREE_SPACE_INFO 4194304) block 30457856 gen 6
          key (475004928 FREE_SPACE_BITMAP 8388608) block 30703616 gen 5
          key (953155584 FREE_SPACE_BITMAP 8388608) block 30720000 gen 5
          key (1431306240 FREE_SPACE_BITMAP 8388608) block 30736384 gen 5
          key (1909456896 FREE_SPACE_BITMAP 8388608) block 30752768 gen 5
          key (2387607552 FREE_SPACE_BITMAP 8388608) block 30769152 gen 5
          key (2865758208 FREE_SPACE_BITMAP 8388608) block 30785536 gen 5
          key (3343908864 FREE_SPACE_BITMAP 8388608) block 30801920 gen 5
          key (3822059520 FREE_SPACE_BITMAP 8388608) block 30818304 gen 5
          key (4300210176 FREE_SPACE_BITMAP 8388608) block 30834688 gen 5
  [...]
  ^^^ So many bitmaps that an empty fs will have two levels for free
      space tree already

[CAUSE]
Member btrfs_block_group::bitmap_high_thresh is never properly set to
any value other than 0, thus in function
update_free_space_extent_count(), the following check is always true:

	if (!(flags & BTRFS_FREE_SPACE_USING_BITMAPS) &&
	    extent_count > block_group->bitmap_high_thresh) {
		ret = convert_free_space_to_bitmaps(trans, block_group, path);

Thus we always got converted to bitmaps.

[FIX]
Cross-port the function set_free_space_tree_thresholds() from kernel,
and call that function in btrfs_make_block_group() and
read_one_block_group() so that every block group has bitmap_high_thresh
properly set.

Now even for that 4GiB large data chunk, we still only have one free extent:

  btrfs-progs v5.17
  free space tree key (FREE_SPACE_TREE ROOT_ITEM 0)
  leaf 30572544 items 15 free space 15860 generation 6 owner FREE_SPACE_TREE
  leaf 30572544 flags 0x1(WRITTEN) backref revision 1
  fs uuid b24e52ea-6580-4a88-aa70-cb173090bfe3
  chunk uuid d85f3905-fc61-4084-b335-2b6b97814b8e
  [...]
          item 13 key (298844160 FREE_SPACE_INFO 4294967296) itemoff 16235 itemsize 8
                  free space info extent count 1 flags 0
          item 14 key (298844160 FREE_SPACE_EXTENT 4294967296) itemoff 16235 itemsize 0
                  free space extent

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-20 15:54:20 +02:00
Josef Bacik
e33738306c btrfs-progs: handle the per-block group global root id
We will now be using block_group->chunk_objectid to point at the global
root id for this particular block group.  For now we'll assign this
based on mod'ing the offset of the block group against the number of
global root id's and handle the block_group_item updating appropriately.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-09 18:07:17 +01:00
Josef Bacik
9ee6cc78a8 btrfs-progs: add support for loading the block group root
This adds the ability to load the block group root, as well as make sure
the various backup super block and super block updates are made
appropriately.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-09 18:06:51 +01:00
Josef Bacik
5dc3964aaa btrfs-progs: remove the _nr from the item helpers
Now that all callers are using the _nr variations we can simply rename
these helpers to btrfs_item_##member/btrfs_set_item_##member and change
the actual item SETGET funcs to raw_item_##member/set_raw_item_##member
and then change all callers to drop the _nr part.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-09 15:13:13 +01:00
Josef Bacik
db2ab47823 btrfs-progs: stop accessing ->extent_root directly
When we switch to multiple global trees we'll need to access the
appropriate extent root depending on the block group or possibly root.
To handle this, use a helper in most places and then the actual root in
places where it is required.  We will whittle down the direct accessors
with future patches, but this does the bulk of the preparatory work.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-11-30 18:56:54 +01:00
Josef Bacik
550fd48136 btrfs-progs: move btrfs_fix_block_accounting to repair.c
We have this helper sitting in extent-tree.c, but it's a repair
function.  I'm going to need to make changes to this for extent-tree-v2
and would rather this live outside of the code we need to share with the
kernel.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-11-22 21:45:37 +01:00
Josef Bacik
7119dc3d79 btrfs-progs: simplify btrfs_make_block_group
This is doing the same work as insert_block_group_item, rework it to
call the helper instead.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-11-22 21:45:37 +01:00
Qu Wenruo
f4c712e024 btrfs-progs: rename data parameter to profile in extent allocation path
In function btrfs_reserve_extent(), we call find_free_extent() passing
"u64 profile" into "int data".

This is definitely a width reduction, but when looking further into the
code, it's more serious than that, in fact the "int data" parameter is
not really to indicate whether it's data extent, but really a block
group profile (with block group type).

This is not only width reduction, but also confusing.

Thankfully so for we don't have any BLOCK_GROUP bits beyond 32 bits, so
the width reduction is not causing a big problem.

This patch will rename the "int data" parameter to a more proper one,
"u64 profile" in all involved call paths.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-20 18:59:24 +02:00
David Sterba
359710d4dd btrfs-progs: use btrfs_bg_type_to_nparity in get_dev_extent_len
Stripe calculation with hard coded parity, use the helper.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-20 18:59:23 +02:00
David Sterba
732d73dc1f btrfs-progs: remove btrfs_crc32c alias
There's an ancient macro btrfs_crc32c which is just wrapping crc32c and
not doing anything else, so we can use the crc helper directly.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-08 20:46:35 +02:00
Nikolay Borisov
97640a5b81 btrfs-progs: remove root argument from btrfs_truncate_item
This function lies in the kernel-shared directory and is supposed to be
close to 1:1 copy with its kernel counterpart, yet it takes one extra
argument - root. But this is now unused to simply remove it.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-08 20:46:34 +02:00
Josef Bacik
dd8e7477f7 btrfs-progs: remove data extents from the free space tree
Dave reported a failure of mkfs-test 009 with the free space tree
enabled by default.  This is because 009 pre-populates the file system
with a given directory, and for some reason our data allocation path
isn't the same as in the kernel.  Fix this by making sure when we
allocate a data extent we remove the space from the free space tree, and
with this our mkfs tests now pass.

Issue: #410
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-06 16:49:52 +02:00
David Sterba
7572839a74 btrfs-progs: add and use bit masks for RAID1 and RAID56 profiles
Many test conditions can be simplified in case they check all the
related profiles.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-06 16:36:18 +02:00
Josef Bacik
826e466028 btrfs-progs: add add_block_group_free_space helper
This exists in the kernel free-space-tree.c but not in progs.  We need
it to generate the free space items for new block groups, which is
needed when we start creating the free space tree in make_btrfs().

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-06 16:36:17 +02:00
David Sterba
b1f374dd1d btrfs-progs: switch %Lu to %llu format
The %Lu format is not standard and we use %llu everywhere else, so
switch the remaining cases.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-19 22:07:49 +02:00
David Sterba
9f6c055e38 btrfs-progs: dump-tree: add options to dump checksums
Add new options to dumps checksums in node headers and in the checksum
items:

  $ btrfs inspect dump-tree --csum-headers image
  root tree
  leaf 471515136 items 19 free space 12186 generation 15 owner ROOT_TREE
  leaf 471515136 flags 0x1(WRITTEN) backref revision 1 csum 0x756b2d54
  fs uuid df0348df-5773-47dd-81e9-a18221461239

For nodes/leaves it's appended on the 2nd line of the header.

Checksum items are stored in leaves as EXTENT_CSUM key type, with offset
value as the logical offset starting. As the array would be hard to
parse or match, each offset value is printed with the checksum. For
crc32c it's 4 values on a line, for xxhash it's 2 and for the long
256bit checksums it's one checksum per line.

  $ btrfs inspect dump-tree --csum-items image
  leaf 5423104 items 1 free space 30 generation 6 owner CSUM_TREE
  leaf 5423104 flags 0x1(WRITTEN) backref revision 1
  fs uuid bd7c981e-16ff-4081-a734-3ef5d50cafc1
  chunk uuid 13f4c76c-7845-4984-88ed-f01b52e05cf8
	  item 0 key (EXTENT_CSUM EXTENT_CSUM 22020096) itemoff 55 itemsize 16228
		  range start 22020096 end 38637568 length 16617472
		  [22020096] 0x8941f998 [22024192] 0x8941f998 [22028288] 0x8941f998 [22032384] 0x8941f998
		  [22036480] 0x8941f998 [22040576] 0x8941f998 [22044672] 0x8941f998 [22048768] 0x8941f998
		  ...

  $ btrfs inspect dump-tree --csum-items image
  leaf 5718016 items 1 free space 7746 generation 6 owner CSUM_TREE
  leaf 5718016 flags 0x1(WRITTEN) backref revision 1
  fs uuid f453a5b4-8b4a-4fbf-90a2-2925e4fe2335
  chunk uuid eb1da63b-248b-44c2-82da-71b2564bf50e
	  item 0 key (EXTENT_CSUM EXTENT_CSUM 52387840) itemoff 7771 itemsize 8512
		  range start 52387840 end 53477376 length 1089536
		  [52387840] 0x686ede9288c391e7e05026e56f2f91bfd879987a040ea98445dabc76f55b8e5f
		  [52391936] 0x686ede9288c391e7e05026e56f2f91bfd879987a040ea98445dabc76f55b8e5f
		  ...

The options are not on by default, the header checksum is not important
for the structures. Data checksums can be quite big so that would make
the dump long and without any actual data to match against.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-19 22:07:49 +02:00
Naohiro Aota
bfdb3ae237 btrfs-progs: zoned: reset zone of freed block group
When freeing a chunk, we can/should reset the underlying device zones
for the chunk. Introduce btrfs_reset_chunk_zones() and reset the zones.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-06 16:41:45 +02:00
Naohiro Aota
50ae9f62c7 btrfs-progs: zoned: implement sequential extent allocation
Implement a sequential extent allocator for zoned filesystems. This
allocator only needs to check if there is enough space in the block group
after the allocation pointer to satisfy the extent allocation request.

Since the allocator is really simple, we implement it directly in
find_search_start().

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-06 16:41:45 +02:00
Naohiro Aota
f08410f078 btrfs-progs: zoned: load zone's allocation offset
A zoned filesystem must allocate blocks at the zones' write pointer. The
device's write pointer position can be mapped to a logical address
within a block group. To facilitate this, add an "alloc_offset" to the
block group to track the logical addresses of the write pointer.

This logical address is populated in btrfs_load_block_group_zone_info()
from the write pointers of corresponding zones.

For now, zoned filesystems the single profile. Supporting non-single
profile with zone append writing is not trivial. For example, in the DUP
profile, we send a zone append writing IO to two zones on a device. The
device reply with written LBAs for the IOs. If the offsets of the
returned addresses from the beginning of the zone are different, then it
results in different logical addresses.

We need fine-grained logical to physical mapping to support such
separated physical address issue. Since it should require additional
metadata type, disable non-single profiles for now.

This commit supports the case all the zones in a block group are
sequential. The next patch will handle the case having a conventional
zone.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-06 16:41:45 +02:00
David Sterba
0144bcb713 btrfs-progs: move volumes.c to kernel-shared/
Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-31 17:01:06 +02:00
David Sterba
6069bc52a9 btrfs-progs: move transaction.c to kernel-shared/
Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-31 17:01:06 +02:00
David Sterba
abb670f883 btrfs-progs: move ctree.c to kernel-shared/
Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-31 17:01:05 +02:00
David Sterba
772f0da6df btrfs-progs: move disk-io.c to kernel-shared/
Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-31 17:01:05 +02:00
David Sterba
cf529f36ad btrfs-progs: move print-tree.c to kernel-shared/
Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-31 17:01:05 +02:00
David Sterba
7dd4abc3c5 btrfs-progs: move extent-tree.c to kernel-shared/
Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-31 17:01:04 +02:00