btrfs-progs

mirror of https://github.com/kdave/btrfs-progs synced 2025-04-07 09:41:29 +00:00

Author	SHA1	Message	Date
Qu Wenruo	2cdc8dddbf	btrfs-progs: mkfs: offset inode numbers of the source filesystem [BUG] When running mkfs tests on a newly rebooted minimal system, it can cause mkfs/009 to fail. The reproduce steps requires /tmp to has minimal files in the first place. # mkdir /tmp/rootdir # xfs_io -f -c "pwrite 0 16k" /tmp/rootdir # mkfs.btrfs --rootdir /tmp/rootdir -f $dev # btrfs check $dev Opening filesystem to check... Checking filesystem on /dev/test/scratch1 UUID: 6821b3db-f056-4c18-b797-32679dcd4272 [1/7] checking root items [2/7] checking extents data backref 13631488 root 5 owner 170 offset 0 num_refs 0 not found in extent tree incorrect local backref count on 13631488 root 5 owner 170 offset 0 found 1 wanted 0 back 0x55ff6cd72260 backref 13631488 root 5 not referenced back 0x55ff6cd4c1f0 incorrect global backref count on 13631488 found 2 wanted 1 backpointer mismatch on [13631488 16384] ERROR: errors found in extent allocation tree or chunk allocation [CAUSE] The extent tree has the following weird item: item 0 key (13631488 EXTENT_ITEM 16384) itemoff 16250 itemsize 33 refs 1 gen 0 flags DATA tree block backref root FS_TREE This is an extent item for data, thus it should not have an inline tree backref. Then checking the fs tree: item 0 key (170 INODE_ITEM 0) itemoff 16123 itemsize 160 generation 7 transid 0 size 16384 nbytes 16384 block group 0 mode 100600 links 1 uid 1000 gid 1000 rdev 0 sequence 0 flags 0x0(none) atime 1664866393.0 (2022-10-04 14:53:13) ctime 1664863510.0 (2022-10-04 14:05:10) mtime 1664863455.0 (2022-10-04 14:04:15) otime 0.0 (1970-01-01 08:00:00) There is an inode item before the root dir inode. And that inode number 170 is causing the problem. In traverse_directory(), we use the inode number reported from stat() directly as btrfs inode number, and pass it to btrfs_record_file_extent(), which finally calls btrfs_inc_extent_ref(), with above 170 passed as @owner parameter. But inside btrfs_inc_extent_ref() we use that @owner value to determine if it's a data backref. Since we got a smaller than BTRFS_FIRST_FREE_OBJECTID, btrfs treats it as tree block, and cause the above problem. [FIX] As a quick fix, always add BTRFS_FIRST_FREE_OBJECTID to all inode number directly grabbed from stat(). And add an ASSERT() in __btrfs_record_file_extent() to catch unexpected objectid. This is not a perfect solution, as the resulted fs will has a huge gap in its inodes: item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160 item 4 key (426 INODE_ITEM 0) itemoff 15883 itemsize 160 For a proper fix, we should allocate new btrfs inode numbers in a sequential order, but that would be another series of patches. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-10-11 09:08:10 +02:00
Qu Wenruo	dad9db45bb	btrfs-progs: properly initialize extent generation in __btrfs_record_file_extent() [BUG] When using mkfs.btrfs --rootdir option, the data extents generated will have 0 as their generation in extent tree: # mkdir /tmp/rootdir # xfs_io -f -c "pwrite 0 16k" /tmp/rootdir/foobar # mkfs.btrfs -f --rootdir /tmp/rootdir $dev # btrfs ins dump-tree -t extent $dev btrfs-progs v5.19.1 extent tree key (EXTENT_TREE ROOT_ITEM 0) leaf 30474240 items 13 free space 15536 generation 7 owner EXTENT_TREE leaf 30474240 flags 0x1(WRITTEN) backref revision 1 fs uuid c1f05988-49f9-4dd4-8489-b90d60f522ee chunk uuid 40f81603-fe75-4f58-aa9e-e74e28df8523 item 0 key (13631488 EXTENT_ITEM 16384) itemoff 16230 itemsize 53 refs 1 gen 0 flags DATA <<< Generation is 0 ... [CAUSE] In __btrfs_record_file_extent() we just set the extent generation to 0. [FIX] Use trans->transid to properly fill extent generation. Now after mkfs, the first data extent backref looks like this: item 0 key (13631488 EXTENT_ITEM 16384) itemoff 16230 itemsize 53 refs 1 gen 7 flags DATA ... Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-10-11 09:08:10 +02:00
David Sterba	feef6aaaf6	btrfs-progs: kernel-lib: remove radix-tree The radix-tree is not used in userspace code. In kernel it's for tracking unpersisted and in-memory structures and has been replaced by the xarray. Signed-off-by: David Sterba <dsterba@suse.com>	2022-10-11 09:08:07 +02:00
Qu Wenruo	2f2f6bfe17	btrfs-progs: btrfstune: add the ability to convert to block group tree feature The new '-b' option will be responsible for converting to block group tree compat ro feature. The workflow looks like this for new convert: - Setting CHANGING_BG_TREE flag And initialize fs_info->last_converted_bg_bytenr value to (u64)-1. Any bg with bytenr >= last_converted_bg_bytenr will have its bg item update go to the new root (bg tree). - Iterate each block group by their bytenr in descending order This involves: * Delete the old bg item from the old tree (extent tree) * Update last_converted_bg_bytenr to the bytenr of the bg * Add the new bg item into the new tree (bg tree) * If we have converted a bunch of bgs, commit current transaction - Clear CHANGING_BG_TREE flag And set the new BLOCK_GROUP_TREE compat ro flag and commit. And since we're doing the convert in multiple transactions, we also need to resume from last interrupted convert. In that case, we just grab the last unconverted bg, and start from it. And to co-operate with the new kernel requirement for both no-holes and free-space-tree features, the convert tool will check for free-space-tree feature. If not enabled, will error out with an error message to how to continue (by mounting with "-o space_cache=v2"). For missing no-holes feature, we just need to set the flag during convert. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-09-12 18:25:32 +02:00
Qu Wenruo	38f90e906e	btrfs-progs: properly initialize block group thresholds [BUG] When creating btrfs with new v2 cache (the default behavior), mkfs.btrfs always create the free space tree using bitmap. It's fine for small fs, but will be a disaster if the device is large and the data profile is something like RAID0: $ mkfs.btrfs -f -m raid1 -d raid0 /dev/test/scratch[1234] btrfs-progs v5.17 [...] Block group profiles: Data: RAID0 4.00GiB Metadata: RAID1 256.00MiB System: RAID1 8.00MiB [..] $ btrfs ins dump-tree -t free-space /dev/test/scratch1 btrfs-progs v5.17 free space tree key (FREE_SPACE_TREE ROOT_ITEM 0) node 30441472 level 1 items 10 free space 483 generation 6 owner FREE_SPACE_TREE node 30441472 flags 0x1(WRITTEN) backref revision 1 fs uuid deddccae-afd0-4160-9a12-48fe7b526fb1 chunk uuid 68f6cf98-afe3-4f47-9797-37fd9c610219 key (1048576 FREE_SPACE_INFO 4194304) block 30457856 gen 6 key (475004928 FREE_SPACE_BITMAP 8388608) block 30703616 gen 5 key (953155584 FREE_SPACE_BITMAP 8388608) block 30720000 gen 5 key (1431306240 FREE_SPACE_BITMAP 8388608) block 30736384 gen 5 key (1909456896 FREE_SPACE_BITMAP 8388608) block 30752768 gen 5 key (2387607552 FREE_SPACE_BITMAP 8388608) block 30769152 gen 5 key (2865758208 FREE_SPACE_BITMAP 8388608) block 30785536 gen 5 key (3343908864 FREE_SPACE_BITMAP 8388608) block 30801920 gen 5 key (3822059520 FREE_SPACE_BITMAP 8388608) block 30818304 gen 5 key (4300210176 FREE_SPACE_BITMAP 8388608) block 30834688 gen 5 [...] ^^^ So many bitmaps that an empty fs will have two levels for free space tree already [CAUSE] Member btrfs_block_group::bitmap_high_thresh is never properly set to any value other than 0, thus in function update_free_space_extent_count(), the following check is always true: if (!(flags & BTRFS_FREE_SPACE_USING_BITMAPS) && extent_count > block_group->bitmap_high_thresh) { ret = convert_free_space_to_bitmaps(trans, block_group, path); Thus we always got converted to bitmaps. [FIX] Cross-port the function set_free_space_tree_thresholds() from kernel, and call that function in btrfs_make_block_group() and read_one_block_group() so that every block group has bitmap_high_thresh properly set. Now even for that 4GiB large data chunk, we still only have one free extent: btrfs-progs v5.17 free space tree key (FREE_SPACE_TREE ROOT_ITEM 0) leaf 30572544 items 15 free space 15860 generation 6 owner FREE_SPACE_TREE leaf 30572544 flags 0x1(WRITTEN) backref revision 1 fs uuid b24e52ea-6580-4a88-aa70-cb173090bfe3 chunk uuid d85f3905-fc61-4084-b335-2b6b97814b8e [...] item 13 key (298844160 FREE_SPACE_INFO 4294967296) itemoff 16235 itemsize 8 free space info extent count 1 flags 0 item 14 key (298844160 FREE_SPACE_EXTENT 4294967296) itemoff 16235 itemsize 0 free space extent Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-05-20 15:54:20 +02:00
Josef Bacik	e33738306c	btrfs-progs: handle the per-block group global root id We will now be using block_group->chunk_objectid to point at the global root id for this particular block group. For now we'll assign this based on mod'ing the offset of the block group against the number of global root id's and handle the block_group_item updating appropriately. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-03-09 18:07:17 +01:00
Josef Bacik	9ee6cc78a8	btrfs-progs: add support for loading the block group root This adds the ability to load the block group root, as well as make sure the various backup super block and super block updates are made appropriately. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-03-09 18:06:51 +01:00
Josef Bacik	5dc3964aaa	btrfs-progs: remove the _nr from the item helpers Now that all callers are using the _nr variations we can simply rename these helpers to btrfs_item_##member/btrfs_set_item_##member and change the actual item SETGET funcs to raw_item_##member/set_raw_item_##member and then change all callers to drop the _nr part. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-03-09 15:13:13 +01:00
Josef Bacik	db2ab47823	btrfs-progs: stop accessing ->extent_root directly When we switch to multiple global trees we'll need to access the appropriate extent root depending on the block group or possibly root. To handle this, use a helper in most places and then the actual root in places where it is required. We will whittle down the direct accessors with future patches, but this does the bulk of the preparatory work. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-11-30 18:56:54 +01:00
Josef Bacik	550fd48136	btrfs-progs: move btrfs_fix_block_accounting to repair.c We have this helper sitting in extent-tree.c, but it's a repair function. I'm going to need to make changes to this for extent-tree-v2 and would rather this live outside of the code we need to share with the kernel. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-11-22 21:45:37 +01:00
Josef Bacik	7119dc3d79	btrfs-progs: simplify btrfs_make_block_group This is doing the same work as insert_block_group_item, rework it to call the helper instead. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-11-22 21:45:37 +01:00
Qu Wenruo	f4c712e024	btrfs-progs: rename data parameter to profile in extent allocation path In function btrfs_reserve_extent(), we call find_free_extent() passing "u64 profile" into "int data". This is definitely a width reduction, but when looking further into the code, it's more serious than that, in fact the "int data" parameter is not really to indicate whether it's data extent, but really a block group profile (with block group type). This is not only width reduction, but also confusing. Thankfully so for we don't have any BLOCK_GROUP bits beyond 32 bits, so the width reduction is not causing a big problem. This patch will rename the "int data" parameter to a more proper one, "u64 profile" in all involved call paths. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:24 +02:00
David Sterba	359710d4dd	btrfs-progs: use btrfs_bg_type_to_nparity in get_dev_extent_len Stripe calculation with hard coded parity, use the helper. Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:23 +02:00
David Sterba	732d73dc1f	btrfs-progs: remove btrfs_crc32c alias There's an ancient macro btrfs_crc32c which is just wrapping crc32c and not doing anything else, so we can use the crc helper directly. Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-08 20:46:35 +02:00
Nikolay Borisov	97640a5b81	btrfs-progs: remove root argument from btrfs_truncate_item This function lies in the kernel-shared directory and is supposed to be close to 1:1 copy with its kernel counterpart, yet it takes one extra argument - root. But this is now unused to simply remove it. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-08 20:46:34 +02:00
Josef Bacik	dd8e7477f7	btrfs-progs: remove data extents from the free space tree Dave reported a failure of mkfs-test 009 with the free space tree enabled by default. This is because 009 pre-populates the file system with a given directory, and for some reason our data allocation path isn't the same as in the kernel. Fix this by making sure when we allocate a data extent we remove the space from the free space tree, and with this our mkfs tests now pass. Issue: #410 Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-06 16:49:52 +02:00
David Sterba	7572839a74	btrfs-progs: add and use bit masks for RAID1 and RAID56 profiles Many test conditions can be simplified in case they check all the related profiles. Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-06 16:36:18 +02:00
Josef Bacik	826e466028	btrfs-progs: add add_block_group_free_space helper This exists in the kernel free-space-tree.c but not in progs. We need it to generate the free space items for new block groups, which is needed when we start creating the free space tree in make_btrfs(). Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-06 16:36:17 +02:00
David Sterba	b1f374dd1d	btrfs-progs: switch %Lu to %llu format The %Lu format is not standard and we use %llu everywhere else, so switch the remaining cases. Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-19 22:07:49 +02:00
David Sterba	9f6c055e38	btrfs-progs: dump-tree: add options to dump checksums Add new options to dumps checksums in node headers and in the checksum items: $ btrfs inspect dump-tree --csum-headers image root tree leaf 471515136 items 19 free space 12186 generation 15 owner ROOT_TREE leaf 471515136 flags 0x1(WRITTEN) backref revision 1 csum 0x756b2d54 fs uuid df0348df-5773-47dd-81e9-a18221461239 For nodes/leaves it's appended on the 2nd line of the header. Checksum items are stored in leaves as EXTENT_CSUM key type, with offset value as the logical offset starting. As the array would be hard to parse or match, each offset value is printed with the checksum. For crc32c it's 4 values on a line, for xxhash it's 2 and for the long 256bit checksums it's one checksum per line. $ btrfs inspect dump-tree --csum-items image leaf 5423104 items 1 free space 30 generation 6 owner CSUM_TREE leaf 5423104 flags 0x1(WRITTEN) backref revision 1 fs uuid bd7c981e-16ff-4081-a734-3ef5d50cafc1 chunk uuid 13f4c76c-7845-4984-88ed-f01b52e05cf8 item 0 key (EXTENT_CSUM EXTENT_CSUM 22020096) itemoff 55 itemsize 16228 range start 22020096 end 38637568 length 16617472 [22020096] 0x8941f998 [22024192] 0x8941f998 [22028288] 0x8941f998 [22032384] 0x8941f998 [22036480] 0x8941f998 [22040576] 0x8941f998 [22044672] 0x8941f998 [22048768] 0x8941f998 ... $ btrfs inspect dump-tree --csum-items image leaf 5718016 items 1 free space 7746 generation 6 owner CSUM_TREE leaf 5718016 flags 0x1(WRITTEN) backref revision 1 fs uuid f453a5b4-8b4a-4fbf-90a2-2925e4fe2335 chunk uuid eb1da63b-248b-44c2-82da-71b2564bf50e item 0 key (EXTENT_CSUM EXTENT_CSUM 52387840) itemoff 7771 itemsize 8512 range start 52387840 end 53477376 length 1089536 [52387840] 0x686ede9288c391e7e05026e56f2f91bfd879987a040ea98445dabc76f55b8e5f [52391936] 0x686ede9288c391e7e05026e56f2f91bfd879987a040ea98445dabc76f55b8e5f ... The options are not on by default, the header checksum is not important for the structures. Data checksums can be quite big so that would make the dump long and without any actual data to match against. Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-19 22:07:49 +02:00
Naohiro Aota	bfdb3ae237	btrfs-progs: zoned: reset zone of freed block group When freeing a chunk, we can/should reset the underlying device zones for the chunk. Introduce btrfs_reset_chunk_zones() and reset the zones. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	50ae9f62c7	btrfs-progs: zoned: implement sequential extent allocation Implement a sequential extent allocator for zoned filesystems. This allocator only needs to check if there is enough space in the block group after the allocation pointer to satisfy the extent allocation request. Since the allocator is really simple, we implement it directly in find_search_start(). Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	f08410f078	btrfs-progs: zoned: load zone's allocation offset A zoned filesystem must allocate blocks at the zones' write pointer. The device's write pointer position can be mapped to a logical address within a block group. To facilitate this, add an "alloc_offset" to the block group to track the logical addresses of the write pointer. This logical address is populated in btrfs_load_block_group_zone_info() from the write pointers of corresponding zones. For now, zoned filesystems the single profile. Supporting non-single profile with zone append writing is not trivial. For example, in the DUP profile, we send a zone append writing IO to two zones on a device. The device reply with written LBAs for the IOs. If the offsets of the returned addresses from the beginning of the zone are different, then it results in different logical addresses. We need fine-grained logical to physical mapping to support such separated physical address issue. Since it should require additional metadata type, disable non-single profiles for now. This commit supports the case all the zones in a block group are sequential. The next patch will handle the case having a conventional zone. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
David Sterba	0144bcb713	btrfs-progs: move volumes.c to kernel-shared/ Signed-off-by: David Sterba <dsterba@suse.com>	2020-08-31 17:01:06 +02:00
David Sterba	6069bc52a9	btrfs-progs: move transaction.c to kernel-shared/ Signed-off-by: David Sterba <dsterba@suse.com>	2020-08-31 17:01:06 +02:00
David Sterba	abb670f883	btrfs-progs: move ctree.c to kernel-shared/ Signed-off-by: David Sterba <dsterba@suse.com>	2020-08-31 17:01:05 +02:00
David Sterba	772f0da6df	btrfs-progs: move disk-io.c to kernel-shared/ Signed-off-by: David Sterba <dsterba@suse.com>	2020-08-31 17:01:05 +02:00
David Sterba	cf529f36ad	btrfs-progs: move print-tree.c to kernel-shared/ Signed-off-by: David Sterba <dsterba@suse.com>	2020-08-31 17:01:05 +02:00
David Sterba	7dd4abc3c5	btrfs-progs: move extent-tree.c to kernel-shared/ Signed-off-by: David Sterba <dsterba@suse.com>	2020-08-31 17:01:04 +02:00

29 Commits