btrfs-progs

mirror of https://github.com/kdave/btrfs-progs synced 2025-05-03 08:27:56 +00:00

Author	SHA1	Message	Date
Josef Bacik	7119dc3d79	btrfs-progs: simplify btrfs_make_block_group This is doing the same work as insert_block_group_item, rework it to call the helper instead. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-11-22 21:45:37 +01:00
Qu Wenruo	c4ff87c3d1	btrfs-progs: cache csum_size and csum_type in btrfs_fs_info Just like kernel commit 22b6331d9617 ("btrfs: store precalculated csum_size in fs_info"), we can cache csum_size and csum_type in btrfs_fs_info. Furthermore, there is already a 32 bits hole in btrfs_fs_info, and we can fit csum_type and csum_size into the hole without increase the size of btrfs_fs_info. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-11-05 12:50:03 +01:00
Qu Wenruo	636b2e6027	btrfs-progs: remove temporary buffer for super block There are a lot of call sites where we use the following code snippet: u8 super_block_data[BTRFS_SUPER_INFO_SIZE]; struct btrfs_super_block sb; u64 ret; sb = (struct btrfs_super_block )super_block_data; The reason for this is, structure btrfs_super_block was smaller than BTRFS_SUPER_INFO_SIZE. Thus for anything with csum involved, we have to use a proper 4K buffer. Since the recent unification of sizeof(struct btrfs_super_block), we no longer need such workaround, and can use struct btrfs_super_block directly to do any operation. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-11-05 12:50:03 +01:00
Qu Wenruo	76f1a2ed57	btrfs-progs: unify size of btrfs_super_block and BTRFS_SUPER_INFO_SIZE Just like kernel change, pad struct btrfs_super_block to 4096 bytes. As ctree.h is part of public headers, use raw number for the superblock offset. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-11-05 12:50:03 +01:00
Wang Yugui	e9a7efb1b6	btrfs-progs: fix XX_flags_to_str() to always end with '\0' [BUG] We noticed 'btrfs check' outputs something like leaf 30408704 flags 0x0(P1逅?) backref revision 1 but we expected: leaf 30408704 flags 0x0() backref revision 1 [CAUSE] Some XX_flags_to_str() failed to make sure the result string always ends with '\0' in some case. [FIX] Reset the buffer at the beginnig. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Wang Yugui (wangyugui@e16-tech.com) Signed-off-by: David Sterba <dsterba@suse.com>	2021-11-05 12:50:03 +01:00
David Sterba	4882a4f5fa	btrfs-porgs: add exception for upper case single profile name For consistency with older versions switch the case of 'single' to be lower case again even if it's inconsistent. This could be revisited in the future. Signed-off-by: David Sterba <dsterba@suse.com>	2021-11-05 12:50:03 +01:00
Qu Wenruo	b1c944657c	btrfs-progs: make "btrfs filesystem df" command show upper case profile [BUG] Since commit `dad03fac3b` ("btrfs-progs: switch btrfs_group_profile_str to use raid table"), fstests/btrfs/023 and btrfs/151 will always fail. The failure of btrfs/151 explains the reason pretty well: btrfs/151 1s ... - output mismatch --- tests/btrfs/151.out 2019-10-22 15:18:14.068965341 +0800 +++ ~/xfstests-dev/results//btrfs/151.out.bad 2021-11-02 17:13:43.879999994 +0800 @@ -1,2 +1,2 @@ QA output created by 151 -Data, RAID1 +Data, raid1 ... (Run 'diff -u ~/xfstests-dev/tests/btrfs/151.out ~/xfstests-dev/results//btrfs/151.out.bad' to see the entire diff) [CAUSE] Commit `dad03fac3b` ("btrfs-progs: switch btrfs_group_profile_str to use raid table") will use btrfs_raid_array[index].raid_name, which is all lower case. [FIX] There is no need to bring such output format change. So here we split the btrfs_raid_attr::raid_name[] into upper_name[] and lower_name[], and make upper and lower case helpers for callers to use. Now there are several types of callers referring to lower_name and upper_name: - parse_bg_profile() It uses strcasecmp(), either case would be fine. - btrfs_group_profile_str() Originally it uses upper case for all profiles except "single". Now unified to upper case. - sprint_profiles() It uses lower case. - bg_flags_to_str() It uses upper case. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-11-05 12:50:03 +01:00
Qu Wenruo	5bee5c99bf	btrfs-progs: fix printf formats on 32bit x86 When compiling btrfs-progs on 32bit x86 using GCC 11.1.0, there are several warnings: In file included from ./common/utils.h:30, from check/main.c:36: check/main.c: In function 'run_next_block': ./common/messages.h:42:31: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'u32' {aka 'unsigned int'} [-Wformat=] 42 \| __btrfs_error((fmt), ##__VA_ARGS__); \ \| ^~~~~ check/main.c:6496:33: note: in expansion of macro 'error' 6496 \| error( \| ^~~~~ In file included from ./common/utils.h:30, from kernel-shared/volumes.c:32: kernel-shared/volumes.c: In function 'btrfs_check_chunk_valid': ./common/messages.h:42:31: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'u32' {aka 'unsigned int'} [-Wformat=] 42 \| __btrfs_error((fmt), ##__VA_ARGS__); \ \| ^~~~~ kernel-shared/volumes.c:2052:17: note: in expansion of macro 'error' 2052 \| error("invalid chunk item size, have %u expect [%zu, %lu)", \| ^~~~~ image/main.c: In function 'search_for_chunk_blocks': ./common/messages.h:42:31: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Wformat=] 42 \| __btrfs_error((fmt), ##__VA_ARGS__); \ \| ^~~~~ image/main.c:2122:33: note: in expansion of macro 'error' 2122 \| error( \| ^~~~~ There are two types of problems: - __BTRFS_LEAF_DATA_SIZE() This macro has no type definition, making it behaves differently on different arches. Fix this by following kernel to use inline function to make its return value fixed to u32. - size_t related output For x86_64 %lu is OK but not for x86. Fix this by using %zu. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-11-05 12:50:03 +01:00
Wang Yugui	b1d8f945c9	btrfs-progs: mask out all unwanted profiles in btrfs_group_profile_str Commit ("btrfs-progs: switch btrfs_group_profile_str to use raid table") introduced a regression that raid profile of GlobalReserve will be printed as 'unknown'. $ btrfs filesystem df /mnt/test Data, single: total=5.02TiB, used=4.98TiB System, single: total=4.00MiB, used=624.00KiB Metadata, single: total=11.01GiB, used=6.94GiB GlobalReserve, unknown: total=512.00MiB, used=0.00B Fix it by: - take BTRFS_BLOCK_GROUP_RESERVED into account when masking the block group flags - update the define of BTRFS_BLOCK_GROUP_RESERVED too so it's same as in kernel Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Wang Yugui <wangyugui@e16-tech.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-11-05 12:50:03 +01:00
Qu Wenruo	8f81113021	btrfs-progs: check: fix a lowmem mode crash where fatal error is not properly handled [BUG] When a special image (diverted from fsck/012) has its unused slots (slot number >= nritems) with garbage, lowmem mode btrfs check can crash: (gdb) run check --mode=lowmem ~/downloads/good.img.restored Starting program: /home/adam/btrfs/btrfs-progs/btrfs check --mode=lowmem ~/downloads/good.img.restored ... ERROR: root 5 INODE[5044031582654955520] nlink(257228800) not equal to inode_refs(0) ERROR: root 5 INODE[5044031582654955520] nbytes 474624 not equal to extent_size 0 Program received signal SIGSEGV, Segmentation fault. 0x0000555555639b11 in btrfs_inode_size (eb=0x5555558a7540, s=0x642e6cd1) at ./kernel-shared/ctree.h:1703 1703 BTRFS_SETGET_FUNCS(inode_size, struct btrfs_inode_item, size, 64); (gdb) bt #0 0x0000555555639b11 in btrfs_inode_size (eb=0x5555558a7540, s=0x642e6cd1) at ./kernel-shared/ctree.h:1703 #1 0x0000555555641544 in check_inode_item (root=0x5555556c2290, path=0x7fffffffd960) at check/mode-lowmem.c:2628 [CAUSE] At check_inode_item() we have path->slot[0] at 29, while the tree block only has 26 items. This happens because two reasons: - btrfs_next_item() never reverts its slots Even if we failed to read next leaf. - check_inode_item() doesn't inform the caller that a fatal error happened In check_inode_item(), if btrfs_next_item() failed, it goes to out label, which doesn't really set @err properly. This means, when check_inode_item() fails at btrfs_next_item(), it will increase path->slots[0], while it's already beyond current tree block nritems. When the slot increases furthermore, and if the unused item slots have some garbage, we will get invalid btrfs_item_ptr() result, and causing above segfault. [FIX] Fix the problems by two ways: - Make btrfs_next_item() to revert its path->slots[0] on failure - Properly detect fatal error from check_inode_item() By this, we will no longer crash on the crafted image. Reported-by: Wang Yugui <wangyugui@e16-tech.com> Issue: #412 Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-11-04 20:56:42 +01:00
Qu Wenruo	eacdd1606c	btrfs-progs: print-tree: fix chunk/block group flags output [BUG] Commit ("btrfs-progs: use raid table for profile names in print-tree.c") introduced one bug in block group and chunk flags output and changed the behavior: item 1 key (FIRST_CHUNK_TREE CHUNK_ITEM 13631488) itemoff 16105 itemsize 80 length 8388608 owner 2 stripe_len 65536 type SINGLE ... item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 22020096) itemoff 15993 itemsize 112 length 8388608 owner 2 stripe_len 65536 type DUP ... item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 30408704) itemoff 15881 itemsize 112 length 268435456 owner 2 stripe_len 65536 type DUP ... Note that, the flag string only contains the profile (SINGLE/DUP/etc...) no type (DATA/METADATA/SYSTEM). And we have new "SINGLE" string, even that profile has no extra bit to indicate that. [CAUSE] The "SINGLE" part is caused by the raid array which has a name for SINGLE profile, even it doesn't have the corresponding bit. The missing type string is caused by a code bug: strcpy(buf, name); while (tmp) { tmp = toupper(tmp); tmp++; } strcpy(ret, buf); The last strcpy() call overrides the existing string in @ret. [FIX] - Enhance string handling using strn()/snprintf() - Add extra "UKNOWN.0x%llx" output for unknown profiles - Call proper strncat() to merge type and profile - Add extra handling for "SINGLE" to keep the old output Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:24 +02:00
Qu Wenruo	f4c712e024	btrfs-progs: rename data parameter to profile in extent allocation path In function btrfs_reserve_extent(), we call find_free_extent() passing "u64 profile" into "int data". This is definitely a width reduction, but when looking further into the code, it's more serious than that, in fact the "int data" parameter is not really to indicate whether it's data extent, but really a block group profile (with block group type). This is not only width reduction, but also confusing. Thankfully so for we don't have any BLOCK_GROUP bits beyond 32 bits, so the width reduction is not causing a big problem. This patch will rename the "int data" parameter to a more proper one, "u64 profile" in all involved call paths. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:24 +02:00
David Sterba	bc28dc6bea	btrfs-progs: introduce helper for striped profiles There are several profiles like raid0, raid10, raid5 and raid6 that can span as many devices as possible and need special handling for the stripe calculations. Provide a helper to identify the profiles in a simple way. Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:24 +02:00
David Sterba	a25a5cc2c0	btrfs-progs: use btrfs_bg_type_to_nparity in btrfs_stripe_length Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:24 +02:00
David Sterba	11fcdbc35e	btrfs-progs: introduce helper to get allowed profiles for a given device number Use the raid table helper to avoid hard coding profiles for the given number of devices in test_num_disk_vs_raid. Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:24 +02:00
David Sterba	cb0b63cd90	btrfs-progs: use raid table value for sub_stripes in btrfs_check_chunk_valid Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:24 +02:00
David Sterba	4f662d74fd	btrfs-progs: export raid table helper for sub_stripes Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:24 +02:00
David Sterba	833ce53872	btrfs-progs: use btrfs_bg_type_to_nparity in calc_stripe_length Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:24 +02:00
David Sterba	e3355b43b4	btrfs-progs: use raid table for min devs in btrfs_check_chunk_valid Replace the hard coded values with the raid table reference. Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:24 +02:00
David Sterba	3d05b20435	btrfs-progs: use btrfs_bg_type_to_nparity in chunk_bytes_by_type Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:23 +02:00
David Sterba	359710d4dd	btrfs-progs: use btrfs_bg_type_to_nparity in get_dev_extent_len Stripe calculation with hard coded parity, use the helper. Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:23 +02:00
David Sterba	f759e9bbad	btrfs-progs: introduce a public helper for raid parity There's a private helper for parity and there are many open coded calculations of parity for the RAID56 profiles. The helper will be used to remove that and use the raid table values. Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:23 +02:00
David Sterba	447bf2fb37	btrfs-progs: zoned: factor out supported profiles to a helper The enumeration could get out of date, like fixed in previous commit. Create a helper that will hide the implementation details. Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:23 +02:00
David Sterba	15eb03ca1d	btrfs-progs: use raid table for profile names in print-tree.c Pick the names from the raid table and do the uppercase conversion. Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:23 +02:00
David Sterba	80714610f3	btrfs-progs: use raid table for ncopies There's opencoded value of raid table ncopies in print_filesystem_usage_overall, add a helper and use it. Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:23 +02:00
David Sterba	b29f1603b0	btrfs-progs: use raid table for devs_min and replace local helper Another duplication of the raid table, in this case missing the changes to raid10 and raid0 minimum devices changed in `a177ef7dd4` ("btrfs-progs: mkfs: allow degenerate raid0/raid10"). Define and use a helper using the table value. Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:23 +02:00
Naohiro Aota	e9696b06f0	btrfs-progs: use direct-io for zoned device We need to use direct-IO for zoned devices to preserve the write ordering. Instead of detecting if the device is zoned or not, we simply use direct-IO for any kind of device (even if emulated zoned mode on a regular device). Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:23 +02:00
Naohiro Aota	4a8d85f730	btrfs-progs: temporarily set zoned flag for initial tree reading Functions to read data/metadata e.g. read_extent_from_disk() now depend on the fs_info->zoned flag to determine if they do direct-IO or not. The flag (and zone_size) is not known before reading the chunk tree and it set to 0 while in the initial chunk tree setup process. That will cause btrfs_pread() to fail because it does not align the buffer. Use fcntl() to find out the file descriptor is opened with O_DIRECT or not, and if it is, set the zoned flag to 1 temporally for this initial process. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:23 +02:00
Naohiro Aota	ae0dfb246d	btrfs-progs: introduce btrfs_pread wrapper for pread Wrap pread with btrfs_pread as well. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:23 +02:00
Naohiro Aota	c821e5545f	btrfs-progs: introduce btrfs_pwrite wrapper for pwrite Wrap pwrite with btrfs_pwrite(). It simply calls pwrite() on non-zoned btrfs (opened without O_DIRECT). On zoned mode (opened with O_DIRECT), it allocates an aligned bounce buffer, copies the contents and uses it for direct-IO writing. Writes in device_zero_blocks() and btrfs_wipe_existing_sb() are a little tricky. We don't have fs_info on our hands, so use zinfo to determine it is a zoned device or not. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:23 +02:00
Naohiro Aota	40ab7530df	btrfs-progs: set eb::fs_info properly everywhere Several extent_buffer initializations miss fs_info initialization. This is OK before the following patch ("btrfs-progs: use direct-io for zoned device") as eb->fs_info is not always necessary. But, after that patch, we will use fs_info to determine it is zoned or not and that causes segfault in such cases. Properly set fs_info when initializing extent_buffers to fix the issue. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-08 20:47:04 +02:00
David Sterba	8bb13015bd	btrfs-progs: don't include btrfs-list.h unless necessary We don't need to include this besides btrfs-list.c itself and subvolume.c that does use the btrfs_list_* API. Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-08 20:47:03 +02:00
Naohiro Aota	585ac14d1a	btrfs-progs: use btrfs_device_size() instead of device_get_partition_size_fd() device_get_partition_size_fd() fails if we pass a regular file. This can happen when trying to create an emulated zoned filesystem on a regular file. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-08 20:46:35 +02:00
David Sterba	785218efb1	btrfs-progs: remove direct calls to crc32c from ctree.h Make the helpers using crc32c not inline so the crc32c.h can be removed from the public headers exported by libbtrfs. Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-08 20:46:35 +02:00
David Sterba	732d73dc1f	btrfs-progs: remove btrfs_crc32c alias There's an ancient macro btrfs_crc32c which is just wrapping crc32c and not doing anything else, so we can use the crc helper directly. Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-08 20:46:35 +02:00
David Sterba	979bda6fb5	btrfs-progs: libbtrfs: replace SZ_ constants and drop sizes.h To drop sizes.h from exported headers, replace the few SZ_ constants from the existing exported headers (ctree.h, send.h). It would be nice to use them in the long run but right now it would prevent unexporting the sizes.h file. Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-08 20:46:35 +02:00
David Sterba	38356d456b	btrfs-progs: libbtrfs: drop radix-tree.h from exported headers The header is only included from ctree.h but not actually used, we can drop it from the exported files. Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-08 20:46:35 +02:00
Nikolay Borisov	39c6e0b79c	btrfs-progs: add btrfs_uuid_tree_remove It will be used to clear received data on RW snapshots that were received. The function is copied from kernel sources. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-08 20:46:34 +02:00
Nikolay Borisov	97640a5b81	btrfs-progs: remove root argument from btrfs_truncate_item This function lies in the kernel-shared directory and is supposed to be close to 1:1 copy with its kernel counterpart, yet it takes one extra argument - root. But this is now unused to simply remove it. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-08 20:46:34 +02:00
Nikolay Borisov	c3584b4fc0	btrfs-progs: remove fs_info argument from leaf_data_end The function already takes an extent_buffer which has a reference to the owning filesystem's fs_info. This also brings the function in line with the kernel's signature. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-08 20:46:34 +02:00
Nikolay Borisov	7c58b09548	btrfs-progs: remove root argument from btrfs_fixup_low_keys It's not used, so just remove it. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-08 20:46:34 +02:00
David Sterba	d0ea2b2af4	btrfs-progs: zoned: also exclude raid1c3 and raid1c4 from supported profiles The enumeration of profiles not available for zoned mode in btrfs_load_block_group_zone_info was lacking the 3 and 4 copy raid1, add them. Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-07 18:39:58 +02:00
David Sterba	27207d651a	btrfs-progs: dump-tree: print complete root_item The output of root_item in the 'inspect dump-tree' command lacks some items and some of them are printed conditionally. As the dump utility is for debugging, it's better to print all the items, with names matching the structure members and order. Some values will inevitably be all zeros like uuids or various timestamps, but that's a minor issue and affecting only a few trees. Example: item 0 key (EXTENT_TREE ROOT_ITEM 0) itemoff 15844 itemsize 439 generation 5 root_dirid 0 bytenr 30523392 byte_limit 0 bytes_used 16384 last_snapshot 0 flags 0x0(none) refs 1 drop_progress key (0 UNKNOWN.0 0) drop_level 0 level 0 generation_v2 5 uuid 00000000-0000-0000-0000-000000000000 parent_uuid 00000000-0000-0000-0000-000000000000 received_uuid 00000000-0000-0000-0000-000000000000 ctransid 0 otransid 0 stransid 0 rtransid 0 ctime 0.0 (1970-01-01 01:00:00) otime 0.0 (1970-01-01 01:00:00) stime 0.0 (1970-01-01 01:00:00) rtime 0.0 (1970-01-01 01:00:00) item 3 key (FS_TREE ROOT_ITEM 0) itemoff 14949 itemsize 439 generation 4 root_dirid 256 bytenr 30408704 byte_limit 0 bytes_used 16384 last_snapshot 0 flags 0x0(none) refs 1 drop_progress key (0 UNKNOWN.0 0) drop_level 0 level 0 generation_v2 4 uuid ec4669b6-6d21-46ab-857e-d60cafde45b3 parent_uuid 00000000-0000-0000-0000-000000000000 received_uuid 00000000-0000-0000-0000-000000000000 ctransid 0 otransid 0 stransid 0 rtransid 0 ctime 1633021823.0 (2021-09-30 19:10:23) otime 1633021823.0 (2021-09-30 19:10:23) stime 0.0 (1970-01-01 01:00:00) rtime 0.0 (1970-01-01 01:00:00) Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-07 18:39:44 +02:00
Josef Bacik	dd8e7477f7	btrfs-progs: remove data extents from the free space tree Dave reported a failure of mkfs-test 009 with the free space tree enabled by default. This is because 009 pre-populates the file system with a given directory, and for some reason our data allocation path isn't the same as in the kernel. Fix this by making sure when we allocate a data extent we remove the space from the free space tree, and with this our mkfs tests now pass. Issue: #410 Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-06 16:49:52 +02:00
Naohiro Aota	85e102f212	btrfs-progs: properly format btrfs_header in btrfs_create_root() Enabling quota in zoned mored hits the following assertion: $ mkfs.btrfs -f -d single -m single -R quota /dev/nullb0 btrfs-progs v5.11 See http://btrfs.wiki.kernel.org for more information. Zoned: /dev/nullb0: host-managed device detected, setting zoned feature Resetting device zones /dev/nullb0 (1600 zones) ... bad tree block 25395200, bytenr mismatch, want=25395200, have=0 kernel-shared/disk-io.c:549: write_tree_block: BUG_ON `1` triggered, value 1 ./mkfs.btrfs(+0x26aaa)[0x564d1a7ccaaa] ./mkfs.btrfs(write_tree_block+0xb8)[0x564d1a7cee29] ./mkfs.btrfs(__commit_transaction+0x91)[0x564d1a7e3740] ./mkfs.btrfs(btrfs_commit_transaction+0x135)[0x564d1a7e39aa] ./mkfs.btrfs(main+0x1fe9)[0x564d1a7b442a] /lib64/libc.so.6(__libc_start_main+0xcd)[0x7f36377d37fd] ./mkfs.btrfs(_start+0x2a)[0x564d1a7b1fda] zsh: IOT instruction sudo ./mkfs.btrfs -f -d single -m single -R quota /dev/nullb0 The issue occurs because btrfs_create_root() is not formatting the root node properly. This is fine in regular mode, because it's fortunately reusing an once freed buffer. As the previous tree node allocation kindly formatted the header, it will see the proper bytenr and pass the checks. However, we never reuse a once freed buffer on zoned filesystem. As a result, we have zero-filled bytenr, FSID, and chunk-tree UUID, hitting the asserts in check_tree_block(). Reported-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-06 16:49:11 +02:00
Johannes Thumshirn	c22e9487a7	btrfs-progs: remove max_zone_append_size logic max_zone_append_size is unused and can as well be removed just like we did on the kernel side. Keep one sanity check though, so we're not adding devices to a zoned FS that aren't supporting zone append. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-06 16:49:07 +02:00
Naohiro Aota	53ec59ead0	btrfs-progs: do not zone reset on emulated zoned mode We cannot zone reset a regular file with emulated zones. So, mkfs.btrfs on such a file causes the following error. ERROR: zoned: failed to reset device '/home/naota/tmp/btrfs.img' zones: Inappropriate ioctl for device Introduce btrfs_zoned_device_info->emulated to distinguish the zones are emulated or not. And, use it to decide it needs zone reset or not. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-06 16:48:56 +02:00
Qu Wenruo	60651ad9da	btrfs-progs: introduce OPEN_CTREE_ALLOW_TRANSID_MISMATCH flag [BUG] There is a report that, btrfstune can even work while the fs has transid mismatch problems. $ btrfstune -f -u /dev/sdb1 Current fsid: b2b5ae8d-4c49-45f0-b42e-46fe7dcfcb07 New fsid: b2b5ae8d-4c49-45f0-b42e-46fe7dcfcb07 Set superblock flag CHANGING_FSID Change fsid in extents parent transid verify failed on 792854528 wanted 20103 found 20091 parent transid verify failed on 792854528 wanted 20103 found 20091 parent transid verify failed on 792854528 wanted 20103 found 20091 Ignoring transid failure parent transid verify failed on 792870912 wanted 20103 found 20091 parent transid verify failed on 792870912 wanted 20103 found 20091 parent transid verify failed on 792870912 wanted 20103 found 20091 Ignoring transid failure parent transid verify failed on 792887296 wanted 20103 found 20091 parent transid verify failed on 792887296 wanted 20103 found 20091 parent transid verify failed on 792887296 wanted 20103 found 20091 Ignoring transid failure ERROR: child eb corrupted: parent bytenr=38010880 item=69 parent level=1 child level=1 ERROR: failed to change UUID of metadata: -5 ERROR: btrfstune failed This leaves a corrupted fs even more corrupted, and due to the extra CHANGING_FSID flag, btrfs check will not even try to run on it: Opening filesystem to check... ERROR: Filesystem UUID change in progress ERROR: cannot open file system [CAUSE] Unlike kernel, btrfs-progs has a less strict check on transid mismatch. In read_tree_block() we will fall back to use the tree block even its transid mismatch if we can't find any better copy. However not all commands in btrfs-progs needs this feature, only btrfs-check (which may fix the problem) and btrfs-restore (it just tries to ignore any problems) really utilize this feature. [FIX] Introduce a new open ctree flag, OPEN_CTREE_ALLOW_TRANSID_MISMATCH, to be explicit about whether we really want to ignore transid error. Currently only btrfs-check and btrfs-restore will utilize this new flag. Also add btrfs-image to allow opening such fs with transid error. Link: https://www.reddit.com/r/btrfs/comments/pivpqk/failure_during_btrfstune_u/ Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-20 12:17:29 +02:00
David Sterba	96a5cf0719	btrfs-progs: handle EINVAL when reading zone size on older kernels A combination of new progs and old kernel may lead to problems with detecting zone size by ioctl. Fixed by #376 but still incomplete because old kernels may return EINVAL for unsupported ioctl. This should be ENOTTY but hasn't been like that until kernel 5.11. As we always pass valid arguments to the ioctl we can't conflate the two and can EINVAL the same way as ENOTTY. Issue: #399 Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-20 11:31:09 +02:00
David Sterba	ee17bcec33	btrfs-progs: remove stale declaration from send.h We don't use this header for kernel compilation so the guarded declaration is pointless. Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-07 19:27:59 +02:00
David Sterba	e86425242f	btrfs-progs: move send.h to kernel-shared/ The header contains the protocol definitions and is almost exactly the same as the kernel version, move it to the proper directory. Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-07 19:26:46 +02:00
David Sterba	76ab1fa364	btrfs-progs: rename and move group_profile_max_safe_loss The helper belongs to the others that translate bg flags to the raid attr table member. Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-07 16:38:56 +02:00
Qu Wenruo	9a11b1b792	btrfs-progs: backport btrfs_check_node() from kernel The btrfs_check_node() has far less meaningful error message compared to kernel counterpart, and it even lacks certain checks like level check. Backport btrfs_check_node() to btrfs-progs to not only unify the code but greatly improve the readability of the error messages. Extra modification includes: - No fs_info needed As we don't need to output fsid. - Remove unlikely() macro - Extra BTRFS_TREE_BLOCK_* error type - Btrfs-progs specific error handling To record the corrupted tree blocks. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-07 14:20:41 +02:00
Qu Wenruo	8f8cafa2ce	btrfs-progs: backport btrfs_check_leaf() from kernel Currently btrfs_check_leaf() provides almost meaningless messages for things like invalid item offset: incorrect offsets 8492 3707786077 While kernel tree-checker is doing a way better job, so it's wise to backport btrfs_check_leaf() from kernel. There are some modification needed: - New generic_err() helper - Remove unlikely() macro - Remove empty essential tree check Mkfs still needs to create empty essential trees. - Using BTRFS_TREE_BLOCK_* return value Original mode check still relies on them to do certain repair. - No need for btrfs_fs_info We no longer need fsid output, thus no need for btrfs_fs_info. - No item contents check - Still using the fail: label for btrfs-progs specific error handling The new output looks like: corrupt leaf: root=2 block=72164753408 slot=109, unexpected item end, have 3707786077 expect 8492 Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-07 14:19:54 +02:00
Qu Wenruo	1f8dfe681f	btrfs-progs: use btrfs_key for btrfs_check_node() and btrfs_check_leaf() In kernel space we hardly use btrfs_disk_key, unless for very lowlevel code. There is no need to intentionally use btrfs_disk_key in btrfs-progs either. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-07 13:58:44 +02:00
David Sterba	c3ee6a8a09	btrfs-progs: unify GPL header comments Add the GPL v2 header to files where it was missing and is not from an external source, update to the most recent version with the address. Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-07 13:58:44 +02:00
David Sterba	7572839a74	btrfs-progs: add and use bit masks for RAID1 and RAID56 profiles Many test conditions can be simplified in case they check all the related profiles. Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-06 16:36:18 +02:00
David Sterba	7fe4396467	btrfs-progs: copy some raid_attr helpers from kernel There are convenience helpers for the raid attr table, copy them from kernel for further cleanups. Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-06 16:36:17 +02:00
Josef Bacik	79e534def9	btrfs-progs: add the incompat flag for extent tree v2 I will have a lot of preparatory patches to reduce the review pain of this large feature. In order to enable that work define the incompat flag. Once all of the work lands to support the feature there will be a patch to actually enable us to select it and manipulate file systems with that incompat flag set. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-06 16:36:17 +02:00
Josef Bacik	826e466028	btrfs-progs: add add_block_group_free_space helper This exists in the kernel free-space-tree.c but not in progs. We need it to generate the free space items for new block groups, which is needed when we start creating the free space tree in make_btrfs(). Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-06 16:36:17 +02:00
Josef Bacik	3d870a491f	btrfs-progs: make sure track_dirty and ref_cows is set properly Adding support for the per-block group roots means we will be reading the roots directly in different places. Make sure we set ->track_dirty and ->ref_cows properly in the helper so we don't have to do this everywhere. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-03 15:33:53 +02:00
David Sterba	a177ef7dd4	btrfs-progs: mkfs: allow degenerate raid0/raid10 Kernel patch b2f78e88052bc0bee ("btrfs: allow degenerate raid0/raid10") in 5.15 will allow mounting and converting to single device raid0 or two device raid10. Let mkfs create such filesystem. "The motivation is to allow to preserve the profile type as long as it possible for some intermediate state (device removal, conversion), or when there are disks of different size, with raid0 the otherwise unusable space of the last device will be used too. Similarly for raid10, though the two largest devices would need to be the same." Signed-off-by: David Sterba <dsterba@suse.com>	2021-08-27 15:40:53 +02:00
Qu Wenruo	991a598f53	btrfs-progs: move btrfs_format_csum() to common/utils.[ch] Function btrfs_format_csum() is a special helper only used in btrfs-progs. Move it to common/utils.[ch] other than leaving it in kernel-shared/disk-io.c. Since we're moving the code, also introduce a macro, BTRFS_CSUM_STRING_LEN, to replace open-coded string length calculation. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-08-26 14:26:13 +02:00
Josef Bacik	8c3c13bb45	btrfs-progs: check blocks in btrfs_next_sibling_block By enabling the lowmem checks properly I uncovered the case where test fsck/007 will infinite loop at the detection stage. This is because when checking the inode item we will just btrfs_next_item(), and because we ignore check tree block failures at read time we don't get an -EIO from btrfs_next_leaf. This occurs because we allow fsck to raw-read blocks even if they fail basic sanity checks, because we want the opportunity to repair the blocks. However this means corrupt blocks are sitting in cache marked as uptodate. btrfs_search_slot() handles this by doing a check_block() on every block we add to the path, so that anything that is doing a search gets a proper -EIO. btrfs_next_sibling_block() needs a similar check. With this fix we now return -EIO on btrfs_next_leaf() properly and we no longer infinite loop on fsck/007 with lowmem. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-08-25 15:38:54 +02:00
Qu Wenruo	a138daac17	btrfs-progs: mkfs: set super_cache_generation to 0 if we're using free space tree [HICCUP] There is a bug report that mkfs.btrfs -R free-space-tree still makes kernel to try to cleanup the v1 space cache: # mkfs.btrfs -R free-space-tree -f /dev/test/scratch1 # mount /dev/test/scratch1 /mnt/btrfs # dmesg \| grep cleaning BTRFS info (device dm-6): cleaning free space cache v1 [CAUSE] By default, mkfs.btrfs will set super cache generation to (u64)-1, which will inform kernel that the v1 space cache is invalid, needs to regenerate it. But for free space cache tree, kernel will set super cache generation to 0, to indicate v1 space cache is not in use. This means, even we enabled free space tree with all the RO compatible bits and new tree, as long as super cache generation is not 0, kernel still consider the fs has some invalid v1 space cache, and will try to remove them. [FIX] This is not a big deal, but to make the "-R free-space-tree" to really work as kernel, we also need to set super cache generation to 0. Reported-by: Chris Murphy <lists@colorremedies.com> Link: https://lore.kernel.org/linux-btrfs/CAJCQCtSvgzyOnxtrqQZZirSycEHp+g0eDH5c+Kw9mW=PgxuXmw@mail.gmail.com/ Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-08-20 14:24:55 +02:00
David Sterba	6527771668	btrfs-progs: add nparity for raid1c34 definitions The values of .ncopies was not explicitly set. Signed-off-by: David Sterba <dsterba@suse.com>	2021-07-23 00:59:27 +02:00
Qu Wenruo	07ecf878c1	btrfs-progs: check: batch v1 space cache inodes when clearing Currently v1 space cache clearing will delete one cache inode just in one transaction, and then start a new transaction to delete the next inode. This is far from efficient and can make the already slow v1 space cache deleting even slower, as large fs has tons of cache inodes to delete. This patch will speed up the process by batching up to 16 inode deletion into one transaction. A quick benchmark of deleting 702 v1 space cache inodes would look like this: Unpatched: 4.898s Patched: 0.087s Which is obviously a big win. Reported-by: Joshua <joshua@mailmag.net> Link: https://lore.kernel.org/linux-btrfs/0b4cf70fc883e28c97d893a3b2f81b11@mailmag.net/ Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-07-22 16:26:05 +02:00
Sidong Yang	94f3b75c00	btrfs-progs: zoned: fix memory leak in btrfs_sb_io() In btrfs_sb_io(), blk_zone_report is used for getting information about zones. But it is not freed if code goes in usual path. This patch frees the variable just after it used. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Sidong Yang <realwakka@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-07-02 17:27:53 +02:00
David Sterba	1dc6f33c28	btrfs-progs: zoned: use fixed width type when reading zone size The ioctl BLKGETZONESZ expects 32bit integer, declare the target variable as such. Signed-off-by: David Sterba <dsterba@suse.com>	2021-07-02 17:27:53 +02:00
David Sterba	b1f374dd1d	btrfs-progs: switch %Lu to %llu format The %Lu format is not standard and we use %llu everywhere else, so switch the remaining cases. Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-19 22:07:49 +02:00
David Sterba	9f6c055e38	btrfs-progs: dump-tree: add options to dump checksums Add new options to dumps checksums in node headers and in the checksum items: $ btrfs inspect dump-tree --csum-headers image root tree leaf 471515136 items 19 free space 12186 generation 15 owner ROOT_TREE leaf 471515136 flags 0x1(WRITTEN) backref revision 1 csum 0x756b2d54 fs uuid df0348df-5773-47dd-81e9-a18221461239 For nodes/leaves it's appended on the 2nd line of the header. Checksum items are stored in leaves as EXTENT_CSUM key type, with offset value as the logical offset starting. As the array would be hard to parse or match, each offset value is printed with the checksum. For crc32c it's 4 values on a line, for xxhash it's 2 and for the long 256bit checksums it's one checksum per line. $ btrfs inspect dump-tree --csum-items image leaf 5423104 items 1 free space 30 generation 6 owner CSUM_TREE leaf 5423104 flags 0x1(WRITTEN) backref revision 1 fs uuid bd7c981e-16ff-4081-a734-3ef5d50cafc1 chunk uuid 13f4c76c-7845-4984-88ed-f01b52e05cf8 item 0 key (EXTENT_CSUM EXTENT_CSUM 22020096) itemoff 55 itemsize 16228 range start 22020096 end 38637568 length 16617472 [22020096] 0x8941f998 [22024192] 0x8941f998 [22028288] 0x8941f998 [22032384] 0x8941f998 [22036480] 0x8941f998 [22040576] 0x8941f998 [22044672] 0x8941f998 [22048768] 0x8941f998 ... $ btrfs inspect dump-tree --csum-items image leaf 5718016 items 1 free space 7746 generation 6 owner CSUM_TREE leaf 5718016 flags 0x1(WRITTEN) backref revision 1 fs uuid f453a5b4-8b4a-4fbf-90a2-2925e4fe2335 chunk uuid eb1da63b-248b-44c2-82da-71b2564bf50e item 0 key (EXTENT_CSUM EXTENT_CSUM 52387840) itemoff 7771 itemsize 8512 range start 52387840 end 53477376 length 1089536 [52387840] 0x686ede9288c391e7e05026e56f2f91bfd879987a040ea98445dabc76f55b8e5f [52391936] 0x686ede9288c391e7e05026e56f2f91bfd879987a040ea98445dabc76f55b8e5f ... The options are not on by default, the header checksum is not important for the structures. Data checksums can be quite big so that would make the dump long and without any actual data to match against. Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-19 22:07:49 +02:00
David Sterba	72d710637c	btrfs-progs: print-tree: convert mode to bitmask Replace follow and traverse by one parameter that takes bits to affect the behaviour. This allows to extend btrfs_print_tree output with more modes from one place. Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-09 20:31:49 +02:00
David Sterba	6134973527	btrfs-progs: zoned: make it work without kernel support There's a report that a system with 4.19 kernel fails boot because device scan exits with error. This is because zoned support is compiled in btrfs-progs but not in kernel. To make new progs and old kernels work, do a fallback when the zoned ioctl is not available, as if it were a non-zoned device. There is no other option, but this is safe at least for the device scan that would not error out. Any unaligned writes to a zoned device will fail as expected. Issue: #376 Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-07 17:38:46 +02:00
Su Yue	80a86f1b47	btrfs-progs: do not BUG_ON if btrfs_add_to_fsid succeeded to write superblock Commit `8ef9313cf2` ("btrfs-progs: zoned: implement log-structured superblock") changed to write BTRFS_SUPER_INFO_SIZE bytes to device. The before num of bytes to be written is sectorsize. It causes mkfs.btrfs failed on my 16k pagesize kvm: $ /usr/bin/mkfs.btrfs -s 16k -f -mraid0 /dev/vdb2 /dev/vdb3 btrfs-progs v5.12 See http://btrfs.wiki.kernel.org for more information. ERROR: superblock magic doesn't match ERROR: superblock magic doesn't match common/device-scan.c:195: btrfs_add_to_fsid: BUG_ON `ret != sectorsize` triggered, value 1 /usr/bin/mkfs.btrfs(btrfs_add_to_fsid+0x274)[0xaaab4fe8a5fc] /usr/bin/mkfs.btrfs(main+0x1188)[0xaaab4fe4dc8c] /usr/lib/libc.so.6(__libc_start_main+0xe8)[0xffff7223c538] /usr/bin/mkfs.btrfs(+0xc558)[0xaaab4fe4c558] [1] 225842 abort (core dumped) /usr/bin/mkfs.btrfs -s 16k -f -mraid0 /dev/vdb2 /dev/vdb3 btrfs_add_to_fsid() now always calls sbwrite() to write BTRFS_SUPER_INFO_SIZE bytes to device, so change condition of the BUG_ON(). Also add comments for sbread() and sbwrite(). Signed-off-by: Su Yue <l@damenly.su> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-12 16:00:14 +02:00
David Sterba	6c53222add	btrfs-progs: delete bogus zero checksum check The check condition (csum_result == 0) does not make sense anymore as it's not the buffer and not the crc32c result as it used to be. The message does not bring any value and looks like it's some debugging aid from the old times (added in 2008 as `bb7055ec21` ("Add some extra debugging around file data checksum failures")). Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-08 00:58:51 +02:00
David Sterba	c19ac510a7	btrfs-progs: move repair.[ch] to common/ Move the file to common as it's used by several parts, while still keeping the name 'repair' although the only thing it does is adding a corrupted extent. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:47 +02:00
David Sterba	b19a603d62	btrfs-progs: remove unnecessary linux/*.h includes Decrease dependency on system headers, remove where they're not needed or became stale after code moved. The path-utils.h encapsulate path operations so include linux/limits.h here, that's where PATH_MAX is defined. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:47 +02:00
David Sterba	aa56bf3a31	btrfs-progs: zoned: replace raw ioctl with a helper for device size Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
David Sterba	c7b5f884e0	btrfs-progs: add prefix to zero_blocks This is a public helper for devices, add the prefix to make it clear. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
David Sterba	2b5d4f2e6f	btrfs-progs: add prefix to discard_blocks This is a helper for devices, make it clear in the function name. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
David Sterba	bc6864967b	btrfs-progs: add prefix to exported queue_param As this is a public helper, add a prefix that makes it clear what is the queue related to. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
David Sterba	38254c4934	btrfs-progs: kerncompat: add const_ilog2 The newly added zoned mode constants can utilize the const ilog2 version. Copy it from kernel include/linux/log2.h. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
Naohiro Aota	8c2dfa6387	btrfs-progs: zoned: wipe temporary superblocks in superblock log zone mkfs.btrfs uses a temporary superblock during the initialization process. The temporary superblock uses BTRFS_MAGIC_TEMPORARY as its magic which is different from a regular superblock. As a result, libblkid, which only supports the usual magic, cannot recognize the volume as btrfs. So, let's wipe the temporary magic before writing out the usual superblock. Technically, we can add the temporary magic to the libblkid's table. But, it will result in recognizing a half-baked filesystem as btrfs, which is not ideal. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
Naohiro Aota	8bbb0c5744	btrfs-progs: zoned: support zero out on zoned block device If we zero out a region in a sequential write required zone, we cannot write to the region until we reset the zone. Thus, we must prohibit zeroing out to a sequential write required zone. zero_dev_clamped() is modified to take the zone information and it calls zero_zone_blocks() if the device is host managed to avoid writing to sequential write required zones. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
Naohiro Aota	58ec593892	btrfs-progs: zoned: support resetting zoned device All zones of zoned block devices should be reset before writing. Support this by introducing PREP_DEVICE_ZONED. btrfs_reset_all_zones() walk all the zones on a device, and reset a zone if it is sequential required zone, or discard the zone range otherwise. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
Naohiro Aota	bfdb3ae237	btrfs-progs: zoned: reset zone of freed block group When freeing a chunk, we can/should reset the underlying device zones for the chunk. Introduce btrfs_reset_chunk_zones() and reset the zones. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	bfd34b7876	btrfs-progs: zoned: redirty clean extent buffers Tree manipulating operations like merging nodes often release once-allocated tree nodes. Btrfs cleans such nodes so that pages in the node are not uselessly written out. On ZONED drives, however, such optimization blocks the following IOs as the cancellation of the write out of the freed blocks breaks the sequential write sequence expected by the device. Check if next dirty extent buffer is continuous to a previously written one. If not, it redirty extent buffers between the previous one and the next one, so that all dirty buffers are written sequentially. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	feff533e34	btrfs-progs: zoned: calculate allocation offset for conventional zones Conventional zones do not have a write pointer, so we cannot use it to determine the allocation offset for sequential allocation if a block group contains a conventional zone. But instead, we can consider the end of the highest addressed extent in the block group for the allocation offset. For new block group, we cannot calculate the allocation offset by consulting the extent tree, because it can cause deadlock by taking extent buffer lock after chunk mutex, which is already taken in btrfs_make_block_group(). Since it is a new block group anyways, we can simply set the allocation offset to 0. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	50ae9f62c7	btrfs-progs: zoned: implement sequential extent allocation Implement a sequential extent allocator for zoned filesystems. This allocator only needs to check if there is enough space in the block group after the allocation pointer to satisfy the extent allocation request. Since the allocator is really simple, we implement it directly in find_search_start(). Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	f08410f078	btrfs-progs: zoned: load zone's allocation offset A zoned filesystem must allocate blocks at the zones' write pointer. The device's write pointer position can be mapped to a logical address within a block group. To facilitate this, add an "alloc_offset" to the block group to track the logical addresses of the write pointer. This logical address is populated in btrfs_load_block_group_zone_info() from the write pointers of corresponding zones. For now, zoned filesystems the single profile. Supporting non-single profile with zone append writing is not trivial. For example, in the DUP profile, we send a zone append writing IO to two zones on a device. The device reply with written LBAs for the IOs. If the offsets of the returned addresses from the beginning of the zone are different, then it results in different logical addresses. We need fine-grained logical to physical mapping to support such separated physical address issue. Since it should require additional metadata type, disable non-single profiles for now. This commit supports the case all the zones in a block group are sequential. The next patch will handle the case having a conventional zone. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	b031fe84fd	btrfs-progs: zoned: implement zoned chunk allocator Implement a zoned chunk and device extent allocator. One device zone becomes a device extent so that a zone reset affects only this device extent and does not change the state of blocks in the neighbor device extents. To implement the allocator, we need to extend the following functions for a zoned filesystem: - init_alloc_chunk_ctl - dev_extent_search_start - dev_extent_hole_check - decide_stripe_size Here, dev_extent_hole_check() is newly introduced to check the validity of a hole found. init_alloc_chunk_ctl_zoned() is mostly the same as regular one. It always set the stripe_size to the zone size and aligns the parameters to the zone size. dev_extent_search_start() only aligns the start offset to zone boundaries. We don't care about the first 1MB like in regular filesystem because we anyway reserve the first two zones for superblock logging. dev_extent_hole_check_zoned() checks if zones in given hole are either conventional or empty sequential zones. Also, it skips zones reserved for superblock logging. With the change to the hole, the new hole may now contain pending extents. So, in this case, loop again to check that. Finally, decide_stripe_size_zoned() should shrink the number of devices instead of stripe size because we need to honor stripe_size == zone_size. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	8ef9313cf2	btrfs-progs: zoned: implement log-structured superblock Superblock (and its copies) is the only data structure in btrfs which has a fixed location on a device. Since we cannot overwrite in a sequential write required zone, we cannot place superblock in the zone. One easy solution is limiting superblock and copies to be placed only in conventional zones. However, this method has two downsides: one is reduced number of superblock copies. The location of the second copy of superblock is 256GB, which is in a sequential write required zone on typical devices in the market today. So, the number of superblock and copies is limited to be two. Second downside is that we cannot support devices which have no conventional zones at all. To solve these two problems, we employ superblock log writing. It uses two adjacent zones as a circular buffer to write updated superblocks. Once the first zone is filled up, start writing into the second one. Then, when both zones are filled up and before starting to write to the first zone again, reset the first zone. We can determine the position of the latest superblock by reading write pointer information from a device. One corner case is when both zones are full. For this situation, we read out the last superblock of each zone, and compare them to determine which zone is older. The following zones are reserved as the circular buffer on ZONED btrfs. - primary superblock: offset 0B (and the following zone) - first copy: offset 512G (and the following zone) - Second copy: offset 4T (4096G, and the following zone) If these reserved zones are conventional, superblock is written fixed at the start of the zone without logging. Currently, superblock reading/writing is done by pread/pwrite. This commit replace the call sites with sbread/sbwrite to wrap the functions. For zoned btrfs, btrfs_sb_io which is called from sbread/sbwrite reverses the IO position back to a mirror number, maps the mirror number into the superblock logging position, and do the IO. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	49d5ce4d0f	btrfs-progs: zoned: allow zoned filesystems on non-zoned block devices Run a zoned filesystem on non-zoned devices. This is done by "slicing up" the block device into fixed-sized chunks and emulate a conventional zone on each of them. The emulated zone size is determined from the size of device extent. This is mainly aimed at testing of zoned filesystems, i.e. the zoned chunk allocator, on regular block devices. Currently, we always use EMULATED_ZONE_SIZE (256MiB) for the emulated zone size. In the future, this will be customized by mkfs option. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	707f0716e0	btrfs-progs: zoned: disallow mixed-bg in ZONED mode Placing both data and metadata in a block group is impossible in ZONED mode. For data, we can allocate a space for it and write it immediately after the allocation. For metadata, however, we cannot do that, because the logical addresses are recorded in other metadata buffers to build up the trees. As a result, a data buffer can be placed after a metadata buffer, which is not written yet. Writing out the data buffer will break the sequential write rule. Check and disallow MIXED_BG with ZONED mode. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	3c0f83e541	btrfs-progs: zoned: introduce max_zone_append_size The zone append write command has a maximum IO size restriction it accepts. This is because a zone append write command cannot be split, as we ask the device to place the data into a specific target zone and the device responds with the actual written location of the data. Introduce max_zone_append_size to zone_info and fs_info to track the value, so we can limit all I/O to a zoned block device that we want to write using the zone append command to the device's limits. Zone append command is mandatory for zoned btrfs. So, reject a device with max_zone_append_size == 0. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	7e520022ff	btrfs-progs: zoned: check and enable ZONED mode Introduce function btrfs_check_zoned_mode() to check if ZONED flag is enabled on the file system and if the file system consists of zoned devices with equal zone size. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	384840b9c0	btrfs-progs: zoned: get zone information of zoned block devices Get the zone information (number of zones and zone size) from all the devices, if the volume contains a zoned block device. To avoid costly run-time zone report commands to test the device zones type during block allocation, it also records all the zone status (zone type, write pointer position, etc.). Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	242c8328bc	btrfs-progs: zoned: add new ZONED feature flag With the zoned feature enabled, a zoned block device-aware btrfs allocates block groups aligned to the device zones and always written in sequential zones at the zone write pointer position. It also supports "emulated" zoned mode on a non-zoned device. In the emulated mode, btrfs emulates conventional zones by slicing the device into fixed-size zones. We don't support conversion from the ext4 volume with the zoned feature because we can't be sure all the converted block groups are aligned to zone boundaries. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	acdd22ab68	btrfs-progs: provide fs_info from btrfs_device Likewise in the kernel code, provide fs_info access from struct btrfs_device. This will help to unify the code between the kernel and the userland. Since fs_info can be NULL at the time of btrfs_add_to_fsid(), let's use btrfs_open_devices() to set fs_info to the devices. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	cf67267d33	btrfs-progs: rename calc_size to stripe_size alloc_chunk_ctl::calc_size is actually the stripe_size in the kernel side code. Let's rename it to clarify what the "calc" is. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00

1 2 3 4

196 Commits