btrfs-progs

mirror of https://github.com/kdave/btrfs-progs synced 2025-04-11 03:31:17 +00:00

Author	SHA1	Message	Date
David Sterba	d0ea2b2af4	btrfs-progs: zoned: also exclude raid1c3 and raid1c4 from supported profiles The enumeration of profiles not available for zoned mode in btrfs_load_block_group_zone_info was lacking the 3 and 4 copy raid1, add them. Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-07 18:39:58 +02:00
David Sterba	27207d651a	btrfs-progs: dump-tree: print complete root_item The output of root_item in the 'inspect dump-tree' command lacks some items and some of them are printed conditionally. As the dump utility is for debugging, it's better to print all the items, with names matching the structure members and order. Some values will inevitably be all zeros like uuids or various timestamps, but that's a minor issue and affecting only a few trees. Example: item 0 key (EXTENT_TREE ROOT_ITEM 0) itemoff 15844 itemsize 439 generation 5 root_dirid 0 bytenr 30523392 byte_limit 0 bytes_used 16384 last_snapshot 0 flags 0x0(none) refs 1 drop_progress key (0 UNKNOWN.0 0) drop_level 0 level 0 generation_v2 5 uuid 00000000-0000-0000-0000-000000000000 parent_uuid 00000000-0000-0000-0000-000000000000 received_uuid 00000000-0000-0000-0000-000000000000 ctransid 0 otransid 0 stransid 0 rtransid 0 ctime 0.0 (1970-01-01 01:00:00) otime 0.0 (1970-01-01 01:00:00) stime 0.0 (1970-01-01 01:00:00) rtime 0.0 (1970-01-01 01:00:00) item 3 key (FS_TREE ROOT_ITEM 0) itemoff 14949 itemsize 439 generation 4 root_dirid 256 bytenr 30408704 byte_limit 0 bytes_used 16384 last_snapshot 0 flags 0x0(none) refs 1 drop_progress key (0 UNKNOWN.0 0) drop_level 0 level 0 generation_v2 4 uuid ec4669b6-6d21-46ab-857e-d60cafde45b3 parent_uuid 00000000-0000-0000-0000-000000000000 received_uuid 00000000-0000-0000-0000-000000000000 ctransid 0 otransid 0 stransid 0 rtransid 0 ctime 1633021823.0 (2021-09-30 19:10:23) otime 1633021823.0 (2021-09-30 19:10:23) stime 0.0 (1970-01-01 01:00:00) rtime 0.0 (1970-01-01 01:00:00) Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-07 18:39:44 +02:00
Josef Bacik	dd8e7477f7	btrfs-progs: remove data extents from the free space tree Dave reported a failure of mkfs-test 009 with the free space tree enabled by default. This is because 009 pre-populates the file system with a given directory, and for some reason our data allocation path isn't the same as in the kernel. Fix this by making sure when we allocate a data extent we remove the space from the free space tree, and with this our mkfs tests now pass. Issue: #410 Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-06 16:49:52 +02:00
Naohiro Aota	85e102f212	btrfs-progs: properly format btrfs_header in btrfs_create_root() Enabling quota in zoned mored hits the following assertion: $ mkfs.btrfs -f -d single -m single -R quota /dev/nullb0 btrfs-progs v5.11 See http://btrfs.wiki.kernel.org for more information. Zoned: /dev/nullb0: host-managed device detected, setting zoned feature Resetting device zones /dev/nullb0 (1600 zones) ... bad tree block 25395200, bytenr mismatch, want=25395200, have=0 kernel-shared/disk-io.c:549: write_tree_block: BUG_ON `1` triggered, value 1 ./mkfs.btrfs(+0x26aaa)[0x564d1a7ccaaa] ./mkfs.btrfs(write_tree_block+0xb8)[0x564d1a7cee29] ./mkfs.btrfs(__commit_transaction+0x91)[0x564d1a7e3740] ./mkfs.btrfs(btrfs_commit_transaction+0x135)[0x564d1a7e39aa] ./mkfs.btrfs(main+0x1fe9)[0x564d1a7b442a] /lib64/libc.so.6(__libc_start_main+0xcd)[0x7f36377d37fd] ./mkfs.btrfs(_start+0x2a)[0x564d1a7b1fda] zsh: IOT instruction sudo ./mkfs.btrfs -f -d single -m single -R quota /dev/nullb0 The issue occurs because btrfs_create_root() is not formatting the root node properly. This is fine in regular mode, because it's fortunately reusing an once freed buffer. As the previous tree node allocation kindly formatted the header, it will see the proper bytenr and pass the checks. However, we never reuse a once freed buffer on zoned filesystem. As a result, we have zero-filled bytenr, FSID, and chunk-tree UUID, hitting the asserts in check_tree_block(). Reported-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-06 16:49:11 +02:00
Johannes Thumshirn	c22e9487a7	btrfs-progs: remove max_zone_append_size logic max_zone_append_size is unused and can as well be removed just like we did on the kernel side. Keep one sanity check though, so we're not adding devices to a zoned FS that aren't supporting zone append. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-06 16:49:07 +02:00
Naohiro Aota	53ec59ead0	btrfs-progs: do not zone reset on emulated zoned mode We cannot zone reset a regular file with emulated zones. So, mkfs.btrfs on such a file causes the following error. ERROR: zoned: failed to reset device '/home/naota/tmp/btrfs.img' zones: Inappropriate ioctl for device Introduce btrfs_zoned_device_info->emulated to distinguish the zones are emulated or not. And, use it to decide it needs zone reset or not. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-06 16:48:56 +02:00
Qu Wenruo	60651ad9da	btrfs-progs: introduce OPEN_CTREE_ALLOW_TRANSID_MISMATCH flag [BUG] There is a report that, btrfstune can even work while the fs has transid mismatch problems. $ btrfstune -f -u /dev/sdb1 Current fsid: b2b5ae8d-4c49-45f0-b42e-46fe7dcfcb07 New fsid: b2b5ae8d-4c49-45f0-b42e-46fe7dcfcb07 Set superblock flag CHANGING_FSID Change fsid in extents parent transid verify failed on 792854528 wanted 20103 found 20091 parent transid verify failed on 792854528 wanted 20103 found 20091 parent transid verify failed on 792854528 wanted 20103 found 20091 Ignoring transid failure parent transid verify failed on 792870912 wanted 20103 found 20091 parent transid verify failed on 792870912 wanted 20103 found 20091 parent transid verify failed on 792870912 wanted 20103 found 20091 Ignoring transid failure parent transid verify failed on 792887296 wanted 20103 found 20091 parent transid verify failed on 792887296 wanted 20103 found 20091 parent transid verify failed on 792887296 wanted 20103 found 20091 Ignoring transid failure ERROR: child eb corrupted: parent bytenr=38010880 item=69 parent level=1 child level=1 ERROR: failed to change UUID of metadata: -5 ERROR: btrfstune failed This leaves a corrupted fs even more corrupted, and due to the extra CHANGING_FSID flag, btrfs check will not even try to run on it: Opening filesystem to check... ERROR: Filesystem UUID change in progress ERROR: cannot open file system [CAUSE] Unlike kernel, btrfs-progs has a less strict check on transid mismatch. In read_tree_block() we will fall back to use the tree block even its transid mismatch if we can't find any better copy. However not all commands in btrfs-progs needs this feature, only btrfs-check (which may fix the problem) and btrfs-restore (it just tries to ignore any problems) really utilize this feature. [FIX] Introduce a new open ctree flag, OPEN_CTREE_ALLOW_TRANSID_MISMATCH, to be explicit about whether we really want to ignore transid error. Currently only btrfs-check and btrfs-restore will utilize this new flag. Also add btrfs-image to allow opening such fs with transid error. Link: https://www.reddit.com/r/btrfs/comments/pivpqk/failure_during_btrfstune_u/ Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-20 12:17:29 +02:00
David Sterba	96a5cf0719	btrfs-progs: handle EINVAL when reading zone size on older kernels A combination of new progs and old kernel may lead to problems with detecting zone size by ioctl. Fixed by #376 but still incomplete because old kernels may return EINVAL for unsupported ioctl. This should be ENOTTY but hasn't been like that until kernel 5.11. As we always pass valid arguments to the ioctl we can't conflate the two and can EINVAL the same way as ENOTTY. Issue: #399 Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-20 11:31:09 +02:00
David Sterba	ee17bcec33	btrfs-progs: remove stale declaration from send.h We don't use this header for kernel compilation so the guarded declaration is pointless. Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-07 19:27:59 +02:00
David Sterba	e86425242f	btrfs-progs: move send.h to kernel-shared/ The header contains the protocol definitions and is almost exactly the same as the kernel version, move it to the proper directory. Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-07 19:26:46 +02:00
David Sterba	76ab1fa364	btrfs-progs: rename and move group_profile_max_safe_loss The helper belongs to the others that translate bg flags to the raid attr table member. Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-07 16:38:56 +02:00
Qu Wenruo	9a11b1b792	btrfs-progs: backport btrfs_check_node() from kernel The btrfs_check_node() has far less meaningful error message compared to kernel counterpart, and it even lacks certain checks like level check. Backport btrfs_check_node() to btrfs-progs to not only unify the code but greatly improve the readability of the error messages. Extra modification includes: - No fs_info needed As we don't need to output fsid. - Remove unlikely() macro - Extra BTRFS_TREE_BLOCK_* error type - Btrfs-progs specific error handling To record the corrupted tree blocks. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-07 14:20:41 +02:00
Qu Wenruo	8f8cafa2ce	btrfs-progs: backport btrfs_check_leaf() from kernel Currently btrfs_check_leaf() provides almost meaningless messages for things like invalid item offset: incorrect offsets 8492 3707786077 While kernel tree-checker is doing a way better job, so it's wise to backport btrfs_check_leaf() from kernel. There are some modification needed: - New generic_err() helper - Remove unlikely() macro - Remove empty essential tree check Mkfs still needs to create empty essential trees. - Using BTRFS_TREE_BLOCK_* return value Original mode check still relies on them to do certain repair. - No need for btrfs_fs_info We no longer need fsid output, thus no need for btrfs_fs_info. - No item contents check - Still using the fail: label for btrfs-progs specific error handling The new output looks like: corrupt leaf: root=2 block=72164753408 slot=109, unexpected item end, have 3707786077 expect 8492 Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-07 14:19:54 +02:00
Qu Wenruo	1f8dfe681f	btrfs-progs: use btrfs_key for btrfs_check_node() and btrfs_check_leaf() In kernel space we hardly use btrfs_disk_key, unless for very lowlevel code. There is no need to intentionally use btrfs_disk_key in btrfs-progs either. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-07 13:58:44 +02:00
David Sterba	c3ee6a8a09	btrfs-progs: unify GPL header comments Add the GPL v2 header to files where it was missing and is not from an external source, update to the most recent version with the address. Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-07 13:58:44 +02:00
David Sterba	7572839a74	btrfs-progs: add and use bit masks for RAID1 and RAID56 profiles Many test conditions can be simplified in case they check all the related profiles. Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-06 16:36:18 +02:00
David Sterba	7fe4396467	btrfs-progs: copy some raid_attr helpers from kernel There are convenience helpers for the raid attr table, copy them from kernel for further cleanups. Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-06 16:36:17 +02:00
Josef Bacik	79e534def9	btrfs-progs: add the incompat flag for extent tree v2 I will have a lot of preparatory patches to reduce the review pain of this large feature. In order to enable that work define the incompat flag. Once all of the work lands to support the feature there will be a patch to actually enable us to select it and manipulate file systems with that incompat flag set. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-06 16:36:17 +02:00
Josef Bacik	826e466028	btrfs-progs: add add_block_group_free_space helper This exists in the kernel free-space-tree.c but not in progs. We need it to generate the free space items for new block groups, which is needed when we start creating the free space tree in make_btrfs(). Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-06 16:36:17 +02:00
Josef Bacik	3d870a491f	btrfs-progs: make sure track_dirty and ref_cows is set properly Adding support for the per-block group roots means we will be reading the roots directly in different places. Make sure we set ->track_dirty and ->ref_cows properly in the helper so we don't have to do this everywhere. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-03 15:33:53 +02:00
David Sterba	a177ef7dd4	btrfs-progs: mkfs: allow degenerate raid0/raid10 Kernel patch b2f78e88052bc0bee ("btrfs: allow degenerate raid0/raid10") in 5.15 will allow mounting and converting to single device raid0 or two device raid10. Let mkfs create such filesystem. "The motivation is to allow to preserve the profile type as long as it possible for some intermediate state (device removal, conversion), or when there are disks of different size, with raid0 the otherwise unusable space of the last device will be used too. Similarly for raid10, though the two largest devices would need to be the same." Signed-off-by: David Sterba <dsterba@suse.com>	2021-08-27 15:40:53 +02:00
Qu Wenruo	991a598f53	btrfs-progs: move btrfs_format_csum() to common/utils.[ch] Function btrfs_format_csum() is a special helper only used in btrfs-progs. Move it to common/utils.[ch] other than leaving it in kernel-shared/disk-io.c. Since we're moving the code, also introduce a macro, BTRFS_CSUM_STRING_LEN, to replace open-coded string length calculation. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-08-26 14:26:13 +02:00
Josef Bacik	8c3c13bb45	btrfs-progs: check blocks in btrfs_next_sibling_block By enabling the lowmem checks properly I uncovered the case where test fsck/007 will infinite loop at the detection stage. This is because when checking the inode item we will just btrfs_next_item(), and because we ignore check tree block failures at read time we don't get an -EIO from btrfs_next_leaf. This occurs because we allow fsck to raw-read blocks even if they fail basic sanity checks, because we want the opportunity to repair the blocks. However this means corrupt blocks are sitting in cache marked as uptodate. btrfs_search_slot() handles this by doing a check_block() on every block we add to the path, so that anything that is doing a search gets a proper -EIO. btrfs_next_sibling_block() needs a similar check. With this fix we now return -EIO on btrfs_next_leaf() properly and we no longer infinite loop on fsck/007 with lowmem. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-08-25 15:38:54 +02:00
Qu Wenruo	a138daac17	btrfs-progs: mkfs: set super_cache_generation to 0 if we're using free space tree [HICCUP] There is a bug report that mkfs.btrfs -R free-space-tree still makes kernel to try to cleanup the v1 space cache: # mkfs.btrfs -R free-space-tree -f /dev/test/scratch1 # mount /dev/test/scratch1 /mnt/btrfs # dmesg \| grep cleaning BTRFS info (device dm-6): cleaning free space cache v1 [CAUSE] By default, mkfs.btrfs will set super cache generation to (u64)-1, which will inform kernel that the v1 space cache is invalid, needs to regenerate it. But for free space cache tree, kernel will set super cache generation to 0, to indicate v1 space cache is not in use. This means, even we enabled free space tree with all the RO compatible bits and new tree, as long as super cache generation is not 0, kernel still consider the fs has some invalid v1 space cache, and will try to remove them. [FIX] This is not a big deal, but to make the "-R free-space-tree" to really work as kernel, we also need to set super cache generation to 0. Reported-by: Chris Murphy <lists@colorremedies.com> Link: https://lore.kernel.org/linux-btrfs/CAJCQCtSvgzyOnxtrqQZZirSycEHp+g0eDH5c+Kw9mW=PgxuXmw@mail.gmail.com/ Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-08-20 14:24:55 +02:00
David Sterba	6527771668	btrfs-progs: add nparity for raid1c34 definitions The values of .ncopies was not explicitly set. Signed-off-by: David Sterba <dsterba@suse.com>	2021-07-23 00:59:27 +02:00
Qu Wenruo	07ecf878c1	btrfs-progs: check: batch v1 space cache inodes when clearing Currently v1 space cache clearing will delete one cache inode just in one transaction, and then start a new transaction to delete the next inode. This is far from efficient and can make the already slow v1 space cache deleting even slower, as large fs has tons of cache inodes to delete. This patch will speed up the process by batching up to 16 inode deletion into one transaction. A quick benchmark of deleting 702 v1 space cache inodes would look like this: Unpatched: 4.898s Patched: 0.087s Which is obviously a big win. Reported-by: Joshua <joshua@mailmag.net> Link: https://lore.kernel.org/linux-btrfs/0b4cf70fc883e28c97d893a3b2f81b11@mailmag.net/ Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-07-22 16:26:05 +02:00
Sidong Yang	94f3b75c00	btrfs-progs: zoned: fix memory leak in btrfs_sb_io() In btrfs_sb_io(), blk_zone_report is used for getting information about zones. But it is not freed if code goes in usual path. This patch frees the variable just after it used. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Sidong Yang <realwakka@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-07-02 17:27:53 +02:00
David Sterba	1dc6f33c28	btrfs-progs: zoned: use fixed width type when reading zone size The ioctl BLKGETZONESZ expects 32bit integer, declare the target variable as such. Signed-off-by: David Sterba <dsterba@suse.com>	2021-07-02 17:27:53 +02:00
David Sterba	b1f374dd1d	btrfs-progs: switch %Lu to %llu format The %Lu format is not standard and we use %llu everywhere else, so switch the remaining cases. Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-19 22:07:49 +02:00
David Sterba	9f6c055e38	btrfs-progs: dump-tree: add options to dump checksums Add new options to dumps checksums in node headers and in the checksum items: $ btrfs inspect dump-tree --csum-headers image root tree leaf 471515136 items 19 free space 12186 generation 15 owner ROOT_TREE leaf 471515136 flags 0x1(WRITTEN) backref revision 1 csum 0x756b2d54 fs uuid df0348df-5773-47dd-81e9-a18221461239 For nodes/leaves it's appended on the 2nd line of the header. Checksum items are stored in leaves as EXTENT_CSUM key type, with offset value as the logical offset starting. As the array would be hard to parse or match, each offset value is printed with the checksum. For crc32c it's 4 values on a line, for xxhash it's 2 and for the long 256bit checksums it's one checksum per line. $ btrfs inspect dump-tree --csum-items image leaf 5423104 items 1 free space 30 generation 6 owner CSUM_TREE leaf 5423104 flags 0x1(WRITTEN) backref revision 1 fs uuid bd7c981e-16ff-4081-a734-3ef5d50cafc1 chunk uuid 13f4c76c-7845-4984-88ed-f01b52e05cf8 item 0 key (EXTENT_CSUM EXTENT_CSUM 22020096) itemoff 55 itemsize 16228 range start 22020096 end 38637568 length 16617472 [22020096] 0x8941f998 [22024192] 0x8941f998 [22028288] 0x8941f998 [22032384] 0x8941f998 [22036480] 0x8941f998 [22040576] 0x8941f998 [22044672] 0x8941f998 [22048768] 0x8941f998 ... $ btrfs inspect dump-tree --csum-items image leaf 5718016 items 1 free space 7746 generation 6 owner CSUM_TREE leaf 5718016 flags 0x1(WRITTEN) backref revision 1 fs uuid f453a5b4-8b4a-4fbf-90a2-2925e4fe2335 chunk uuid eb1da63b-248b-44c2-82da-71b2564bf50e item 0 key (EXTENT_CSUM EXTENT_CSUM 52387840) itemoff 7771 itemsize 8512 range start 52387840 end 53477376 length 1089536 [52387840] 0x686ede9288c391e7e05026e56f2f91bfd879987a040ea98445dabc76f55b8e5f [52391936] 0x686ede9288c391e7e05026e56f2f91bfd879987a040ea98445dabc76f55b8e5f ... The options are not on by default, the header checksum is not important for the structures. Data checksums can be quite big so that would make the dump long and without any actual data to match against. Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-19 22:07:49 +02:00
David Sterba	72d710637c	btrfs-progs: print-tree: convert mode to bitmask Replace follow and traverse by one parameter that takes bits to affect the behaviour. This allows to extend btrfs_print_tree output with more modes from one place. Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-09 20:31:49 +02:00
David Sterba	6134973527	btrfs-progs: zoned: make it work without kernel support There's a report that a system with 4.19 kernel fails boot because device scan exits with error. This is because zoned support is compiled in btrfs-progs but not in kernel. To make new progs and old kernels work, do a fallback when the zoned ioctl is not available, as if it were a non-zoned device. There is no other option, but this is safe at least for the device scan that would not error out. Any unaligned writes to a zoned device will fail as expected. Issue: #376 Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-07 17:38:46 +02:00
Su Yue	80a86f1b47	btrfs-progs: do not BUG_ON if btrfs_add_to_fsid succeeded to write superblock Commit `8ef9313cf2` ("btrfs-progs: zoned: implement log-structured superblock") changed to write BTRFS_SUPER_INFO_SIZE bytes to device. The before num of bytes to be written is sectorsize. It causes mkfs.btrfs failed on my 16k pagesize kvm: $ /usr/bin/mkfs.btrfs -s 16k -f -mraid0 /dev/vdb2 /dev/vdb3 btrfs-progs v5.12 See http://btrfs.wiki.kernel.org for more information. ERROR: superblock magic doesn't match ERROR: superblock magic doesn't match common/device-scan.c:195: btrfs_add_to_fsid: BUG_ON `ret != sectorsize` triggered, value 1 /usr/bin/mkfs.btrfs(btrfs_add_to_fsid+0x274)[0xaaab4fe8a5fc] /usr/bin/mkfs.btrfs(main+0x1188)[0xaaab4fe4dc8c] /usr/lib/libc.so.6(__libc_start_main+0xe8)[0xffff7223c538] /usr/bin/mkfs.btrfs(+0xc558)[0xaaab4fe4c558] [1] 225842 abort (core dumped) /usr/bin/mkfs.btrfs -s 16k -f -mraid0 /dev/vdb2 /dev/vdb3 btrfs_add_to_fsid() now always calls sbwrite() to write BTRFS_SUPER_INFO_SIZE bytes to device, so change condition of the BUG_ON(). Also add comments for sbread() and sbwrite(). Signed-off-by: Su Yue <l@damenly.su> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-12 16:00:14 +02:00
David Sterba	6c53222add	btrfs-progs: delete bogus zero checksum check The check condition (csum_result == 0) does not make sense anymore as it's not the buffer and not the crc32c result as it used to be. The message does not bring any value and looks like it's some debugging aid from the old times (added in 2008 as `bb7055ec21` ("Add some extra debugging around file data checksum failures")). Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-08 00:58:51 +02:00
David Sterba	c19ac510a7	btrfs-progs: move repair.[ch] to common/ Move the file to common as it's used by several parts, while still keeping the name 'repair' although the only thing it does is adding a corrupted extent. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:47 +02:00
David Sterba	b19a603d62	btrfs-progs: remove unnecessary linux/*.h includes Decrease dependency on system headers, remove where they're not needed or became stale after code moved. The path-utils.h encapsulate path operations so include linux/limits.h here, that's where PATH_MAX is defined. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:47 +02:00
David Sterba	aa56bf3a31	btrfs-progs: zoned: replace raw ioctl with a helper for device size Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
David Sterba	c7b5f884e0	btrfs-progs: add prefix to zero_blocks This is a public helper for devices, add the prefix to make it clear. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
David Sterba	2b5d4f2e6f	btrfs-progs: add prefix to discard_blocks This is a helper for devices, make it clear in the function name. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
David Sterba	bc6864967b	btrfs-progs: add prefix to exported queue_param As this is a public helper, add a prefix that makes it clear what is the queue related to. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
David Sterba	38254c4934	btrfs-progs: kerncompat: add const_ilog2 The newly added zoned mode constants can utilize the const ilog2 version. Copy it from kernel include/linux/log2.h. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
Naohiro Aota	8c2dfa6387	btrfs-progs: zoned: wipe temporary superblocks in superblock log zone mkfs.btrfs uses a temporary superblock during the initialization process. The temporary superblock uses BTRFS_MAGIC_TEMPORARY as its magic which is different from a regular superblock. As a result, libblkid, which only supports the usual magic, cannot recognize the volume as btrfs. So, let's wipe the temporary magic before writing out the usual superblock. Technically, we can add the temporary magic to the libblkid's table. But, it will result in recognizing a half-baked filesystem as btrfs, which is not ideal. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
Naohiro Aota	8bbb0c5744	btrfs-progs: zoned: support zero out on zoned block device If we zero out a region in a sequential write required zone, we cannot write to the region until we reset the zone. Thus, we must prohibit zeroing out to a sequential write required zone. zero_dev_clamped() is modified to take the zone information and it calls zero_zone_blocks() if the device is host managed to avoid writing to sequential write required zones. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
Naohiro Aota	58ec593892	btrfs-progs: zoned: support resetting zoned device All zones of zoned block devices should be reset before writing. Support this by introducing PREP_DEVICE_ZONED. btrfs_reset_all_zones() walk all the zones on a device, and reset a zone if it is sequential required zone, or discard the zone range otherwise. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
Naohiro Aota	bfdb3ae237	btrfs-progs: zoned: reset zone of freed block group When freeing a chunk, we can/should reset the underlying device zones for the chunk. Introduce btrfs_reset_chunk_zones() and reset the zones. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	bfd34b7876	btrfs-progs: zoned: redirty clean extent buffers Tree manipulating operations like merging nodes often release once-allocated tree nodes. Btrfs cleans such nodes so that pages in the node are not uselessly written out. On ZONED drives, however, such optimization blocks the following IOs as the cancellation of the write out of the freed blocks breaks the sequential write sequence expected by the device. Check if next dirty extent buffer is continuous to a previously written one. If not, it redirty extent buffers between the previous one and the next one, so that all dirty buffers are written sequentially. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	feff533e34	btrfs-progs: zoned: calculate allocation offset for conventional zones Conventional zones do not have a write pointer, so we cannot use it to determine the allocation offset for sequential allocation if a block group contains a conventional zone. But instead, we can consider the end of the highest addressed extent in the block group for the allocation offset. For new block group, we cannot calculate the allocation offset by consulting the extent tree, because it can cause deadlock by taking extent buffer lock after chunk mutex, which is already taken in btrfs_make_block_group(). Since it is a new block group anyways, we can simply set the allocation offset to 0. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	50ae9f62c7	btrfs-progs: zoned: implement sequential extent allocation Implement a sequential extent allocator for zoned filesystems. This allocator only needs to check if there is enough space in the block group after the allocation pointer to satisfy the extent allocation request. Since the allocator is really simple, we implement it directly in find_search_start(). Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	f08410f078	btrfs-progs: zoned: load zone's allocation offset A zoned filesystem must allocate blocks at the zones' write pointer. The device's write pointer position can be mapped to a logical address within a block group. To facilitate this, add an "alloc_offset" to the block group to track the logical addresses of the write pointer. This logical address is populated in btrfs_load_block_group_zone_info() from the write pointers of corresponding zones. For now, zoned filesystems the single profile. Supporting non-single profile with zone append writing is not trivial. For example, in the DUP profile, we send a zone append writing IO to two zones on a device. The device reply with written LBAs for the IOs. If the offsets of the returned addresses from the beginning of the zone are different, then it results in different logical addresses. We need fine-grained logical to physical mapping to support such separated physical address issue. Since it should require additional metadata type, disable non-single profiles for now. This commit supports the case all the zones in a block group are sequential. The next patch will handle the case having a conventional zone. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	b031fe84fd	btrfs-progs: zoned: implement zoned chunk allocator Implement a zoned chunk and device extent allocator. One device zone becomes a device extent so that a zone reset affects only this device extent and does not change the state of blocks in the neighbor device extents. To implement the allocator, we need to extend the following functions for a zoned filesystem: - init_alloc_chunk_ctl - dev_extent_search_start - dev_extent_hole_check - decide_stripe_size Here, dev_extent_hole_check() is newly introduced to check the validity of a hole found. init_alloc_chunk_ctl_zoned() is mostly the same as regular one. It always set the stripe_size to the zone size and aligns the parameters to the zone size. dev_extent_search_start() only aligns the start offset to zone boundaries. We don't care about the first 1MB like in regular filesystem because we anyway reserve the first two zones for superblock logging. dev_extent_hole_check_zoned() checks if zones in given hole are either conventional or empty sequential zones. Also, it skips zones reserved for superblock logging. With the change to the hole, the new hole may now contain pending extents. So, in this case, loop again to check that. Finally, decide_stripe_size_zoned() should shrink the number of devices instead of stripe size because we need to honor stripe_size == zone_size. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00

1 2 3

105 Commits