btrfs-progs

Commit Graph

Author	SHA1	Message	Date
Naohiro Aota	8816a65fec	btrfs-progs: zoned: check SB zone existence properly Currently, write_dev_supers() compares the superblock location vs the size of the device to check if it can write the superblock. This is not correct for a zoned device, whose superblock location is different than a regular device. Introduce check_sb_location() to check if the superblock zone exists for the zoned case. Running btrfs check can fail on a certain zoned device setup (e.g, zone size = 128MB, device size = 16GB). From generic/330: yes \| btrfs check --repair --force /dev/nullb1 [1/7] checking root items Fixed 0 roots. [2/7] checking extents ERROR: zoned: failed to read zone info of 4096 and 4097: Invalid argument ERROR: failed to write super block for devid 1: write error: Input/output error failed to write new super block err -5 failed to repair damaged filesystem, aborting This happens because write_dev_supers() is comparing the original superblock location vs the device size to check if it can write out a superblock copy or not. For the above example, since the first copy location (64MB) < device size (16GB), it tries to write out the copy. But, the copy must be written into zone 4096 (512G / zone size (128M) = 4096), which is out of the device. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-10-21 15:51:06 +02:00
David Sterba	21aa6777b2	btrfs-progs: clean up includes, using include-what-you-use Signed-off-by: David Sterba <dsterba@suse.com>	2023-10-03 01:11:57 +02:00
Johannes Thumshirn	b4ab282686	btrfs-progs: allow zoned RAID Allow for RAID levels 0, 1 and 10 on zoned devices if the RAID stripe tree is used. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-10-02 18:41:08 +02:00
Josef Bacik	228aa34f10	btrfs-progs: sync messages.[ch] from the kernel These are the printk helpers from the kernel. There were a few modifications, the hi-lights are - We do not have fs_info::fs_state, so that needed to be removed. - We do not have discard.h sync'ed yet, so that dependency was dropped. - Anything related to struct super_block was commented out. - The transaction abort had to be modified to fit with the current btrfs-progs code. - Added a btrfs_no_printk() helper to common/messages.* so that the print statements still worked. - The 32bit limit checkers are not needed so are behind __KERNEL__ Additionally there were kerncompat.h changes that needed to be made to handle the dependencies properly. Those are easier to spot. Any function that needed to be modified has a MODIFIED tag in the comment section with a list of things that were changed. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-05-26 18:02:28 +02:00
psykose	c9abbf6264	btrfs-progs: stop using legacy 64 interfaces The 64 interfaces, such as fstat64, off64_t, etc, are legacy interfaces created at a time when 64-bit file support was still new. They are generally exposed when defining a macro named _LARGEFILE64_SOURCE, as e.g. the glibc docs[0] say. The modern way to utilise largefile support, is to continue to use the regular interfaces (off_t, fstat, ..), and define _FILE_OFFSET_BITS=64. We already use the autoconf macro AC_SYS_LARGEFILE[1] which arranges this and sets this macro for us. Therefore, we can utilise the non-64 names without fear of breaking on 32-bit systems. This fixes the build against musl libc, ever since musl dropped the 64 compat from interfaces by default[2] just for _GNU_SOURCE, unless _LARGEFILE64_SOURCE is defined. However, there are plans for a future removal of the whole 64 header API, and that workaround (adding another define) might cease to exist. So, rename all 64 API use to the regular non-suffixed names. For consistency, rename the internal functions that were 64 named (lstat64_path, ..) too. This should have no regressions on any platform. [0]: https://www.gnu.org/software/libc/manual/html_node/Feature-Test-Macros.html#index-_005fLARGEFILE64_005fSOURCE [1]: https://www.gnu.org/software/autoconf/manual/autoconf-2.67/html_node/System-Services.html [2]: `25e6fee27f` Pull-request: #615 Signed-off-by: psykose <alice@ayaya.dev> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-25 16:59:42 +02:00
Naohiro Aota	32c43d0c68	btrfs-progs: zoned: export sb_zone_number() and related constants Move sb_zone_number() and related constants from zoned.c to the corresponding header for later use. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-04-08 23:17:35 +02:00
David Sterba	471ca4a580	btrfs-progs: build: add stub definition for non-zoned build In commit `88895a920f` ("btrfs-progs: use profile_supported in mkfs as well") there's a wrapper but not available on non-zoned builds. Add it. Issue: #445 Signed-off-by: David Sterba <dsterba@suse.com>	2022-02-16 22:48:01 +01:00
Johannes Thumshirn	89191f8c12	btrfs-progs: pass in block-group type to zoned_profile_supported Pass BTRFS_BLOCK_GROUP_DATA and BTRFS_BLOCK_GROUP_METADATA to zoned_profile_supported(), so we can actually distinguish if it is a data or a meta-data block group. Fixes: 8f914d518a46 ("btrfs-progs: zoned support DUP on metadata block groups") Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-02-16 22:48:01 +01:00
Johannes Thumshirn	88895a920f	btrfs-progs: use profile_supported in mkfs as well Currently we have two places checking if a block-group profile is supported on a zoned device, one in mkfs/main.c and one in kernel-shared/zoned.c. Use the one from kernel-shared/zoned.c in mkfs as well, unifying all checks. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-02-01 18:41:51 +01:00
Johannes Thumshirn	c22e9487a7	btrfs-progs: remove max_zone_append_size logic max_zone_append_size is unused and can as well be removed just like we did on the kernel side. Keep one sanity check though, so we're not adding devices to a zoned FS that aren't supporting zone append. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-06 16:49:07 +02:00
Naohiro Aota	53ec59ead0	btrfs-progs: do not zone reset on emulated zoned mode We cannot zone reset a regular file with emulated zones. So, mkfs.btrfs on such a file causes the following error. ERROR: zoned: failed to reset device '/home/naota/tmp/btrfs.img' zones: Inappropriate ioctl for device Introduce btrfs_zoned_device_info->emulated to distinguish the zones are emulated or not. And, use it to decide it needs zone reset or not. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-06 16:48:56 +02:00
David Sterba	c3ee6a8a09	btrfs-progs: unify GPL header comments Add the GPL v2 header to files where it was missing and is not from an external source, update to the most recent version with the address. Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-07 13:58:44 +02:00
Su Yue	80a86f1b47	btrfs-progs: do not BUG_ON if btrfs_add_to_fsid succeeded to write superblock Commit `8ef9313cf2` ("btrfs-progs: zoned: implement log-structured superblock") changed to write BTRFS_SUPER_INFO_SIZE bytes to device. The before num of bytes to be written is sectorsize. It causes mkfs.btrfs failed on my 16k pagesize kvm: $ /usr/bin/mkfs.btrfs -s 16k -f -mraid0 /dev/vdb2 /dev/vdb3 btrfs-progs v5.12 See http://btrfs.wiki.kernel.org for more information. ERROR: superblock magic doesn't match ERROR: superblock magic doesn't match common/device-scan.c:195: btrfs_add_to_fsid: BUG_ON `ret != sectorsize` triggered, value 1 /usr/bin/mkfs.btrfs(btrfs_add_to_fsid+0x274)[0xaaab4fe8a5fc] /usr/bin/mkfs.btrfs(main+0x1188)[0xaaab4fe4dc8c] /usr/lib/libc.so.6(__libc_start_main+0xe8)[0xffff7223c538] /usr/bin/mkfs.btrfs(+0xc558)[0xaaab4fe4c558] [1] 225842 abort (core dumped) /usr/bin/mkfs.btrfs -s 16k -f -mraid0 /dev/vdb2 /dev/vdb3 btrfs_add_to_fsid() now always calls sbwrite() to write BTRFS_SUPER_INFO_SIZE bytes to device, so change condition of the BUG_ON(). Also add comments for sbread() and sbwrite(). Signed-off-by: Su Yue <l@damenly.su> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-12 16:00:14 +02:00
Naohiro Aota	8c2dfa6387	btrfs-progs: zoned: wipe temporary superblocks in superblock log zone mkfs.btrfs uses a temporary superblock during the initialization process. The temporary superblock uses BTRFS_MAGIC_TEMPORARY as its magic which is different from a regular superblock. As a result, libblkid, which only supports the usual magic, cannot recognize the volume as btrfs. So, let's wipe the temporary magic before writing out the usual superblock. Technically, we can add the temporary magic to the libblkid's table. But, it will result in recognizing a half-baked filesystem as btrfs, which is not ideal. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
Naohiro Aota	8bbb0c5744	btrfs-progs: zoned: support zero out on zoned block device If we zero out a region in a sequential write required zone, we cannot write to the region until we reset the zone. Thus, we must prohibit zeroing out to a sequential write required zone. zero_dev_clamped() is modified to take the zone information and it calls zero_zone_blocks() if the device is host managed to avoid writing to sequential write required zones. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
Naohiro Aota	58ec593892	btrfs-progs: zoned: support resetting zoned device All zones of zoned block devices should be reset before writing. Support this by introducing PREP_DEVICE_ZONED. btrfs_reset_all_zones() walk all the zones on a device, and reset a zone if it is sequential required zone, or discard the zone range otherwise. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
Naohiro Aota	bfdb3ae237	btrfs-progs: zoned: reset zone of freed block group When freeing a chunk, we can/should reset the underlying device zones for the chunk. Introduce btrfs_reset_chunk_zones() and reset the zones. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	bfd34b7876	btrfs-progs: zoned: redirty clean extent buffers Tree manipulating operations like merging nodes often release once-allocated tree nodes. Btrfs cleans such nodes so that pages in the node are not uselessly written out. On ZONED drives, however, such optimization blocks the following IOs as the cancellation of the write out of the freed blocks breaks the sequential write sequence expected by the device. Check if next dirty extent buffer is continuous to a previously written one. If not, it redirty extent buffers between the previous one and the next one, so that all dirty buffers are written sequentially. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	f08410f078	btrfs-progs: zoned: load zone's allocation offset A zoned filesystem must allocate blocks at the zones' write pointer. The device's write pointer position can be mapped to a logical address within a block group. To facilitate this, add an "alloc_offset" to the block group to track the logical addresses of the write pointer. This logical address is populated in btrfs_load_block_group_zone_info() from the write pointers of corresponding zones. For now, zoned filesystems the single profile. Supporting non-single profile with zone append writing is not trivial. For example, in the DUP profile, we send a zone append writing IO to two zones on a device. The device reply with written LBAs for the IOs. If the offsets of the returned addresses from the beginning of the zone are different, then it results in different logical addresses. We need fine-grained logical to physical mapping to support such separated physical address issue. Since it should require additional metadata type, disable non-single profiles for now. This commit supports the case all the zones in a block group are sequential. The next patch will handle the case having a conventional zone. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	b031fe84fd	btrfs-progs: zoned: implement zoned chunk allocator Implement a zoned chunk and device extent allocator. One device zone becomes a device extent so that a zone reset affects only this device extent and does not change the state of blocks in the neighbor device extents. To implement the allocator, we need to extend the following functions for a zoned filesystem: - init_alloc_chunk_ctl - dev_extent_search_start - dev_extent_hole_check - decide_stripe_size Here, dev_extent_hole_check() is newly introduced to check the validity of a hole found. init_alloc_chunk_ctl_zoned() is mostly the same as regular one. It always set the stripe_size to the zone size and aligns the parameters to the zone size. dev_extent_search_start() only aligns the start offset to zone boundaries. We don't care about the first 1MB like in regular filesystem because we anyway reserve the first two zones for superblock logging. dev_extent_hole_check_zoned() checks if zones in given hole are either conventional or empty sequential zones. Also, it skips zones reserved for superblock logging. With the change to the hole, the new hole may now contain pending extents. So, in this case, loop again to check that. Finally, decide_stripe_size_zoned() should shrink the number of devices instead of stripe size because we need to honor stripe_size == zone_size. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	8ef9313cf2	btrfs-progs: zoned: implement log-structured superblock Superblock (and its copies) is the only data structure in btrfs which has a fixed location on a device. Since we cannot overwrite in a sequential write required zone, we cannot place superblock in the zone. One easy solution is limiting superblock and copies to be placed only in conventional zones. However, this method has two downsides: one is reduced number of superblock copies. The location of the second copy of superblock is 256GB, which is in a sequential write required zone on typical devices in the market today. So, the number of superblock and copies is limited to be two. Second downside is that we cannot support devices which have no conventional zones at all. To solve these two problems, we employ superblock log writing. It uses two adjacent zones as a circular buffer to write updated superblocks. Once the first zone is filled up, start writing into the second one. Then, when both zones are filled up and before starting to write to the first zone again, reset the first zone. We can determine the position of the latest superblock by reading write pointer information from a device. One corner case is when both zones are full. For this situation, we read out the last superblock of each zone, and compare them to determine which zone is older. The following zones are reserved as the circular buffer on ZONED btrfs. - primary superblock: offset 0B (and the following zone) - first copy: offset 512G (and the following zone) - Second copy: offset 4T (4096G, and the following zone) If these reserved zones are conventional, superblock is written fixed at the start of the zone without logging. Currently, superblock reading/writing is done by pread/pwrite. This commit replace the call sites with sbread/sbwrite to wrap the functions. For zoned btrfs, btrfs_sb_io which is called from sbread/sbwrite reverses the IO position back to a mirror number, maps the mirror number into the superblock logging position, and do the IO. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	3c0f83e541	btrfs-progs: zoned: introduce max_zone_append_size The zone append write command has a maximum IO size restriction it accepts. This is because a zone append write command cannot be split, as we ask the device to place the data into a specific target zone and the device responds with the actual written location of the data. Introduce max_zone_append_size to zone_info and fs_info to track the value, so we can limit all I/O to a zoned block device that we want to write using the zone append command to the device's limits. Zone append command is mandatory for zoned btrfs. So, reject a device with max_zone_append_size == 0. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	7e520022ff	btrfs-progs: zoned: check and enable ZONED mode Introduce function btrfs_check_zoned_mode() to check if ZONED flag is enabled on the file system and if the file system consists of zoned devices with equal zone size. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	384840b9c0	btrfs-progs: zoned: get zone information of zoned block devices Get the zone information (number of zones and zone size) from all the devices, if the volume contains a zoned block device. To avoid costly run-time zone report commands to test the device zones type during block allocation, it also records all the zone status (zone type, write pointer position, etc.). Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00

24 Commits