btrfs-progs

Commit Graph

Author	SHA1	Message	Date
Johannes Thumshirn	a7ae6d5948	btrfs-progs: zoned: add upper and lower zone size boundaries As we're not supporting arbitrarily big or small zone sizes in the kernel, reject devices that don't fit in progs as well. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-06-06 15:47:50 +02:00
Naohiro Aota	32c43d0c68	btrfs-progs: zoned: export sb_zone_number() and related constants Move sb_zone_number() and related constants from zoned.c to the corresponding header for later use. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-04-08 23:17:35 +02:00
Johannes Thumshirn	f77da2b173	btrfs-progs: zoned support DUP on metadata block groups Support using BTRFS_BLOCK_GROUP_DUP on metadata (and system) block groups. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-02-01 18:41:53 +01:00
Johannes Thumshirn	88895a920f	btrfs-progs: use profile_supported in mkfs as well Currently we have two places checking if a block-group profile is supported on a zoned device, one in mkfs/main.c and one in kernel-shared/zoned.c. Use the one from kernel-shared/zoned.c in mkfs as well, unifying all checks. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-02-01 18:41:51 +01:00
Josef Bacik	db2ab47823	btrfs-progs: stop accessing ->extent_root directly When we switch to multiple global trees we'll need to access the appropriate extent root depending on the block group or possibly root. To handle this, use a helper in most places and then the actual root in places where it is required. We will whittle down the direct accessors with future patches, but this does the bulk of the preparatory work. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-11-30 18:56:54 +01:00
David Sterba	447bf2fb37	btrfs-progs: zoned: factor out supported profiles to a helper The enumeration could get out of date, like fixed in previous commit. Create a helper that will hide the implementation details. Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:23 +02:00
Naohiro Aota	ae0dfb246d	btrfs-progs: introduce btrfs_pread wrapper for pread Wrap pread with btrfs_pread as well. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:23 +02:00
Naohiro Aota	c821e5545f	btrfs-progs: introduce btrfs_pwrite wrapper for pwrite Wrap pwrite with btrfs_pwrite(). It simply calls pwrite() on non-zoned btrfs (opened without O_DIRECT). On zoned mode (opened with O_DIRECT), it allocates an aligned bounce buffer, copies the contents and uses it for direct-IO writing. Writes in device_zero_blocks() and btrfs_wipe_existing_sb() are a little tricky. We don't have fs_info on our hands, so use zinfo to determine it is a zoned device or not. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-20 18:59:23 +02:00
Naohiro Aota	585ac14d1a	btrfs-progs: use btrfs_device_size() instead of device_get_partition_size_fd() device_get_partition_size_fd() fails if we pass a regular file. This can happen when trying to create an emulated zoned filesystem on a regular file. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-08 20:46:35 +02:00
David Sterba	d0ea2b2af4	btrfs-progs: zoned: also exclude raid1c3 and raid1c4 from supported profiles The enumeration of profiles not available for zoned mode in btrfs_load_block_group_zone_info was lacking the 3 and 4 copy raid1, add them. Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-07 18:39:58 +02:00
Johannes Thumshirn	c22e9487a7	btrfs-progs: remove max_zone_append_size logic max_zone_append_size is unused and can as well be removed just like we did on the kernel side. Keep one sanity check though, so we're not adding devices to a zoned FS that aren't supporting zone append. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-06 16:49:07 +02:00
Naohiro Aota	53ec59ead0	btrfs-progs: do not zone reset on emulated zoned mode We cannot zone reset a regular file with emulated zones. So, mkfs.btrfs on such a file causes the following error. ERROR: zoned: failed to reset device '/home/naota/tmp/btrfs.img' zones: Inappropriate ioctl for device Introduce btrfs_zoned_device_info->emulated to distinguish the zones are emulated or not. And, use it to decide it needs zone reset or not. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-06 16:48:56 +02:00
David Sterba	96a5cf0719	btrfs-progs: handle EINVAL when reading zone size on older kernels A combination of new progs and old kernel may lead to problems with detecting zone size by ioctl. Fixed by #376 but still incomplete because old kernels may return EINVAL for unsupported ioctl. This should be ENOTTY but hasn't been like that until kernel 5.11. As we always pass valid arguments to the ioctl we can't conflate the two and can EINVAL the same way as ENOTTY. Issue: #399 Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-20 11:31:09 +02:00
David Sterba	c3ee6a8a09	btrfs-progs: unify GPL header comments Add the GPL v2 header to files where it was missing and is not from an external source, update to the most recent version with the address. Signed-off-by: David Sterba <dsterba@suse.com>	2021-09-07 13:58:44 +02:00
Sidong Yang	94f3b75c00	btrfs-progs: zoned: fix memory leak in btrfs_sb_io() In btrfs_sb_io(), blk_zone_report is used for getting information about zones. But it is not freed if code goes in usual path. This patch frees the variable just after it used. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Sidong Yang <realwakka@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-07-02 17:27:53 +02:00
David Sterba	1dc6f33c28	btrfs-progs: zoned: use fixed width type when reading zone size The ioctl BLKGETZONESZ expects 32bit integer, declare the target variable as such. Signed-off-by: David Sterba <dsterba@suse.com>	2021-07-02 17:27:53 +02:00
David Sterba	6134973527	btrfs-progs: zoned: make it work without kernel support There's a report that a system with 4.19 kernel fails boot because device scan exits with error. This is because zoned support is compiled in btrfs-progs but not in kernel. To make new progs and old kernels work, do a fallback when the zoned ioctl is not available, as if it were a non-zoned device. There is no other option, but this is safe at least for the device scan that would not error out. Any unaligned writes to a zoned device will fail as expected. Issue: #376 Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-07 17:38:46 +02:00
David Sterba	b19a603d62	btrfs-progs: remove unnecessary linux/*.h includes Decrease dependency on system headers, remove where they're not needed or became stale after code moved. The path-utils.h encapsulate path operations so include linux/limits.h here, that's where PATH_MAX is defined. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:47 +02:00
David Sterba	aa56bf3a31	btrfs-progs: zoned: replace raw ioctl with a helper for device size Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
David Sterba	c7b5f884e0	btrfs-progs: add prefix to zero_blocks This is a public helper for devices, add the prefix to make it clear. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
David Sterba	2b5d4f2e6f	btrfs-progs: add prefix to discard_blocks This is a helper for devices, make it clear in the function name. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
David Sterba	bc6864967b	btrfs-progs: add prefix to exported queue_param As this is a public helper, add a prefix that makes it clear what is the queue related to. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
David Sterba	38254c4934	btrfs-progs: kerncompat: add const_ilog2 The newly added zoned mode constants can utilize the const ilog2 version. Copy it from kernel include/linux/log2.h. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
Naohiro Aota	8c2dfa6387	btrfs-progs: zoned: wipe temporary superblocks in superblock log zone mkfs.btrfs uses a temporary superblock during the initialization process. The temporary superblock uses BTRFS_MAGIC_TEMPORARY as its magic which is different from a regular superblock. As a result, libblkid, which only supports the usual magic, cannot recognize the volume as btrfs. So, let's wipe the temporary magic before writing out the usual superblock. Technically, we can add the temporary magic to the libblkid's table. But, it will result in recognizing a half-baked filesystem as btrfs, which is not ideal. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
Naohiro Aota	8bbb0c5744	btrfs-progs: zoned: support zero out on zoned block device If we zero out a region in a sequential write required zone, we cannot write to the region until we reset the zone. Thus, we must prohibit zeroing out to a sequential write required zone. zero_dev_clamped() is modified to take the zone information and it calls zero_zone_blocks() if the device is host managed to avoid writing to sequential write required zones. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
Naohiro Aota	58ec593892	btrfs-progs: zoned: support resetting zoned device All zones of zoned block devices should be reset before writing. Support this by introducing PREP_DEVICE_ZONED. btrfs_reset_all_zones() walk all the zones on a device, and reset a zone if it is sequential required zone, or discard the zone range otherwise. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
Naohiro Aota	bfdb3ae237	btrfs-progs: zoned: reset zone of freed block group When freeing a chunk, we can/should reset the underlying device zones for the chunk. Introduce btrfs_reset_chunk_zones() and reset the zones. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	bfd34b7876	btrfs-progs: zoned: redirty clean extent buffers Tree manipulating operations like merging nodes often release once-allocated tree nodes. Btrfs cleans such nodes so that pages in the node are not uselessly written out. On ZONED drives, however, such optimization blocks the following IOs as the cancellation of the write out of the freed blocks breaks the sequential write sequence expected by the device. Check if next dirty extent buffer is continuous to a previously written one. If not, it redirty extent buffers between the previous one and the next one, so that all dirty buffers are written sequentially. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	feff533e34	btrfs-progs: zoned: calculate allocation offset for conventional zones Conventional zones do not have a write pointer, so we cannot use it to determine the allocation offset for sequential allocation if a block group contains a conventional zone. But instead, we can consider the end of the highest addressed extent in the block group for the allocation offset. For new block group, we cannot calculate the allocation offset by consulting the extent tree, because it can cause deadlock by taking extent buffer lock after chunk mutex, which is already taken in btrfs_make_block_group(). Since it is a new block group anyways, we can simply set the allocation offset to 0. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	f08410f078	btrfs-progs: zoned: load zone's allocation offset A zoned filesystem must allocate blocks at the zones' write pointer. The device's write pointer position can be mapped to a logical address within a block group. To facilitate this, add an "alloc_offset" to the block group to track the logical addresses of the write pointer. This logical address is populated in btrfs_load_block_group_zone_info() from the write pointers of corresponding zones. For now, zoned filesystems the single profile. Supporting non-single profile with zone append writing is not trivial. For example, in the DUP profile, we send a zone append writing IO to two zones on a device. The device reply with written LBAs for the IOs. If the offsets of the returned addresses from the beginning of the zone are different, then it results in different logical addresses. We need fine-grained logical to physical mapping to support such separated physical address issue. Since it should require additional metadata type, disable non-single profiles for now. This commit supports the case all the zones in a block group are sequential. The next patch will handle the case having a conventional zone. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	b031fe84fd	btrfs-progs: zoned: implement zoned chunk allocator Implement a zoned chunk and device extent allocator. One device zone becomes a device extent so that a zone reset affects only this device extent and does not change the state of blocks in the neighbor device extents. To implement the allocator, we need to extend the following functions for a zoned filesystem: - init_alloc_chunk_ctl - dev_extent_search_start - dev_extent_hole_check - decide_stripe_size Here, dev_extent_hole_check() is newly introduced to check the validity of a hole found. init_alloc_chunk_ctl_zoned() is mostly the same as regular one. It always set the stripe_size to the zone size and aligns the parameters to the zone size. dev_extent_search_start() only aligns the start offset to zone boundaries. We don't care about the first 1MB like in regular filesystem because we anyway reserve the first two zones for superblock logging. dev_extent_hole_check_zoned() checks if zones in given hole are either conventional or empty sequential zones. Also, it skips zones reserved for superblock logging. With the change to the hole, the new hole may now contain pending extents. So, in this case, loop again to check that. Finally, decide_stripe_size_zoned() should shrink the number of devices instead of stripe size because we need to honor stripe_size == zone_size. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	8ef9313cf2	btrfs-progs: zoned: implement log-structured superblock Superblock (and its copies) is the only data structure in btrfs which has a fixed location on a device. Since we cannot overwrite in a sequential write required zone, we cannot place superblock in the zone. One easy solution is limiting superblock and copies to be placed only in conventional zones. However, this method has two downsides: one is reduced number of superblock copies. The location of the second copy of superblock is 256GB, which is in a sequential write required zone on typical devices in the market today. So, the number of superblock and copies is limited to be two. Second downside is that we cannot support devices which have no conventional zones at all. To solve these two problems, we employ superblock log writing. It uses two adjacent zones as a circular buffer to write updated superblocks. Once the first zone is filled up, start writing into the second one. Then, when both zones are filled up and before starting to write to the first zone again, reset the first zone. We can determine the position of the latest superblock by reading write pointer information from a device. One corner case is when both zones are full. For this situation, we read out the last superblock of each zone, and compare them to determine which zone is older. The following zones are reserved as the circular buffer on ZONED btrfs. - primary superblock: offset 0B (and the following zone) - first copy: offset 512G (and the following zone) - Second copy: offset 4T (4096G, and the following zone) If these reserved zones are conventional, superblock is written fixed at the start of the zone without logging. Currently, superblock reading/writing is done by pread/pwrite. This commit replace the call sites with sbread/sbwrite to wrap the functions. For zoned btrfs, btrfs_sb_io which is called from sbread/sbwrite reverses the IO position back to a mirror number, maps the mirror number into the superblock logging position, and do the IO. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	49d5ce4d0f	btrfs-progs: zoned: allow zoned filesystems on non-zoned block devices Run a zoned filesystem on non-zoned devices. This is done by "slicing up" the block device into fixed-sized chunks and emulate a conventional zone on each of them. The emulated zone size is determined from the size of device extent. This is mainly aimed at testing of zoned filesystems, i.e. the zoned chunk allocator, on regular block devices. Currently, we always use EMULATED_ZONE_SIZE (256MiB) for the emulated zone size. In the future, this will be customized by mkfs option. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	707f0716e0	btrfs-progs: zoned: disallow mixed-bg in ZONED mode Placing both data and metadata in a block group is impossible in ZONED mode. For data, we can allocate a space for it and write it immediately after the allocation. For metadata, however, we cannot do that, because the logical addresses are recorded in other metadata buffers to build up the trees. As a result, a data buffer can be placed after a metadata buffer, which is not written yet. Writing out the data buffer will break the sequential write rule. Check and disallow MIXED_BG with ZONED mode. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	3c0f83e541	btrfs-progs: zoned: introduce max_zone_append_size The zone append write command has a maximum IO size restriction it accepts. This is because a zone append write command cannot be split, as we ask the device to place the data into a specific target zone and the device responds with the actual written location of the data. Introduce max_zone_append_size to zone_info and fs_info to track the value, so we can limit all I/O to a zoned block device that we want to write using the zone append command to the device's limits. Zone append command is mandatory for zoned btrfs. So, reject a device with max_zone_append_size == 0. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	7e520022ff	btrfs-progs: zoned: check and enable ZONED mode Introduce function btrfs_check_zoned_mode() to check if ZONED flag is enabled on the file system and if the file system consists of zoned devices with equal zone size. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	384840b9c0	btrfs-progs: zoned: get zone information of zoned block devices Get the zone information (number of zones and zone size) from all the devices, if the volume contains a zoned block device. To avoid costly run-time zone report commands to test the device zones type during block allocation, it also records all the zone status (zone type, write pointer position, etc.). Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00

37 Commits