From f9659c72357e81564f497f4900699527002b6920 Mon Sep 17 00:00:00 2001 From: Qu Wenruo Date: Tue, 19 Apr 2022 19:17:42 +0800 Subject: [PATCH] btrfs-progs: fix an error path which can lead to empty device list [BUG] With the incoming delayed chunk item insertion feature, there is a super weird failure at mkfs/022: ====== RUN CHECK ./mkfs.btrfs -f --rootdir tmp.KnKpP5 -d dup -b 350M tests/test.img ... Checksum: crc32c Number of devices: 0 Devices: ID SIZE PATH Note the "Number of devices: 0" line, this means our fs_info->fs_devices->devices list is empty. And since our rw device list is empty, we won't finish the mkfs with proper superblock magic, and cause later btrfs check to fail. [CAUSE] Although the failure is only triggered by the incoming delayed chunk item insertion feature, the bug itself is here for a while. In btrfs_alloc_chunk(), we move rw devices to our @private_devs list first, then in create_chunk(), we move it back to our rw devices list. This dance is pretty dangerous, especially if btrfs_alloc_dev_extent() failed inside create_chunk(), and current profile have multiple stripes (including DUP), we will exit create_chunk() directly, without moving the remaining devices in @private_devs list back to @dev_list. Furthermore, btrfs_alloc_chunk() is expected to return -ENOSPC, as we call btrfs_alloc_chunk() to pre-allocate chunks, and ignore the -ENOSPC error if it's just a pre-allocation failure. This existing error path can lead to the empty rw list seen above. [FIX] After create_chunk(), unconditionally move all devices in @private_devs back to rw device list. And add extra check to make sure our rw device list is never empty after a chunk allocation attempt. Reviewed-by: Josef Bacik Reviewed-by: Johannes Thumshirn Signed-off-by: Qu Wenruo Signed-off-by: David Sterba --- kernel-shared/volumes.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c index 598ac553..ba9a7306 100644 --- a/kernel-shared/volumes.c +++ b/kernel-shared/volumes.c @@ -1558,6 +1558,21 @@ again: } ret = create_chunk(trans, info, &ctl, &private_devs); + + /* + * This can happen if above create_chunk() failed, we need to move all + * devices back to dev_list. + */ + while (!list_empty(&private_devs)) { + device = list_entry(private_devs.next, struct btrfs_device, + dev_list); + list_move(&device->dev_list, dev_list); + } + /* + * All private devs moved back to @dev_list, now dev_list should not be + * empty. + */ + ASSERT(!list_empty(dev_list)); *start = ctl.start; *num_bytes = ctl.num_bytes;