btrfs-progs: fix an error path which can lead to empty device list

[BUG]
With the incoming delayed chunk item insertion feature, there is a super
weird failure at mkfs/022:

  ====== RUN CHECK ./mkfs.btrfs -f --rootdir tmp.KnKpP5 -d dup -b 350M tests/test.img
  ...
  Checksum:           crc32c
  Number of devices:  0
  Devices:
     ID        SIZE  PATH

Note the "Number of devices: 0" line, this means our
fs_info->fs_devices->devices list is empty.

And since our rw device list is empty, we won't finish the mkfs with
proper superblock magic, and cause later btrfs check to fail.

[CAUSE]
Although the failure is only triggered by the incoming delayed chunk
item insertion feature, the bug itself is here for a while.

In btrfs_alloc_chunk(), we move rw devices to our @private_devs list
first, then in create_chunk(), we move it back to our rw devices list.

This dance is pretty dangerous, especially if btrfs_alloc_dev_extent()
failed inside create_chunk(), and current profile have multiple stripes
(including DUP), we will exit create_chunk() directly, without moving the
remaining devices in @private_devs list back to @dev_list.

Furthermore, btrfs_alloc_chunk() is expected to return -ENOSPC, as we
call btrfs_alloc_chunk() to pre-allocate chunks, and ignore the -ENOSPC
error if it's just a pre-allocation failure.

This existing error path can lead to the empty rw list seen above.

[FIX]
After create_chunk(), unconditionally move all devices in @private_devs
back to rw device list.

And add extra check to make sure our rw device list is never empty after
a chunk allocation attempt.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This commit is contained in:
Qu Wenruo 2022-04-19 19:17:42 +08:00 committed by David Sterba
parent 4a940ab2c0
commit f9659c7235
1 changed files with 15 additions and 0 deletions

View File

@ -1558,6 +1558,21 @@ again:
}
ret = create_chunk(trans, info, &ctl, &private_devs);
/*
* This can happen if above create_chunk() failed, we need to move all
* devices back to dev_list.
*/
while (!list_empty(&private_devs)) {
device = list_entry(private_devs.next, struct btrfs_device,
dev_list);
list_move(&device->dev_list, dev_list);
}
/*
* All private devs moved back to @dev_list, now dev_list should not be
* empty.
*/
ASSERT(!list_empty(dev_list));
*start = ctl.start;
*num_bytes = ctl.num_bytes;