btrfs-progs/mkfs
Qu Wenruo c6464d3f99 btrfs-progs: mkfs: rework how we traverse rootdir
[PITFALLS]
There are several hidden pitfalls of the existing traverse_directory():

- Hand written preorder traversal
  There is already a better written standard library function, nftw()
  doing exactly what we need.

- Over-designed path list
  To properly handle the directory change, we have structure
  directory_name_entry, to record every inode until rootdir.

  But it has two string members, dir_name and path, which is a little
  confusing and overkilled.
  As for preorder traversal, we will never need to read the parent's
  filename, just its btrfs inode number.

  And it's exported while no one utilizes it out of mkfs/rootdir.c.

- Weird inode numbers
  We use the inode number from st->st_ino, with an extra offset.
  This by itself is not safe, if the rootdir has child directories in
  another filesystem.

  And this results very weird inode numbers, e.g:

	item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
	item 6 key (88347519 INODE_ITEM 0) itemoff 15815 itemsize 160
	item 16 key (88347520 INODE_ITEM 0) itemoff 15363 itemsize 160
	item 20 key (88347521 INODE_ITEM 0) itemoff 15119 itemsize 160
	item 24 key (88347522 INODE_ITEM 0) itemoff 14875 itemsize 160
	item 26 key (88347523 INODE_ITEM 0) itemoff 14700 itemsize 160
	item 28 key (88347524 INODE_ITEM 0) itemoff 14525 itemsize 160
	item 30 key (88347557 INODE_ITEM 0) itemoff 14350 itemsize 160
	item 32 key (88347566 INODE_ITEM 0) itemoff 14175 itemsize 160

  Which is far from a regular fs created by copying the data.

- Weird directory inode size calculation
  Unlike kernel, which updated the directory inode size every time new
  child inodes are added, we calculate the directory inode size by
  searching all its children first, then later new inodes linked to this
  directory won't touch the inode size.

- Bad hard link detection and cross mount point handling
  The hard link detection is purely based on the st_ino returned from
  the host filesystem, this means we do not have extra checks whether
  the inode is even inside the same fs.

  And we directly reuse st_nlink from the host filesystem, if there
  is a hard link out of rootdir, the st_nlink will be incorrect and
  cause a corrupted fs.

Enhance all these points by:

- Use nftw() to do the preorder traversal
  It also provides the extra level detection, which is pretty handy.

- Use a simple local inode_entry to record each parent
  The only value is a u64 to record the inode number.
  And one simple rootdir_path structure to record the list of
  inode_entry, alone with the current level.

  This rootdir_path structure along with two helpers,
  rootdir_path_push() and rootdir_path_pop(), along with the
  preorder traversal provided by nftw(), are enough for us to record
  all the parent directories until the rootdir.

- Grab new inode number properly
  Just call btrfs_get_free_objectid() to grab a proper inode number,
  other than using some weird calculated value.

- Treat every inode as a new one
  This means we will have no hard link support for now.

  But I still believe it's a good trade-off, especially considering the
  old handling is buggy for several corner cases.

- Use btrfs_insert_inode() and btrfs_add_link() to update directory
  inode automatically

With all the refactoring, the code is shorter and easier to read.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
2024-08-14 23:58:27 +02:00
..
common.c btrfs-progs: introduce btrfs_make_subvolume() 2024-07-30 19:54:04 +02:00
common.h btrfs-progs: introduce btrfs_make_subvolume() 2024-07-30 19:54:04 +02:00
main.c btrfs-progs: mkfs: simplify mkfs_main() cleanup 2024-07-30 20:04:38 +02:00
Makefile btrfs-progs: build: add stub makefile to image and mkfs 2019-07-04 15:36:01 +02:00
rootdir.c btrfs-progs: mkfs: rework how we traverse rootdir 2024-08-14 23:58:27 +02:00
rootdir.h btrfs-progs: mkfs: rework how we traverse rootdir 2024-08-14 23:58:27 +02:00