Commit Graph

421 Commits

Author SHA1 Message Date
David Sterba bc2317381d btrfs-progs: kernel-shared: sync tree-checker.c
Sync from kernel 6.12 queue:

- dir type range
- DEV_EXTENT item checks

Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-17 16:47:41 +02:00
Qu Wenruo bc0995297f btrfs-progs: convert: fix inline extent size for symlink
[BUG]
Sometimes test case btrfs/012 fails randomly, with the failure to read a
symlink:

     QA output created by 012
     Checking converted btrfs against the original one:
    -OK
    +readlink: Structure needs cleaning
     Checking saved ext2 image against the original one:
     OK

Furthermore, this will trigger a kernel error message:

 BTRFS critical (device dm-2): regular/prealloc extent found for non-regular inode 133081

[CAUSE]
For that specific inode 133081, the tree dump looks like this:

        item 127 key (133081 INODE_ITEM 0) itemoff 40984 itemsize 160
                generation 1 transid 1 size 4095 nbytes 4096
                block group 0 mode 120777 links 1 uid 0 gid 0 rdev 0
                sequence 0 flags 0x0(none)
        item 128 key (133081 INODE_REF 133080) itemoff 40972 itemsize 12
                index 2 namelen 2 name: l3
        item 129 key (133081 EXTENT_DATA 0) itemoff 40919 itemsize 53
                generation 4 type 1 (regular)
                extent data disk byte 2147483648 nr 38080512
                extent data offset 37974016 nr 4096 ram 38080512
                extent compression 0 (none)

Note that, the symlink inode size is 4095 at the max size (PATH_MAX,
removing the terminating NUL).
But the nbytes is 4096, exactly matching the sector size of the btrfs.

Thus it results the creation of a regular extent, but for btrfs we do
not accept a symlink with a regular/preallocated extent, thus kernel
rejects such read and failed the readlink call.

The root cause is in the convert code, where for symlinks we always
create a data extent with its size + 1, causing the above problem.

I guess the original code is to handle the terminating NUL, but in btrfs
we never need to store the terminating NUL for inline extents nor
file names.

Thus this pitfall in btrfs-convert leads to the above invalid data
extent and fail the test case.

[FIX]
- Fix the ext2 and reiserfs symbolic link creation code
  To remove the terminating NUL.

- Add extra checks for the size of a symbolic link
  Btrfs has extra limits on the size of a symbolic link, as btrfs must
  store symbolic link targets as inlined extents.

  This means for 4K node sized btrfs, the size limit is smaller than the
  usual PATH_MAX - 1 (only around 4000 bytes instead of 4095).

  So for certain nodesize, some filesystems can not be converted to
  btrfs.
  (this should be rare, because the default nodesize is 16K already)

- Split the symbolic link and inline data extent size checks
  For symbolic links the real limit is PATH_MAX - 1 (removing the
  terminating NUL), but for inline data extents the limit is
  sectorsize - 1, which can be different from 4096 - 1 (e.g. 64K sector
  size).

Pull-request: #884
Signed-off-by: Qu Wenruo <wqu@suse.com>
2024-09-17 14:33:22 +02:00
David Sterba a5b7e414da btrfs-progs: kernel-shared: update const of parameters accessors.h
Sync up with kernel and fix warnings reported by -Wcast-qual. eg.
Most of the change is due to extent_buffer::data, which is a direct
struct member, unlike in kernel where it's an array of pages. The
const qualifier cannot be used the same way so it's dropped in affected
herlpers.

Signed-off-by: David Sterba <dsterba@suse.com>
2024-08-14 23:59:36 +02:00
Qu Wenruo afae10ddb6 btrfs-progs: constify the name parameter of btrfs_add_link()
The name is never touched, thus it should be const.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
2024-08-14 23:58:24 +02:00
Qu Wenruo 99dc37bcfe btrfs-progs: cross-port btrfs_uuid_tree_add() from kernel
The modification is minimal:

- Replace WARN_ON() with UASSERT()

- Remove the @trans parameter for btrfs_extend_item() and
  btrfs_mark_buffer_dirty()
  As progs version doesn't need a transaction handler.

- Remove the btrfs_uuid_tree_add() in mkfs/main.c

Signed-off-by: Qu Wenruo <wqu@suse.com>
2024-07-30 20:02:42 +02:00
Qu Wenruo 8efa8092aa btrfs-progs: move uuid-tree definitions to kernel-shared/uuid-tree.h
Currently we already have a kernel-shared/uuid-tree.c, which is mostly
shared with kernel.

Kernel also has a uuid-tree.h, but we are still using ctree.h for the
header.

Move all the uuid-tree related definitions to kernel-shared/uuid-tree.h,
making future code sync easier.

Signed-off-by: Qu Wenruo <wqu@suse.com>
2024-07-30 20:01:59 +02:00
Mark Harmstone 32ab0e6328 btrfs-progs: set transid in btrfs_insert_dir_item
btrfs_insert_dir_item wasn't setting the transid field in
btrfs_dir_item. Set it to the current transaction ID rather than writing
uninitialized memory to disk.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Mark Harmstone <maharmstone@fb.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
2024-07-30 20:01:38 +02:00
Yaroslav Halchenko 16a7cbca91 btrfs-progs: run codespell throughout fixing typos automagically
Spell checking can now run in automated mode.

=== Do not change lines below ===
{
 "chain": [],
 "cmd": "codespell -w",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^

Author: Yaroslav Halchenko <debian@onerussian.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-30 19:56:08 +02:00
Qu Wenruo 9ad15a7301 btrfs-progs: use btrfs_link_subvolume() to replace btrfs_mksubvol()
The function btrfs_mksubvol() is very different between btrfs-progs and
kernel, the former version is really just linking a subvolume to another
directory inode, but the kernel version is really to make a completely
new subvolume.

Instead of same-named function, introduce btrfs_link_subvolume() and use
it to replace the old btrfs_mksubvol().

This is done by:

- Introduce btrfs_link_subvolume()
  Which does extra checks before doing any modification:
  * Make sure the target inode is a directory
  * Make sure no filename conflict

  Then do the linkage:
  * Add the dir_item/dir_index into the parent inode
  * Add the forward and backward root refs into tree root

- Introduce link_image_subvolume() helper
  Currently btrfs_mksubvol() has a dedicated convert filename retry
  behavior, which is unnecessary and should be done by the convert code.

  Now move the filename retry behavior into the helper.

- Remove btrfs_mksubvol()
  Since there is only one caller utilizing btrfs_mksubvol(), and it's
  now gone, we can remove the old btrfs_mksubvol().

Signed-off-by: Qu Wenruo <wqu@suse.com>
2024-07-30 19:54:50 +02:00
Qu Wenruo 9b74d80919 btrfs-progs: remove fs_info parameter from btrfs_create_tree()
The @fs_info parameter can be easily extracted from @trans, and kernel
has already remove the parameter.

Signed-off-by: Qu Wenruo <wqu@suse.com>
2024-07-30 19:54:00 +02:00
David Sterba ef73193623 btrfs-progs: dump-tree: escape special characters in paths or xattrs
Filenames can contain a newline (or other funny characters), this makes
the dump-tree output confusing, same for xattr names or values that can
binary data.  Encode the special characters in the C-style ('\e' ->
"\e", or \NNN if there's no single letter representation). This is based
on the isprint() as it's espected either on a terminal or in a dump
file.

Issue: #350
Issue: #407
Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-30 19:53:33 +02:00
Johannes Thumshirn 7c549b5f7c btrfs-progs: remove raid stripe encoding
Remove the not needed encoding and reserved fields in struct
raid_stripe_extent.

This saves 8 bytes per stripe extent.

Note: this is a format change and previously created filesystems with
raid-stripe-tree will not be accessible. Similar patch is needed in
kernel.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-24 19:40:18 +02:00
David Sterba 4db925911c btrfs-progs: use strncpy_null everywhere
Use the safe version of strncpy that makes sure the string is
terminated.

To be noted:

- the conversion in scrub path handling was skipped
- sizes of device paths in some ioctl related structures is
  BTRFS_DEVICE_PATH_NAME_MAX + 1

Recently gcc 13.3 started to detect problems with our use of strncpy
potentially lacking the null terminator, warnings like:

cmds/inspect.c: In function ‘cmd_inspect_logical_resolve’:
cmds/inspect.c:294:33: warning: ‘__builtin_strncpy’ specified bound 4096 equals destination size [-Wstringop-truncation]
  294 |                                 strncpy(mount_path, mounted, PATH_MAX);
      |                                 ^

Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-24 19:18:48 +02:00
Qu Wenruo 5dc737c42c btrfs-progs: print-tree: handle all supported flags
Although we already have a pretty good array defined for all
super/compat_ro/incompat flags, we still rely on a manually defined mask
to do the printing.

This can lead to easy de-sync between the definition and the flags.

Change it to automatically iterate through the array to calculate the
flags, and add the remaining super flags.

Pull-request: #810
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-24 19:17:53 +02:00
Qu Wenruo 2f8a6ee294 btrfs-progs: fix the conflicting super block flags
[BUG]
There is a bug report that a canceled checksum conversion (still
experimental feature) resulted in unexpected super flags:

csum_type		0 (crc32c)
csum_size		4
csum			0x14973811 [match]
bytenr			65536
flags			0x1000000001
			( WRITTEN |
			  CHANGING_FSID_V2 )
magic			_BHRfS_M [match]

While for a filesystem under checksum conversion it should have either
CHANGING_DATA_CSUM or CHANGING_META_CSUM.

[CAUSE]
It turns out that, due to btrfs-progs keeps its own extra flags inside
its own ctree.h headers, not the shared uapi headers, we have
conflicting super flags:

kernel-shared/uapi/btrfs_tree.h:#define BTRFS_SUPER_FLAG_METADUMP_V2	(1ULL << 34)
kernel-shared/uapi/btrfs_tree.h:#define BTRFS_SUPER_FLAG_CHANGING_FSID	(1ULL << 35)
kernel-shared/uapi/btrfs_tree.h:#define BTRFS_SUPER_FLAG_CHANGING_FSID_V2 (1ULL << 36)
kernel-shared/ctree.h:#define BTRFS_SUPER_FLAG_CHANGING_DATA_CSUM	(1ULL << 36)
kernel-shared/ctree.h:#define BTRFS_SUPER_FLAG_CHANGING_META_CSUM	(1ULL << 37)

Note that CHANGING_FSID_V2 is conflicting with CHANGING_DATA_CSUM.

[FIX]
Cross port the proper updated uapi headers into btrfs-progs, and remove
the definition from ctree.h.

This would change the value for CHANGING_DATA_CSUM and
CHANGING_META_CSUM, but considering they are experimental features, and
kernel would reject them anyway, the damage is not that huge and we can
accept such change before exposing it to end users.

Pull-request: #810
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-24 19:17:49 +02:00
Qu Wenruo 0eeb12aef5 btrfs-progs: error out immediately if an unknown backref type is found
There is a bug report that for fuzzed image
bko-155621-bad-block-group-offset.raw, "btrfs check --mode=lowmem
--repair" would lead to an endless loop.

Unlike original mode, lowmem mode relies on the backref walk to properly
go through each root, but unfortunately inside __add_inline_refs() we
doesn't handle unknown backref types correctly, causing it never moving
forward thus deadloop.

Fix it by erroring out to prevent an endless loop.

Issue: #788
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-05 19:48:04 +02:00
Qu Wenruo cef75dde63 btrfs-progs: print-tree: do sanity checks for dir items
There is a bug report that with UBSAN enabled, fuzz/006 test case
crashes.

It turns out that the image bko-154021-invalid-drop-level.raw has
invalid dir items, that the name/data len is beyond the item.

And if we try to read beyond the eb boundary, UBSAN got triggered.

Normally in kernel tree-checker would reject such metadata in the first
place, but in btrfs-progs we can not be that strict or we cannot do a
lot of repair.

So here just enhance print_dir_item() to do extra sanity checks for
data/name len before reading the contents.

Issue: #805
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-05 19:46:41 +02:00
Qu Wenruo 6ad89f67a9 btrfs-progs: print-tree: add support for dev-replace item
This is inspired by a recent bug that csum change doesn't detect
finished dev-replace.

At the time of that csum change patch, there is no print-tree to
show the content of btrfs_dev_replace_item thus contributes to the bug.

Add the new output for btrfs_dev_replace_item, and the example looks
like this:

	item 1 key (0 DEV_REPLACE 0) itemoff 16171 itemsize 72
		src devid -1 cursor left 1179648000 cursor right 1179648000 mode ALWAYS
		state FINISHED write errors 0 uncorrectable read errors 0
		start time 1717282771 (2024-06-02 08:29:31)
		stop time 1717282771 (2024-06-02 08:29:31)

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-03 21:41:27 +02:00
Naohiro Aota edd80fbde3 btrfs-progs: support byte length for zone resetting
Even with "mkfs.btrfs -b", mkfs.btrfs resets all the zones on the device.
Limit the reset target within the specified length.

Also, we need to check that there is no active zone outside of the FS
range. Having an active zone outside FS reduces the number of zones btrfs
can write simultaneously. Technically, we can still scan all the device
zones and keep active zones outside FS intact and try to live with the
limited active zones. But, that will make btrfs operations harder.

It is generally bad idea to use "-b" on a non-test usage on a device with
active zone limit in the first place. You really need to take care that FS
and outside the FS goes over the limit. That means you'll never be able to
use zones outside the FS anyway.

So, until there is a strong request for that, I don't think it's worthwhile
to do so.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-03 21:26:39 +02:00
Qu Wenruo cae94956d9 btrfs-progs: dump-tree: support simple quota mode status flags
[BUG]
For simple quota mode btrfs, dump tree does not show the extra flags
correctly:

 # mkfs.btrfs -f -O squota $dev
 # btrfs inspect dump-tree -t quota $dev | grep QGROUP_STATUS -A1
	item 0 key (0 QGROUP_STATUS 0) itemoff 16243 itemsize 40
		version 1 generation 10 flags ON scan 0 enable_gen 7

Note just ON is shown, but squota has one extra bit set for it.

[CAUSE]
Just no support for the new flag.

[FIX]
Add the new flag support, also to be consistent with other flags string
output, add output for extra unknown flags.

With a hand crafted image, the output with unknown flags looks like
this:
	item 0 key (0 QGROUP_STATUS 0) itemoff 16243 itemsize 40
		version 1 generation 10 flags ON|SIMPLE_MODE|UNKNOWN(0xf00) scan 0 enable_gen 7

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-10 15:20:16 +02:00
David Sterba 7f396f5ced btrfs-progs: reorder key initializations
Use the objectid, type, offset natural order as it's more readable and
we're used to read keys like that.

Signed-off-by: David Sterba <dsterba@suse.com>
2024-04-30 21:49:15 +02:00
David Sterba 12bdb72d50 btrfs-progs: print-tree: fix deref before check in btrfs_print_tree()
Reported by 'gcc -fanalyzer':
kernel-shared/print-tree.c:1745:12: warning: check of ‘eb’ for NULL after already dereferencing it [-Wanalyzer-deref-before-check]

The fs_info is initialized before we check 'eb' but we always get a
valid one so no need to validate it.

Signed-off-by: David Sterba <dsterba@suse.com>
2024-04-18 19:16:15 +02:00
David Sterba 844caf8639 btrfs-progs: fix double free on error in read_raid56()
Reported by 'gcc -fanalyzer':
kernel-shared/extent_io.c: In function ‘read_raid56’:
./include/kerncompat.h:393:18: warning: dereference of NULL ‘pointers’ [CWE-476] [-Wanalyzer-null-dereference]

After allocation of the pointers array fails it's dereferenced in the
exit block. We can return immediately instead.

Signed-off-by: David Sterba <dsterba@suse.com>
2024-04-18 19:16:15 +02:00
Boris Burkov 682f676eb3 btrfs-progs: enable send v3 correctly (use EXPERIMENTAL instead of CONFIG_BTRFS_DEBUG)
The send v3 protocol is enabled in kernel by a different config option
than in btrfs-progs to actually work. Now v3 can be tested when
configured and built with --enable-experimental.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-18 23:19:52 +01:00
David Sterba 2edd439617 btrfs-progs: use unsigned types for bit shifts
Bit shifts should be done on unsigned type as a matter of good practice
to avoid any problems with bit overflowing to the sign bit.

Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-12 22:05:09 +01:00
David Sterba 1c551e22cf btrfs-progs: make all parameters of rb_tree search/insert const
Tree comparators never change parameters, make them all const and also
change the rb-tree prototypes.

Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-12 21:43:54 +01:00
David Sterba a80d717db2 btrfs-progs: minor source sync with kernel 6.8
Sync a few more file on the source level with kernel 6.8.

- type cleanups
- defines and enums
- comments
- parameter updates
- error handling

Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-12 21:22:56 +01:00
David Sterba bec6bc8eee btrfs-progs: minor source sync with kernel 6.8-rc3
Sync a few more file on the source level with kernel 6.8-rc3, no
functional changes.

Signed-off-by: David Sterba <dsterba@suse.com>
2024-02-08 09:30:16 +01:00
Qu Wenruo e54514aaea btrfs-progs: fix stray fd close in open_ctree_fs_info()
[BUG]
Although commit b2a1be83b8 ("btrfs-progs: mkfs: keep file descriptors
open during whole time") is making sure we're only closing the writeable
fds after the fs is properly created, there is still a missing fd not
following the requirement.

And this explains the issue why sometimes after mkfs.btrfs, lsblk still
doesn't give a valid uuid.

Shown by the strace output (the command is "mkfs.btrfs -f
/dev/test/scratch1"):

  openat(AT_FDCWD, "/dev/test/scratch1", O_RDWR) = 5 <<< Writeable open
  fadvise64(5, 0, 0, POSIX_FADV_DONTNEED) = 0
  sysinfo({uptime=2529, loads=[8704, 6272, 2496], totalram=4104548352, freeram=3376611328, sharedram=9211904, bufferram=43016192, totalswap=3221221376, freeswap=3221221376, procs=190, totalhigh=0, freehigh=0, mem_unit=1}) = 0
  lseek(5, 0, SEEK_END)                   = 10737418240
  lseek(5, 0, SEEK_SET)                   = 0
  ......
  close(5)                                = 0 <<< Closed now
  pwrite64(6, "O\250\22\261\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1163264) = 16384
  pwrite64(6, "\201\316\272\342\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1179648) = 16384
  pwrite64(6, "K}S\t\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1196032) = 16384
  pwrite64(6, "\207j$\265\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1212416) = 16384
  pwrite64(6, "q\267;\336\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 5242880) = 16384
  fsync(6) <<< But we're still writing into the disk.

[CAUSE]
After more digging, it turns out we have a very obvious escape in
open_ctree_fs_info():

	open_ctree_fs_info()
	|- fp = open(oca->filename, flags);
	|- info = __open_ctree_fd();
	|- close(fp);

As later we only do IO using the device fd, this close() seems fine.

But the truth is, for mkfs usage, this fs_info is a temporary one, with
a special magic number for the disk.  And since mkfs is doing writeable
operations, this close() would immediately trigger udev scan.

And since at this stage, the fs is not yet fully created, udev can race
with mkfs, and may get the invalid temporary superblock.

[FIX]
Introduce a new btrfs_fs_info member, initial_fd, for
open_ctree_fs_info() to record the fd.

And on close_ctree(), if we find fs_info::initial_fd is a valid fd, then
close it.

By this, we make sure all writeable fds are only closed after we have
written valid super blocks into the disk.

Issue: #734
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-02-08 08:30:37 +01:00
Qu Wenruo 389c959d6d btrfs-progs: implement arg_strtou64_with_suffix() with a new helper
This patch introduces a new parser helper, parse_u64_with_suffix(),
which has a better error handling, following all the parse_*()
helpers to return non-zero value for errors.

This new helper is going to replace parse_size_from_string(), which
would directly call exit(1) to stop the whole program.

Furthermore most callers of parse_size_from_string() are expecting
exit(1) for error, so that they can skip the error handling.

For those call sites, introduce a wrapper, arg_strtou64_with_suffix(),
to do that.  The only disadvantage is a little less detailed error
report for why the parse failed, but for most cases the generic error
string should be enough.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-01-18 02:14:23 +01:00
Qu Wenruo 94ace90508 btrfs-progs: tree-checker: dump the tree block when hitting an error
Unlike kernel where tree-checker would provide enough info so later we
can use "btrfs inspect dump-tree" to catch the offending tree block, in
progs we may not even have a btrfs to start "btrfs inspect dump-tree".
E.g during btrfs-convert.

To make later debuging easier, let's call btrfs_print_tree() for every
error we hit inside tree-checker.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-01-12 16:34:44 +01:00
Boris Burkov d7c6baf82f btrfs-progs: make OWNER_REF_KEY type value smallest among inline refs
Companion patch to progs for the same change in the kernel. Inline refs
are expected to have non-decreasing type value but owner ref violated
this and got away with it via special parsing. Fix the inconsistency
while it is still experimental.

Link: https://lore.kernel.org/linux-btrfs/20231103134547.GA3548732@perftesting/T/#mca2c0e21ecb7a0da616dd09980b9f008c3c00f63
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-11-09 15:24:46 +01:00
Sergei Trofimovich 8a687cf954 kernel-shared: uapi: fix BTRFS_IOC_SCAN_DEV defiintion
Without the change `BTRFS_IOC_SCAN_DEV` aliased with `BTRFS_IOC_FORGET_DEV`.
It's a regression introduced in fcd9142b6 "btrfs-progs: docs: formatting,
fixups, updates".

It manifests as a sudden device disappearance when device is scanned:

    machine # [    4.095032] Btrfs loaded, crc32c=crc32c-intel, zoned=no, fsverity=no
    machine # ERROR: device scan failed on '/dev/vdb': No such file or directory
    machine # ERROR: device scan failed on '/dev/vdc': No such file or directory
    (finished: must succeed: mkfs.btrfs -d raid0 /dev/vdb /dev/vdc, in 10.31 seconds)

Issue: #704
Pull-request: #706
Reported-by: Atemu <atemu.main@gmail.com>
Bug: https://github.com/NixOS/nixpkgs/issues/265668
Author: Sergei Trofimovich <slyich@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-11-05 23:10:28 +01:00
David Sterba fcd9142b67 btrfs-progs: docs: formatting, fixups, updates
- update Status page
- new features in 6.7
- more ioctls
- CSS fix to wrap long lines in tables

[ci skip]

Signed-off-by: David Sterba <dsterba@suse.com>
2023-11-03 18:04:37 +01:00
David Sterba d739e3b73a btrfs-progs: kernel-shared: use kmalloc and kfree
All the code in kernel-shared should use the proper memory allocation
helpers.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-11-03 18:04:37 +01:00
David Sterba b4f43d72ff btrfs-progs: mkfs: support parametric zone size
In experimental build, read global '--param zone-size=SIZE' and use it
as emulated zone size.  This is for testing only, will be promoted to a
proper option in the future.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-11-03 18:04:37 +01:00
Qu Wenruo ad8a831a74 btrfs-progs: dump-tree: output the sequence number for inline references
Commit 6cf11f3e38 ("btrfs-progs: check: check order of inline extent
refs") fixes a problem that btrfs check never properly verify the
sequence of inline references.

It's not obvious because by default kernel handles EXTENT_DATA_REF_KEY
using its own hash, resulting some seemingly out-of-order result:

	item 0 key (13631488 EXTENT_ITEM 4096) itemoff 16143 itemsize 140
		refs 4 gen 7 flags DATA
		extent data backref root FS_TREE objectid 258 offset 0 count 1
		extent data backref root FS_TREE objectid 257 offset 0 count 1
		extent data backref root FS_TREE objectid 260 offset 0 count 1
		extent data backref root FS_TREE objectid 259 offset 0 count 1

By a quick glance, no one can see the above inline backref items are in
any order.

To make such sequence more obvious, let dump-tree to output a new prefix
to indicate the type and the internal sequence number:

For above case, the new output would look like this:

        item 0 key (13631488 EXTENT_ITEM 4096) itemoff 16143 itemsize 140
                refs 4 gen 7 flags DATA
                (178 0xdfb591fbbf5f519) extent data backref root FS_TREE objectid 258 offset 0 count 1
                (178 0xdfb591fa80d95ea) extent data backref root FS_TREE objectid 257 offset 0 count 1
                (178 0xdfb591f9c0534ff) extent data backref root FS_TREE objectid 260 offset 0 count 1
                (178 0xdfb591f49f9f8e7) extent data backref root FS_TREE objectid 259 offset 0 count 1

Although still not that obvious, it should show the inline data backrefs
has descending sequence number.

For the type part, it's anti-instinctive in ascending order, which is
not that easy to produce.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-23 15:52:12 +02:00
Naohiro Aota 8816a65fec btrfs-progs: zoned: check SB zone existence properly
Currently, write_dev_supers() compares the superblock location vs the size
of the device to check if it can write the superblock. This is not correct
for a zoned device, whose superblock location is different than a regular
device.

Introduce check_sb_location() to check if the superblock zone exists for
the zoned case.

Running btrfs check can fail on a certain zoned device setup (e.g,
zone size = 128MB, device size = 16GB).

From generic/330:

  yes | btrfs check --repair --force /dev/nullb1
  [1/7] checking root items
  Fixed 0 roots.
  [2/7] checking extents
  ERROR: zoned: failed to read zone info of 4096 and 4097: Invalid argument
  ERROR: failed to write super block for devid 1: write error: Input/output error
  failed to write new super block err -5
  failed to repair damaged filesystem, aborting

This happens because write_dev_supers() is comparing the original
superblock location vs the device size to check if it can write out a
superblock copy or not.

For the above example, since the first copy location (64MB) < device size
(16GB), it tries to write out the copy. But, the copy must be written into
zone 4096 (512G / zone size (128M) = 4096), which is out of the device.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-21 15:51:06 +02:00
Naohiro Aota 58148d5209 btrfs-progs: zoned: introduce sb_bytenr_to_sb_zone()
Introduce sb_bytenr_to_sb_zone(), which converts the original superblock
location to the zone number of superblock log writing.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-17 19:34:00 +02:00
David Sterba b421fdff95 btrfs-progs: move raid-stripe-tree and squota build out of experimental
The kernel patches for RST and squota are queued for 6.7, we need to be
able to test the features so it's not necessary to hide the mkfs support
under experimental build. The kernel may still need debug build to
enable mount.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-17 19:33:59 +02:00
Qu Wenruo e79f18a4a7 btrfs-progs: introduce a basic metadata free space reservation check
Unlike kernel, in btrfs-progs btrfs_start_transaction() never checks if
there is enough metadata space.

This can lead to very dangerous situation where there is no metadata
space left at all, deadlocking future tree operations.

This patch introduces a very basic version of metadata/system free space
check by:

- Check if there is enough metadata/system space left
  If there is enough, go as usual.

- If there is not enough space left, try allocating a new chunk

- Recheck if the new space can meet our demand
  If not, return ERR_PTR(-ENOSPC).
  Otherwise, allocate a new trans handle to the caller.

This is possible thanks to the simplified transaction model in
btrfs-progs:

- We don't allow joining a transaction
  This means we don't need to handle complex cases like data ordered
  extents, which need to reserve space first, then join the current
  transaction and use the reserved blocks.

- We don't allow multiple transaction handles for one transaction
  Since btrfs-progs is single threaded, we always start a transaction
  and then commit it.

However there is a feature that must be an exception for the new
metadata/system free space check:

- btrfs check --init-extent-tree
  As all the meta/system free space check is based on the space info,
  which is loaded from block group items.
  Thus when rebuilding extent tree, we can no longer have an accurate
  view, thus we have to disable the feature for the whole execution if
  we're rebuilding the extent tree.

For now, there is no regression exposed during the self tests, but I
really hope this can be an extra safety net to prevent causing ENOSPC
deadlock in btrfs-progs.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-17 19:33:59 +02:00
Qu Wenruo 930c6362d1 btrfs-progs: fix all variable shadowing
There are quite some variable shadowing in btrfs-progs, most of them are
just reusing some common names like tmp.
And those are quite safe and the shadowed one are even different type.

But there are some exceptions:

- @end in traverse_tree_blocks()
  There is already an @end with the same type, but a different meaning
  (the end of the current extent buffer passed in).
  Just rename it to @child_end.

- @start in generate_new_data_csums_range()
  Just rename it to @csum_start.

- @size of fixup_chunk_tree_block()
  This one is particularly bad, we declare a local @size and initialize
  it to -1, then before we really utilize the variable @size, we
  immediately reset it to 0, then pass it to logical_to_physical().
  Then there is a location to check if @size is -1, which will always be
  true.

  According to the code in logical_to_physical(), @size would be clamped
  down by its original value, thus our local @size will always be 0.

  This patch would rename the local @size to @found_size, and only set
  it to -1.
  The call site is only to pass something as logical_to_physical()
  requires a non-NULL pointer.
  We don't really need to bother the returned value.

- duplicated @ref declaration in run_delayed_tree_ref()
- duplicated @super_flags in change_meta_csums()
  Just delete the duplicated one.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-10 19:16:29 +02:00
Qu Wenruo c788977878 btrfs-progs: pull in the full max/min/clamp implementation from kernel
The current implementation would introduce variable shadowing due to
both max() and min() are using the same __x and __y.

This may not be a big deal, but since kernel is already handling it
properly using __UNIQUE_ID() macro, and has more checks, we can
cross-port the kernel version to btrfs-progs.

There are some dependency needed, they are all small enough thus can be
put into the helper.

- __PASTE()
- __UNIQUE_ID()
- BUILD_BUG_ON_ZERO()
- __is_constexpr()

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-10 19:16:29 +02:00
Johannes Thumshirn dfc866bfef btrfs-progs: remove stride length from on-disk format
The stride length has been removed from kernel code, remove it here as
well.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-06 17:26:56 +02:00
Johannes Thumshirn 4acadd1d42 btrfs-progs: remove stride length from tree dump
The length has been removed from kernel, remove it here as well.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-06 17:26:54 +02:00
Josef Bacik 2a34fd3892 btrfs-progs: cleanup dirty buffers on transaction abort
When adding the extent buffer leak detection I started getting failures
on some of the fuzz tests.  This is because we don't clean up dirty
buffers for aborted transactions, we just leave them dirty and thus we
leak them.  Fix this up by making btrfs_commit_transaction() on an
aborted transaction properly cleanup the dirty buffers that exist in the
system.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:57 +02:00
David Sterba 21aa6777b2 btrfs-progs: clean up includes, using include-what-you-use
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:57 +02:00
David Sterba d4cf2a3b4c btrfs-progs: kernel-shared: sync delayed-refs.[ch]
Update parts of struct btrfs_delayed_ref_head and updated where used,
add more prototypes. More still needs to be synced.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:57 +02:00
Josef Bacik 47dc6bd8a9 btrfs-progs: update btrfs_split_item to match the in-kernel definition
In the kernel new_key is const, update the definition in btrfs-progs to
match the in-kernel definition.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:57 +02:00
Josef Bacik 692b68110e btrfs-progs: inline btrfs_name_hash and btrfs_extref_hash
This is the opposite of what we do in the kernel, however in the kernel
we put the helpers in dir-item.h and inode-item.h respectively.  Those
do not exist in btrfs-progs right now, so instead of doing all that work
right now simply inline them in ctree.h to make it easier to sync
ctree.c from the kernel.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:57 +02:00