Even with "mkfs.btrfs -b", mkfs.btrfs resets all the zones on the device.
Limit the reset target within the specified length.
Also, we need to check that there is no active zone outside of the FS
range. Having an active zone outside FS reduces the number of zones btrfs
can write simultaneously. Technically, we can still scan all the device
zones and keep active zones outside FS intact and try to live with the
limited active zones. But, that will make btrfs operations harder.
It is generally bad idea to use "-b" on a non-test usage on a device with
active zone limit in the first place. You really need to take care that FS
and outside the FS goes over the limit. That means you'll never be able to
use zones outside the FS anyway.
So, until there is a strong request for that, I don't think it's worthwhile
to do so.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
block_count and dev_block_count are counting the size in bytes. And,
comparing them with e.g, "min_dev_size" is confusing. Rename them to
represent the unit better.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There's a report:
ERROR: Failed to send/receive subvolume: .../testbackup.20240330T1102 -> .../testbackup.20240330T1102
ERROR: ... Command execution failed (exitcode=1)
ERROR: ... sh: btrfs send '.../testbackup.20240330T1102' | ssh user@host.lan 'sudo -n btrfs receive '\''...'\'''
ERROR: ... invalid tlv in cmd tlv_type = 816
This is send/receive between arm64 and armv5el hosts, with btrfs-progs
6.2.1. Last known working version is 5.16. This looked like another
custom protocol extension by NAS vendors but this was a false trace and
this is indeed a bug in stream parsing after changes to the v2 protocol.
The most likely explanation is that the armv5 host requires strict
alignment for reads (32bit type must be 4 byte aligned) but the way the
raw data buffer is mapped to the cmd structure in read_cmd() does not
guarantee that.
Issue: #770
Fixes: aa1ca3789e ("btrfs-progs: receive: support v2 send stream DATA tlv format")
Signed-off-by: David Sterba <dsterba@suse.com>
Use the objectid, type, offset natural order as it's more readable and
we're used to read keys like that.
Signed-off-by: David Sterba <dsterba@suse.com>
This is a followup to 884a609a77 ("btrfs-progs: add basename
wrappers for unified semantics"). Test cli/019-subvolume-create-parents
fails as there are paths with trailing slashes.
The GNU semantics does not change the argument of basename(3) but this
is problematic with trailing slashes. This is not uncommon and could
potentially break things.
To minimize impact of the basename behaviour depending on the include of
libgen.h use the single wrapper in path utils that has to include libgen
anyway for dirname. Our code passes writable buffers to basename.
Issue: #778
Signed-off-by: David Sterba <dsterba@suse.com>
What basename(3) does with the argument depends on _GNU_SOURCE and
inclusion of libgen.h. This is problematic on Musl (1.2.5) as reported.
We want the GNU semantics that does not modify the argument. Common way
to make it portable is to add own helper. This is now implemented in
path_basename() that does not use the libc provided basename but preserves
the semantics. The path_dirname() is just for parity, otherwise same as
dirname().
Sources:
- https://bugs.gentoo.org/926288
- https://git.musl-libc.org/cgit/musl/commit/?id=725e17ed6dff4d0cd22487bb64470881e86a92e7
Issue: #778
Signed-off-by: David Sterba <dsterba@suse.com>
Reported by 'gcc -fanalyzer':
common/utils.c:1203:9: warning: use of uninitialized value ‘data’ [CWE-457] [-Wanalyzer-use-of-uninitialized-value]
There are several return parameters passed to
btrfs_get_string_for_multiple_profiles(), in case it fails early no
values are assigned so the free() would be called on some stack
initialization value. Initialize all the pointers.
Signed-off-by: David Sterba <dsterba@suse.com>
Reported by 'gcc -fanalyzer':
common/format-output.c:168:1: warning: missing call to ‘va_end’ [-Wanalyzer-va-list-leak]
There's a temporary va_list used infmt_set_unquoted() but va_copy() must
be paired with va_end(), which is missing.
Signed-off-by: David Sterba <dsterba@suse.com>
Reported by 'gcc -fanalyzer':
common/path-utils.c:401:16: warning: use of possibly-NULL ‘curr_dir’ where non-null expected [CWE-690] [-Wanalyzer-possible-null-argument]
There's an unhandled strdup() call in path_is_in_dir() so tmp could be
potentially NULL and passed down in the function. This is in the path
utilities so we assume the buffer is a path and can use the safe copy.
Signed-off-by: David Sterba <dsterba@suse.com>
Reported by 'gcc -fanalyzer:
common/string-table.c:62:17: warning: leak of ‘msg’ [CWE-401] [-Wanalyzer-malloc-leak]
The 'msg' still allocated when returning from the function due to error,
free it.
Signed-off-by: David Sterba <dsterba@suse.com>
Reported by 'gcc -fanalyzer':
common/device-scan.c:222:20: warning: dereference of NULL ‘device’ [CWE-476] [-Wanalyzer-null-dereference]
If the allocation of device fails then we can't free device->zone_info
at the out label. To fix that return immediately as it's at the
beginning of the function.
Signed-off-by: David Sterba <dsterba@suse.com>
Add template for read/write error messages and use it for write of
superblock when adding a device. sbwrite() is wrapper around write that
makes sure the zoned devices are accessed correctly.
Signed-off-by: David Sterba <dsterba@suse.com>
Use a local copy of the search header for proper aligned access instead
of the unaligned helpers, move the definitions to the closest scope.
Signed-off-by: David Sterba <dsterba@suse.com>
Use tree search ioctl wrappers for code that is considered internal, ie.
leaving out libbtrfs (legacy), libbtrfsutil (needs own API for that).
Conversion is mostly direct of what the API provides.
Signed-off-by: David Sterba <dsterba@suse.com>
For unclear reasons using the v2 ioctl leads to an infinite loop in
'btrfs fi usage' in load_chunk_info() when there's only one valid item
returned and then it keeps looping. Can be reproduced by mkfs-tests/001.
After debugging, from second item in the buffer there's all zeros, while
it's returned nr_items=4. Switching the same code to use v1 makes it
work again. It's puzzling as it's the same code in kernel.
We want to make the switch eventually so only disable the detection so
other code can use the new API.
Signed-off-by: David Sterba <dsterba@suse.com>
Add wrappers around v1 and v2 of TREE_SEARCH ioctl so it can be
transparently used by code. The structures partially overlap but due to
the buffer size the v2 is offset and also needs a filler to expand the
flexible buffer.
Usage:
- define struct btrfs_tree_search_args, all zeros
- btrfs_tree_search_sk() reads offset of the search key within the
structures
- btrfs_tree_search_ioctl() detect support and call the highest
supported ioctl version, v2 has been supported since 3.14 but we want
to keep backward compatibility
- btrfs_tree_search_data() read data from the buffer previously filled
by ioctl, a sequence of (search header, data)
Signed-off-by: David Sterba <dsterba@suse.com>
The buffer size check is needed and has already caught problems when
adding the raid-stripe-tree, do a better error reporting.
Signed-off-by: David Sterba <dsterba@suse.com>
Add new error message template and use it to report invalid range
overlaps and do proper error handling.
Signed-off-by: David Sterba <dsterba@suse.com>
Use a more descriptive name, the interface is generic so it should use
the generic term for file/directory.
Signed-off-by: David Sterba <dsterba@suse.com>
There are many places that pass false as verbosity argument and then
print an error message, or don't print any message in error cases.
Use btrfs_open_file_or_dir_fd() that will be verbose in case of an error
with the same semantics.
Signed-off-by: David Sterba <dsterba@suse.com>
It's commonly used elsewhere in the code to return the -errno values if
possible, do that for the open helpers too.
Signed-off-by: David Sterba <dsterba@suse.com>
For historical reasons the helpers [btrfs_]open_dir... return also
the 'DIR *dirstream' value when a directory is opened.
However this is never used. So avoid calling diropen() and return
only the fd.
Replace open_file_or_dir() with btrfs_open_fd2() removing any reference
to the unused/useless dirstream variables. btrfs_open_fd2() is required
to avoid spurious error messages.
Signed-off-by: Goffredo Baroncelli <kreijack@libero.it>
Signed-off-by: David Sterba <dsterba@suse.com>
For historical reasons the helpers [btrfs_]open_dir... return also
the 'DIR *dirstream' value when a directory is opened.
However this is never used. So avoid calling diropen() and return
only the fd.
Replace the last btrfs_open_dir() call with btrfs_open_dir_fd()
removing any reference to the unused/useless dirstream variables.
Also update the add_seen_fsid() function removing any reference to dir
stream (again this is never used).
Signed-off-by: Goffredo Baroncelli <kreijack@libero.it>
Signed-off-by: David Sterba <dsterba@suse.com>
For historical reasons the helpers [btrfs_]open_dir... return also
the 'DIR *dirstream' value when a directory is opened.
However this is never used. So avoid calling diropen() and return only
the fd. This is a preparatory patch.
Signed-off-by: Goffredo Baroncelli <kreijack@libero.it>
Signed-off-by: David Sterba <dsterba@suse.com>
There's a report that 'btrfs balance start --enqueue' does not properly
wait when there are multiple instances started. The command does a busy
wait instead of timeouts.
Strace output:
0.000006 pselect6(5, NULL, NULL, [4], {tv_sec=60, tv_nsec=0}, NULL) = 1 (except [4], left {tv_sec=59, tv_nsec=999999716})
0.000008 pselect6(5, NULL, NULL, [4], {tv_sec=29, tv_nsec=999999000}, NULL) = 1 (except [4], left {tv_sec=29, tv_nsec=999998786})
After the first select there's almost the entire time left, the second
one starts right after it.
Polling/selecting sysfs files is possible under some conditions:
- the file descriptor must be reopened before each poll/select
- the whole buffer must be read too
With that in place it now works as expected. The remaining timeout logic
is slightly adjusted to wait at most 10 seconds so the pending jobs do
not wait too long if there's still a lot of time left from the first
select.
Issue: #746
Signed-off-by: David Sterba <dsterba@suse.com>
Be verbose about the potential compatibility problems with the
sectorsize and page size. Also print the page size on the overview.
Signed-off-by: David Sterba <dsterba@suse.com>
The symbol BTRFS_UPDATE_KERNEL seems to be unused since 2f55fd7019
("btrfs-progs: optimize btrfs_scan_lblkid() for multiple calls"), remove
it.
Signed-off-by: David Sterba <dsterba@suse.com>
The raid-stripe-tree can be enabled for convert, though it's still
considered incomplete and slightly experimental. Due to that the tests
need to be adjusted to check for support and skip mount eventually.
Possible remaining options to add: quota, squota
Issue: #694
Signed-off-by: David Sterba <dsterba@suse.com>
This patch introduces a new parser helper, parse_u64_with_suffix(),
which has a better error handling, following all the parse_*()
helpers to return non-zero value for errors.
This new helper is going to replace parse_size_from_string(), which
would directly call exit(1) to stop the whole program.
Furthermore most callers of parse_size_from_string() are expecting
exit(1) for error, so that they can skip the error handling.
For those call sites, introduce a wrapper, arg_strtou64_with_suffix(),
to do that. The only disadvantage is a little less detailed error
report for why the parse failed, but for most cases the generic error
string should be enough.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Both functions are just doing the same thing, the only difference is
only in error handling, as parse_u64() requires callers to handle it,
meanwhile arg_strtou64() would call exit(1).
This patch would convert arg_strtou64() to utilize parse_u64(), and use
the return value to output different error messages.
This also means the return value of parse_u64() would be more than just
0 or 1, but -EINVAL for invalid string (including no numeric string at
all, has any tailing characters, or minus value), and -ERANGE for
overflow.
The existing callers are only checking if the return value is 0, thus
not really affected.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a report about failed btrfs-convert, which shows the following
error:
Create btrfs metadata
corrupt leaf: root=5 block=5001931145216 slot=1 ino=89911763, invalid previous key objectid, have 89911762 expect 89911763
leaf 5001931145216 items 336 free space 7 generation 90 owner FS_TREE
leaf 5001931145216 flags 0x1(WRITTEN) backref revision 1
fs uuid 8b69f018-37c3-4b30-b859-42ccfcbe2449
chunk uuid 448ce78c-ea41-49f6-99dc-46ad80b93da9
item 0 key (89911762 INODE_REF 3858733) itemoff 16222 itemsize 61
index 171 namelen 51 name: [FILENAME1]
item 1 key (89911763 INODE_REF 3858733) itemoff 16161 itemsize 61
index 103 namelen 51 name: [FILENAME2]
[CAUSE]
When iterating a directory, btrfs-convert would insert the DIR_ITEMs,
along with the INODE_REF of that inode.
This leads to above stray INODE_REFs, and trigger the tree-checker.
This can only happen for large fs, as for most cases we have all these
modified tree blocks cached, thus tree-checker won't be triggered.
But when the tree block cache is not hit, and we have to read from disk,
then such behavior can lead to above tree-checker error.
[FIX]
Insert a dummy INODE_ITEM for the INODE_REF first, the inode items would
be updated when iterating the child inode of the directory.
Issue: #731
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Recently we had a scrub use-after-free caused by unaligned chunk
length, although the fix was submitted, we may want to do extra checks
for a chunk's alignment.
This patch adds such check for the starting bytenr and length of a
chunk, to make sure they are properly aligned to 64K stripe boundary.
By default, the check only leads to a warning but is not treated as an
error, as we expect kernel to handle such unalignment without any
problem.
But if the new debug environmental variable,
BTRFS_PROGS_DEBUG_STRICT_CHUNK_ALIGNMENT, is specified, then we will
treat it as an error. So that we can detect unexpected chunks from
btrfs-progs, and fix them before reaching the end users.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
To be consistent with the rest of the code the sysfs helper should
return the -errno instead of passing -1 from various syscalls. Update
callers that relied on -1 as the invalid file descriptor.
Signed-off-by: David Sterba <dsterba@suse.com>
The enqueue option should let the user know that the expected operation
hasn't started yet and that it's waiting for another one. Although the
exclusive operations can take long, the two reason should be
distinguished.
Signed-off-by: David Sterba <dsterba@suse.com>
strtoull may return the boundary values, if the callers could expect
that and verify it then the errno must be reset before the call.
Signed-off-by: David Sterba <dsterba@suse.com>