[BUG]
There is a report that, for zoned devices btrfstune is unable to convert
it to block group tree.
# btrfstune /dev/nullb0 --convert-to-block-group-tree
Error reading 1342193664, -1
Error reading 1342193664, -1
ERROR: cannot read chunk root
ERROR: open ctree failed
[CAUSE]
For read-write opened zoned devices, all the read/write has to be
aligned to its sector size.
However btrfs stores its metadata by extent_buffer::data[], which has
all the structures before it, thus never aligned to zoned device sector
size.
Normally we would require btrfs_pread() and btrfs_pwrite() to do the
extra alignment, but during open_ctree(), we are not aware if a device
is zoned or not.
Thus we rely on if the fd is opened with O_DIRECT flag, if the fd has
O_DIRECT, then we would temporarily set fs_info->zoned for chunk tree
read.
Unforunately not all open_ctree_fd() callers have the flags set
properly, and btrfstune is one of the missing call site.
This makes all the read not properly aligned and cause read failure.
[FIX]
Just manually check if the target device is a zoned one, and set
O_DIRECT accordingly.
Issue: #765
Pull-request: #767
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a bug report that mkfs.btrfs can not specify block-group-tree
feature along with zoned devices:
# mkfs.btrfs /dev/nullb0 -O block-group-tree,zoned
btrfs-progs v6.7.1
See https://btrfs.readthedocs.io for more information.
Resetting device zones /dev/nullb0 (40 zones) ...
NOTE: several default settings have changed in version 5.15, please make sure
this does not affect your deployments:
- DUP for metadata (-m dup)
- enabled no-holes (-O no-holes)
- enabled free-space-tree (-R free-space-tree)
ERROR: error during mkfs: Invalid argument
[CAUSE]
During mkfs, we need to write all the 7 or 8 tree blocks into the
metadata zone, and since it's zoned device, we need to fulfill all the
requirement for zoned writes, including:
- All writes must be in sequential bytenr
- Buffer must be aligned to sector size
The sequential bytenr requirement is already met by the mkfs design, but
the second requirement on memory alignment is never met for metadata, as
we put the contents of a leaf in extent_buffer::data[], which is after a
lot of small members.
Thus metadata IO buffer would never be aligned to sector size (normally
4K).
And we require btrfs_pwrite() and btrfs_pread() to handle the memory
alignment for us.
However in create_block_group_tree() we didn't use btrfs_pwrite(), but
plain pwrite() call directly, which would lead to -EINVAL error due to
memory alignment problem.
[FIX]
Just call btrfs_pwrite() instead of the plain pwrite() in
create_block_group_tree().
Issue: #765
Pull-request: #767
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is a missing newline for a successful
--convert-from-block-group-tree run, meanwhile
--convert-to-block-group-tree has the correct newline.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It was reported in https://lore.kernel.org/linux-btrfs/e7ce9995-93cb-4904-875c-684d4494765f@web.de/
that compression does not happen on direct io files. This is incorrectly
documented that it works but this is not true. Compression works on
buffered writes and relies on page cache, while direct io avoids that
and takes a different path in code.
[ci skip]
Pull-request: #764
Author: HAN Yuwei <hrx@bupt.moe>
Signed-off-by: David Sterba <dsterba@suse.com>
Use the templated error message for transaction failures, use the same
pattern assigning the ret and errno.
Signed-off-by: David Sterba <dsterba@suse.com>
The send v3 protocol is enabled in kernel by a different config option
than in btrfs-progs to actually work. Now v3 can be tested when
configured and built with --enable-experimental.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Use the templated error message for transaction start failures, use the
same pattern assigning the ret and errno.
Signed-off-by: David Sterba <dsterba@suse.com>
Turn all BUG_ONs to error handling and push it to the caller. The error
conditions are almost certainly corruptions so we can't do anything
about that.
Signed-off-by: David Sterba <dsterba@suse.com>
The error values of enter_shared_node() are mixing int and bool, unify
that to be 1 == true, 0 == false, <0 errors. Update callers to handle
errors.
Inline the add_shared_node() helper as it's trivial and makes handling
errors easier. As all errors can be now returned, do proper error
handling instead of all remaining BUG_ONs.
Signed-off-by: David Sterba <dsterba@suse.com>
Handle the BUG_ONs inside splice_shared_node() and move them to the
callers. As there's a big loop and external tree cache updated there's
not error cleanup done.
Signed-off-by: David Sterba <dsterba@suse.com>
Free the newly allocated structures when 'mod' is requests and insertion
fails. All exit paths from the function now don't leave anything to
clean up.
Signed-off-by: David Sterba <dsterba@suse.com>
Add template for read/write error messages and use it for write of
superblock when adding a device. sbwrite() is wrapper around write that
makes sure the zoned devices are accessed correctly.
Signed-off-by: David Sterba <dsterba@suse.com>
Use a local copy of the search header for proper aligned access instead
of the unaligned helpers, move the definitions to the closest scope.
Signed-off-by: David Sterba <dsterba@suse.com>
Use tree search ioctl wrappers for code that is considered internal, ie.
leaving out libbtrfs (legacy), libbtrfsutil (needs own API for that).
Conversion is mostly direct of what the API provides.
Signed-off-by: David Sterba <dsterba@suse.com>
For unclear reasons using the v2 ioctl leads to an infinite loop in
'btrfs fi usage' in load_chunk_info() when there's only one valid item
returned and then it keeps looping. Can be reproduced by mkfs-tests/001.
After debugging, from second item in the buffer there's all zeros, while
it's returned nr_items=4. Switching the same code to use v1 makes it
work again. It's puzzling as it's the same code in kernel.
We want to make the switch eventually so only disable the detection so
other code can use the new API.
Signed-off-by: David Sterba <dsterba@suse.com>
Add wrappers around v1 and v2 of TREE_SEARCH ioctl so it can be
transparently used by code. The structures partially overlap but due to
the buffer size the v2 is offset and also needs a filler to expand the
flexible buffer.
Usage:
- define struct btrfs_tree_search_args, all zeros
- btrfs_tree_search_sk() reads offset of the search key within the
structures
- btrfs_tree_search_ioctl() detect support and call the highest
supported ioctl version, v2 has been supported since 3.14 but we want
to keep backward compatibility
- btrfs_tree_search_data() read data from the buffer previously filled
by ioctl, a sequence of (search header, data)
Signed-off-by: David Sterba <dsterba@suse.com>
The buffer size check is needed and has already caught problems when
adding the raid-stripe-tree, do a better error reporting.
Signed-off-by: David Sterba <dsterba@suse.com>
Add new error message template and use it to report invalid range
overlaps and do proper error handling.
Signed-off-by: David Sterba <dsterba@suse.com>
Bit shifts should be done on unsigned type as a matter of good practice
to avoid any problems with bit overflowing to the sign bit.
Signed-off-by: David Sterba <dsterba@suse.com>
Bit shifts should be done on unsigned type as a matter of good practice
to avoid any problems with bit overflowing to the sign bit.
Signed-off-by: David Sterba <dsterba@suse.com>
Bit shifts should be done on unsigned type as a matter of good practice
to avoid any problems with bit overflowing to the sign bit.
Signed-off-by: David Sterba <dsterba@suse.com>
Bit shifts should be done on unsigned type as a matter of good practice
to avoid any problems with bit overflowing to the sign bit.
Signed-off-by: David Sterba <dsterba@suse.com>
Bit shifts should be done on unsigned type as a matter of good practice
to avoid any problems with bit overflowing to the sign bit.
Signed-off-by: David Sterba <dsterba@suse.com>
Sync a few more file on the source level with kernel 6.8.
- type cleanups
- defines and enums
- comments
- parameter updates
- error handling
Signed-off-by: David Sterba <dsterba@suse.com>
The initial version of libbtrfsutil did not follow a unified naming
scheme that's usually used for libraries like those provide by
util-linux. Add aliases that are "btrfs_util_" + object + action +
suffix.
The library version changes to 1.3 but there's no new functionality,
only the aliases added. New functions can be added in the future without
possible confusion when the same action could apply to different
objects.
Issue: #574
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
With the latest kernel patch to reject invalid qgroupids in
btrfs_qgroup_inherit structure, "btrfs subvolume create" or "btrfs
subvolume snapshot" can lead to the following output:
# mkfs.btrfs -O quota -f $dev
# mount $dev $mnt
# btrfs subvolume create -i 2/0 $mnt/subv1
Create subvolume '/mnt/btrfs/subv1'
ERROR: cannot create subvolume: No such file or directory
The "btrfs subvolume" command output the first line, seemingly to
indicate a successful subvolume creation, then followed by an error
message.
This can be a little confusing on whether if the subvolume is created or
not.
[FIX]
Fix the output by only outputting the regular line if the ioctl
succeeded.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Remove btrfs_qgroup_inherit_add_copy() and the command line interface.
This was designed to add a pair of source/destination qgroups into
btrfs_qgroup_inherit structure, so that rfer/excl numbers would be
copied from the source qgroup into the destination one.
This behavior has been intentionally hidden since 2016, as such copy will
cause qgroup inconsistent immediately and a rescan would reset whatever
numbers copied anyway.
Now we're going to reject the copy behavior from kernel, there is no
need to keep those hidden (and already disabled for "subvolume create")
case.
Remove btrfs_qgroup_inherit_add_copy() call, and cleanup the
undocumented options.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Feature from https://github.com/sphinx-doc/sphinx/pull/2064 enable
navigation to be able to navigate documentation using the arrow keys.
Pull-request: #754
Author: Martin <spleefer90@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Use a more descriptive name, the interface is generic so it should use
the generic term for file/directory.
Signed-off-by: David Sterba <dsterba@suse.com>