The in-kernel version of read_tree_block adds some extra sanity checks
to make sure we don't return blocks that don't match what we expect.
This includes the owning root, the level, and the expected first key.
We don't actually do these checks in btrfs-progs, however kernel code
we're going to sync will expect this calling convention, so update it to
match the in-kernel code and then update all the callers.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch syncs file-item.h into btrfs-progs. This carries with it an
API change for btrfs_del_csums, which takes a root argument in the
kernel, so all callsites have been updated accordingly.
I didn't sync file-item.c because it carries with it a bunch of bio
related helpers which are difficult to adapt to the kernel.
Additionally there's a few helpers in the local copy of file-item.c that
aren't in the kernel that are required for different tools.
This requires more cleanups in both the kernel and progs in order to
sync file-item.c, so for now just do file-item.h in order to pull things
out of ctree.h.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This syncs accessors.[ch] from the kernel. For the most part
accessors.h will remain the same, there's just some helpers that need to
be adjusted for eb->data instead of eb->pages. Additionally accessors.c
needed to be completely updated to deal with this as well.
This is a set of files where we will likely only sync the header going
forward, and leave the C file in place as it needs to be specific to
btrfs-progs.
This forced a few "unrelated" changes
- Using btrfs_dir_item_ftype() instead of btrfs_dir_item_type(). This
is due to the encryption changes, and was simpler to just do in this
patch.
- Adjusting some of the print tree code to use the actual helpers and
not the btrfs-progs ones.
A local definition of static_assert is used to avoid compilation
failures on older gcc (< 9) where the 2nd parameter is mandatory.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that the libbtrfs stuff has it's own local copy of ctree.h and
ioctl.h, let's rename these qgroup struct members to match the kernel
names, this way it'll make it easier to sync the kernel code into
btrfs-progs.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[FALSE ALERT]
There is a false alert when compiling btrfs-progs using gcc 12.2.1:
$ make D=1
kernel-shared/print-tree.c: In function 'print_sys_chunk_array':
kernel-shared/print-tree.c:1797:9: warning: 'buf' may be used uninitialized [-Wmaybe-uninitialized]
1797 | write_extent_buffer(buf, sb, 0, sizeof(*sb));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ./kernel-shared/ctree.h:27,
from kernel-shared/print-tree.c:24:
./kernel-shared/extent_io.h:148:6: note: by argument 1 of type 'const struct extent_buffer *' to 'write_extent_buffer' declared here
148 | void write_extent_buffer(const struct extent_buffer *eb, const void *src,
| ^~~~~~~~~~~~~~~~~~~
[CAUSE]
This is a false alert, the uninitialized part of buf will not be
utilized at all during write_extent_buffer().
[FIX]
Instead of allocating such ad-hoc buffer, go a more formal way by
calling alloc_dummy_extent_buffer(), which would properly set all
the members.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The checksum conversion is still experimental and still does not convert
all filesystems correctly. Do not use on valuable data.
Previous implementation copied the UUID conversion which was not a good
base for the checksum conversion so it left out basically all trees
except extent and checksum.
This update adds the base for the required safety features:
- let the old csum tree intact until the full conversion is done (ie.
all data are still verifiable)
- add on-disk status tracking item, this should keep the from/to
checksum conversion, last generation to catch potential updates of the
underlying filesystem if conversion is interrupted and the filesystem
mounted
- convert most of the fundamental trees, the subvolumes, tree log and
relocation trees are not converted
- trees are converted in-place to avoid potentially running out of space
but this might be better done by transaction protection with a
temporary tree
Known issues:
- not all trees are converted
- not all checksums are correctly inserted into the new tree and reading
the files leads to EIO
Issue: #438
Signed-off-by: David Sterba <dsterba@suse.com>
[FALSE ALERT]
Unlike gcc, clang doesn't really understand the comments, thus it's
reportings tons of fall through related errors:
cmds/reflink.c:124:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
case 'r':
^
cmds/reflink.c:124:3: note: insert '__attribute__((fallthrough));' to silence this warning
case 'r':
^
__attribute__((fallthrough));
cmds/reflink.c:124:3: note: insert 'break;' to avoid fall-through
case 'r':
^
break;
[CAUSE]
Although gcc is fine with /* fallthrough */ comments, clang is not.
[FIX]
So just introduce a fallthrough macro to handle the situation properly,
and use that macro instead.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This reverts commit 03451430de.
(It's not 1:1, there are some additional trivial fixups in cmds/qgroup.c)
This breaks a lot of 3rd party tools that depend on it as Neal reports:
* btrfs-assistant
* buildah
* cri-o
* podman
* skopeo
* containerd
* moby/docker
* snapper
* source-to-image
Link: https://lore.kernel.org/linux-btrfs/CAEg-Je8L7jieKdoWoZBuBZ6RdXwvwrx04AB0fOZF1fr5Pb-o1g@mail.gmail.com/
Reported-by: Neal Gompa <ngompa@fedoraproject.org>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch copies in compression.h from the kernel. This is relatively
straightforward, we just have to drop the compression types definition
from ctree.h, and update the image to use BTRFS_NR_COMPRESS_TYPES
instead of BTRFS_COMPRESS_LAST, and add a few things to kerncompat.h to
make everything build smoothly.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We return __u16 in the kernel, as this is actually the size of
btrfs_qgroup_level. Adjust the existing helpers and update all the
callers to deal with the new size appropriately. This will make syncing
the kernel code cleaner.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We're going to sync the kernel source into btrfs-progs, and in the
kernel we have all these qgroup fields named with short names instead of
the full name, so rename
referenced -> rfer
compressed -> cmpr
exclusive -> excl
to match the kernel and update all the users, this will make the sync
cleaner.
ioctl.h is a public header but there are no users of the
btrfs_qgroup_limit structure.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Kernel function name is btrfs_qgroup_subvolid so rename it in progs. The
libbtrfs can't API be changed without versioning so at least add the new
helper.
Signed-off-by: David Sterba <dsterba@suse.com>
Currently we put EXTENT_TREE_V2 incompat flag entry under EXPERIMENTAL
features, thus at compile time, incompat_flags_array[] is determined at
compile time.
But the truth is, we have @supported_flags for __print_readable_flag(),
which is already defined based on EXPERIMENTAL flag.
Thus for __print_readable_flag(), we can always include the entry for
EXTENT_TREE_V2, and only print the flag if it's in the @supported_flags
By this, we can remove one EXPERIMENTAL ifdef.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The extent-tree-v2 is still experimental but it should be printed among
the other incompat flags if enabled by the build.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The radix-tree is not used in userspace code. In kernel it's for
tracking unpersisted and in-memory structures and has been replaced by
the xarray.
Signed-off-by: David Sterba <dsterba@suse.com>
Block group tree feature is completely a standalone feature, and it has
been over 5 years before the initial introduction to solve the long
mount time.
I don't really want to waste another 5 years waiting for a feature which
may or may not work, but definitely not properly reviewed for its
preparation patches.
So this patch will separate the block group tree feature into a
standalone compat RO feature.
There is a catch, in mkfs create_block_group_tree(), current
tree-checker only accepts block group item with valid chunk_objectid,
but the existing code from extent-tree-v2 didn't properly initialize it.
This patch will also fix above mentioned problem so kernel can mount it
correctly.
Now mkfs/fsck should be able to handle the fs with block group tree.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The extent tree v2 (thankfully not yet fully materialized) needs a
new root for storing all block group items.
My initial proposal years ago just added a new tree rootid, and load it
from tree root, just like what we did for quota/free space tree/uuid/extent
roots.
But the extent tree v2 patches introduced a completely new (and to me,
wasteful) way to store block group tree root into super block.
Currently there are only 3 trees stored in super blocks, and they all
have their valid reasons:
- Chunk root
Needed for bootstrap.
- Tree root
Really the entrance of all trees.
- Log root
This is special as log root has to be updated out of existing
transaction mechanism.
There is not even any reason to put block group root into super blocks,
the block group tree is updated at the same timing as old extent tree,
no need for extra bootstrap/out-of-transaction update.
So just move block group root from super block into tree root.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is the same on-disk format update synchronized from the kernel
code.
Unlike kernel, there are two callers reading this member:
- btrfs inspect dump-super
It's just printing the value, add a notice about deprecation.
- btrfs-find-root
In that case, since we always got 0, the root search for log root
should never find a perfect match.
Use btrfs_super_geneartion() + 1 to provide a better result.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu noticed that the full checksums are still printed even if the
experimental build is not enabled. This is caused by wrong use of #ifdef
(as the macro is always defined), this must be "#if".
Fixes: 1bb6fb896dfc ("btrfs-progs: btrfstune: experimental, new option to switch csums")
Reported-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For the default CRC32C checksum, print-tree now prints tons of
unnecessary padding zeros:
btrfs-progs v5.17
chunk tree
leaf 22036480 items 7 free space 15430 generation 6 owner CHUNK_TREE
leaf 22036480 flags 0x1(WRITTEN) backref revision 1
checksum stored 0ac1b9fa00000000000000000000000000000000000000000000000000000000
checksum calced 0ac1b9fa00000000000000000000000000000000000000000000000000000000
fs uuid 3d95b7e3-3ab6-4927-af56-c58aa634342e
This is caused by commit 1bb6fb896dfc ("btrfs-progs: btrfstune:
experimental, new option to switch csums"), and it looks like most
distros just enable EXPERIMENTAL features by default.
(Which is a good thing to provide much better coverage).
So here we just limit the csum print to the utilized csum size.
Now the output looks like:
btrfs-progs v5.17
chunk tree
leaf 22036480 items 4 free space 15781 generation 6 owner CHUNK_TREE
leaf 22036480 flags 0x1(WRITTEN) backref revision 1
checksum stored 676b812f
checksum calced 676b812f
fs uuid d11f8799-b6dc-415d-b1ed-cebe6da5f0b7
Fixes: 1bb6fb896dfc ("btrfs-progs: btrfstune: experimental, new option to switch csums")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
'btrfs inspect-internals dump-tree' doesn't currently know about the two
types of verity items and prints them as 'UNKNOWN.36' or 'UNKNOWN.37'.
So add them to the known item types.
Suggested-by: Boris Burkov <boris@bur.io>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: David Sterba <dsterba@suse.com>
Add the appropriate support to the print tree and dump tree code to spit
out the block group tree.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that all callers are using the _nr variations we can simply rename
these helpers to btrfs_item_##member/btrfs_set_item_##member and change
the actual item SETGET funcs to raw_item_##member/set_raw_item_##member
and then change all callers to drop the _nr part.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This matches how the kernel does it, simply pass in the slot and fix up
btrfs_file_extent_inline_item_len to use the btrfs_item_nr() helper and
the correct define. Fixup all the callers to use the slot now instead
of passing in the btrfs_item.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have this pattern in a lot of places
item = btrfs_item_nr(slot);
btrfs_item_size(leaf, item);
btrfs_item_offset(leaf, item);
when we could simply use
btrfs_item_size_nr(leaf, slot);
btrfs_item_offset_nr(leaf, slot);
Fix all callers of btrfs_item_size() and btrfs_item_offset() to use the
_nr variation of the helpers.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This helper only takes the nodesize, but in the future it'll take a bool
to indicate if we're extent tree v2. The remaining users are all where
we only have extent_buffer, but we should always have a valid
eb->fs_info in these cases, so add BUG_ON()'s for the !eb->fs_info case
and then convert these callers to use BTRFS_LEAF_DATA_SIZE which takes
the fs_info.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is still work in progress but can survive some stress testing.
There are still some sanity checks missing, do not user this on valuable
data. To enables this, configure must be run with the experimental
features enabled.
$ mkfs.btrfs --csum crc32c /dev/sdx
$ <mount, fill with data, unmount>
$ btrfstune --csum sha256
Will change the checksum to sha256.
Implementation:
- set bit on superblock when the checksums are being changed (similar to
the uuid rewrite)
- metadata checksums are overwritten in place
- data checksums:
- the checksum tree is completely deleted and no checksums are
verified
- data blocks are enumerated and all checksums generated (same as
check --init-csum-tree)
To make it usable, it should be restartable and track the current
progress somehow. Also the previous data checksums should be verified
any time they're available.
Signed-off-by: David Sterba <dsterba@suse.com>
Add the on disk definitions for the block group tree. This will be part
of the super block so we need to add the appropriate helpers to the
super block, as well as adding it to the backup roots.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Just like kernel commit 22b6331d9617 ("btrfs: store precalculated
csum_size in fs_info"), we can cache csum_size and csum_type in
btrfs_fs_info.
Furthermore, there is already a 32 bits hole in btrfs_fs_info, and we
can fit csum_type and csum_size into the hole without increase the size
of btrfs_fs_info.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
We noticed 'btrfs check' outputs something like
leaf 30408704 flags 0x0(P1逅?) backref revision 1
but we expected:
leaf 30408704 flags 0x0() backref revision 1
[CAUSE]
Some XX_flags_to_str() failed to make sure the result string always ends
with '\0' in some case.
[FIX]
Reset the buffer at the beginnig.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Wang Yugui (wangyugui@e16-tech.com)
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Since commit dad03fac3b ("btrfs-progs: switch btrfs_group_profile_str
to use raid table"), fstests/btrfs/023 and btrfs/151 will always fail.
The failure of btrfs/151 explains the reason pretty well:
btrfs/151 1s ... - output mismatch
--- tests/btrfs/151.out 2019-10-22 15:18:14.068965341 +0800
+++ ~/xfstests-dev/results//btrfs/151.out.bad 2021-11-02 17:13:43.879999994 +0800
@@ -1,2 +1,2 @@
QA output created by 151
-Data, RAID1
+Data, raid1
...
(Run 'diff -u ~/xfstests-dev/tests/btrfs/151.out ~/xfstests-dev/results//btrfs/151.out.bad' to see the entire diff)
[CAUSE]
Commit dad03fac3b ("btrfs-progs: switch btrfs_group_profile_str to use
raid table") will use btrfs_raid_array[index].raid_name, which is all
lower case.
[FIX]
There is no need to bring such output format change.
So here we split the btrfs_raid_attr::raid_name[] into upper_name[] and
lower_name[], and make upper and lower case helpers for callers to use.
Now there are several types of callers referring to lower_name and
upper_name:
- parse_bg_profile()
It uses strcasecmp(), either case would be fine.
- btrfs_group_profile_str()
Originally it uses upper case for all profiles except "single".
Now unified to upper case.
- sprint_profiles()
It uses lower case.
- bg_flags_to_str()
It uses upper case.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Commit ("btrfs-progs: use raid table for profile names in
print-tree.c") introduced one bug in block group and chunk flags output
and changed the behavior:
item 1 key (FIRST_CHUNK_TREE CHUNK_ITEM 13631488) itemoff 16105 itemsize 80
length 8388608 owner 2 stripe_len 65536 type SINGLE
...
item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 22020096) itemoff 15993 itemsize 112
length 8388608 owner 2 stripe_len 65536 type DUP
...
item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 30408704) itemoff 15881 itemsize 112
length 268435456 owner 2 stripe_len 65536 type DUP
...
Note that, the flag string only contains the profile (SINGLE/DUP/etc...)
no type (DATA/METADATA/SYSTEM).
And we have new "SINGLE" string, even that profile has no extra bit to
indicate that.
[CAUSE]
The "SINGLE" part is caused by the raid array which has a name for
SINGLE profile, even it doesn't have the corresponding bit.
The missing type string is caused by a code bug:
strcpy(buf, name);
while (*tmp) {
*tmp = toupper(*tmp);
tmp++;
}
strcpy(ret, buf);
The last strcpy() call overrides the existing string in @ret.
[FIX]
- Enhance string handling using strn*()/snprintf()
- Add extra "UKNOWN.0x%llx" output for unknown profiles
- Call proper strncat() to merge type and profile
- Add extra handling for "SINGLE" to keep the old output
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add new options to dumps checksums in node headers and in the checksum
items:
$ btrfs inspect dump-tree --csum-headers image
root tree
leaf 471515136 items 19 free space 12186 generation 15 owner ROOT_TREE
leaf 471515136 flags 0x1(WRITTEN) backref revision 1 csum 0x756b2d54
fs uuid df0348df-5773-47dd-81e9-a18221461239
For nodes/leaves it's appended on the 2nd line of the header.
Checksum items are stored in leaves as EXTENT_CSUM key type, with offset
value as the logical offset starting. As the array would be hard to
parse or match, each offset value is printed with the checksum. For
crc32c it's 4 values on a line, for xxhash it's 2 and for the long
256bit checksums it's one checksum per line.
$ btrfs inspect dump-tree --csum-items image
leaf 5423104 items 1 free space 30 generation 6 owner CSUM_TREE
leaf 5423104 flags 0x1(WRITTEN) backref revision 1
fs uuid bd7c981e-16ff-4081-a734-3ef5d50cafc1
chunk uuid 13f4c76c-7845-4984-88ed-f01b52e05cf8
item 0 key (EXTENT_CSUM EXTENT_CSUM 22020096) itemoff 55 itemsize 16228
range start 22020096 end 38637568 length 16617472
[22020096] 0x8941f998 [22024192] 0x8941f998 [22028288] 0x8941f998 [22032384] 0x8941f998
[22036480] 0x8941f998 [22040576] 0x8941f998 [22044672] 0x8941f998 [22048768] 0x8941f998
...
$ btrfs inspect dump-tree --csum-items image
leaf 5718016 items 1 free space 7746 generation 6 owner CSUM_TREE
leaf 5718016 flags 0x1(WRITTEN) backref revision 1
fs uuid f453a5b4-8b4a-4fbf-90a2-2925e4fe2335
chunk uuid eb1da63b-248b-44c2-82da-71b2564bf50e
item 0 key (EXTENT_CSUM EXTENT_CSUM 52387840) itemoff 7771 itemsize 8512
range start 52387840 end 53477376 length 1089536
[52387840] 0x686ede9288c391e7e05026e56f2f91bfd879987a040ea98445dabc76f55b8e5f
[52391936] 0x686ede9288c391e7e05026e56f2f91bfd879987a040ea98445dabc76f55b8e5f
...
The options are not on by default, the header checksum is not important
for the structures. Data checksums can be quite big so that would make
the dump long and without any actual data to match against.
Signed-off-by: David Sterba <dsterba@suse.com>
Replace follow and traverse by one parameter that takes bits to affect
the behaviour. This allows to extend btrfs_print_tree output with more
modes from one place.
Signed-off-by: David Sterba <dsterba@suse.com>
With the zoned feature enabled, a zoned block device-aware btrfs
allocates block groups aligned to the device zones and always written in
sequential zones at the zone write pointer position.
It also supports "emulated" zoned mode on a non-zoned device. In the
emulated mode, btrfs emulates conventional zones by slicing the device
into fixed-size zones.
We don't support conversion from the ext4 volume with the zoned feature
because we can't be sure all the converted block groups are aligned to
zone boundaries.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For passing authentication keys to the checksumming functions we need a
container for the key.
Pass in a btrfs_fs_info to btrfs_csum_data() so we can use the fs_info
as a container for the authentication key.
Note this is not always possible for all callers of btrfs_csum_data() so
we're just passing in NULL for now
Functions calling btrfs_csum_data() with a NULL fs_info argument are
currently not supported in the context of an authenticated file system.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
While debugging a corruption problem I realized we don't spit out the
flags for nodes, which is needed when debugging relocation problems so
we know which nodes are the RELOC root items and which are the actual fs
tree's items. Fix this by unifying the header printing helper so both
leaf's and nodes get the same information printed out.
node 41070940160 level 1 items 34 free space 87 generation 7709536 owner ROOT_TREE
node 41070940160 flags 0x1(WRITTEN) backref revision 1
Same for leaves:
leaf 41070944256 items 12 free space 515 generation 7709536 owner ROOT_TREE
leaf 41070944256 flags 0x1(WRITTEN) backref revision 1
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>