We were previously getting away with this because the
load_global_roots() treated ENOENT like everything was a-ok. However
that was a bug and fixing that bug uncovered a problem where we were
unconditionally trying to load the free space tree. Fix that by
skipping the load if we do not have the compat bit set.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The fuzz-test/003 was infinite looping when I reworked the code to
re-calculate the used bytes for the superblock. This is because fsck
wasn't properly fixing the bad extent before my change, it just happened
to error out nicely, whereas my change made it so we go the wrong bytes
used count and just infinite looped trying to fix the problem.
Fix this by sanity checking the extent when we try to re-calculate the
bytes_used. This makes us no longer infinite loop so we can get through
the fuzz tests.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
003-multi-check-unmounted was taking a long time, this is because it was
doing the 10 second countdown for each iteration of --init-csum-tree.
Fix this by using --force to bypass the countdown.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While running make test on other patches I noticed we are now
segfaulting on the fuzz tests. This is because when I converted us to a
rb tree for the global roots we lost the ability to catch that there's
no extent root at all. Before we'd populate a dummy
fs_info->extent_root with a not uptodate node, but now you simply don't
get an extent root in the rb_tree. Fix the check_global_roots_uptodate
helper to count how many roots we find and make sure it matches the
number we expect. If it doesn't then we can return -EIO.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
'make clean' can fail in case the _build directory is not present, can
be observed with the build-tests.sh. Use plain 'rm -fd'.
Signed-off-by: David Sterba <dsterba@suse.com>
The fsstress tool is a useful file generator, pull it from fstests as
it's not packaged as a standalone tool anywhere and the LTP version is
out of date.
The file has been modified to build, some xfs-specific ioctls are not
supported.
Signed-off-by: David Sterba <dsterba@suse.com>
This is still work in progress but can survive some stress testing.
There are still some sanity checks missing, do not user this on valuable
data. To enables this, configure must be run with the experimental
features enabled.
$ mkfs.btrfs --csum crc32c /dev/sdx
$ <mount, fill with data, unmount>
$ btrfstune --csum sha256
Will change the checksum to sha256.
Implementation:
- set bit on superblock when the checksums are being changed (similar to
the uuid rewrite)
- metadata checksums are overwritten in place
- data checksums:
- the checksum tree is completely deleted and no checksums are
verified
- data blocks are enumerated and all checksums generated (same as
check --init-csum-tree)
To make it usable, it should be restartable and track the current
progress somehow. Also the previous data checksums should be verified
any time they're available.
Signed-off-by: David Sterba <dsterba@suse.com>
There was a workaround for asciidoc/xmlto build because page btrfs.5 and
btrfs.8 used the same intermediate file. To avoid clash for the section
5 page it is part of the name, this is still needed, but for sphinx we
can use the final name as it will get the right suffix. This affects
only the generated manual pages.
Signed-off-by: David Sterba <dsterba@suse.com>
Update sphinx build so that by default it now builds the manual pages,
in quiet mode. 'make html' builds the html, other sphinx targets are
available, see 'make help'. Installation now works as well.
Note: sphinx is still conditional and must be selected by
ASCIIDOOC_TOOL=sphinx ./configure
Signed-off-by: David Sterba <dsterba@suse.com>
Building 32bit binaries on 64bit hosts is tedious due to all the
required libraries with the right target, also the static versions.
Limit the build only to the native arch.
Signed-off-by: David Sterba <dsterba@suse.com>
There are some configure options not covered by build tests, eg. the
--disable-zoned build broke recently.
Signed-off-by: David Sterba <dsterba@suse.com>
A snapshot could be created in an existing directory, explain the
difference in the command line help options.
Pull-request: #117
Author: Howard <hwj@BridgeportContractor.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Update the remaining erroneous entry of -s in the 'exit status' section
to -c in accordance with the changes made in:
"btrfs-progs: dev stats: update option name for checking non-zero status"
Pull-request: #146
Signed-off-by: Philip Guyton <philip@yewtreeapps.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The zoned support requires a header file and some structures for full
support. There are distros that have only part of that and the
autodetection at configure time does not handle that properly, assuming
the user requested the support.
Check if there was any of the --*able-zoned options and if not detect
the support level and fallback to no zoned eventually, not requiring
users to specify --disable-zoned as before.
Issue: #425
Signed-off-by: David Sterba <dsterba@suse.com>
Since 5.15 enables a runtime feature, this statement is incorrect.
Pull-request: #437
Author: dathide <47128084+dathide@users.noreply.github.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Question forwarded from wiki:
https://btrfs.wiki.kernel.org/index.php?title=Talk:Compression&diff=0&oldid=33545
(info few months old) higher levels: upstream and library has levels 1..22.
mapping to the upstream levels 1 to 15
As implemented in kernel zstd.c, max level is 15 and the level selection
is direct without any translation.
Pull-request: #442
Author: jtagcat <git-12dbd862@jtag.cat>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
In commit 88895a920f ("btrfs-progs: use profile_supported in mkfs as
well") there's a wrapper but not available on non-zoned builds. Add it.
Issue: #445
Signed-off-by: David Sterba <dsterba@suse.com>
Pass BTRFS_BLOCK_GROUP_DATA and BTRFS_BLOCK_GROUP_METADATA to
zoned_profile_supported(), so we can actually distinguish if it is a data
or a meta-data block group.
Fixes: 8f914d518a46 ("btrfs-progs: zoned support DUP on metadata block groups")
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For !extent tree v2 we should validate the key.offset == 0, and for
extent tree v2 we should validate that key.offset < nr_global_roots. If
this fails we need to fail to load the global root so that the
appropriate action is taken.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The kernel uses 'unsigned long' for u64 specifically for ppc64 and
mips64.
Remove asm/types.h include as it will get included properly later.
Fixe -Wformat warnings.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The image has a key in extent tree, (30457856 METADATA_ITEM 256), which
has invalid level (256 > BTRFS_MAX_LEVEL).
Make sure check can at least detect such problem.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a bug report that kernel tree-checker rejected an invalid
metadata item:
corrupt leaf: block=934474399744 slot=68 extent bytenr=425173254144 len=16384 invalid tree level, have 33554432 expect [0, 7]
But original mode btrfs-check reports nothing wrong.
(lowmem mode will just crash, and fixed in previous patch).
[CAUSE]
For original mode it doesn't really check tree level, thus didn't find
the problem.
[FIX]
I don't have a good idea to completely make original mode to verify the
level in backref and in the tree block (while lowmem does that).
But at least we can detect obviously corrupted level just like kernel.
Now original mode will detect such problem:
...
[2/7] checking extents
ERROR: tree block 30457856 has bad backref level, has 256 expect [0, 7]
ref mismatch on [30457856 16384] extent item 0, found 1
tree backref 30457856 root 5 not found in extent tree
backpointer mismatch on [30457856 16384]
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space tree
...
Reported-by: Stickstoff <stickstoff@posteo.de>
Link: https://lore.kernel.org/linux-btrfs/6ed4cd5a-7430-f326-4056-25ae7eb44416@posteo.de/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When running lowmem mode with METADATA_ITEM which has invalid level, it
will crash with the following backtrace:
(gdb) bt
#0 0x0000555555616b0b in btrfs_header_bytenr (eb=0x4)
at ./kernel-shared/ctree.h:2134
#1 0x0000555555620c78 in check_tree_block_backref (root_id=5,
bytenr=30457856, level=256) at check/mode-lowmem.c:3818
#2 0x0000555555621f6c in check_extent_item (path=0x7fffffffd9c0)
at check/mode-lowmem.c:4334
#3 0x00005555556235a5 in check_leaf_items (root=0x555555688e10,
path=0x7fffffffd9c0, nrefs=0x7fffffffda30, account_bytes=1)
at check/mode-lowmem.c:4835
#4 0x0000555555623c6d in walk_down_tree (root=0x555555688e10,
path=0x7fffffffd9c0, level=0x7fffffffd984, nrefs=0x7fffffffda30,
check_all=1) at check/mode-lowmem.c:4967
#5 0x000055555562494f in check_btrfs_root (root=0x555555688e10, check_all=1)
at check/mode-lowmem.c:5266
#6 0x00005555556254ee in check_chunks_and_extents_lowmem ()
at check/mode-lowmem.c:5556
#7 0x00005555555f0b82 in do_check_chunks_and_extents () at check/main.c:9114
#8 0x00005555555f50ea in cmd_check (cmd=0x55555567c640 <cmd_struct_check>,
argc=3, argv=0x7fffffffdec0) at check/main.c:10892
#9 0x000055555556b2b1 in cmd_execute (argv=0x7fffffffdec0, argc=3,
cmd=0x55555567c640 <cmd_struct_check>) at cmds/commands.h:125
[CAUSE]
For function check_extent_item() it will go through inline extent items
and then check their backrefs.
But for METADATA_ITEM, it doesn't really validate key.offset, which is
u64 and can contain value way larger than BTRFS_MAX_LEVEL (mostly caused
by bit flip).
In that case, if we have a larger value like 256 in key.offset, then
later check_tree_block_backref() will use 256 as level, and overflow
path->nodes[level] and crash.
[FIX]
Just verify the level, no matter if it's from btrfs_tree_block_level()
(which is just u8), or it's from key.offset (which is u64).
To do the check properly and detect higher bits corruption, also change
the type of @level from u8 to u64.
Now lowmem mode can detect the problem properly:
...
[2/7] checking extents
ERROR: tree block 30457856 has bad backref level, has 256 expect [0, 7]
ERROR: extent[30457856 16384] level mismatch, wanted: 0, have: 256
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space tree
...
Reviewed-by: Su Yue <l@damenly.su>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The search for default subvolume could fail for two reasons, the lack of
CAP_SYS_ADMIN for TREE_SEARCH ioctl is one but the default subvolume
could be unset as well, thus no restrictions for deletion.
Signed-off-by: David Sterba <dsterba@suse.com>
Checking the default subvolume uses TREE_SEARCH which is a CAP_SYS_ADMIN
only operation, and thus will fail when unprivileged, even if we have
permissions to actually delete the subvolume.
This produces a warning even if all is ok. Let's hide it if we're not
root (root but !CAP is odd enough to warn).
Fixes 87804a3f06 ("btrfs-progs: subvolume: check deleting default subvolume")
Link: https://bugs.debian.org/998840
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: David Sterba <dsterba@suse.com>
Support using BTRFS_BLOCK_GROUP_DUP on metadata (and system) block groups.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently we have two places checking if a block-group profile is
supported on a zoned device, one in mkfs/main.c and one in
kernel-shared/zoned.c.
Use the one from kernel-shared/zoned.c in mkfs as well, unifying all
checks.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Unlike kernel, we have cached physical address of extent_buffer in
dev_bytenr. Print it for better debug experience.
Signed-off-by: Su Yue <l@damenly.su>
Signed-off-by: David Sterba <dsterba@suse.com>
As zoned btrfs uses regular writes for metadata, it needs zone write
locking in the IO scheduler. Add a udev rule that configures an IO
scheduler doing zone write locking.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add some scripts for convenience, so far there was only one for musl as
it usually breaks first, but we've had some problems on centos due to
old kernel headers and potential breakage when changing kerncpomat.h.
Signed-off-by: David Sterba <dsterba@suse.com>
This new test script will create a fs with the following situations:
- Preallocated extents (no data csum)
- Nodatasum inodes (no data csum)
- Partially written preallocated extents (no data csum for part of the
extent)
- Regular data extents (with data csum)
- Regular data extents then hole punched (with data csum)
- Preallocated data, then written, then hole punched (with data csum)
- Compressed extents (with data csum)
And make sure after --init-csum-tree (with or without
--init-extent-tree) the result fs can still pass check.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When using `--init-csum-tree` with `--init-extent-tree`, csum tree
population will be done by iterating all file extent items.
This allow us to skip preallocated extents, but it still has the
following problems:
- Inodes with NODATASUM
- Hole file extents
- Written preallocated extents
We will generate csum for the whole extent, while other part may still
be unwritten.
Make it to follow the same behavior of recently introduced
fill_csum_for_file_extent(), so we can generate correct csum.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
If a btrfs filesystem has preallocated file extents, 'btrfs check
--init-csum-tree' will create csum item for preallocated extents, and
cause error:
# mkfs.btrfs -f test.img
# mount test.img /mnt/btrfs
# fallocate -l 32K /mnt/btrfs/file
# umount /mnt/btrfs
# btrfs check --init-csum-tree --force test.img
...
[4/7] checking fs roots
root 5 inode 257 errors 800, odd csum item
ERROR: errors found in fs roots
found 376832 bytes used, error(s) found
And the csum tree is not empty, containing csum for the preallocated
extent:
$ btrfs ins dump-tree -t csum test.img
btrfs-progs v5.15.1
checksum tree key (CSUM_TREE ROOT_ITEM 0)
leaf 30408704 items 1 free space 16226 generation 9 owner CSUM_TREE
leaf 30408704 flags 0x1(WRITTEN) backref revision 1
fs uuid ecc79835-5611-4609-b985-e4ccd6f15b54
chunk uuid b1c75553-5b82-4aa6-bbbe-e7f50643b1a8
item 0 key (EXTENT_CSUM EXTENT_CSUM 13631488) itemoff 16251 itemsize 32
range start 13631488 end 13664256 length 32768
[CAUSE]
For `--init-csum-tree` alone, we will use extent tree to iterate each
data extent, and calculate csum for them.
But extent items alone can not tell us if the file extent belongs to a
NODATASUM inode, nor if it's preallocated.
Thus we create csums for those data extents, and cause the problem.
[FIX]
However the fix is not that simple, we can not just generate csum for
non-preallocated range.
As the following case we still need csum for the un-referred part:
xfs_io -f -c "pwrite 0 8K" -c "sync" -c "punch 0 4K"
So here we have to go another direction by:
- Always generate csum for the whole data extent
This is the same as the old code
- Iterate the file extents, and delete csum for preallocated range
or NODATASUM inodes
Issue: #430
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This part has no mode specific operations, just move them into
mode-common.[ch].
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When calling iterate_extent_inodes() on data extents with indirect ref
(with inline or keyed EXTENT_DATA_REF_KEY), it will fail to execute the
call back function at all.
[CAUSE]
In function find_parent_nodes(), we only add the target tree block if a
backref has @parent populated.
For indirect backref like EXTENT_DATA_REF_KEY, we rely on
__resolve_indirect_ref() to get the parent leaves.
However __resolve_indirect_ref() only grabs backrefs from
&prefstate->pending_indirect_refs.
Meaning callers should queue any indirect backref to
pending_indirect_refs.
But unfortunately in __add_prelim_ref() and __add_missing_keys(), none
of them properly queue the indirect backrefs to pending_indirect_refs,
but directly to pending.
Making all indirect backrefs never got resolved, thus no callback
function executed
[FIX]
Fix __add_prelim_ref() and __add_missing_keys() to properly queue
indirect backrefs to the correct list.
Currently there is no such direct user in btrfs-progs, but later csum
tree re-initialization code will rely this to do proper csum
re-calculate (to avoid preallocated/nodatasum extents).
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There's still problem left with compilation on musl and kernel < 5.11,
because __ALIGN_KERNEL is not defined anymore:
../bin/ld: kernel-shared/volumes.o: in function `create_chunk':
volumes.c:(.text+0x17f8): undefined reference to `__ALIGN_KERNEL'
Due to the entangled includes and unconditional definition of
__ALIGN_KERNEL, we can't use #ifdef in kerncompat.h to define it
eventually (as kerncompat.h is the first include). Instead add local
definitions of the macros and rename them to avoid name clashes.
Pull-request: #433
Signed-off-by: David Sterba <dsterba@suse.com>