[BUG]
There is a bug report that kernel tree-checker rejected an invalid
metadata item:
corrupt leaf: block=934474399744 slot=68 extent bytenr=425173254144 len=16384 invalid tree level, have 33554432 expect [0, 7]
But original mode btrfs-check reports nothing wrong.
(lowmem mode will just crash, and fixed in previous patch).
[CAUSE]
For original mode it doesn't really check tree level, thus didn't find
the problem.
[FIX]
I don't have a good idea to completely make original mode to verify the
level in backref and in the tree block (while lowmem does that).
But at least we can detect obviously corrupted level just like kernel.
Now original mode will detect such problem:
...
[2/7] checking extents
ERROR: tree block 30457856 has bad backref level, has 256 expect [0, 7]
ref mismatch on [30457856 16384] extent item 0, found 1
tree backref 30457856 root 5 not found in extent tree
backpointer mismatch on [30457856 16384]
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space tree
...
Reported-by: Stickstoff <stickstoff@posteo.de>
Link: https://lore.kernel.org/linux-btrfs/6ed4cd5a-7430-f326-4056-25ae7eb44416@posteo.de/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When running lowmem mode with METADATA_ITEM which has invalid level, it
will crash with the following backtrace:
(gdb) bt
#0 0x0000555555616b0b in btrfs_header_bytenr (eb=0x4)
at ./kernel-shared/ctree.h:2134
#1 0x0000555555620c78 in check_tree_block_backref (root_id=5,
bytenr=30457856, level=256) at check/mode-lowmem.c:3818
#2 0x0000555555621f6c in check_extent_item (path=0x7fffffffd9c0)
at check/mode-lowmem.c:4334
#3 0x00005555556235a5 in check_leaf_items (root=0x555555688e10,
path=0x7fffffffd9c0, nrefs=0x7fffffffda30, account_bytes=1)
at check/mode-lowmem.c:4835
#4 0x0000555555623c6d in walk_down_tree (root=0x555555688e10,
path=0x7fffffffd9c0, level=0x7fffffffd984, nrefs=0x7fffffffda30,
check_all=1) at check/mode-lowmem.c:4967
#5 0x000055555562494f in check_btrfs_root (root=0x555555688e10, check_all=1)
at check/mode-lowmem.c:5266
#6 0x00005555556254ee in check_chunks_and_extents_lowmem ()
at check/mode-lowmem.c:5556
#7 0x00005555555f0b82 in do_check_chunks_and_extents () at check/main.c:9114
#8 0x00005555555f50ea in cmd_check (cmd=0x55555567c640 <cmd_struct_check>,
argc=3, argv=0x7fffffffdec0) at check/main.c:10892
#9 0x000055555556b2b1 in cmd_execute (argv=0x7fffffffdec0, argc=3,
cmd=0x55555567c640 <cmd_struct_check>) at cmds/commands.h:125
[CAUSE]
For function check_extent_item() it will go through inline extent items
and then check their backrefs.
But for METADATA_ITEM, it doesn't really validate key.offset, which is
u64 and can contain value way larger than BTRFS_MAX_LEVEL (mostly caused
by bit flip).
In that case, if we have a larger value like 256 in key.offset, then
later check_tree_block_backref() will use 256 as level, and overflow
path->nodes[level] and crash.
[FIX]
Just verify the level, no matter if it's from btrfs_tree_block_level()
(which is just u8), or it's from key.offset (which is u64).
To do the check properly and detect higher bits corruption, also change
the type of @level from u8 to u64.
Now lowmem mode can detect the problem properly:
...
[2/7] checking extents
ERROR: tree block 30457856 has bad backref level, has 256 expect [0, 7]
ERROR: extent[30457856 16384] level mismatch, wanted: 0, have: 256
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space tree
...
Reviewed-by: Su Yue <l@damenly.su>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When using `--init-csum-tree` with `--init-extent-tree`, csum tree
population will be done by iterating all file extent items.
This allow us to skip preallocated extents, but it still has the
following problems:
- Inodes with NODATASUM
- Hole file extents
- Written preallocated extents
We will generate csum for the whole extent, while other part may still
be unwritten.
Make it to follow the same behavior of recently introduced
fill_csum_for_file_extent(), so we can generate correct csum.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
If a btrfs filesystem has preallocated file extents, 'btrfs check
--init-csum-tree' will create csum item for preallocated extents, and
cause error:
# mkfs.btrfs -f test.img
# mount test.img /mnt/btrfs
# fallocate -l 32K /mnt/btrfs/file
# umount /mnt/btrfs
# btrfs check --init-csum-tree --force test.img
...
[4/7] checking fs roots
root 5 inode 257 errors 800, odd csum item
ERROR: errors found in fs roots
found 376832 bytes used, error(s) found
And the csum tree is not empty, containing csum for the preallocated
extent:
$ btrfs ins dump-tree -t csum test.img
btrfs-progs v5.15.1
checksum tree key (CSUM_TREE ROOT_ITEM 0)
leaf 30408704 items 1 free space 16226 generation 9 owner CSUM_TREE
leaf 30408704 flags 0x1(WRITTEN) backref revision 1
fs uuid ecc79835-5611-4609-b985-e4ccd6f15b54
chunk uuid b1c75553-5b82-4aa6-bbbe-e7f50643b1a8
item 0 key (EXTENT_CSUM EXTENT_CSUM 13631488) itemoff 16251 itemsize 32
range start 13631488 end 13664256 length 32768
[CAUSE]
For `--init-csum-tree` alone, we will use extent tree to iterate each
data extent, and calculate csum for them.
But extent items alone can not tell us if the file extent belongs to a
NODATASUM inode, nor if it's preallocated.
Thus we create csums for those data extents, and cause the problem.
[FIX]
However the fix is not that simple, we can not just generate csum for
non-preallocated range.
As the following case we still need csum for the un-referred part:
xfs_io -f -c "pwrite 0 8K" -c "sync" -c "punch 0 4K"
So here we have to go another direction by:
- Always generate csum for the whole data extent
This is the same as the old code
- Iterate the file extents, and delete csum for preallocated range
or NODATASUM inodes
Issue: #430
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This part has no mode specific operations, just move them into
mode-common.[ch].
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch fixes potential bugs in fixup_extent_refs(). If
btrfs_start_transaction() fails in some way and returns error ptr, It
goes to out logic. But old code checkes whether it is null and it calls
commit. This patch solves the problem with make that it calls only if
ret is no error.
Issue: #409
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We may have multiple extent roots, so cycle through all of the extent
roots and populate the csum tree based on the content of every extent
root we have in the file system.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Use the global roots tree to find all of the csum roots in the system
and check all of them as appropriate.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Instead of checking the csum and extent tree individually, walk through
the global roots and validate them all. This will work properly if we
have extent tree v1 or extent tree v2.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Instead of looking for the first extent root or csum root in memory,
scan through the tree root and re-init any root items that match the
given objectid. This will allow reinit to work with both extent tree v1
and extent tree v2 global roots when using the --reinit option.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When we switch to multiple global trees we'll need to access the
appropriate extent root depending on the block group or possibly root.
To handle this, use a helper in most places and then the actual root in
places where it is required. We will whittle down the direct accessors
with future patches, but this does the bulk of the preparatory work.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is going to be used for the extent tree v2 stuff more commonly, so
move it out so that it is accessible from everywhere that we need it.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We will walk all referenced tree blocks during check in order to avoid
writing over any referenced blocks during fsck. However in the future
we're going to need to do this for things like fixing block group
accounting with extent tree v2. This is because extent tree v2 will not
refer to all of the allocated blocks in the extent tree. Refactor the
code some to allow us to send down an arbitrary extent_io_tree so we can
use this helper for any case where we need to figure out where all the
used space is on an extent tree v2 file system.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
With extent tree v2 we will have per-block group checksums, so add a
helper to access the csum root and rename the fs_info csum_root to
_csum_root to catch all the places that are accessing it directly.
Convert everybody to use the helper except for internal things.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We pass the csum root from way high in the call chain in check down to
where we actually need it. However we can just get it from the fs_info
in these places, so clean up the functions to skip passing around the
csum root needlessly.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We are going to need to start looking up the csum root based on the
bytenr with extent tree v2. To that end stop passing the root to the
csum related functions so that can be done in the helper functions
themselves.
There's an unrelated deletion of a function prototype that no longer
exists.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We will lookup implied refs for every root id for every block that we
find when verifying qgroups, but we don't need to worry about non
fstrees, so skip them here.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
I screwed up a fix where we're setting the bytenr range as dirty when
marking all tree blocks used, I was looking at btrfs_pin_extent and put
->nodesize for end instead of the actual end, which is bytenr +
->nodesize - 1. Fix this up so it's correct.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
My tool ntfs2btrfs has been creating btrfs volumes in a way that the
kernel doesn't like, but which isn't picked up by btrfs check - see
maharmstone/ntfs2btrfs#23 for the details, including a backtrace. This
patch adds a check for when a csum item contains too many entries -
effectively it ensures that there's always at least sizeof(struct
btrfs_item) bytes free in the tree, otherwise btrfs_del_csums can throw
an error.
max_entries is the value of the __MAX_CSUM_ITEMS macro in
fs/btrfs/file-item.c.
Pull-request: #401
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Just like kernel commit 22b6331d9617 ("btrfs: store precalculated
csum_size in fs_info"), we can cache csum_size and csum_type in
btrfs_fs_info.
Furthermore, there is already a 32 bits hole in btrfs_fs_info, and we
can fit csum_type and csum_size into the hole without increase the size
of btrfs_fs_info.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When compiling btrfs-progs on 32bit x86 using GCC 11.1.0, there are
several warnings:
In file included from ./common/utils.h:30,
from check/main.c:36:
check/main.c: In function 'run_next_block':
./common/messages.h:42:31: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'u32' {aka 'unsigned int'} [-Wformat=]
42 | __btrfs_error((fmt), ##__VA_ARGS__); \
| ^~~~~
check/main.c:6496:33: note: in expansion of macro 'error'
6496 | error(
| ^~~~~
In file included from ./common/utils.h:30,
from kernel-shared/volumes.c:32:
kernel-shared/volumes.c: In function 'btrfs_check_chunk_valid':
./common/messages.h:42:31: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'u32' {aka 'unsigned int'} [-Wformat=]
42 | __btrfs_error((fmt), ##__VA_ARGS__); \
| ^~~~~
kernel-shared/volumes.c:2052:17: note: in expansion of macro 'error'
2052 | error("invalid chunk item size, have %u expect [%zu, %lu)",
| ^~~~~
image/main.c: In function 'search_for_chunk_blocks':
./common/messages.h:42:31: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Wformat=]
42 | __btrfs_error((fmt), ##__VA_ARGS__); \
| ^~~~~
image/main.c:2122:33: note: in expansion of macro 'error'
2122 | error(
| ^~~~~
There are two types of problems:
- __BTRFS_LEAF_DATA_SIZE()
This macro has no type definition, making it behaves differently on
different arches.
Fix this by following kernel to use inline function to make its return
value fixed to u32.
- size_t related output
For x86_64 %lu is OK but not for x86.
Fix this by using %zu.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When a special image (diverted from fsck/012) has its unused slots (slot
number >= nritems) with garbage, lowmem mode btrfs check can crash:
(gdb) run check --mode=lowmem ~/downloads/good.img.restored
Starting program: /home/adam/btrfs/btrfs-progs/btrfs check --mode=lowmem ~/downloads/good.img.restored
...
ERROR: root 5 INODE[5044031582654955520] nlink(257228800) not equal to inode_refs(0)
ERROR: root 5 INODE[5044031582654955520] nbytes 474624 not equal to extent_size 0
Program received signal SIGSEGV, Segmentation fault.
0x0000555555639b11 in btrfs_inode_size (eb=0x5555558a7540, s=0x642e6cd1) at ./kernel-shared/ctree.h:1703
1703 BTRFS_SETGET_FUNCS(inode_size, struct btrfs_inode_item, size, 64);
(gdb) bt
#0 0x0000555555639b11 in btrfs_inode_size (eb=0x5555558a7540, s=0x642e6cd1) at ./kernel-shared/ctree.h:1703
#1 0x0000555555641544 in check_inode_item (root=0x5555556c2290, path=0x7fffffffd960) at check/mode-lowmem.c:2628
[CAUSE]
At check_inode_item() we have path->slot[0] at 29, while the tree block
only has 26 items.
This happens because two reasons:
- btrfs_next_item() never reverts its slots
Even if we failed to read next leaf.
- check_inode_item() doesn't inform the caller that a fatal error
happened
In check_inode_item(), if btrfs_next_item() failed, it goes to out
label, which doesn't really set @err properly.
This means, when check_inode_item() fails at btrfs_next_item(), it will
increase path->slots[0], while it's already beyond current tree block
nritems.
When the slot increases furthermore, and if the unused item slots have
some garbage, we will get invalid btrfs_item_ptr() result, and causing
above segfault.
[FIX]
Fix the problems by two ways:
- Make btrfs_next_item() to revert its path->slots[0] on failure
- Properly detect fatal error from check_inode_item()
By this, we will no longer crash on the crafted image.
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Issue: #412
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It's not used, so just remove it.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Help string for "--clear-ino-cache" option is not following the
indentation of other help strings:
repair options:
--init-csum-tree create a new CRC tree
--init-extent-tree create a new extent tree
--clear-space-cache v1|v2 clear space cache for v1 or v2
--clear-ino-cache clear ino cache leftover items
The problem is caused by the usage of tab instead of space.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a report that, btrfstune can even work while the fs has transid
mismatch problems.
$ btrfstune -f -u /dev/sdb1
Current fsid: b2b5ae8d-4c49-45f0-b42e-46fe7dcfcb07
New fsid: b2b5ae8d-4c49-45f0-b42e-46fe7dcfcb07
Set superblock flag CHANGING_FSID
Change fsid in extents
parent transid verify failed on 792854528 wanted 20103 found 20091
parent transid verify failed on 792854528 wanted 20103 found 20091
parent transid verify failed on 792854528 wanted 20103 found 20091
Ignoring transid failure
parent transid verify failed on 792870912 wanted 20103 found 20091
parent transid verify failed on 792870912 wanted 20103 found 20091
parent transid verify failed on 792870912 wanted 20103 found 20091
Ignoring transid failure
parent transid verify failed on 792887296 wanted 20103 found 20091
parent transid verify failed on 792887296 wanted 20103 found 20091
parent transid verify failed on 792887296 wanted 20103 found 20091
Ignoring transid failure
ERROR: child eb corrupted: parent bytenr=38010880 item=69 parent level=1 child level=1
ERROR: failed to change UUID of metadata: -5
ERROR: btrfstune failed
This leaves a corrupted fs even more corrupted, and due to the extra
CHANGING_FSID flag, btrfs check will not even try to run on it:
Opening filesystem to check...
ERROR: Filesystem UUID change in progress
ERROR: cannot open file system
[CAUSE]
Unlike kernel, btrfs-progs has a less strict check on transid mismatch.
In read_tree_block() we will fall back to use the tree block even its
transid mismatch if we can't find any better copy.
However not all commands in btrfs-progs needs this feature, only
btrfs-check (which may fix the problem) and btrfs-restore (it just tries
to ignore any problems) really utilize this feature.
[FIX]
Introduce a new open ctree flag, OPEN_CTREE_ALLOW_TRANSID_MISMATCH, to
be explicit about whether we really want to ignore transid error.
Currently only btrfs-check and btrfs-restore will utilize this new flag.
Also add btrfs-image to allow opening such fs with transid error.
Link: https://www.reddit.com/r/btrfs/comments/pivpqk/failure_during_btrfstune_u/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is part of the checker and unfortunately also the public header, so
we can only copy it to the right directory.
Signed-off-by: David Sterba <dsterba@suse.com>
In kernel space we hardly use btrfs_disk_key, unless for very lowlevel
code.
There is no need to intentionally use btrfs_disk_key in btrfs-progs
either.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When running "btrfs check --check-data-csum" on fs with corrupted data,
the error message almost makes no sense:
$ btrfs check --check-data-csum /dev/test/test
Opening filesystem to check...
Checking filesystem on /dev/test/test
UUID: c31afe0a-55bc-4e7d-aba0-9dfa9ddf8090
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking csums against data
mirror 1 bytenr 13631488 csum 19 expected csum 152 <<<
ERROR: errors found in csum tree
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 147456 bytes used, error(s) found
total csum bytes: 16
total tree bytes: 131072
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 124799
file data blocks allocated: 16384
referenced 16384
[CAUSE]
We're just outputting the first byte and in decimal, which is completely
different from what we did in kernel space, nor what we did for metadata
csum mismatch.
[FIX]
Use btrfs_format_csum() for btrfs-check to output csum.
Now the result looks much better:
[5/7] checking csums against data
mirror 1 bytenr 13631488 csum 0x13fec125 expected csum 0x98757625
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
- Change it void
The old one always return csum_size.
- Use snprintf()
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We do not detect problems with our bytes_used counter in the super
block. Thankfully the same method to fix block groups is used to re-set
the value in the super block, so simply add some extra code to validate
the bytes_used field and then piggy back on the repair code for block
groups.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
By enabling the lowmem checks properly I uncovered the case where test
fsck/007 will infinite loop at the detection stage. This is because
when checking the inode item we will just btrfs_next_item(), and because
we ignore check tree block failures at read time we don't get an -EIO
from btrfs_next_leaf. Generally what check usually does is validate the
leaves/nodes as we hit them, but in this case we're not doing that. Fix
this by checking the leaf if we move to the next one and if it fails
bail. This allows us to pass the fsck/007 test with lowmem.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The repair cycle in the main check will drop all of our cache and loop
through again to make sure everything is still good to go.
Unfortunately we record our unaligned extent records on a per-root list
so they can be retrieved when we're checking the fs roots. This isn't
straightforward to clean up, so instead simply check our current list of
unaligned extent records when we are adding a new one to make sure we're
not duplicating our efforts. This makes us able to pass fsck/001 with
my super bytes_used fix applied.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We can already fix this problem with the block accounting code, we just
need to keep track of how much we should have used on the file system,
and then check it against the bytes_super. The repair just piggy backs
on the block group used repair.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Test 044 was failing with lowmem because it was not bubbling up the
error to the user. This is because we try to allow repair the
opportunity to clear the error, however if repair isn't set we simply do
not add the temporary error to the main error return variable. Fix this
by adding the tmp_err to err before moving on to the next item.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have a check that will return an error only if ret < 0, but we return
the lowmem specific errors which are all > 0. Fix this by simply
checking if (ret). This allows test 010 to pass with lowmem properly.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The lowmem mode validates the used field of the block group item, but
the normal mode does not. Fix this by keeping a running tally of what
we think the used value for the block group should be, and then if it
mismatches report an error and fix the problem if we have repair set.
We have to keep track of pending extents because we process leaves as we
see them, so it could be much later in the process that we find the
block group item to associate the extents with.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For the incoming extra page size support for subpage (sectorsize <
PAGE_SIZE) cases, the support for metadata will be a critical point.
Currently for subpage support, we require 64K page size, so that no
matter whatever the nodesize is, it will be contained inside one page.
And we will reject any tree block which crosses page boundary.
But for other page size, especially 16K page size, we must support
nodesize differently.
For nodesize < PAGE_SIZE, we will have the same requirement (tree blocks
can't cross page boundary).
While for nodesize >= PAGE_SIZE, we will require the tree blocks to be
page aligned.
To support such feature, we will make btrfs-check to reports more
subpage related warnings for metadata.
This patch will report any tree block which is not nodesize aligned as a
warning.
Existing mkfs/convert has already make sure all new tree blocks are
nodesize aligned, this is just for older converted filesystems.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Linux VFS doesn't allow directory to have hard links, thus for btrfs
on-disk directory inode items, their nlinks should never go beyond 1.
Lowmem mode already has the check and will report it without problem.
Only original mode needs this update.
Reported-by: Pepperpoint <pepperpoint@mb.ardentcoding.com>
Link: https://lore.kernel.org/linux-btrfs/162648632340.7.1932907459648384384.10178178@mb.ardentcoding.com/
Reviewed-by: Su Yue <l@damenly.su>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently v1 space cache clearing will delete one cache inode just in
one transaction, and then start a new transaction to delete the next
inode.
This is far from efficient and can make the already slow v1 space cache
deleting even slower, as large fs has tons of cache inodes to delete.
This patch will speed up the process by batching up to 16 inode deletion
into one transaction.
A quick benchmark of deleting 702 v1 space cache inodes would look like
this:
Unpatched: 4.898s
Patched: 0.087s
Which is obviously a big win.
Reported-by: Joshua <joshua@mailmag.net>
Link: https://lore.kernel.org/linux-btrfs/0b4cf70fc883e28c97d893a3b2f81b11@mailmag.net/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is a report from the mailing list that one user got its filesystem
with device item bytes_used mismatch.
This problem leaves the device item with some ghost bytes_used, meaning
even if we delete all device extents of that device, the bytes_used
still won't be 0.
This itself is not a big deal, but when the user used up all its
unallocated space, write time tree-checker can be triggered and make the
fs RO, as the new device::bytes_used can be larger than
device::total_bytes.
Thus we need to fix the problem in btrfs-check to avoid above write-time
tree check warning.
This patch will add the ability to reset a device's bytes_used to both
original mode and lowmem mode.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
User reported that test fsck-tests/037-freespacetree-repair fails:
# TEST=037\* ./fsck-tests.sh
[TEST/fsck] 037-freespacetree-repair
btrfs check should have detected corruption
test failed for case 037-freespacetree-repair
The test tries to corrupt FST, call btrfs check readonly then repair FST
using btrfs check. Above case failed at the second readonly check step.
Test log said "cache and super generation don't match, space cache will
be invalidated" which is printed by validate_free_space_cache().
If cache_generation of the superblock is not -1ULL,
validate_free_space_cache() requires that cache_generation must equal
to the superblock's generation. Otherwise, it skips the check of space
cache(v1, v2) like the above case where the sb cache_generation is 0.
Since kernel commit 948462294577 ("btrfs: keep sb cache_generation
consistent with space_cache"), sb cache_generation will be set to be 0
once space cache v1 is disabled (nospace_cache/space_cache=v2). But
progs check was forgotten to be added the 0 case support.
Fix it by adding the condition if sb cache_generation is 0 in
validate_free_space_cache() as the 0 case is valid now since the
kernel commit mentioned above.
Issue: #338
Signed-off-by: Su Yue <l@damenly.su>
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Move the file to common as it's used by several parts, while still
keeping the name 'repair' although the only thing it does is adding a
corrupted extent.
Signed-off-by: David Sterba <dsterba@suse.com>
There's a group of functions that are related to opening filesystem in
various modes, this can be moved to a separate file.
Signed-off-by: David Sterba <dsterba@suse.com>
For passing authentication keys to the checksumming functions we need a
container for the key.
Pass in a btrfs_fs_info to btrfs_csum_data() so we can use the fs_info
as a container for the authentication key.
Note this is not always possible for all callers of btrfs_csum_data() so
we're just passing in NULL for now
Functions calling btrfs_csum_data() with a NULL fs_info argument are
currently not supported in the context of an authenticated file system.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Extending open_ctree with more parameters would be difficult, we'll need
to add more so factor out the parameters to a structure for easier
extension.
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When btrfs-check is executed on even newly created fs, it can report
tree blocks crossing 64K page boundary like this:
Opening filesystem to check...
Checking filesystem on /dev/test/test
UUID: 80d734c8-dcbc-411b-9623-a10bd9e7767f
[1/7] checking root items
[2/7] checking extents
WARNING: tree block [30523392, 30539776) crosses 64K page boudnary, may cause problem for 64K page system
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 131072 bytes used, no error found
total csum bytes: 0
total tree bytes: 131072
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 125199
file data blocks allocated: 0
referenced 0
[CAUSE]
Tree block [30523392, 30539776) is at the last 16K slot of page.
As 30523392 % 65536 = 49152, and 30539776 % 65536 = 0.
The cross boundary check is using exclusive end, which causes false
alerts.
[FIX]
Use inclusive end to do the cross 64K boundary check.
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Issue: #352
Issue: #354
Fixes: fc38ae7f48 ("btrfs-progs: check: detect and warn about tree blocks crossing 64K page boundary")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>