Help string for "--clear-ino-cache" option is not following the
indentation of other help strings:
repair options:
--init-csum-tree create a new CRC tree
--init-extent-tree create a new extent tree
--clear-space-cache v1|v2 clear space cache for v1 or v2
--clear-ino-cache clear ino cache leftover items
The problem is caused by the usage of tab instead of space.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a report that, btrfstune can even work while the fs has transid
mismatch problems.
$ btrfstune -f -u /dev/sdb1
Current fsid: b2b5ae8d-4c49-45f0-b42e-46fe7dcfcb07
New fsid: b2b5ae8d-4c49-45f0-b42e-46fe7dcfcb07
Set superblock flag CHANGING_FSID
Change fsid in extents
parent transid verify failed on 792854528 wanted 20103 found 20091
parent transid verify failed on 792854528 wanted 20103 found 20091
parent transid verify failed on 792854528 wanted 20103 found 20091
Ignoring transid failure
parent transid verify failed on 792870912 wanted 20103 found 20091
parent transid verify failed on 792870912 wanted 20103 found 20091
parent transid verify failed on 792870912 wanted 20103 found 20091
Ignoring transid failure
parent transid verify failed on 792887296 wanted 20103 found 20091
parent transid verify failed on 792887296 wanted 20103 found 20091
parent transid verify failed on 792887296 wanted 20103 found 20091
Ignoring transid failure
ERROR: child eb corrupted: parent bytenr=38010880 item=69 parent level=1 child level=1
ERROR: failed to change UUID of metadata: -5
ERROR: btrfstune failed
This leaves a corrupted fs even more corrupted, and due to the extra
CHANGING_FSID flag, btrfs check will not even try to run on it:
Opening filesystem to check...
ERROR: Filesystem UUID change in progress
ERROR: cannot open file system
[CAUSE]
Unlike kernel, btrfs-progs has a less strict check on transid mismatch.
In read_tree_block() we will fall back to use the tree block even its
transid mismatch if we can't find any better copy.
However not all commands in btrfs-progs needs this feature, only
btrfs-check (which may fix the problem) and btrfs-restore (it just tries
to ignore any problems) really utilize this feature.
[FIX]
Introduce a new open ctree flag, OPEN_CTREE_ALLOW_TRANSID_MISMATCH, to
be explicit about whether we really want to ignore transid error.
Currently only btrfs-check and btrfs-restore will utilize this new flag.
Also add btrfs-image to allow opening such fs with transid error.
Link: https://www.reddit.com/r/btrfs/comments/pivpqk/failure_during_btrfstune_u/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In kernel space we hardly use btrfs_disk_key, unless for very lowlevel
code.
There is no need to intentionally use btrfs_disk_key in btrfs-progs
either.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When running "btrfs check --check-data-csum" on fs with corrupted data,
the error message almost makes no sense:
$ btrfs check --check-data-csum /dev/test/test
Opening filesystem to check...
Checking filesystem on /dev/test/test
UUID: c31afe0a-55bc-4e7d-aba0-9dfa9ddf8090
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking csums against data
mirror 1 bytenr 13631488 csum 19 expected csum 152 <<<
ERROR: errors found in csum tree
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 147456 bytes used, error(s) found
total csum bytes: 16
total tree bytes: 131072
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 124799
file data blocks allocated: 16384
referenced 16384
[CAUSE]
We're just outputting the first byte and in decimal, which is completely
different from what we did in kernel space, nor what we did for metadata
csum mismatch.
[FIX]
Use btrfs_format_csum() for btrfs-check to output csum.
Now the result looks much better:
[5/7] checking csums against data
mirror 1 bytenr 13631488 csum 0x13fec125 expected csum 0x98757625
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
- Change it void
The old one always return csum_size.
- Use snprintf()
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We do not detect problems with our bytes_used counter in the super
block. Thankfully the same method to fix block groups is used to re-set
the value in the super block, so simply add some extra code to validate
the bytes_used field and then piggy back on the repair code for block
groups.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The repair cycle in the main check will drop all of our cache and loop
through again to make sure everything is still good to go.
Unfortunately we record our unaligned extent records on a per-root list
so they can be retrieved when we're checking the fs roots. This isn't
straightforward to clean up, so instead simply check our current list of
unaligned extent records when we are adding a new one to make sure we're
not duplicating our efforts. This makes us able to pass fsck/001 with
my super bytes_used fix applied.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The lowmem mode validates the used field of the block group item, but
the normal mode does not. Fix this by keeping a running tally of what
we think the used value for the block group should be, and then if it
mismatches report an error and fix the problem if we have repair set.
We have to keep track of pending extents because we process leaves as we
see them, so it could be much later in the process that we find the
block group item to associate the extents with.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For the incoming extra page size support for subpage (sectorsize <
PAGE_SIZE) cases, the support for metadata will be a critical point.
Currently for subpage support, we require 64K page size, so that no
matter whatever the nodesize is, it will be contained inside one page.
And we will reject any tree block which crosses page boundary.
But for other page size, especially 16K page size, we must support
nodesize differently.
For nodesize < PAGE_SIZE, we will have the same requirement (tree blocks
can't cross page boundary).
While for nodesize >= PAGE_SIZE, we will require the tree blocks to be
page aligned.
To support such feature, we will make btrfs-check to reports more
subpage related warnings for metadata.
This patch will report any tree block which is not nodesize aligned as a
warning.
Existing mkfs/convert has already make sure all new tree blocks are
nodesize aligned, this is just for older converted filesystems.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Linux VFS doesn't allow directory to have hard links, thus for btrfs
on-disk directory inode items, their nlinks should never go beyond 1.
Lowmem mode already has the check and will report it without problem.
Only original mode needs this update.
Reported-by: Pepperpoint <pepperpoint@mb.ardentcoding.com>
Link: https://lore.kernel.org/linux-btrfs/162648632340.7.1932907459648384384.10178178@mb.ardentcoding.com/
Reviewed-by: Su Yue <l@damenly.su>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently v1 space cache clearing will delete one cache inode just in
one transaction, and then start a new transaction to delete the next
inode.
This is far from efficient and can make the already slow v1 space cache
deleting even slower, as large fs has tons of cache inodes to delete.
This patch will speed up the process by batching up to 16 inode deletion
into one transaction.
A quick benchmark of deleting 702 v1 space cache inodes would look like
this:
Unpatched: 4.898s
Patched: 0.087s
Which is obviously a big win.
Reported-by: Joshua <joshua@mailmag.net>
Link: https://lore.kernel.org/linux-btrfs/0b4cf70fc883e28c97d893a3b2f81b11@mailmag.net/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is a report from the mailing list that one user got its filesystem
with device item bytes_used mismatch.
This problem leaves the device item with some ghost bytes_used, meaning
even if we delete all device extents of that device, the bytes_used
still won't be 0.
This itself is not a big deal, but when the user used up all its
unallocated space, write time tree-checker can be triggered and make the
fs RO, as the new device::bytes_used can be larger than
device::total_bytes.
Thus we need to fix the problem in btrfs-check to avoid above write-time
tree check warning.
This patch will add the ability to reset a device's bytes_used to both
original mode and lowmem mode.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
User reported that test fsck-tests/037-freespacetree-repair fails:
# TEST=037\* ./fsck-tests.sh
[TEST/fsck] 037-freespacetree-repair
btrfs check should have detected corruption
test failed for case 037-freespacetree-repair
The test tries to corrupt FST, call btrfs check readonly then repair FST
using btrfs check. Above case failed at the second readonly check step.
Test log said "cache and super generation don't match, space cache will
be invalidated" which is printed by validate_free_space_cache().
If cache_generation of the superblock is not -1ULL,
validate_free_space_cache() requires that cache_generation must equal
to the superblock's generation. Otherwise, it skips the check of space
cache(v1, v2) like the above case where the sb cache_generation is 0.
Since kernel commit 948462294577 ("btrfs: keep sb cache_generation
consistent with space_cache"), sb cache_generation will be set to be 0
once space cache v1 is disabled (nospace_cache/space_cache=v2). But
progs check was forgotten to be added the 0 case support.
Fix it by adding the condition if sb cache_generation is 0 in
validate_free_space_cache() as the 0 case is valid now since the
kernel commit mentioned above.
Issue: #338
Signed-off-by: Su Yue <l@damenly.su>
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Move the file to common as it's used by several parts, while still
keeping the name 'repair' although the only thing it does is adding a
corrupted extent.
Signed-off-by: David Sterba <dsterba@suse.com>
There's a group of functions that are related to opening filesystem in
various modes, this can be moved to a separate file.
Signed-off-by: David Sterba <dsterba@suse.com>
For passing authentication keys to the checksumming functions we need a
container for the key.
Pass in a btrfs_fs_info to btrfs_csum_data() so we can use the fs_info
as a container for the authentication key.
Note this is not always possible for all callers of btrfs_csum_data() so
we're just passing in NULL for now
Functions calling btrfs_csum_data() with a NULL fs_info argument are
currently not supported in the context of an authenticated file system.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Extending open_ctree with more parameters would be difficult, we'll need
to add more so factor out the parameters to a structure for easier
extension.
Signed-off-by: David Sterba <dsterba@suse.com>
For the incoming subpage support, there is a new requirement for tree
blocks. Tree blocks should not cross 64K page boundary.
For current btrfs-progs and kernel, there shouldn't be any causes to
create such tree blocks. But still, we want to detect such tree blocks
in the wild before subpage support fully lands in upstream.
This patch will add such check for both lowmem and original mode.
Currently it's just a warning, since there aren't many users using 64K
page size yet.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are cases where v1 free space cache is still left while user has
already enabled v2 cache. In that case, we still want to force v1 space
cache cleanup in btrfs-check.
This patch will only warn and not exit if v2 is detected while the user
asked to clear v1.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Inode cache feature is going to be removed in kernel 5.11. After this
kernel version items left on disk by this feature will take some extra
space. Testing showed that the size is actually negligible but for
completeness' sake give ability to users to remove such left-overs.
This is achieved by iterating every fs root and removing respective
items as well as relevant csum extents since the ino cache used the csum
tree for csums.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While debugging some corruption, I got confused because it appeared as
if we had an invalid parent set on a extent reference, because of this
message:
tree backref 67014213632 parent 5 root 5 not found in extent tree
But it turns out that parent and the root are a union, and we were just
printing it out regardless of the type of backref it was. Fix the error
message to be consistent with the other mismatch messages, simply print
parent or root, depending on the ref type.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is pretty much the same as for lowmem mode, it will try to reset
the extent item generation using either the tree block generation or
current transid.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In check_block(), we unconditionally reset extent_record::generation.
This is in fact correct, but this makes original mode fail to detect bad
extent item generation.
So change to behavior to set the generation if and only if the tree
block generation is higher.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If btrfs check detects an error on the root inode of a subvolume it
prints:
Opening filesystem to check...
Checking filesystem on /dev/vdc
UUID: 4ac7a216-bf97-4c5f-9899-0f203c20d8af
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
root 5 root dir 256 error
ERROR: errors found in fs roots
found 196608 bytes used, error(s) found
total csum bytes: 0
total tree bytes: 131072
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 124376
file data blocks allocated: 65536
referenced 65536
This is not very helpful since there is no specific information about
the exact error. This is due to the fact that check_root_dir doesn't
set inode_record::errors accordingly. This patch rectifies this and now
the output would look like:
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
root 5 inode 256 errors 2000, link count wrong
ERROR: errors found in fs roots
found 196608 bytes used, error(s) found
total csum bytes: 0
total tree bytes: 131072
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 124376
file data blocks allocated: 65536
referenced 65536
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The inode transid detect and repair is reusing the existing inode
geneartion code.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function exists in kernel side but using the _item suffix, and
objectid argument is placed before the name argument. Change the
function to reflect the kernel version.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Allow repair_qgroups() to do silent repair, so it can acts as offline
qgroup rescan.
This provides the basis for later mkfs quota support.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The original qgroup-verify integrates qgroup classification into
report_qgroups(). This behavior makes silent qgroup repair (or offline
rescan) impossible.
To repair qgroup, we must call report_qgroups() to trigger bad qgroup
classification, which will output error message.
This patch moves bad qgroup classification from report_qgroups() to
qgroup_verify_all(). Now report_qgroups() is pretty lightweight, only
doing basic qgroup difference report thus change it type to void.
And since the functionality of qgroup_verify_all() changes, change
callers to handle the new return value correctly.
Signed-off-by: Qu Wenruo <wqu@suse.com>
[ removed some comments ]
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is a report about checksum item overlap, which makes newer btrfs
kernel to reject it due to tree-checker.
Now let btrfs-progs have the same ability to detect such problem.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This would sync the code between kernel and btrfs-progs, and save at
least 1 byte for each btrfs_block_group_cache.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The whole maybe_repair_root_item() and repair_root_items() functions are
introduced to handle an ancient bug in v3.17.
However in certain extent tree corruption case, such early exit would
only exit the whole check process early on, preventing user to know
what's really wrong about the fs.
So this patch will allow the check to continue, since the ancient bug is
no long that common.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We can easily get the level from @eb parameter, thus the level is not
needed.
This is inspired by the work of Marek in U-boot.
Cc: Marek Behun <marek.behun@nic.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we have memory allocation failure in add_cache_extent(), it will
simply exit with one error message.
That's definitely not proper, especially when all but one call sites
have handled the error.
This patch will return -ENOMEM for add_cache_extent(), and fix the only
call site which doesn't handle error from it.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
btrfs check can return strange return value for shell:
[Inferior 1 (process 48641) exited with code 0213]
^^^^
[CAUSE]
It's caused by the incorrect handling of qgroup error.
qgroup_report_ret can be -117 (-EUCLEAN), using that value with exit()
can cause overflow, causing return value not properly recognized.
[FIX]
Fix it by sanitize the return value to 0 or 1.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Reviewed-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Valgrind reports the following error for fsck/002 (which only supports
original mode):
==97088== Conditional jump or move depends on uninitialised value(s)
==97088== at 0x15BFF6: add_data_backref (main.c:4884)
==97088== by 0x16025C: run_next_block (main.c:6452)
==97088== by 0x165539: deal_root_from_list (main.c:8471)
==97088== by 0x166040: check_chunks_and_extents (main.c:8753)
==97088== by 0x166441: do_check_chunks_and_extents (main.c:8842)
==97088== by 0x169D13: cmd_check (main.c:10324)
==97088== by 0x11CDC6: cmd_execute (commands.h:125)
==97088== by 0x11D712: main (btrfs.c:386)
[CAUSE]
In alloc_data_backref(), only ref->node is set to 0.
While ref->disk_bytenr is not initialized at all.
And then in add_data_backref(), if @back is a newly allocated data
backref, we use the garbage from back->disk_bytenr to determine if we
should reset them.
[FIX]
Fix it by initialize the whole data_backref structure in
alloc_data_backref().
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
With valgrind, fsck/002 test with original mode would report the
following valgrind error:
==90600== Conditional jump or move depends on uninitialised value(s)
==90600== at 0x15C280: pick_next_pending (main.c:4949)
==90600== by 0x15F3CF: run_next_block (main.c:6175)
==90600== by 0x1655CC: deal_root_from_list (main.c:8486)
==90600== by 0x1660C7: check_chunks_and_extents (main.c:8762)
==90600== by 0x166439: do_check_chunks_and_extents (main.c:8842)
==90600== by 0x169D0B: cmd_check (main.c:10324)
==90600== by 0x11CDC6: cmd_execute (commands.h:125)
==90600== by 0x11D712: main (btrfs.c:386)
[CAUSE]
The problem happens like this:
deal_root_from_list(@list is empty)
|- stack @last is not initialized
|- while(!list_empty(list)) {} is skipped
|- run_next_block(&last);
|- pick_next_pending(*last);
|- node_start = last;
Since the stack @last is not initialized in deal_root_from_list(), the
final node_start = last assignment would just fetch the garbage from
stack.
[FIX]
Fix the problem by initializing @last to 0, as that's exactly what the
first while loop did.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>