To make original mode to repair imode error in subvolume trees, this
patch will do:
- Remove the show-stopper checks for root->objectid.
Now repair_imode_original() will accept inodes in subvolume trees.
- Export detect_imode() for original mode
Due to the call requirement, original mode must use an existing trans
handler to do the repair, thus we need to re-implement most of the
work done in repair_imode_common().
- Make repair_imode_original() to use detect_imode().
- Free the path after reset_imode()
reset_imode() keeps the path, as lowmem mode uses path to locate its
current check position.
But for original mode, the unreleased path can cause later repair to
report warning, so we need to manually release the path.
- Update rec->imode after imode reset
So later repair depending on rec->imode can get correct value.
- Move the repair before repair_inode_nlinks()
repair_inode_nlinks() needs correct imode to add DIR_INDEX/DIR_ITEM.
So moving the repair before repair_inode_nlinks() makes the latter
repair happier.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function will be later used by common mode code, so export it.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since kernel is going to reject any root item which is newer than super
block generation, we need to provide a way to fix such problem in
btrfs-check.
This patch addes the ability to report and repair root generation in
lowmem mode.
This is done by cowing the root node, so we will update the root
generation along with the root node generation at commit transaction
time.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Unlike inodes in fs roots, we don't really check the inode items in root
tree, in fact we just skip everything other than ROOT_ITEM and ROOT_REF.
This makes invalid inode items sneak into root tree.
For example:
item 9 key (256 INODE_ITEM 0) itemoff 13702 itemsize 160
generation 30 transid 30 size 65536 nbytes 1507328
block group 0 mode 0 links 1 uid 0 gid 0 rdev 0
^ Should be 100600
sequence 23 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
atime 0.0 (1970-01-01 08:00:00)
ctime 1553491158.189771625 (2019-03-25 13:19:18)
mtime 0.0 (1970-01-01 08:00:00)
otime 0.0 (1970-01-01 08:00:00)
There is a report of such problem in the mail list.
This patch will check and repair inode items of free space cache inodes in
lowmem mode.
Since free space cache inodes doesn't have INODE_REF but still has 1
link, we can't use check_inode_item() directly.
Instead we only check the inode mode, as that's the important part.
The check and repair function: check_repair_free_space_inode() is also
exported for original mode.
Signed-off-by: Qu Wenruo <wqu@suse.com>
In root tree, we only have 2 types of inodes:
- ROOT_TREE_DIR inode
Its mode is fixed to 40755
- free space cache inodes
Its mode is fixed to 100600
This patch will add the ability to repair such inodes to lowmem mode.
For fs/subvolume tree error, at least we haven't see such corruption
yet, so we don't need to rush to fix corruption in fs trees yet.
The repair function, reset_imode() and repair_imode_common() can be
reused by later original mode patch, so it's placed in check/mode-common.c.
Signed-off-by: Qu Wenruo <wqu@suse.com>
There is one report about invalid free space cache inode mode.
Normally free space cache inode should have mode 100600 (regular file,
no uid/gid/sticky bit, rw------ bit).
But in that report, we have free space cache inode mode as 0.
So at least btrfs check should report invalid inode mode.
This patch will at least make btrfs check lowmem mode to detect this
problem.
Please note that, this check only applies to inodes in fs/subvol trees.
It doesn't apply to free space cache inodes.
Reported-by: Thorsten Hirsch <t.hirsch@web.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
For DIR_ITEM with mismatch hash, we could just remove the offending dir
item from the tree.
Lowmem mode will handle the rest, either re-create the correct dir_item
or move the orphan inode to lost+found.
This is especially important for old filesystems, since later kernel
introduces stricter tree-checker, which could detect such hash mismatch
and refuse to read the corrupted leaf.
With this repair ability, user could repair with 'btrfs check
--mode=lowmem --repair'.
Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1111991
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We reuse the task_position enum and task_ctx struct of the original progress
indicator, adding more values and fields for our needs.
Then add hooks in all steps of the check to properly record progress.
Here's how the output looks like on a 22 Tb 5-disk RAID1 FS:
Opening filesystem to check...
Checking filesystem on /dev/mapper/luks-ST10000VN0004-XXXXXXXX
UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
[1/7] checking extents (0:20:21 elapsed, 950958 items checked)
[2/7] checking root items (0:01:29 elapsed, 15121 items checked)
[3/7] checking free space cache (0:00:11 elapsed, 4928 items checked)
[4/7] checking fs roots (0:51:31 elapsed, 600892 items checked)
[5/7] checking csums (0:14:35 elapsed, 754522 items checked)
[6/7] checking root refs (0:00:00 elapsed, 232 items checked)
[7/7] checking quota groups skipped (not enabled on this FS)
found 5286458060800 bytes used, no error found
Signed-off-by: Stéphane Lesimple <stephane_btrfs@lesimple.fr>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit d17d6663c99c ("btrfs-progs: lowmem check: Fix regression which
screws up extent allocator") removes pin_metadata_blocks() from lowmem
repair. So we have to find another way to exclude extents which should
be occupied by existing tree blocks.
Modify pin_down_tree_blocks() and rename it to traverse_tree_blocks
for sharing code with new function exclude_metadata_blocks().
* exclude_metadata_blocks() traverses and marks extents of all tree
blocks dirty in fs_info->excluded_extents.
* cleanup_excluded_extents() is responsible for cleanup.
Export them to mode-common.h since they will be used both in original
and lowmem modes.
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Move pin_down_tree_blocks from main.c to mode-common.c for
further patches.
And export pin_metadata_blocks to mode-common.h.
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Under some cases the filesystem checker reports an error when it finds
checksum items for an extent that is referenced by an inode as a prealloc
extent. Such cases are not an error when the extent is actually shared
(was cloned/reflinked) with other inodes and was written through one of
those other inodes.
Example:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ touch /mnt/foo
$ xfs_io -c "falloc 0 256K" /mnt/foo
$ sync
$ xfs_io -c "pwrite -S 0xab 0 256K" /mnt/foo
$ touch /mnt/bar
$ xfs_io -c "reflink /mnt/foo 0 0 256K" /mnt/bar
$ xfs_io -c "fsync" /mnt/bar
<power fail>
$ mount /dev/sdb /mnt
$ umount /mnt
$ btrfs check /dev/sdc
Checking filesystem on /dev/sdb
UUID: 52d3006e-ee3b-40eb-aa21-e56253a03d39
checking extents
checking free space cache
checking fs roots
root 5 inode 257 errors 800, odd csum item
ERROR: errors found in fs roots
found 688128 bytes used, error(s) found
total csum bytes: 256
total tree bytes: 163840
total fs tree bytes: 65536
total extent tree bytes: 16384
btree space waste bytes: 138819
file data blocks allocated: 10747904
referenced 10747904
$ echo $?
1
So teach check to not report such cases as errors by checking if the
extent is shared with other inodes and if so, consider it an error the
existence of checksum items only if all those other inodes are referencing
the extent as a prealloc extent.
This case can be hit often when running the generic/475 testcase from
fstests.
A test case will follow in a separate patch.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>