For the incoming subpage support, there is a new requirement for tree
blocks. Tree blocks should not cross 64K page boundary.
For current btrfs-progs and kernel, there shouldn't be any causes to
create such tree blocks. But still, we want to detect such tree blocks
in the wild before subpage support fully lands in upstream.
This patch will add such check for both lowmem and original mode.
Currently it's just a warning, since there aren't many users using 64K
page size yet.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is an internal report about bad extent item generation triggering
tree-checker.
This patch will add the repair ability to btrfs check --mode=lowmem
mode, by resetting the generation field of extent item.
Currently the correct generation for tree block is fetched from its
header, while for data extent it uses transid as fallback.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are quite some reports on kernel rejecting invalid inode
generation, but it turns out to be that, kernel is just rejecting inode
transid. It's a bug in kernel error message.
To solve the problem and make the fs mountable again, add the detect and
repair support for lowmem mode.
The implementation is pretty much the same, just re-use the existing
inode generation detect and repair code.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The variable @root is only set but not utilized, while we only utilize
@root1.
Replace @root1 with @root, then remove the @root1.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In check_chunk_item() we search extent tree for block group item.
Refactor this part into a separate function, find_block_group_item(), so
that later skinny-bg-tree feature can reuse it.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This would sync the code between kernel and btrfs-progs, and save at
least 1 byte for each btrfs_block_group_cache.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There are some reports on fsck/001 test segfault failure with lowmem mode.
While I failed to reproduce it, valgrind still catches it with the
following output:
Delete backref in extent [12845056 1048576]
ERROR: file extent [257, 0] has unaligned disk bytenr: 755944791, should be aligned to 4096
ERROR: file extent[257 0] root 5 owner 5 backref lost
Deleted root 5 item[257, 108, 0]
==29080== Conditional jump or move depends on uninitialised value(s)
==29080== at 0x1A81D7: btrfs_release_path (ctree.c:97)
==29080== by 0x192C33: repair_extent_data_item (mode-lowmem.c:3330)
==29080== by 0x1962FF: check_leaf_items (mode-lowmem.c:4696)
==29080== by 0x196ABF: walk_down_tree (mode-lowmem.c:4858)
==29080== by 0x197762: check_btrfs_root (mode-lowmem.c:5157)
==29080== by 0x198335: check_chunks_and_extents_lowmem (mode-lowmem.c:5450)
==29080== by 0x166414: do_check_chunks_and_extents (main.c:8829)
==29080== by 0x169CF7: cmd_check (main.c:10313)
==29080== by 0x11CDC6: cmd_execute (commands.h:125)
==29080== by 0x11D712: main (btrfs.c:386)
==29080==
[CAUSE]
In repair_extent_data_item() if we find unaligned file extent, we just
delete it and kick in hole punch procedure.
The problem is, file extent deletion is done before initializing @path.
And when the deletion is done without problem, we will goto out tag,
which will release @path, containing uninitialized values, and
triggering segfault.
[FIX]
Don't try to abort trans nor free path if we're going through file
extent deletion routine.
Fixes: 0617bde3bc ("btrfs-progs: lowmem: delete unaligned bytes extent data under repair")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Lowmem check had the opposite problem of normal check, it caught gaps
that started at 0, but would still fail with my fixes in place. This is
because lowmem check doesn't take into account the isize of the inode.
Address this by making sure we do not complain about gaps that are after
isize. This makes lowmem pass with my fixes applied, and still fail
without my fixes.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We are going to touch dirty_bgs in transaction directly, so every call
chain should pass @trans to the leaf functions.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since older `btrfs check --init-extent-tree` could cause invalid
EXTENT_ITEM generation for data extents, add such check to lowmem mode
check.
Also add such generation check to file extents too.
For the repair part, I don't have any good idea yet. So affected user
may depend on --init-extent-tree again.
Reviewed-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are at least two bug reports of kernel tree-checker complaining
about invalid inode generation.
All offending inodes seem to be caused by old kernel around 2014, with
inode generation overflow.
So add such check and repair ability to lowmem mode check first.
This involves:
- Calculate the inode generation upper limit
If it's an inode from log tree, then the upper limit is
super_generation + 1, otherwise it's super_generation.
- Check if the inode generation is larger than the upper limit
- Repair by resetting inode generation to current transaction
generation
Reported-by: Charles Wright <charles.v.wright@gmail.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For lowmem mode, if we hit a bad inode mode, normally it is reported
when we checking the DIR_INDEX/DIR_ITEM of the parent inode.
If we didn't repair at that time, the error will be recorded even if we
fixed it later.
So this patch will check for INODE_ITEM_MISMATCH error type, and if it's
really caused by invalid imode, repair it and clear the error.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For orphan inodes, kernel won't update its nbytes and size since it's a
waste of time.
So lowmem check can report false alert on some orphan inodes.
Fix it by checking if the inode is an orphan before
complaining/repairing its nbytes.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since kernel is going to reject any root item which is newer than super
block generation, we need to provide a way to fix such problem in
btrfs-check.
This patch addes the ability to report and repair root generation in
lowmem mode.
This is done by cowing the root node, so we will update the root
generation along with the root node generation at commit transaction
time.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Create directory for all sources that can be used by anything that's not
rellated to a relevant kernel part, all common functions, helpers,
utilities that do not fit any other specific category.
The traditional location would be probably lib/ with all things that are
statically linked to the main binaries, but we have libbtrfs and
libbtrfsutil so this would be confusing.
Signed-off-by: David Sterba <dsterba@suse.com>
In lowmem mode, we check fs roots and free space cache by iterating
each root item and inode item, using btrfs_next_item() and a path
pointing to the root tree.
However in repair mode, check_fs_root() can modify the fs root, thus
CoWs the tree root, and the old path in check_fs
It could lead to strange behavior, e.g. after repairing a fs tree, the
path can point to a fs tree.
Since no ROOT_ITEM exists in fs tree, all remaining trees are skipped in
repair mode.
This bug exists from the early time of lowmem mode repair, and is only
exposed by recent free space inode check code. (Fs tree inodes are
passed to free space inode check, causing false alerts and repair
failure).
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is one report of compressed extent happens in btrfs, but has no
csum and then leads to possible decompress error screwing up kernel
memory.
Although it's a kernel bug, and won't cause problem until compressed
data get corrupted, let's catch such problem in advance.
This patch will catch any unexpected compressed extent with:
1) missing csum
2) nodatasum flag set in the inode item
This is for lowmem mode.
Reported-by: James Harvey <jamespharvey20@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Unlike inodes in fs roots, we don't really check the inode items in root
tree, in fact we just skip everything other than ROOT_ITEM and ROOT_REF.
This makes invalid inode items sneak into root tree.
For example:
item 9 key (256 INODE_ITEM 0) itemoff 13702 itemsize 160
generation 30 transid 30 size 65536 nbytes 1507328
block group 0 mode 0 links 1 uid 0 gid 0 rdev 0
^ Should be 100600
sequence 23 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
atime 0.0 (1970-01-01 08:00:00)
ctime 1553491158.189771625 (2019-03-25 13:19:18)
mtime 0.0 (1970-01-01 08:00:00)
otime 0.0 (1970-01-01 08:00:00)
There is a report of such problem in the mail list.
This patch will check and repair inode items of free space cache inodes in
lowmem mode.
Since free space cache inodes doesn't have INODE_REF but still has 1
link, we can't use check_inode_item() directly.
Instead we only check the inode mode, as that's the important part.
The check and repair function: check_repair_free_space_inode() is also
exported for original mode.
Signed-off-by: Qu Wenruo <wqu@suse.com>
In root tree, we only have 2 types of inodes:
- ROOT_TREE_DIR inode
Its mode is fixed to 40755
- free space cache inodes
Its mode is fixed to 100600
This patch will add the ability to repair such inodes to lowmem mode.
For fs/subvolume tree error, at least we haven't see such corruption
yet, so we don't need to rush to fix corruption in fs trees yet.
The repair function, reset_imode() and repair_imode_common() can be
reused by later original mode patch, so it's placed in check/mode-common.c.
Signed-off-by: Qu Wenruo <wqu@suse.com>
There is one report about invalid free space cache inode mode.
Normally free space cache inode should have mode 100600 (regular file,
no uid/gid/sticky bit, rw------ bit).
But in that report, we have free space cache inode mode as 0.
So at least btrfs check should report invalid inode mode.
This patch will at least make btrfs check lowmem mode to detect this
problem.
Please note that, this check only applies to inodes in fs/subvol trees.
It doesn't apply to free space cache inodes.
Reported-by: Thorsten Hirsch <t.hirsch@web.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
If found a extent data item has unaligned part, lowmem repair
just deletes it.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
The function can delete items in trees besides extent tree.
Rename and move it for further use.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
[Update comment, solve merge conflicts]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Add support to check unaligned disk_bytenr for extent_data.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Previously, @err are assigned immediately after check but before
repair.
repair_extent_item()'s return value also confuses the caller. If
error has been repaired and returns 0, check_extent_item() will try
to continue check the nonexistent and cause flase alerts.
Here make repair_extent_item()'s return codes only represents status
of the extent item, error bits are handled in caller of the repair
function.
Change of @err after repair.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
[Solve conflicts with DIR_ITEM hash mismatch patchset]
Signed-off-by: Qu Wenruo <wqu@suse.com>
For files, lowmem repair will try to check nbytes and isize,
but isize check depends nbytes.
Once bytes has been repaired, then isize should be checked and
repaired.
So move nbytes check before isize check. Also set nbytes to
extent_size once repaired successfully.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Since repair will do CoW, the outer path may be invalid.
This patch will add an argument, @path, to punch_extent_hole().
When punch_extent_hole() returns, path will still point to the same key.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
[Update comment and commit message]
Signed-off-by: Qu Wenruo <wqu@suse.com>
The 'end' parameter of check_file_extent tracks the ending offset of the
last checked extent. This is used to detect gaps between adjacent extents.
Currently such gaps are wrongly detected since for regular extents only
the size of the extent is added to the 'end' parameter. This results in
wrongly considering all extents of a file as having gaps between them
when only 2 of them really have a gap as seen in the example below.
Solution:
The extent_end variable should set to the sum of the offset and the
extent_num_bytes of the file extent.
Example:
Suppose that lowmem check the following file extent of inode 257.
item 6 key (257 EXTENT_DATA 0) itemoff 15813 itemsize 53
generation 6 type 1 (regular)
extent data disk byte 13631488 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0 (none)
item 7 key (257 EXTENT_DATA 8192) itemoff 15760 itemsize 53
generation 6 type 1 (regular)
extent data disk byte 13631488 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0 (none)
item 8 key (257 EXTENT_DATA 12288) itemoff 15707 itemsize 53
generation 6 type 1 (regular)
extent data disk byte 13631488 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0 (none)
For inode 257, check_inode_item set extent_end to 0, then call
check_file_extent to check item {6,7,8}.
item 6)
offset(0) == extent_end(0)
extent_end = extent_end(0) + extent_num_bytes(4096)
item 7)
offset(8192) != extent_end(4096)
extent_end = extent_end(4096) + extent_num_bytes(4096)
^^^
The old extent_end should replace by offset(8192).
item 8)
offset(12288) != extent_end(8192)
^^^
But there is no gap between item {7,8}.
Fixes: d88da10ddd ("btrfs-progs: check: introduce function to check file extent")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
[Move this patch as the 1st patch, since it's an independent fix]
Signed-off-by: Qu Wenruo <wqu@suse.com>
For DIR_ITEM with mismatch hash, we could just remove the offending dir
item from the tree.
Lowmem mode will handle the rest, either re-create the correct dir_item
or move the orphan inode to lost+found.
This is especially important for old filesystems, since later kernel
introduces stricter tree-checker, which could detect such hash mismatch
and refuse to read the corrupted leaf.
With this repair ability, user could repair with 'btrfs check
--mode=lowmem --repair'.
Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1111991
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add such check to check_dev_item(), since at that time we're also
iterating dev extents for dev item accounting.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Move "\n" at end of the sentence to print.
Fixes: 281eec7a9d ("btrfs-progs: check: repair inode nbytes in lowmem mode")
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Found using -Wmissing-prototypes in GCC. This should improve LTO
behavior.
Note that set_free_space_tree_thresholds is an unused function. Adding
inline seems to remove the unused function warning.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Similar to the changes where strerror(errno) was converted, continue
with the remaining cases where the argument was stored in another
variable.
The savings in object size are about 4500 bytes:
$ size btrfs.old btrfs.new
text data bss dec hex filename
805055 24248 19748 849051 cf49b btrfs.old
804527 24248 19748 848523 cf28b btrfs.new
Signed-off-by: David Sterba <dsterba@suse.com>
Make the checks in check_file_extent a bit more explicit. First we check
for unknown type and fail accordingly. Then we check for inline extent
and handle it in the newly introduced check_file_extent_inline. Finally
if none of the above checks triggered then we must have a regular or
prealloc extents.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>