Sometimes it's needed to do a check on a mounted filesystem. This should
work fine on a quiescent filesystem or a read-only mount. Changes on the
block device done by kernel might confuse the userspace checker and it
might crash when it reads some stale data.
Repair without mount checks is not supported right now.
Signed-off-by: David Sterba <dsterba@suse.cz>
The pointers to critical roots must be valid before we start using them,
eg. as the space clearing code.
Signed-off-by: David Sterba <dsterba@suse.com>
A code added in 2009 (95d3f20b51) for a very short-lived change in
the format is no concern to us nowadays.
Signed-off-by: David Sterba <dsterba@suse.com>
As btrfs_update_block_group fails when the block group is not found in
cache, we can exit btrfs_free_block_group, not much to rollback. The
caller will also exit in turn.
Signed-off-by: David Sterba <dsterba@suse.com>
Tree blocks are always nodesize. As readahead is only an optimization,
exact size is not required and is only advisory.
Signed-off-by: David Sterba <dsterba@suse.com>
I found some btrfs commands options are not working because of
inappropriate getopt_long() setting.
This fixes "btrfs check -Q/-E"
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Although lowmem mode can detect name and hash mismatch in dir_item,
it's done by checking inode_ref to expose such problem.
This patch will enhance dir_item check, by also comparing name and
hash when checking dir_items.
Reported-by: Filippe LeMarchand <gasinvein@gmail.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In original mode, we don't check if the name in dir_item matches the
hash in key.offset.
In the following case, original mode will report nothing wrong while
lowmem mode will detect the name and hash mismatch.
------
item 72 key (79177 DIR_ITEM 54846528) itemoff 12380 itemsize 88
location key (4222342 INODE_ITEM 0) type FILE
transid 170929 data_len 0 name_len 14
name: deprecated.sxt
location key (13590433 INODE_ITEM 0) type FILE
transid 796448 data_len 0 name_len 14
name: deprecated.txt
------
In above case, hash of "deprecated.txt" matches with 54846528,
while hash of "deprecated.sxt" should be 2008317993.
Reported-by: Filippe LeMarchand <gasinvein@gmail.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Just to keep the 1st paramter the same as kernel.
We can also save a few lines since the parameter is shorter now.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When checking chunk or dev extent, lowmem mode uses chunk length as dev
extent length, and if they mismatch, report missing chunk or dev extent
like:
------
ERROR: chunk[256 4324327424) stripe 0 did not find the related dev extent
ERROR: chunk[256 4324327424) stripe 1 did not find the related dev extent
ERROR: chunk[256 4324327424) stripe 2 did not find the related dev extent
------
However, only for Single/DUP/RAID1 profiles chunk length is the same as
dev extent length.
For other profiles, this will cause tons of false alert.
Fix it by using correct stripe length when checking chunk and dev extent
items.
This fixes the mkfs test failure when using lowmem mode check.
Reported-by: Kai Krakow <hurikhan77@gmail.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Before this patch, btrfs check lowmem mode manually checks found chunk
item, even we already have the generic chunk validation checker,
btrfs_check_chunk_valid().
This patch will use btrfs_check_chunk_valid() to replace open-coded
chunk validation checker in check_chunk_item().
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The only reasom read_tree_block() needs a btrfs_root parameter is to get
its node/sector size.
And long ago, I have already introduced a compactible interface,
read_tree_block_fs_info() to pass btrfs_fs_info instead of btrfs_root.
Since we have cleaned up all root->sector/node/stripesize users, we
should be OK to refactor read_tree_block() function.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
As Qu mentioned in this thread
(https://www.spinics.net/lists/linux-btrfs/msg64469.html), compression
can cause regular extent to co-exist with an inlined extent. This
coexistence makes things confusing. Since it is currently allowed and
can appear in a filesystem, fix btrfsck to prevent a bunch of error
reports to appear that will make user feel uneasy.
When checking a file extent, record the extent_end of the regular extent
to check if there is a gap between the regular extents. Normally there
is only one inlined extent, so the extent_end of inlined extent is
useless. However, if a regular extent can co-exist with an inlined
extent, the extent_end of the inlined extent also needs to be recorded.
Reported-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since the incompat feature NO_HOLES still allows us to have an explicit
hole file extent, current check is too strict and will cause false
alerts like:
root 5 EXTENT_DATA[257, 0] shouldn't be hole
Fix it by removing the strict file hole extent check.
Link: https://www.spinics.net/lists/linux-btrfs/msg66374.html
Reported-by: Henk Slager <eye1tm@gmail.com>
Tested-by: Henk Slager <eye1tm@gmail.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When reading out name from inode_ref, it's possible that corrupted
name_len can lead to read beyond boundary of item or even extent buffer.
This happens when checking fuzzed image /tmp/bko-161811.raw, for both
lowmem mode and original mode.
Below is the example from lowmem mode.
ERROR: root 5 INODE REF[256 256] doesn't have related DIR_INDEX[256 216172782113783808] namelen 255 filename bar filetype 0
ERROR: root 5 INODE REF[256 256] doesn't have related DIR_ITEM[256 1306590535] namelen 255 filename bar filetype 0
WARNING: root 5 INODE[256] mode 0 shouldn't have DIR_INDEX[256 1167283096]
WARNING: root 5 DIR_ITEM[256 1167283096] name too long
==13013== Invalid read of size 1
==13013== at 0x4C31A38: memmove (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13013== by 0x431518: read_extent_buffer (extent_io.c:863)
==13013== by 0x4752AB: check_dir_item (cmds-check.c:4627)
==13013== by 0x475E5C: check_inode_item (cmds-check.c:4911)
==13013== by 0x476200: check_fs_first_inode (cmds-check.c:5011)
==13013== by 0x476276: check_fs_root_v2 (cmds-check.c:5044)
==13013== by 0x4769FB: check_fs_roots_v2 (cmds-check.c:5242)
==13013== by 0x488B5B: cmd_check (cmds-check.c:13033)
==13013== by 0x40A8C5: main (btrfs.c:246)
==13013== Address 0x5c95b80 is 0 bytes after a block of size 4,224 alloc'd
==13013== at 0x4C2CF35: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13013== by 0x4307E0: __alloc_extent_buffer (extent_io.c:538)
==13013== by 0x430C37: alloc_extent_buffer (extent_io.c:642)
==13013== by 0x413DFE: btrfs_find_create_tree_block (disk-io.c:193)
==13013== by 0x414370: read_tree_block_fs_info (disk-io.c:340)
==13013== by 0x40B5D5: read_tree_block (disk-io.h:125)
==13013== by 0x40CFD2: read_node_slot (ctree.c:652)
==13013== by 0x40E5EB: btrfs_search_slot (ctree.c:1172)
==13013== by 0x4761A8: check_fs_first_inode (cmds-check.c:5001)
==13013== by 0x476276: check_fs_root_v2 (cmds-check.c:5044)
==13013== by 0x4769FB: check_fs_roots_v2 (cmds-check.c:5242)
==13013== by 0x488B5B: cmd_check (cmds-check.c:13033)
Fix it by double checking dir_item, name_len against item boundary
before trying to read out name from extent buffer, for both original
mode and lowmem mode.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When reading out name from inode_ref, it's possible that corrupted
name_len can lead to read beyond boundary of item or even extent buffer.
This happens when checking fuzzed image /tmp/bko-161811.raw, for both
lowmem mode and original mode.
ERROR: root 5 INODE REF[256 256] doesn't have related DIR_INDEX[256 504403158265495680] namelen 0 filename filetype 0
ERROR: root 5 INODE REF[256 256] doesn't have related DIR_ITEM[256 4294967294] namelen 0 filename filetype 0
WARNING: root 5 INODE_REF[256 256] name too long
==13022== Invalid read of size 8
==13022== at 0x4C319BE: memmove (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13022== by 0x431518: read_extent_buffer (extent_io.c:863)
==13022== by 0x474730: check_inode_ref (cmds-check.c:4307)
==13022== by 0x475D65: check_inode_item (cmds-check.c:4890)
==13022== by 0x476200: check_fs_first_inode (cmds-check.c:5011)
==13022== by 0x476276: check_fs_root_v2 (cmds-check.c:5044)
==13022== by 0x4769FB: check_fs_roots_v2 (cmds-check.c:5242)
==13022== by 0x488B5B: cmd_check (cmds-check.c:13033)
==13022== by 0x40A8C5: main (btrfs.c:246)
==13022== Address 0x5c96780 is 0 bytes after a block of size 4,224 alloc'd
==13022== at 0x4C2CF35: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13022== by 0x4307E0: __alloc_extent_buffer (extent_io.c:538)
==13022== by 0x430C37: alloc_extent_buffer (extent_io.c:642)
==13022== by 0x413DFE: btrfs_find_create_tree_block (disk-io.c:193)
==13022== by 0x414370: read_tree_block_fs_info (disk-io.c:340)
==13022== by 0x40B5D5: read_tree_block (disk-io.h:125)
==13022== by 0x40CFD2: read_node_slot (ctree.c:652)
==13022== by 0x40E5EB: btrfs_search_slot (ctree.c:1172)
==13022== by 0x4761A8: check_fs_first_inode (cmds-check.c:5001)
==13022== by 0x476276: check_fs_root_v2 (cmds-check.c:5044)
==13022== by 0x4769FB: check_fs_roots_v2 (cmds-check.c:5242)
==13022== by 0x488B5B: cmd_check (cmds-check.c:13033)
=
Fix it by double checking inode_ref, name_len against item boundary
before trying to read out name from extent buffer, for both original
mode and lowmem mode.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
fsck/004-no-dir-index makes valgrinds complaining about Invalid read.
==31890== Invalid read of size 1
==31890== at 0x453D09: repair_inode_backrefs (cmds-check.c:2690)
==31890== by 0x453D09: check_inode_recs (cmds-check.c:3330)
==31890== by 0x453D09: check_fs_root (cmds-check.c:4012)
==31890== by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890== by 0x45E788: cmd_check (cmds-check.c:13031)
==31890== by 0x40A88A: main (btrfs.c:246)
==31890== Address 0x5cb7b90 is 16 bytes inside a block of size 50 free'd
==31890== at 0x4C2C14B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31890== by 0x453D08: repair_inode_backrefs (cmds-check.c:2684)
==31890== by 0x453D08: check_inode_recs (cmds-check.c:3330)
==31890== by 0x453D08: check_fs_root (cmds-check.c:4012)
==31890== by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890== by 0x45E788: cmd_check (cmds-check.c:13031)
==31890== by 0x40A88A: main (btrfs.c:246)
==31890== Block was alloc'd at
==31890== at 0x4C2AF1F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31890== by 0x45055C: get_inode_backref (cmds-check.c:1075)
==31890== by 0x45055C: add_inode_backref (cmds-check.c:1097)
==31890== by 0x45180C: process_dir_item (cmds-check.c:1525)
==31890== by 0x45180C: process_one_leaf (cmds-check.c:1838)
==31890== by 0x45180C: walk_down_tree (cmds-check.c:2134)
==31890== by 0x45180C: check_fs_root (cmds-check.c:3957)
==31890== by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890== by 0x45E788: cmd_check (cmds-check.c:13031)
==31890== by 0x40A88A: main (btrfs.c:246)
==31890==
==31890== Invalid read of size 8
==31890== at 0x452D66: repair_inode_backrefs (cmds-check.c:2731)
==31890== by 0x452D66: check_inode_recs (cmds-check.c:3330)
==31890== by 0x452D66: check_fs_root (cmds-check.c:4012)
==31890== by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890== by 0x45E788: cmd_check (cmds-check.c:13031)
==31890== by 0x40A88A: main (btrfs.c:246)
==31890== Address 0x5cb7b90 is 16 bytes inside a block of size 50 free'd
==31890== at 0x4C2C14B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31890== by 0x453D08: repair_inode_backrefs (cmds-check.c:2684)
==31890== by 0x453D08: check_inode_recs (cmds-check.c:3330)
==31890== by 0x453D08: check_fs_root (cmds-check.c:4012)
==31890== by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890== by 0x45E788: cmd_check (cmds-check.c:13031)
==31890== by 0x40A88A: main (btrfs.c:246)
==31890== Block was alloc'd at
==31890== at 0x4C2AF1F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31890== by 0x45055C: get_inode_backref (cmds-check.c:1075)
==31890== by 0x45055C: add_inode_backref (cmds-check.c:1097)
==31890== by 0x45180C: process_dir_item (cmds-check.c:1525)
==31890== by 0x45180C: process_one_leaf (cmds-check.c:1838)
==31890== by 0x45180C: walk_down_tree (cmds-check.c:2134)
==31890== by 0x45180C: check_fs_root (cmds-check.c:3957)
==31890== by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890== by 0x45E788: cmd_check (cmds-check.c:13031)
==31890== by 0x40A88A: main (btrfs.c:246)
==31890==
While iterating over backrefs in repair_inode_backrefs, there are
several situations to repair one backref according
backref->found_dir_item and backref->found_dir_index. Two of these
branches may free the backref, but next checks will still access the
freed memory.
Because these branches are independent, let repair_inode_backrefs skip
to handle next backref after free can fix it.
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since we memset tmpl, max_size==0. This does not seem consistent with nr = 1.
In check_extent_refs, we will call:
set_extent_dirty(root->fs_info->excluded_extents,
rec->start,
rec->start + rec->max_size - 1);
This ends up with BUG_ON(end < start) in insert_state.
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When this happens, we will trip a BUG_ON(end < start) in insert_state
because in check_extent_refs, we use this max_size expecting it's not zero:
set_extent_dirty(root->fs_info->excluded_extents,
rec->start,
rec->start + rec->max_size - 1);
See https://bugzilla.redhat.com/show_bug.cgi?id=1435567
for an example where this scenario occurs.
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.com>
See https://bugzilla.redhat.com/show_bug.cgi?id=1435567 for an example
where the message occurs.
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
[ un-indent strings overfowing 80 cols ]
Signed-off-by: David Sterba <dsterba@suse.com>
Fuzzed image bko-161821.raw causes btrfs check to get segmentation fault.
The function check_owner_ref attempts to access a non-exist quota tree
when dealing with extent_item [4198400 4096] in the corrupted filesystem.
The function btrfs_new_fs_info always allocates memory for
fs_info->quota_root regardless of whether quota_tree exists or not.
Additionally, the function btrfs_read_fs_root will directly return
fs_info->quota_root if location->objectid == BTRFS_QUOTA_TREE_OBJECTID.
This patch does the following things:
1. Do extra check and return ENOENT if quota tree does not exist in the
function btrfs_read_fs_root.
2. Free useless fs_info->quota_root in the function btrfs_setup_all_roots
to reduce confusion.
3. free_extent_buffer even if check_child_node failed in the function
walk_down_tree.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
fsck/003-shift-offsets makes valgrinds complaining about memory leaks.
==5910==
==5910== HEAP SUMMARY:
==5910== in use at exit: 1,112 bytes in 11 blocks
==5910== total heap usage: 161 allocs, 150 frees, 164,800 bytes allocated
==5910==
==5910== 216 (72 direct, 144 indirect) bytes in 1 blocks are definitely lost in loss record 3 of 5
==5910== at 0x4C2AF1F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5910== by 0x4815A3: add_root_item_to_list (cmds-check.c:9683)
==5910== by 0x481CE2: check_chunks_and_extents (cmds-check.c:9886)
==5910== by 0x48888B: cmd_check (cmds-check.c:12977)
==5910== by 0x40A8C5: main (btrfs.c:246)
==5910==
The check_chunks_and_extents() memory leaks are caused by not freeing
added root items of normal_trees and dropping_trees.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In check_extent_data_item(), after checking extent item of one data
extent, we search inlined data backref, then EXTENT_DATA_REF_KEY.
But we didn't search SHARED_DATA_REF, so if the backref is
SHARED_DATA_REF, then we will raise a false alert about backref lost.
Fix by also checking SHARED_DATA_REF_KEY in check_extent_data_item().
Reported-by: Chris Murphy <chris@colorremedies.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In lowmem mode, 'walk_down_tree_v2' returns negative values wheather
the error is fatal or not. It causes the loop where 'walk_down_tree_v2'
is to break even the error is tolerated and then subsequent nodes process
will be skipped.
Fix it by redefining meanings of values 'walk_down_tree_v2' returns.
Do a similar fix for 'process_one_leaf_v2'.
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If first inode item is missing, lowmem check will detect it but does not
output any error message.
Add error message for it.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Old lowmem check doesn't check if the inline extent is compressed and
always checks extent numbytes against inline item size.
And when it finds the extent numbytes mismatch with inline item size it
doesn't output any error message, just return error silently, making it
quite hard to debug.
Fix it by only checking extent numbytes against inline item size when
the extent is not compressed, and output error message.
Reported-by: Christoph Anton Mitterer <calestyo@scientia.net>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If one extent item has no inline ref, btrfs lowmem mode check can give
false alert without outputting any error message.
The problem is lowmem mode always assumes that extent item has inline
refs, and when it encounters such case it flags the extent item has
wrong size, but doesn't output the error message.
Although we already have such image submitted, at the commit time due to
another bug in cmds-check return value, it doesn't detect it until that
bug is fixed.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Btrfs lowmem check can report false csum error like:
ERROR: root 5 EXTENT_DATA[257 0] datasum missing
ERROR: root 5 EXTENT_DATA[257 4096] prealloc shouldn't have datasum
This is because lowmem check code always compare the found csum size
with the whole extent which data extents points to.
Normally it's OK, but when prealloc extent is written, or reflink is
done, data extent can points to part of a larger extent, making the csum
check wrong.
To fix it, the csum check part is modified to handle plain and
compressed extents in different ways:
1) Plain extent
Only search csums for the range it refers to.
So the search range is from (disk_bytenr + extent_offset) and search
length is (extent_num_bytes)
2) Compressed extent
Search the whole extent.
Search range is from (disk_bytner) and search length is
(disk_num_bytes)
Reported-by: Chris Murphy <chris@colorremedies.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Although we output error like "errors found in extent allocation tree or
chunk allocation", but we lacks such output for other trees, but leaving
the final "found error is %d" to catch the last return value(and
sometime it's cleared)
This patch adds extra error message for top level error path, and modify
the last "found error is %d" to "error(s) found" or "no error found".
Cc: Christoph Anton Mitterer <calestyo@scientia.net>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since btrfs_search_slot() can point to the slot which is beyond the
leaves' capacity, we should pay extra attention when doing afterward
search.
While for lowmem check, several places uses afterward search:
1) Block group item used space check
2) Device item used space check
3) Data extent backref check.
In the following case for block group item check, btrfs lowmem mode
check will skip the block group and report false alert:
leaf 29405184 items 37 free space 1273 generation 11 owner 2
...
item 36 key (77594624 EXTENT_ITEM 2097152)
extent refs 1 gen 8 flags DATA
extent data backref root 5 objectid 265 offset 0 count 1
leaf 29409280 items 43 free space 670 generation 11 owner 2
item 0 key (96468992 EXTENT_ITEM 2097152)
extent refs 1 gen 8 flags DATA
extent data backref root 5 objectid 274 offset 0 count 1
item 1 key (96468992 BLOCK_GROUP_ITEM 33554432)
block group used 2265088 chunk_objectid 256 flags DATA
When checking block group item, we will search key (96468992 0 0) to
start from the first item in the block group.
While search_slot() will point to leaf 29405184, slot 37 which is beyond
leaf capacity.
And when reading key from slot 37, uninitialized data can be read out
and cause us to exit block group item check, leading to false alert.
Fix it by checking path.slot[0] before reading out the key.
Reported-by: Christoph Anton Mitterer <calestyo@scientia.net>
Reported-by: Chris Murphy <chris@colorremedies.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>