While testing snapshot deletion with dm-log-writes I saw that I was
failing the fsck sometimes when the fs was actually in the correct
state. This is because we only skip blocks on the same level of
root_item->drop_level. If the drop_level < the root level then we could
very well walk into nodes that we wouldn't actually walk into on fs
mount, because the drop_progress is further ahead in the slot of the
root. Instead only process the slots of the nodes that are above the
drop_progress key. With this patch in place we no longer improperly
fail to check fs'es that have a drop_progress set with a drop_level <
root level.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Create directory for all sources that can be used by anything that's not
rellated to a relevant kernel part, all common functions, helpers,
utilities that do not fit any other specific category.
The traditional location would be probably lib/ with all things that are
statically linked to the main binaries, but we have libbtrfs and
libbtrfsutil so this would be confusing.
Signed-off-by: David Sterba <dsterba@suse.com>
Commit 756105181e ("btrfs-progs: check: supplement extent backref
list with rbtree") changed the backref implementation to use rb tree
and also commented the old implementations. It's been almost 2 years
since that change and it's unlikely the old version will ever be used,
so just remove it.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that every call site has a cmd_struct, we can just pass the cmd_struct
to usage to print the usager information. This allows us to interpret
the format flags we'll add later in this series to inform the user of
which output formats any given command supports.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch passes the cmd_struct to the command callback function. This
has several purposes: It allows the command callback to identify which
command was used to call it. It also gives us direct access to the
usage associated with that command.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Rather than having global command usage and callbacks used to create
cmd_structs in the command array, establish the cmd_struct structures
separately and use those. The next commit in the series passes the
cmd_struct to the command callbacks such that we can access flags
and determine which of several potential command we were called as.
This establishes several macros to more easily define the commands
within each command's source.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In lowmem mode, we check fs roots and free space cache by iterating
each root item and inode item, using btrfs_next_item() and a path
pointing to the root tree.
However in repair mode, check_fs_root() can modify the fs root, thus
CoWs the tree root, and the old path in check_fs
It could lead to strange behavior, e.g. after repairing a fs tree, the
path can point to a fs tree.
Since no ROOT_ITEM exists in fs tree, all remaining trees are skipped in
repair mode.
This bug exists from the early time of lowmem mode repair, and is only
exposed by recent free space inode check code. (Fs tree inodes are
passed to free space inode check, causing false alerts and repair
failure).
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
BTRFS_COMPAT_EXTENT_TREE_V0 is introduced for a short time in kernel,
and it's over 10 years ago.
Nowadays there should be no user for that feature, and kernel has remove
this support in Jun, 2018. There is no need for btrfs-progs to support
it.
This patch will remove EXTENT_TREE_V0 related code and replace those
BUG_ON() to a more graceful error message.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is one report of compressed extent happens in btrfs, but has no
csum and then leads to possible decompress error screwing up kernel
memory.
Although it's a kernel bug, and won't cause problem until compressed
data get corrupted, let's catch such problem in advance.
This patch will catch any unexpected compressed extent with:
1) 0 or less than expected csum
2) nodatasum flag set in the inode item
This is for original mode.
Reported-by: James Harvey <jamespharvey20@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is one report of compressed extent happens in btrfs, but has no
csum and then leads to possible decompress error screwing up kernel
memory.
Although it's a kernel bug, and won't cause problem until compressed
data get corrupted, let's catch such problem in advance.
This patch will catch any unexpected compressed extent with:
1) missing csum
2) nodatasum flag set in the inode item
This is for lowmem mode.
Reported-by: James Harvey <jamespharvey20@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When repairing a file system created by a very old kernel, I ran into
issues fixing up the extent flags since fixup_extent_flags assumed
that a METADATA_ITEM would be present if the record was for metadata.
Since METADATA_ITEMs don't exist without skinny metadata, we need to
fall back to EXTENT_ITEMs. This also falls back to EXTENT_ITEMs even
with skinny metadata enabled as other parts of the tools do.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Just like lowmem mode, also check and repair free space cache inode
item.
And since we don't really have a good timing/function to check free
space chace inodes, we use the same common mode
check_repair_free_space_inode() when iterating root tree.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Unlike inodes in fs roots, we don't really check the inode items in root
tree, in fact we just skip everything other than ROOT_ITEM and ROOT_REF.
This makes invalid inode items sneak into root tree.
For example:
item 9 key (256 INODE_ITEM 0) itemoff 13702 itemsize 160
generation 30 transid 30 size 65536 nbytes 1507328
block group 0 mode 0 links 1 uid 0 gid 0 rdev 0
^ Should be 100600
sequence 23 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
atime 0.0 (1970-01-01 08:00:00)
ctime 1553491158.189771625 (2019-03-25 13:19:18)
mtime 0.0 (1970-01-01 08:00:00)
otime 0.0 (1970-01-01 08:00:00)
There is a report of such problem in the mail list.
This patch will check and repair inode items of free space cache inodes in
lowmem mode.
Since free space cache inodes doesn't have INODE_REF but still has 1
link, we can't use check_inode_item() directly.
Instead we only check the inode mode, as that's the important part.
The check and repair function: check_repair_free_space_inode() is also
exported for original mode.
Signed-off-by: Qu Wenruo <wqu@suse.com>
In root tree, we only have 2 types of inodes:
- ROOT_TREE_DIR inode
Its mode is fixed to 40755
- free space cache inodes
Its mode is fixed to 100600
This patch will add the ability to repair such inodes to lowmem mode.
For fs/subvolume tree error, at least we haven't see such corruption
yet, so we don't need to rush to fix corruption in fs trees yet.
The repair function, reset_imode() and repair_imode_common() can be
reused by later original mode patch, so it's placed in check/mode-common.c.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Just like lowmem mode, check inode mode, specially for S_IFMT bits and
beyond.
Please note that, this check only applies to inodes in fs/subvol trees.
It doesn't apply to free space cache inodes.
Reported-by: Thorsten Hirsch <t.hirsch@web.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
There is one report about invalid free space cache inode mode.
Normally free space cache inode should have mode 100600 (regular file,
no uid/gid/sticky bit, rw------ bit).
But in that report, we have free space cache inode mode as 0.
So at least btrfs check should report invalid inode mode.
This patch will at least make btrfs check lowmem mode to detect this
problem.
Please note that, this check only applies to inodes in fs/subvol trees.
It doesn't apply to free space cache inodes.
Reported-by: Thorsten Hirsch <t.hirsch@web.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
For test case fsck-tests/001-bad-file-extent-bytenr, we have an
obviously hand crafted image with unaligned file extent:
item 7 key (257 EXTENT_DATA 0) itemoff 3453 itemsize 53
generation 6 type 1 (regular)
extent data disk byte 755944791 nr 1048576
extent data offset 0 nr 1048576 ram 1048576
extent compression 0 (none)
disk bytenr 755944791 is obviously unaligned (not even).
For such obviously corrupted file extent, we should just delete the file
extent.
Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
[Update commit message and comment]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Function find_possible_backrefs() is used to locate the file extents
referring to an data extent.
For data extent backref, its btrfs_extent_data_ref structure has
the following members:
- root
Which root refers to this data extent
- objectid
Which inode refers to this data extent
- offset
Search *hint*.
Its value is @file_offset - @extent_offset.
While for @file_offset, it's directly recorded in (INO EXTENT_DATA
FILE_OFFSET) key.
So when searching the file extents refers to this data extent, we can't
use btrfs_extent_data_ref::offset as search key::offset.
We must search from file offset 0, and iterate all file extents until we
hit a file extent matches the data backref.
Thankfully such time consuming behavior is not triggered frequently,
it only gets called for repair, so it shouldn't affect normal check
routine.
Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
[Update commit message]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Commit 0ddf63c09f ("btrfs-progs: Record orphan data extent ref to
corresponding root.") introduces the ability to record a file extent
even all other related info is lost (data backref, inode item).
However this patch only records such info without doing any proper
repair, further more, it could even record invalid file extents, and the
report part only happens after all check is done.
Since we will later introduce proper file extent repair functionality,
we could revert that patch.
Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
[Update commit message, solve merge conflicts]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Commit ad03f840f0 ("btrfs-progs: Add repair and report function for
orphan file extent.") will record and try to repair orphan file extents
by:
- Removing the orphan file extent item if no extent backref can be found
Or
- Re-insert a file extent using data backref
Especially the later case is far from ideal, as normally extent tree is
more fragile and corruption prone.
Use any data from extent tree to try to repair could easily lead to
further corruption.
So here we revert commit ad03f840f0 ("btrfs-progs: Add repair and report
function for orphan file extent.") to cleanup the space for later proper
repair in original mode.
Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
[Update commit message, solve conflicts with DIR_ITEM hash mismatch patchset]
Signed-off-by: Qu Wenruo <wqu@suse.com>
If found a extent data item has unaligned part, lowmem repair
just deletes it.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
The function can delete items in trees besides extent tree.
Rename and move it for further use.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
[Update comment, solve merge conflicts]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Add support to check unaligned disk_bytenr for extent_data.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Previously, @err are assigned immediately after check but before
repair.
repair_extent_item()'s return value also confuses the caller. If
error has been repaired and returns 0, check_extent_item() will try
to continue check the nonexistent and cause flase alerts.
Here make repair_extent_item()'s return codes only represents status
of the extent item, error bits are handled in caller of the repair
function.
Change of @err after repair.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
[Solve conflicts with DIR_ITEM hash mismatch patchset]
Signed-off-by: Qu Wenruo <wqu@suse.com>
For files, lowmem repair will try to check nbytes and isize,
but isize check depends nbytes.
Once bytes has been repaired, then isize should be checked and
repaired.
So move nbytes check before isize check. Also set nbytes to
extent_size once repaired successfully.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Since repair will do CoW, the outer path may be invalid.
This patch will add an argument, @path, to punch_extent_hole().
When punch_extent_hole() returns, path will still point to the same key.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
[Update comment and commit message]
Signed-off-by: Qu Wenruo <wqu@suse.com>
The 'end' parameter of check_file_extent tracks the ending offset of the
last checked extent. This is used to detect gaps between adjacent extents.
Currently such gaps are wrongly detected since for regular extents only
the size of the extent is added to the 'end' parameter. This results in
wrongly considering all extents of a file as having gaps between them
when only 2 of them really have a gap as seen in the example below.
Solution:
The extent_end variable should set to the sum of the offset and the
extent_num_bytes of the file extent.
Example:
Suppose that lowmem check the following file extent of inode 257.
item 6 key (257 EXTENT_DATA 0) itemoff 15813 itemsize 53
generation 6 type 1 (regular)
extent data disk byte 13631488 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0 (none)
item 7 key (257 EXTENT_DATA 8192) itemoff 15760 itemsize 53
generation 6 type 1 (regular)
extent data disk byte 13631488 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0 (none)
item 8 key (257 EXTENT_DATA 12288) itemoff 15707 itemsize 53
generation 6 type 1 (regular)
extent data disk byte 13631488 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0 (none)
For inode 257, check_inode_item set extent_end to 0, then call
check_file_extent to check item {6,7,8}.
item 6)
offset(0) == extent_end(0)
extent_end = extent_end(0) + extent_num_bytes(4096)
item 7)
offset(8192) != extent_end(4096)
extent_end = extent_end(4096) + extent_num_bytes(4096)
^^^
The old extent_end should replace by offset(8192).
item 8)
offset(12288) != extent_end(8192)
^^^
But there is no gap between item {7,8}.
Fixes: d88da10ddd ("btrfs-progs: check: introduce function to check file extent")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
[Move this patch as the 1st patch, since it's an independent fix]
Signed-off-by: Qu Wenruo <wqu@suse.com>
GCC 8.2.1 will report the following error:
check/main.c: In function 'try_repair_inode':
check/main.c:2606:5: warning: 'ret' may be used uninitialized in this function [-Wmaybe-uninitialized]
if (!ret) {
^
check/main.c:2584:6: note: 'ret' was declared here
int ret;
^~~
The offending code is in repair_mismatch_dir_hash():
int ret;
printf(
"Deleting bad dir items with invalid hash for root %llu ino %llu\n",
root->root_key.objectid, rec->ino);
while (!list_empty(&rec->mismatch_dir_hash)) {
/* do some repair */
}
if (!ret) { <<< Here
/* do some fix */
}
The truth is, to enter try_repair_inode(), we must have
I_ERR_MISMATCH_DIR_HASH bit set for rec->errors.
And just after we set I_ERR_MISMATCH_DIR_HASH, we call
add_mismatch_dir_hash() and handled its error correctly.
So it's impossible to to skip the while loop.
Fix it by initializing @ret to -EUCLEAN, so even we hit some impossible
case, repair_mismatch_dir_hash() won't falsely consider the mismatch
hash fixed.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit e578b59bf6 ("btrfs-progs: convert strerror to implicit %m")
missed adding braces after a conditional so we will see the following
message whenever a tree block needs repair, regardless of whether repair
was successful: "Failed to repair btree: Success"
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The repair function is reusing delete_corrupted_dir_item().
Since the error can happen for root dir inode, also call
try_repair_inode() on root dir inode.
This is especially important for old filesystems, since later kernel
introduces stricter tree-checker, which could detect such hash mismatch
and refuse to read the corrupted leaf.
With this repair ability, user could repair with btrfs check --repair.
Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1111991
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This changes reporting from current in-place, like:
ERROR: DIR_ITEM[256 751495445] name foor.WvG1c1TdU namelen 13 filetype 1 mismatch with its hash, wanted 751495445 have 2870353892
root 5 root dir 256 error
To new summary report at the end of the pass:
root 5 root dir 256 error
root 5 inode 256 errors 40000
Dir items with mismatch hash:
name: foor.WvG1c1Td namelen: 13 wanted 0xab161fe4 has 0x2ccae915
Also, with mismatch_dir_hash_record structure, it provides the base for
later original mode repair.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For DIR_ITEM with mismatch hash, we could just remove the offending dir
item from the tree.
Lowmem mode will handle the rest, either re-create the correct dir_item
or move the orphan inode to lost+found.
This is especially important for old filesystems, since later kernel
introduces stricter tree-checker, which could detect such hash mismatch
and refuse to read the corrupted leaf.
With this repair ability, user could repair with 'btrfs check
--mode=lowmem --repair'.
Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1111991
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Unlike lowmem mode check, we don't have good place for original mode to
check overlapping device extents.
So this patch introduces a new function, btrfs_check_dev_extents(), to
handle such extents.
Reported-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add such check to check_dev_item(), since at that time we're also
iterating dev extents for dev item accounting.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add support for a new metadata_uuid field. This is just a preparatory
commit which switches all users of the fsid field for metdata comparison
purposes to utilize the new field. This more or less mirrors the
kernel patch, additionally:
* Update 'btrfs inspect-internal dump-super' to account for the new
field. This involes introducing the 'metadata_uuid' line to the
output and updating the logic for comparing the fs uuid to the
dev_item uuid.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Move "\n" at end of the sentence to print.
Fixes: 281eec7a9d ("btrfs-progs: check: repair inode nbytes in lowmem mode")
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This header is exported to /usr/include/btrfs but there are no known
users, so the change should be safe.
Generated by https://github.com/jsoref/spelling
Issue: #154
Author: Josh Soref <jsoref@users.noreply.github.com>
Signed-off-by: David Sterba <dsterba@suse.com>