To make original mode to repair imode error in subvolume trees, this
patch will do:
- Remove the show-stopper checks for root->objectid.
Now repair_imode_original() will accept inodes in subvolume trees.
- Export detect_imode() for original mode
Due to the call requirement, original mode must use an existing trans
handler to do the repair, thus we need to re-implement most of the
work done in repair_imode_common().
- Make repair_imode_original() to use detect_imode().
- Free the path after reset_imode()
reset_imode() keeps the path, as lowmem mode uses path to locate its
current check position.
But for original mode, the unreleased path can cause later repair to
report warning, so we need to manually release the path.
- Update rec->imode after imode reset
So later repair depending on rec->imode can get correct value.
- Move the repair before repair_inode_nlinks()
repair_inode_nlinks() needs correct imode to add DIR_INDEX/DIR_ITEM.
So moving the repair before repair_inode_nlinks() makes the latter
repair happier.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For lowmem mode, if we hit a bad inode mode, normally it is reported
when we checking the DIR_INDEX/DIR_ITEM of the parent inode.
If we didn't repair at that time, the error will be recorded even if we
fixed it later.
So this patch will check for INODE_ITEM_MISMATCH error type, and if it's
really caused by invalid imode, repair it and clear the error.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[[PROBLEM]]
Before this patch, repair_imode_common() can only handle two types of
inodes:
- Free space cache inodes
- ROOT DIR inodes
For inodes in subvolume trees, the core complexity is how to determine
the correct imode, thus it was not implemented.
However there are more reports of incorrect imode in subvolume trees, we
need to support such fix.
[[ENHANCEMENT]]
So this patch adds a new function, detect_imode(), to detect imode for
inodes in subvolume trees. The policy here is, try our best to find a
valid imode to recovery. If no convicing info can be found, fail out.
That function will determine imode by:
1) Search for INODE_REF of the inode
If we have INODE_REF, we will then try to find DIR_ITEM/DIR_INDEX.
As long as one valid DIR_ITEM or DIR_INDEX can be found, we convert
the BTRFS_FT_* to imode, then call it a day.
This should be the most accurate way.
2) Search for DIR_INDEX/DIR_ITEM belongs to this inode
If above search fails, we falls back to locate the DIR_INDEX/DIR_ITEM
just after the INODE_ITEM.
Thus this only works for non-empty directory.
If any can be found, it's definitely a directory.
3) Search for EXTENT_DATA belongs to this inode
If EXTENT_DATA can be found, it's either REG or LNK.
Thus this only works for non-empty file or soft link.
For this case, we default to REG, as user can inspect the file to
determine if it's a file or just a path.
4) Use rdev to detect BLK/CHR
If all above fails, but INODE_ITEM has non-zero rdev, then it's either
a BLK or CHR file. Then we default to BLK.
5) Fail out if none of above methods succeeded
No educated guess to make things worse.
[[SHORTCOMING]]
The above search is not perfect, there are cases where we can't repair:
E.g. orphan empty regular inode. Since it's already orphan, it has no
INODE_REF. And it's regular empty file, it has no DIR_INDEX nor
EXTENT_DATA nor rdev. Thus we can't recover. Although for this case, it
really doesn't matter as it's already orphan and will be deleted anyway.
Furthermore, due to the DIR_ITEM/DIR_INDEX/INODE_REF repair code which
can happen before imode repair, it's possible that DIR_ITEM search code
may not be executed. If there is only DIR_ITEM remaining, repair code
will remove the DIR_ITEM completely and move the inode to lost+found,
leaving us no info to rebuild imode. If there is DIR_INDEX missing,
repair code will re-insert the DIR_INDEX, then imode repair code will go
DIR_INDEX directly.
But overall, the repair code should handle the invalid imode caused by
older kernels without problem.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a function, find_file_type(), to find filetype using info from
INODE_REF, including dir_id from key index/name from inode_ref_item.
This function will:
- Search DIR_INDEX first
DIR_INDEX is easier since there is only one item in it.
- Validate the DIR_INDEX item
If the DIR_INDEX is valid, use the filetype and call it a day.
- Search DIR_ITEM then
It needs extra iteration since it's possible to have hash collision.
- Validate the DIR_ITEM
If valid, call it a day. Or return -ENOENT;
This would be used as the primary method to determine the imode in later
imode repair code.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function will be later used by common mode code, so export it.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Discovered with cppcheck. Fix signed/unsigned int mismatches, sizeof and
long formats.
Pull-request: #197
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Update the checksumming API to be able to cope with more checksum types
than just CRC32C. The finalization call is merged into btrfs_csum_data.
There are some fixme's and asserts added that need to be resolved.
Co-developed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
In preparation to supporting new checksum algorithm pass the checksum type
to btrfs_csum_data/btrfs_csum_final, this allows us to encapsulate any
differences in processing into the respective functions
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Pass pointer to a generic buffer instead of fixed size that crc32c
currently uses.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
We want just one header for the check API (similar to what mkfs does)
but as btrfsck.h is exported header (libbtrfs), it needs some
deprecation beriod before it's moved through there are probably no users
of that header file in particular.
Copy the header to check, all modifications and cleanups won't affect
the public header.
Signed-off-by: David Sterba <dsterba@suse.com>
For orphan inodes, kernel won't update its nbytes and size since it's a
waste of time.
So lowmem check can report false alert on some orphan inodes.
Fix it by checking if the inode is an orphan before
complaining/repairing its nbytes.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We don't update the inode when evicting it, so the nbytes will be wrong
in between transaction commits. This isn't a problem, stop complaining
about it to make generic/269 stop randomly failing. The orphan outdated
inodes can be still present but check will not skip them.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add such ability to original mode to fix root generation mismatch, which
can be rejected by kernel.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since kernel is going to reject any root item which is newer than super
block generation, we need to provide a way to fix such problem in
btrfs-check.
This patch addes the ability to report and repair root generation in
lowmem mode.
This is done by cowing the root node, so we will update the root
generation along with the root node generation at commit transaction
time.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The option -E has a mandatory argument that was missing from the short
option spec, thus it always crashed.
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: David Sterba <dsterba@suse.com>
"btrfsck -Q" segfaults because it does not call qgroup_set_item_count_ptr()
properly:
# btrfsck -Q /dev/sdk
Opening filesystem to check...
Checking filesystem on /dev/sdk
UUID: 34a35bbc-43f8-40f0-8043-65ed33f2e6c3
Print quota groups for /dev/sdk
UUID: 34a35bbc-43f8-40f0-8043-65ed33f2e6c3
Segmentation fault (core dumped)
Since "struct task_ctx ctx" is global, we can just move
qgroup_set_item_count_ptr() much earlier stage in the check process to
avoid to forget initializing it.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While testing snapshot deletion with dm-log-writes I saw that I was
failing the fsck sometimes when the fs was actually in the correct
state. This is because we only skip blocks on the same level of
root_item->drop_level. If the drop_level < the root level then we could
very well walk into nodes that we wouldn't actually walk into on fs
mount, because the drop_progress is further ahead in the slot of the
root. Instead only process the slots of the nodes that are above the
drop_progress key. With this patch in place we no longer improperly
fail to check fs'es that have a drop_progress set with a drop_level <
root level.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Create directory for all sources that can be used by anything that's not
rellated to a relevant kernel part, all common functions, helpers,
utilities that do not fit any other specific category.
The traditional location would be probably lib/ with all things that are
statically linked to the main binaries, but we have libbtrfs and
libbtrfsutil so this would be confusing.
Signed-off-by: David Sterba <dsterba@suse.com>
Commit 756105181e ("btrfs-progs: check: supplement extent backref
list with rbtree") changed the backref implementation to use rb tree
and also commented the old implementations. It's been almost 2 years
since that change and it's unlikely the old version will ever be used,
so just remove it.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that every call site has a cmd_struct, we can just pass the cmd_struct
to usage to print the usager information. This allows us to interpret
the format flags we'll add later in this series to inform the user of
which output formats any given command supports.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch passes the cmd_struct to the command callback function. This
has several purposes: It allows the command callback to identify which
command was used to call it. It also gives us direct access to the
usage associated with that command.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Rather than having global command usage and callbacks used to create
cmd_structs in the command array, establish the cmd_struct structures
separately and use those. The next commit in the series passes the
cmd_struct to the command callbacks such that we can access flags
and determine which of several potential command we were called as.
This establishes several macros to more easily define the commands
within each command's source.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In lowmem mode, we check fs roots and free space cache by iterating
each root item and inode item, using btrfs_next_item() and a path
pointing to the root tree.
However in repair mode, check_fs_root() can modify the fs root, thus
CoWs the tree root, and the old path in check_fs
It could lead to strange behavior, e.g. after repairing a fs tree, the
path can point to a fs tree.
Since no ROOT_ITEM exists in fs tree, all remaining trees are skipped in
repair mode.
This bug exists from the early time of lowmem mode repair, and is only
exposed by recent free space inode check code. (Fs tree inodes are
passed to free space inode check, causing false alerts and repair
failure).
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
BTRFS_COMPAT_EXTENT_TREE_V0 is introduced for a short time in kernel,
and it's over 10 years ago.
Nowadays there should be no user for that feature, and kernel has remove
this support in Jun, 2018. There is no need for btrfs-progs to support
it.
This patch will remove EXTENT_TREE_V0 related code and replace those
BUG_ON() to a more graceful error message.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is one report of compressed extent happens in btrfs, but has no
csum and then leads to possible decompress error screwing up kernel
memory.
Although it's a kernel bug, and won't cause problem until compressed
data get corrupted, let's catch such problem in advance.
This patch will catch any unexpected compressed extent with:
1) 0 or less than expected csum
2) nodatasum flag set in the inode item
This is for original mode.
Reported-by: James Harvey <jamespharvey20@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is one report of compressed extent happens in btrfs, but has no
csum and then leads to possible decompress error screwing up kernel
memory.
Although it's a kernel bug, and won't cause problem until compressed
data get corrupted, let's catch such problem in advance.
This patch will catch any unexpected compressed extent with:
1) missing csum
2) nodatasum flag set in the inode item
This is for lowmem mode.
Reported-by: James Harvey <jamespharvey20@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When repairing a file system created by a very old kernel, I ran into
issues fixing up the extent flags since fixup_extent_flags assumed
that a METADATA_ITEM would be present if the record was for metadata.
Since METADATA_ITEMs don't exist without skinny metadata, we need to
fall back to EXTENT_ITEMs. This also falls back to EXTENT_ITEMs even
with skinny metadata enabled as other parts of the tools do.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Just like lowmem mode, also check and repair free space cache inode
item.
And since we don't really have a good timing/function to check free
space chace inodes, we use the same common mode
check_repair_free_space_inode() when iterating root tree.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Unlike inodes in fs roots, we don't really check the inode items in root
tree, in fact we just skip everything other than ROOT_ITEM and ROOT_REF.
This makes invalid inode items sneak into root tree.
For example:
item 9 key (256 INODE_ITEM 0) itemoff 13702 itemsize 160
generation 30 transid 30 size 65536 nbytes 1507328
block group 0 mode 0 links 1 uid 0 gid 0 rdev 0
^ Should be 100600
sequence 23 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
atime 0.0 (1970-01-01 08:00:00)
ctime 1553491158.189771625 (2019-03-25 13:19:18)
mtime 0.0 (1970-01-01 08:00:00)
otime 0.0 (1970-01-01 08:00:00)
There is a report of such problem in the mail list.
This patch will check and repair inode items of free space cache inodes in
lowmem mode.
Since free space cache inodes doesn't have INODE_REF but still has 1
link, we can't use check_inode_item() directly.
Instead we only check the inode mode, as that's the important part.
The check and repair function: check_repair_free_space_inode() is also
exported for original mode.
Signed-off-by: Qu Wenruo <wqu@suse.com>
In root tree, we only have 2 types of inodes:
- ROOT_TREE_DIR inode
Its mode is fixed to 40755
- free space cache inodes
Its mode is fixed to 100600
This patch will add the ability to repair such inodes to lowmem mode.
For fs/subvolume tree error, at least we haven't see such corruption
yet, so we don't need to rush to fix corruption in fs trees yet.
The repair function, reset_imode() and repair_imode_common() can be
reused by later original mode patch, so it's placed in check/mode-common.c.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Just like lowmem mode, check inode mode, specially for S_IFMT bits and
beyond.
Please note that, this check only applies to inodes in fs/subvol trees.
It doesn't apply to free space cache inodes.
Reported-by: Thorsten Hirsch <t.hirsch@web.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
There is one report about invalid free space cache inode mode.
Normally free space cache inode should have mode 100600 (regular file,
no uid/gid/sticky bit, rw------ bit).
But in that report, we have free space cache inode mode as 0.
So at least btrfs check should report invalid inode mode.
This patch will at least make btrfs check lowmem mode to detect this
problem.
Please note that, this check only applies to inodes in fs/subvol trees.
It doesn't apply to free space cache inodes.
Reported-by: Thorsten Hirsch <t.hirsch@web.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
For test case fsck-tests/001-bad-file-extent-bytenr, we have an
obviously hand crafted image with unaligned file extent:
item 7 key (257 EXTENT_DATA 0) itemoff 3453 itemsize 53
generation 6 type 1 (regular)
extent data disk byte 755944791 nr 1048576
extent data offset 0 nr 1048576 ram 1048576
extent compression 0 (none)
disk bytenr 755944791 is obviously unaligned (not even).
For such obviously corrupted file extent, we should just delete the file
extent.
Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
[Update commit message and comment]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Function find_possible_backrefs() is used to locate the file extents
referring to an data extent.
For data extent backref, its btrfs_extent_data_ref structure has
the following members:
- root
Which root refers to this data extent
- objectid
Which inode refers to this data extent
- offset
Search *hint*.
Its value is @file_offset - @extent_offset.
While for @file_offset, it's directly recorded in (INO EXTENT_DATA
FILE_OFFSET) key.
So when searching the file extents refers to this data extent, we can't
use btrfs_extent_data_ref::offset as search key::offset.
We must search from file offset 0, and iterate all file extents until we
hit a file extent matches the data backref.
Thankfully such time consuming behavior is not triggered frequently,
it only gets called for repair, so it shouldn't affect normal check
routine.
Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
[Update commit message]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Commit 0ddf63c09f ("btrfs-progs: Record orphan data extent ref to
corresponding root.") introduces the ability to record a file extent
even all other related info is lost (data backref, inode item).
However this patch only records such info without doing any proper
repair, further more, it could even record invalid file extents, and the
report part only happens after all check is done.
Since we will later introduce proper file extent repair functionality,
we could revert that patch.
Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
[Update commit message, solve merge conflicts]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Commit ad03f840f0 ("btrfs-progs: Add repair and report function for
orphan file extent.") will record and try to repair orphan file extents
by:
- Removing the orphan file extent item if no extent backref can be found
Or
- Re-insert a file extent using data backref
Especially the later case is far from ideal, as normally extent tree is
more fragile and corruption prone.
Use any data from extent tree to try to repair could easily lead to
further corruption.
So here we revert commit ad03f840f0 ("btrfs-progs: Add repair and report
function for orphan file extent.") to cleanup the space for later proper
repair in original mode.
Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
[Update commit message, solve conflicts with DIR_ITEM hash mismatch patchset]
Signed-off-by: Qu Wenruo <wqu@suse.com>
If found a extent data item has unaligned part, lowmem repair
just deletes it.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
The function can delete items in trees besides extent tree.
Rename and move it for further use.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
[Update comment, solve merge conflicts]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Add support to check unaligned disk_bytenr for extent_data.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>