In copy_inode_rec(), a shallow copy happens on rec->holes rb_root.
So for shared inode case, new rec->holes still points to old rb_root,
and when the old inode record is freed, the new inode_rec->holes will
points to garbage and cause segfault when we try to free new
inode_rec->holes.
Fix it by calling copy_file_extent_holes() to do deep copy.
Reported-by: Eric Sandeen <sandeen@redhat.com>
Reported-by: Filipe David Manana <fdmanana@gmail.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
make[1]: Nothing to be done for `all'.
cmds-check.c: In function ‘del_file_extent_hole’:
cmds-check.c:289:26: warning: ‘prev.start’ may be used uninitialized in this function
cmds-check.c:289:26: warning: ‘prev.len’ may be used uninitialized in this function
cmds-check.c:290:26: warning: ‘next.start’ may be used uninitialized in this function
cmds-check.c:290:26: warning: ‘next.len’ may be used uninitialized in this function
Reported-by: Anand Jain <Anand.Jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
We're not using it anywhere. The best practice is to add enums with
values > 255 for the long options, option index counting is error prone.
Signed-off-by: David Sterba <dsterba@suse.cz>
Before this patch, csum tree rebuild will not work with extent tree
rebuild, since extent tree rebuild will only build up basic block
groups, but csum tree rebuild needs data extents to rebuild.
So if one use btrfsck with --init-csum-tree and --init-extent-tree, csum
tree will be empty and tons of "missing csum" error will be outputted.
This patch allows csum tree rebuild get its data from fs/subvol trees
using regular file extents (which is also the only one using csum tree
currently).
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[renamed to fill_csum_tree_from_one_fs_root]
Signed-off-by: David Sterba <dsterba@suse.cz>
There is a case that can cause nlink fix function.
For example, lost+found dir already has the following files:
---------------------------
|ino |filename |
|-------------------------|
|258 |normal_file |
|259 |normal_file.260 |
---------------------------
The next inode to be fixed is the following:
---------------------------
|260 |normail_file |
---------------------------
And when trying to move inode to lost+found dir, its file name conflicts
with inode 258, and even add ".INO" suffix, it still conflicts with
inode 259.
Since the move failed, the LINK_COUNT_ERR flag is not cleared, the inode
record will not be freed, btrfsck will try fix it again and again,
causing the infinite loop.
The patch will first change the ".INO" suffix naming to a loop behavior,
and clear the LINK_COUNT_ERR flag anyway to avoid infinite loop.
Reported-by: Naohiro Aota <naota@elisp.net>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
We can have FULL_BACKREF set or not set when we need the opposite, this patch
fixes this problem by setting a bit when the flag is set improperly. This way
we can either correct the problem when we re-create the extent item if the
backrefs are also wrong, or we can just set the flag properly in the extent
item. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
We don't want to keep extent records pinned down if we fix stuff as we may need
the space and we can be pretty sure that these records are correct. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
We hold a transaction open for the entirety of fixing extent refs. This works
out ok most of the time but we can be tight on space and run out of space when
fixing things. To get around this just push down the transaction starting dance
into the functions that actually fix things. This keeps us from ending up with
ENOSPC because we pinned everything and allows the code to be a bit simpler.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
The METADUMP super flag makes us skip doing the chunk tree reading which isn't
helpful for the new restore since we have a valid chunk tree. But we still want
to have a way for the kernel to know that this is a metadump restore so it
doesn't do things like verify data checksums. We also want to skip some of the
device extent checks in fsck since those will obviously not match. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
The data reloc root is weird with it's csums. It'll copy an entire extent and
then log any csums it finds, which makes it look weird when it comes to prealloc
extents. So just skip the data reloc tree, it's special and we just don't need
to worry about it. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
We have logic to fix the root locations for roots in response to a corruption
bug we had earlier. However this work doesn't apply to reloc roots and can
screw things up worse, so make sure we skip any reloc roots that we find.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
If we fix bad blocks during run_next_block we will return -EAGAIN to loop around
and start again. The deal_with_roots work messed up this handling, this patch
fixes it. With this patch we can properly deal with broken tree blocks.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Previously we used to just set FULL_BACKREF if we couldn't lookup an extent info
for an extent. Now we just bail out if we can't lookup the extent info, which
is less than good since fsck is supposed to fix these very problems. So instead
figure out the flag we are supposed to use and pass that along instead. This
patch also provides a test image to test this functionality. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Allow read_tree_block() and read_node_slot() to return error pointer.
This should help caller to get more specified error number.
For existing callers, change (!eb) judgmentt to
(!extent_buffer_uptodate(eb)) to keep the compatibility, and for caller
missing the check, use PTR_ERR(eb) if possible.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Since orphan extents are handled in previous patches, now just punch
holes to fill the file extents hole.
Also since now btrfsck is aware of whether the inode has orphan file
extent, allow repair_inode_no_item() to determine filetype according to
orphan file extent.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
In some fs tree leaf/node corruption case, file extents may be lost, but
in extent tree, its record may still exists.
This provide the possibility for such orphan file extents to be
recovered even we can't recover its compression and other info, we can
still insert it as a normal non-compression file extent.
This patch provides the repair and report function for such orphan file
extent.
Even after such repair, user may still need to try to decompress its
data if user knows that is a compressed extent.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Record every file extent discontinuous hole in inode_record using a
rb_tree member.
Before the patch, btrfsck will only record the first file extent hole by
using first_extent_gap, that's good for detecting error, but not
suitable for fixing it.
This patch provides the ability to record every file extent hole and
report it.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Before this patch, when a extent's data ref points to a invalid key in
fs tree, this happens if a leaf/node of fs tree is corrupted, btrfsck
can't do any repair and just exit.
In fact, such problem can be handled in fs tree repair routines, rebuild
the inode item(if missing) and add back the extent data (with some
assumption).
So this patch records such data extent refs for later fs tree recovery
routine.
TODO:
Restore orphan data extent refs into btrfs_root is not the best
method. It's best to directly restore it into inode_record, however
current extent tree and fs tree can't cooperate together, so use
btrfs_root as a temporary storage until inode_cache is built.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
- use standard PACKAGE_{NAME,VERSION,STRING,URL,...} autoconf macros
rather than homemade BTRFS_BUILD_VERSION
- don't #include version.h, now the file is necessary for library API only
Note that "btrfs version" returns "btrfs-progs <version>" instead of
the original confusing "btrfs <version>".
Signed-off-by: Karel Zak <kzak@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Particularly during support conversations, people get confused about
which options to use with btrfs check. Adding a flag, --readonly, which
implies the default read-only behaviour and which conflicts with the
read-write operations, should help make the behaviour of the tool clear.
Signed-off-by: Hugo Mills <hugo@carfax.org.uk>
Signed-off-by: David Sterba <dsterba@suse.cz>
The current approach to option parsing, where long-only options are
selected on the basis of their position in the long_options array is
fragile and painful to modify if options are to be inserted into the
list, rather than appended.
Instead, use the last field of struct option to return a value which
cannot be a char (and hence a short option), and simply switch on those
within the case statement.
Signed-off-by: Hugo Mills <hugo@carfax.org.uk>
Signed-off-by: David Sterba <dsterba@suse.cz>
glibc 2.10+ (5+ years old) enables all the desired features:
_XOPEN_SOURCE 700, __XOPEN2K8, POSIX_C_SOURCE, DEFAULT_SOURCE; with a
single _GNU_SOURCE define in the makefile alone. For portability to
other libc implementations (e.g. dietlibc) _XOPEN_SOURCE=700 is also
defined.
This also resolves Debian bug report filed by Michael Tautschnig -
"Inconsistent use of _XOPEN_SOURCE results in conflicting
declarations". Whilst I was not able to reproduce the results, the
reported fact is that _XOPEN_SOURCE set to 500 in one set of files
(e.g. cmds-filesystem.c) generates/defines different struct stat from
other files (cmds-replace.c).
This patch thus cleans up all feature defines, and sets them at a
consistent level.
Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=747969
Signed-off-by: Dimitri John Ledkov <dimitri.j.ledkov@intel.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The original check_inode_recs() will return -1 if found any error in a
inode_record. This is OK for original design since there is almost
nothing can repair at that time.
However more and more error from nlink mismatch to missing inode item
can be repaired in try_repair_inode(), check_inode_recs() should not
increase the error count if the inode can be repair.
With this patch, repair function for leaf-corruption will not return
error if all corruption inode can be recovered.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The commit f495a2ac66 ("btrfs-progs: fsck: remove unfriendly BUG_ON()
for searching tree failure") is causing tons of extent buffer leak if some
csum mismatches in btrfsck.
This is caused by a misplaced btrfs_release_path(), fix it.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
In case the buffer is corrupted and the for loop does not happen, we'd
return garbage. The caller retunrs -EIO in case of any corruption, use
that value in fix_key_order.
Resolves-coverity-id: 1246944
Signed-off-by: David Sterba <dsterba@suse.cz>
In reset_nlink(), we believe rec->found_link as accurate number of the
valid links. However it only records the number of found DIR_ITEM, so we
can't use it as reliable value.
Before this patch, in some case, leaf corruption recovery will believe
there is a valid backref but don't add_link() since it can't find any
valid one and don't put it into the lost+found dir.
So the recovered inode will be considered as an orphan item without
orphan item and repair_inode_orphan_item() will add orphan item for it,
causing all the filename/filetype we recovered being a waste of time.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
If the tree's empty, we'll get NULL and dereference it.
Resolves-Coverity-CID: 1248828
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
If some critical roots are corrupt, reapr_root_items() will fail,
this is detected by fsck_tests.sh's extent rebuilding tests.
Signed-off-by: Wang Shilong <wangshilong1991@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This patch makes us to rebuild a really corrupt extent tree with snapshots.
To implement this, we have to verify whether a block is FULL BACKREF.
This idea come from Josef Bacik:
1) We walk down the original tree, every eb we encounter has
btrfs_header_owner(eb) == root->objectid. We add normal references
for this root (BTRFS_TREE_BLOCK_REF_KEY) for this root. World peace
is achieved.
2) We walk down the snapshotted tree. Say we didn't change anything
at all, it was just a clean snapshot and then boom. So the
btrfs_header_owner(root->node) == root->objectid, so normal backref.
We walk down to the next level, where btrfs_header_owner(eb) !=
root->objectid, but the level above did, so we add normal refs for all
of these blocks. We go down the next level, now our
btrfs_header_owner(parent) != root->objectid and
btrfs_header_owner(eb) != root->objectid. This is where we need to
now go back and see if btrfs_header_owner(eb) currently has a ref on
eb. If it does we are done, move on to the next block in this same
level, we don't have to go further down.
3) Harder case, we snapshotted and then changed things in the original
root. Do the same thing as in step 2, but now we get down to
btrfs_header_owner(eb) != root->objectid && btrfs_header_owner(parent)
!= root->objectid. We lookup the references we have for eb and notice
that btrfs_header_owner(eb) no longer refers to eb. So now we must
set FULL_BACKREF on this extent reference and add a
SHARED_BLOCK_REF_KEY for this eb using the parent->start as the
offset. And we need to keep walking down and doing the same thing
until we either hit level 0 or btrfs_header_owner(eb) has a ref on the
block.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Wang Shilong <wangshilong1991@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Previously, we deal with node block firstly and then leaf block which can
maximize readahead. However, to rebuild extent tree, we need deal with snapshot
one by one.
This patch makes us deal with snapshot one by one if we need rebuild extent
tree otherwise we drop into previous way.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Wang Shilong <wangshilong1991@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Add a basic inode item rebuild function for I_ERR_NO_INODE_ITEM.
The main use case is to repair btrfs which fs root has corrupted leaf,
but it is already working for case if the corrupteed fs root leaf/node
contains no inode extent_data.
The repair needs 3 elements for inode rebuild:
1. inode number
This is quite easy, existing inode_record codes will detect it quite
well.
2. inode type
This is the trick part. The only reliable method is to recovery it from
parent's dir_index/item.
The remaining method will search for regular file extent for FILE
type or child's backref for DIR(todo).
Fallback will be FILE.
Inode name(inode_ref) will be recoverd by nlink repair function.
This is just a fundamental implement, some advanced recovery can be
improved later with btrfs-progs infrastructure change.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Current btrfsck can skip corrupted leaf and even repair some corrupted
one if their bytenr or key order is wrong.
However when it comes to csum error leaf, btrfsck can only skip them,
which is OK for read-only iteration, but is bad for write.
This patch introduce the new repair_btree() function to recover the
btree, deleting all the corrupted leaf/node including corresponding
extent, allowing later write into the btree.
This patch provides the basis for later recovery with corrupted leaves.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When leaf/node is corrupted in fs/subvolume root, btrfsck can ignore it
without much pain except some stderr messages complaining about it.
But this works fine doing read-only works, if we want to do deeper
recovery like rebuild missing inodes in the b+tree, it will cause
problem.
At least, info user that there is something wrong in the btree,
and this patch provides the base for later btree repair.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
[BUG]
At least two users have already hit a bug in btrfs causing file
missing(Chromium config file).
The missing file's nlink is still 1 but its backref points to non-exist
parent inode.
This should be a kernel bug, but btrfsck fix is needed anyway.
[FIX]
For such nlink mismatch inode, we will reset all the inode_ref with its
dir_index/item (including valid one), and re-add the valids.
If there is no valid backref for it, create 'lost+found' under the root
of the subvolume and add link to the directory.
Reported-by: Mike Gavrilov <mikhail.v.gavrilov@gmail.com>
Reported-by: Ed Tomlinson <edt@aei.ca>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Add find_file_name() and find_file_type() function for later nlink and
inode_item repair.
Later nlink repair will use both function and and inode_item repair will
use find_file_type().
They are done by searching the backref list, dir_item/index for type
search and dir_item/index or inode_ref for name search.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Add btrfs_unlink() and btrfs_add_link() functions in inode.c,
for the incoming btrfs_mkdir() and later inode operations functions.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Record highest inode number before inode repair.
This is especially important for corrupted leaf case.
Under that case, if use btrfs_find_free_objectid, it may find a ino
existing in corrupted leaf but dropped by btree_recover.
If that happens, created dir will be referenced incorrectly since there
may be inode_ref or dir_index/item refers to it.
So we must record the highest inode number according to the inode_cache.
Inode_cache is OK since when a inode_ref or dir_index/item is found even
the referenced source is not found, it will be created.
If we record the highest inode number of inode_cache, and use
highest_inode + 1 as 'lost+found' dir, it will ensure the newly created
dir not conflicting with any possible inode.
This provides the basis for nlink or inode rebuild for repairing btrfs
with leaf/node corruption.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Sometimes we have a pretty corrupted fs but have an old tree bytenr that we
could use, add the ability to specify the tree root bytenr. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Tested-by: Ansgar Hockmann-Stolle <ansgar.hockmann-stolle@uni-osnabrueck.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
Before the patch, chunk will be considered bad if the corresponding
block group is missing, even the only uncertain data is the 'used'
member of the block group.
This patch will try to recalculate the 'used' value of the block group
and rebuild it.
So even only chunk item and dev extent item is found, the chunk can be
recovered.
Although if extent tree is damanged and needed extent item can't be
read, the block group's 'used' value will be the block group length, to
prevent any later write/block reserve damaging the block group.
In that case, we will prompt user and recommend them to use
'--init-extent-tree' to rebuild extent tree if possible.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Before this patch, when btrfsck found an error in root dir, it will only
output the following message "root %llu root dir %llu error" without any
detailed error.
Just add print_inode_error() to print out the whole error.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
If we just don't have the root dirid stuff go ahead and re-create it, since it
is easily recreated. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
If we have all the other items but no inode item we can recreate it for the most
part, with the exception of the permissions and ownership. Add this ability to
btrfsck. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
If we have everything except the dir item and dir index we can easily replace
them, so add this ability to btrfsck. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>