We have --init-csum-tree, which just empties the csum tree. I'm not sure why we
would ever need this, but we definitely need to be able to rebuild the csum tree
in some cases. This patch adds the ability to completely rebuild the crc tree
by reading all of the data and adding csum entries for them. This patch doesn't
pay attention to NODATASUM inodes, it'll happily add csums for everything.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When we have non-inlined extent references, we were failing to find the
corresponding extent item for an existing csum item in the csum tree.
Reproducer:
mkfs.btrfs -f /dev/sdd
mount /dev/sdd /mnt
xfs_io -f -c "falloc 780366 135302" /mnt/foo
xfs_io -c "falloc 327680 151552" /mnt/foo
xfs_io -c "pwrite -S 0xff -b 131072 0 131072" /mnt/foo
sync
for i in `seq 1 40`; do btrfs subvolume snapshot /mnt /mnt/snap$i ; done
umount /mnt
btrfs check /dev/sdd
The check command exited with status 1 and the following output:
Checking filesystem on /dev/sdd
UUID: 2416ab5f-9d71-457e-bb13-a27d4f6b399a
checking extents
checking free space cache
checking fs roots
checking csums
There are no extents for csum range 12980224-12984320
Csum exists for 12980224-12984320 but there is no extent record
found 1388544 bytes used err is 1
total csum bytes: 132
total tree bytes: 704512
total fs tree bytes: 573440
total extent tree bytes: 16384
btree space waste bytes: 564479
file data blocks allocated: 19341312
referenced 14606336
Btrfs v3.14.1-94-g80597e7
After this change it no longer erroneously reports a missing extent for the
csum item and exits with a status of 0.
Also added missing btrfs_prev_leaf() return value checks, as we were ignoring
errors and non-existence of left siblings completely.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When encountering system crash or balance enospc errors,
there maybe still some reloc roots left.
The way we store reloc root is different from fs root:
reloc root's root key(BTRFS_RELOC_TREE_OBJECTID, ROOT_ITEM, objectid)
fs root's root key(objectid, ROOT_ITEM, -1)
reloc data's root key(BTRFS_DATA_RELOC_TREE_OBJECTID, ROOT_ITEM, 0)
So this patch use right key to search corresponding root node, and
avoid using normal fs root cache for reloc roots.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
If btrfsck fail to repair, we hit something like following:
Check tree block failed, want=29442048, have=0
Check tree block failed, want=29442048, have=0
Check tree block failed, want=29442048, have=0
Check tree block failed, want=29442048, have=0
Check tree block failed, want=29442048, have=0
read block failed check_tree_block
found 98304 bytes used err is 1
total csum bytes: 0
total tree bytes: 0
total fs tree bytes: 0
total extent tree bytes: 0
btree space waste bytes: 0
file data blocks allocated: 0
referenced 0
Btrfs v3.14.2-rc2-63-g3944f15
btrfs: transaction.h:38: btrfs_start_transaction: Assertion `!(root->commit_root)' failed.
Aborted (core dumped)
This is because under repair mode, we will start a transaction, and if we error out,
we don't finish this transaction. So in close_ctree(), it will try
to start and commit transaction which causes the above segmentation.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Now btrfsck would hit assertation failure for some searching tree failure.
It is true that filesystem may get some metadata block corrupted,
and btrfsck could not deal with these corruptings. But, Users really
don't want a BUG_ON() here, Instead, just return errors to caller.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Repair mode will commit transaction which will make us
fail to load log tree anymore.
Give a warning to common users, if they really want to
coninue, we will clear out log tree.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This can not only give some speedups but also avoid forever loop
with a really broken filesystem.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
[BUG]
Some fsfuzzed btrfs image will cause btrfsck segfault.
[REPRODUCER]
Run btrfsck on a csum tree block corrupted image.
[REASON]
check_csums() function call btrfs_search_slot() on csum_tree but doesn't
check whether the csum_tree contains a valid extent_buffer, which causes
the segfault.
[FIX]
Check the csum_root->node before any search.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
A user reported a WARN_ON() when trying to run btrfsck --repair on his fs with
bad key ordering. This was because the root that was broken wasn't part of the
transaction yet. We do this open coded thing in a few other places in fsck, so
just make it a helper function and make sure all the places that need to call it
do call it. With this patch he was able to run repair without it dying.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
David sent a quick patch that removed a BUG_ON(). I took a peek and
found that the function was already leaking an eb ref and only returned
0. So this fixes the leak and makes the function void and fixes up the
callers.
Accidentally-motivated-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Zach Brown <zab@zabbo.net>
Signed-off-by: David Sterba <dsterba@suse.cz>
Since this patch:
btrfs-progs: move the check_argc_* functions into utils.c
All tools including the independent tools(e.g. btrfs-image, btrfs-convert)
can share the convenience of the check_argc_* functions, so this patch
adopt the argc check functions globally.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The qgroup verification code can trivially be extended to provide
extended information on the extents which a subvolume root
references. Along with qgroup-verify, I have found this tool to be
invaluable when tracking down extent references.
The patch adds a switch to the check subcommand '--subvol-extents'
which takes as args a single subvolume id. When run with the switch,
we'll print out each extent that the subvolume references. The extent
printout gives standard extent info you would expect along with
information on which other roots reference it.
Sample output follows - this is a few lines from a run on a subvolume
I've been testing qgroup changes on:
Print extent state for subvolume 281 on /dev/vdb2
UUID: 8203ca66-9858-4e3f-b447-5bbaacf79c02
Offset Len Root Refs Roots
12582912 20480 12 257 279 280 281 282 283 284 285 286 287 288 289
12603392 8192 12 257 279 280 281 282 283 284 285 286 287 288 289
12611584 12288 12 257 279 280 281 282 283 284 285 286 287 288 289
<snip a bunch of extents to show some variety>
124583936 16384 4 281 282 283 280
125075456 16384 4 280 281 282 283
126255104 16384 11 257 280 281 282 283 284 285 286 287 288 289
4763508736 4096 3 279 280 281
In case it wasn't clear, this applies on top of my qgroup verify patch:
"btrfs-progs: add quota group verify code"
A branch with all this can be found on github:
https://github.com/markfasheh/btrfs-progs-patches/tree/qgroup-verify
Please apply,
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
Steps to reproduce:
# mkfs.btrfs -f /dev/sda9 -b 2g
# mount /dev/sda9 /mnt
# dd if=/dev/zero of=/mnt/data bs=4k oflag=direct
# btrfs file df /mnt
Data, single: total=1.66GiB, used=1.66GiB
System, single: total=4.00MiB, used=16.00KiB
Metadata, single: total=200.00MiB, used=67.88MiB
For a filesystem without snapshots, 70M metadata, extent
checking eats max memory about 110M, this is a nightmare
for some system with low memory.
It is very likely that extent record can be freed quickly
for a filesystem without snapshots, improve this by trying
if it can free memory after adding data/tree backrefs.
This patch reduces max memory cost from 110M to 40M for
extents checking for the above case.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
There's no reason to assume that the bad key order is in a leaf block,
so accessing level 0 of the path is going to be an error if it's actually
a node block that's bad.
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Hugo Mills <hugo@carfax.org.uk>
Signed-off-by: David Sterba <dsterba@suse.cz>
This patch adds an option '--check-data-csum' to verify data checksums.
fsck won't check data csums unless users specify this option explictly.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This patch adds functionality (in qgroup-verify.c) to compute bytecounts in
subvolume quota groups. The original groups are read in and stored in memory
so that after we compute our own bytecounts, we can compare them with those
on disk. A print function is provided to do this comparison and show the
results on the console.
A 'qgroup check' pass is added to btrfsck. If any subvolume quota groups
differ from what we compute, the differences for them are printed. We also
provide an option '--qgroup-report' which will run only the quota check code
and print a report on all quota groups. Other than making it possible to
verify that our qgroup changes work correctly, this mode can also be used in
xfstests for automated checking after qgroup tests.
This patch does not address the following:
- compressed counts are identical to non compressed, because kernel doesn't
make the distinction yet. Adding the code to verify compressed counts
shouldn't be hard at all though once kernel can do this.
- It is only concerned with subvolume quota groups (like most of
btrfs-progs).
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
It complains errno never gets assigned to zero in find-root and since
errno anyway is zero at program started up, lets remove it.
Check "copy is less then zero" isn't possible because strtoull used by
arg_strtou64 wouldn't return -ve number.
Trivial space fixes.
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Lets use "errors" instead of "error" because more then one ref errors
are possible. Also print error messages for unresolved refs in
check_root_refs.
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The following kernel commit changed the definition of the inline function
btrfs_file_extent_inline_len():
commit 514ac8ad8793a097c0c9d89202c642479d6dfa34
Author: Chris Mason <clm@fb.com>
Date: Fri Jan 3 21:07:00 2014 -0800
Btrfs: don't use ram_bytes for uncompressed inline items
If we truncate an uncompressed inline item, ram_bytes isn't updated to reflect
the new size. The fixe uses the size directly from the item header when
reading uncompressed inlines, and also fixes truncate to update the
size as it goes.
Not having this new definition implies that the restore tool might misbehave when
restoring files with an inline extent that got truncated on a kernel older than
release 3.14.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Though all tree blocks have same size, we'd better use right
index here.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Two changes:
1.use bit filed for @found_rec
2.u32 is enough to calculate duplicate extent number.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
We still need free allocated cache memory in case error happens.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
@seen cache is used to avoid iterating same block more than once, and
we can not free them until we have finished searching.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Free already allocated memory to item1_data if malloc fails for
item2_data in swap_values. Seems to be a typo from commit 70749a77.
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Previously, --init-extent-tree works just because btrfs_lookup_extent_info()
blindly return 0, and this make it work if there are not any *FULL BACKREF*
mode in broken filesystem.
It is just a coincidence that --init-extent-tree option works, let's
do it in the right way firstly.
For now, we have not supported to rebuild extent tree if there are
any *FULL BACKREF* mode which means if there are snapshots with broken
filesystem, avoid using --init-extent-tree option now.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
commit roots won't update root item in tree root if it finds
updated root's bytenr is same as before.
However, this is not right for fsck, we need update tree root in
the following case:
1.overwrite previous root node.
2.reinit reloc data tree, this is because we skip pin relo data
tree before which means we can allocate same block as before.
Fix this by updating tree root ourselves for the above cases.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
There are two bugs when resetting balance:
1.we will skip reinitting reloc data tree if no reloc root found, however
this is not right because we don't pin reloc data tree before.
2.we should insert root dir into reloc data tree,otherwise we will fail
to fsck.
Fix problems by forcely reiniting reloc data root and inserting root dir.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
reset balance need cow block which will insert extent item into
extent tree. If we do this before reinitting extent root, we may
encounter EEIXST.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
To reinit extent root, we need find a free extent, however,
we may have a really corrupted extent tree, so we can't rely
on existed extent tree to cache block group any more.
During test, we fail to reinit extent tree which is because we
can not find a free extent so let's make block group cache ourselves
firstly.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
When working with a user who had a broken file system I noticed that we were
reading a bad copy of a block when the other copy was perfectly fine. This is
because we don't keep track of the parent generation for tree blocks, so we just
read whichever copy we damned well please with no regards for which is best.
This fixes this problem by recording the parent generation of the tree block so
we can be sure to read the most correct copy before we check it, which will give
us a better chance of fixing really broken filesystems. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
We found btrfsck will output backrefs mismatch while the filesystem
is defenitely ok.
The problem is that check_block() don't return right value,which
makes btrfsck won't walk all tree blocks thus we don't get a consistent
filesystem, we will fail to check extent refs etc.
Reported-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Steps to reproduce:
# mkfs.btrfs -f /dev/sda9
# btrfs check /dev/sda9 --init-extent-tree --init-csum-tree
# btrfs check /dev/sda9
During reinitting extent tree, we will pin all metadata blocks to
avoid overwritting existing metadata space. However, those space will
be unpinned after committing transaction.
If we try to reinit csum tree after reiniting extent tree, we may
overwrite existing space. Fix this problem by making reinit extent tree
and csum tree in same transaction.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Switch to new helper arg_strtou64(), also check if user assign
a valid super copy.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Add close_ctree()s before the "returns" on errors after open_ctree()
Also merge the err returns into the "goto + single return" pattern.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
The function call that set the ret parameter evaluated in this
BUG_ON was removed in a previous commit:
11be10f71e
Btrfs-progs: make fsck fix certain file extent inconsistencies
Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
The return value in process_one_leaf could be over-written while
looping over the items in the leaf.
This patch will preserve a non-zero return value to the calling
function if a non-zero return value is encountered in the loop.
The return value of one (1) is consistent with non-zero values
that could be returned while processing the leaf.
The only caller of this function (walk_down_tree) would ignore
the return value anyway. But this patch will correct the
behaviour in case future changes intend to utilize the return
value.
Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
this patch will make btrfsck operations to open disk in exclusive mode,
so that mount will fail when btrfsck is running
Signed-off-by: Anand Jain <Anand.Jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
The following steps could trigger btrfs segfault:
mkfs -t btrfs -m raid5 -d raid5 /dev/loop{0..3}
losetup -d /dev/loop2
btrfs check /dev/loop0
The reason is that read_tree_block() returns NULL and
add_root_to_pending() dereferences it without checking it first.
Also replace a BUG_ON with proper error checking.
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
This adds the flag to ctree.h, adds the feature option to mkfs to turn it on and
fixes fsck so it doesn't complain about missing hole extents in files when this
flag is set.
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
A user had a fs where the objectid of an orphan item was not the actual orphan
item objectid. This screwed up fsck because the block has keys in the wrong
order, also the fs scanning stuff will freak out because we have an inode with
nlink 0 and no orphan item. So this patch is pretty big but is all related.
1) Deal with bad key ordering. We can easily fix this up, so fix the checking
stuff to tell us exactly what it found when it said there was a problem. Then
if it's bad key ordering we can reorder the keys and restart the scan.
2) Deal with bad keys. If we find an orphan item with the wrong objectid it's
likely to screw with stuff, so keep track of these sort of things with a
bad_item list and just run through and delete any objects that don't make sense.
So far we just do this for orphan items but we could extend this as new stuff
pops up.
3) Deal with missing orphan items. This is easy, if we have a file with i_nlink
set to 0 and no orphan item we can just add an orphan item.
4) Add the infrastructure to corrupt actual key values. Needed this to create a
test image to verify I was fixing things properly.
This patch fixes the corrupt image I'm adding and passes the other make test
tests. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
When we re-init the extent root we make it completely empty, so when we reset a
pending balance we will fail to find refs for any blocks we may cow, which will
result in errors and we will exit out. We need to reset the balance first so
the normal cow stuff doesn't freak out and then we can re-init the extent tree.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
btrfsck reports backref error after running init-csum-tree
btrfsck --init-csum-tree /dev/sdc
btrfsck /dev/sdc
::
ref mismatch on [29474816 16384] extent item 1, found 0
Backref 29474816 root 7 not referenced back 0x1101d30
Incorrect global backref count on 29474816 found 1 wanted 0
backpointer mismatch on [29474816 16384]
owner ref check failed [29474816 16384]
Errors found in extent allocation tree or chunk allocation
::
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Internally, btrfs_header_chunk_tree_uuid() calculates an unsigned
long, but casts it to a pointer, while all callers cast it to unsigned
long again.
From btrfs commit b308bc2f05a86e728bd035e21a4974acd05f4d1e
Signed-off-by: Ross Kirk <ross.kirk@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
we use 37 as the allocation size to hold the uuid_unparse, here
it defines BTRFS_UUID_UNPARSE_SIZE for the same.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
As we know, a new fs doesn't have space cache, so we set the cache generation
of the super block to be -1ULL, it is not equal to the fs generation. But the
check program didn't consider this case, and output the following message
cache and super generation don't match, space cache will be invalidated
directly, it would be baffling the users. So we should avoid outputing such
message. This patch fixes this problem.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Unfortunately you can't run --init-extent-tree if you can't actually read the
extent root. Fix this by allowing partial starts with no extent root and then
have fsck only check to see if the extent root is uptodate _after_ the check to
see if we are init'ing the extent tree. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Exec btrfsck on btrfs with snapshots that are under a dropping
progress will cause prompt on "ref mismatch".
However we do not want this kind of prompt, since an remount
operation will continue the dropping progress.
Here the prompt is nonsense.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
e0a04278 removed a bunch of dead code but left one little
bit; reinit is always 0, so btrfs_read_block_groups is
never called from here.
Resolves-Coverity-CID: 1125926
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>