Simple test case which preps a filesystem, then corrupts the FST and
finally repairs it. Tests both extent based and bitmap based FSTs.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that all the prerequisite code for proper support of free space
tree repair is in, it's time to wire it in. This is achieved by first
hooking the freespace tree to the __free_extent/alloc_reserved_tree_block
functions. And then introducing a wrapper function to contains the
existing check_space_cache and the newly introduced repair code.
Finally, it's important to note that FST repair code first clears the
existing FST in case of any problem found and rebuilds it from scratch.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The RO_FREE_SPACE_TREE(_VALID) flags are required in order to be able
to open an FST filesystem in repair mode. Add them to
BTRFS_FEATURE_COMPAT_RO_SUPP.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For now this doesn't change the functionality since FST code is not yet
enabled via the compat bits. But this will be needed when it's enabled
so that the FST is correctly modified during repair operations that
allocate/deallocate extents.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
To help implement free space tree checker in user space some kernel
function are necessary, namely iterating/deleting/adding freespace
items, some internal search functions. Functions to populate a block
group based on the extent tree. The code is largely copy/paste from
the kernel with locking eliminated (i.e free_space_lock). It supports
reading/writing of both bitmap and extent based FST trees.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This commit introduces explicit little endian bit operations. The only
difference with the existing bitops implementation is that bswap(32|64)
is called when the _le versions are invoked on a big-endian machine.
This is in preparation for adding free space tree conversion support.
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Replace existing find_*_bit functions with kernel equivalent. This
reduces duplication, simplifies the code (we really have one worker
function _find_next_bit) and is quite likely faster. No functional
changes.
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Those functions are in preparation for adding the freespace tree repair
code since it needs to be able to deal with bitmap based FSTs. This
patch adds extent_buffer_bitmap_set and extent_buffer_bitmap_clear
functions. Since in userspace we don't have to deal with page mappings
their implementation is vastly simplified by simply setting each bit in
the passed range.
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For completeness sake add code to btrfs_read_fs_root so that it can
handle the freespace tree.
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that delayed refs have been wired let's merge the two function. In
the process also remove one BUG_ON since alloc_reserved_tree_block's
callers can handle errors. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that delayed refs have been all wired up clean up the __free_extent2
adapter function since it's no longer needed. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Given that the new delayed refs infrastructure is implemented and wired
up, there is no point in keeping the old code. So just remove it.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This commit enables the delayed refs infrastructures. This entails doing
the following:
1. Replacing existing calls of btrfs_extent_post_op (which is the
equivalent of delayed refs) with the proper btrfs_run_delayed_refs.
As well as eliminating open-coded calls to finish_current_insert and
del_pending_extents which execute the delayed ops.
2. Wiring up the addition of delayed refs when freeing extents
(btrfs_free_extent) and when adding new extents (alloc_tree_block).
3. Adding calls to btrfs_run_delayed refs in the transaction commit
path alongside comments why every call is needed, since it's not
always obvious (those call sites were derived empirically by running
and debugging existing tests)
4. Correctly flagging the transaction in which we are reinitialising
the extent tree.
5. Moving btrfs_write_dirty_block_groups to
btrfs_write_dirty_block_groups since blockgroups should be written to
disk after the last delayed refs have been run.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The root argument is used only to get a reference to the fs_info, this
can be achieved with the transaction handle being passed so use that.
This is in preparation for moving this function in the main transaction
commit routine. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This commit pulls those portions of the kernel implementation of
delayed refs which are necessary to have them working in user-space.
I've done the following modifications:
1. Replaced all kmem_cache_alloc calls to kmalloc.
2. Removed all locking-related code, since we are single threaded in
userspace.
3. Removed code which deals with data refs - delayed refs in user space
are going to be used only for cowonly trees.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is a simple adapter function to convert the delayed-refs structures
to the current arguments of alloc_reserved_tree_block.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is a simple adapter to convert the arguments delayed ref arguments
to the existing arguments of __free_extent.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
BTRFS_IOC_QGROUP_ASSIGN ioctl could return >0 if qgroup is marked
inconsistent after successful relationship assignment/removal.
We leak the return value as the final return value of btrfs command.
But according to the man page, return value other than 0 means failure.
Fix this by resetting the return value to 0 for --no-rescan case.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
During fuzz/007 we hit the following error:
====== RUN MAYFAIL btrfs rescue super-recover -y -v tests/fuzz-tests/images/bko-200409.raw.restored.scratch
ERROR: tree_root block unaligned: 33554431
ERROR: superblock checksum matches but it has invalid members
ERROR: tree_root block unaligned: 33554431
ERROR: superblock checksum matches but it has invalid members
ERROR: tree_root block unaligned: 33554431
ERROR: superblock checksum matches but it has invalid members
ERROR: failed to add chunk map start=12582912 len=8454144: -17 (File exists)
Couldn't read chunk tree
failed (ignored, ret=139): btrfs rescue super-recover -y -v tests/fuzz-tests/images/bko-200409.raw.restored.scratch
mayfail: returned code 139 (SEGFAULT), not ignored
test failed for case 007-simple-super-recover
[CAUSE]
In __open_ctree_fd(), if we have valid @open_ctree_flags and
btrfs_scan_fs_devices() succeeds without problems, no matter what
happens we will call btrfs_close_devices(), thus free all related
devices.
In super-recover, before we call open_ctree(), we have called
btrfs_scan_fs_devices() already, so btrfs_scan_fs_devices() should not
fail in open_ctree(), fs_devices will always be freed in open_ctree() or
close_ctree().
[FIX]
So in super-recover.c, we should not call btrfs_close_devices(), or we
will find fs_devices->list freed, and trigger segfault when exiting.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Another BUG_ON() during fuzz/003:
====== RUN MAYFAIL btrfs check --repair tests/fuzz-tests/images/bko-199833-reloc-recovery-crash.raw.restored
[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
ctree.c:1650: leaf_space_used: Warning: assertion `data_len < 0` failed, value 1
bad key ordering 18 19
bad block 29409280
ERROR: errors found in extent allocation tree or chunk allocation
WARNING: minor unaligned/mismatch device size detected
WARNING: recommended to use 'btrfs rescue fix-device-size' to fix it
[3/7] checking free space cache
[4/7] checking fs roots
ctree.c:1650: leaf_space_used: Warning: assertion `data_len < 0` failed, value 1
bad key ordering 18 19
root 18446744073709551608 missing its root dir, recreating
Unable to find block group for 0
Unable to find block group for 0
Unable to find block group for 0
volumes.c:564: btrfs_alloc_dev_extent: BUG_ON `ret` triggered, value -28
failed (ignored, ret=134): btrfs check --repair tests/fuzz-tests/images/bko-199833-reloc-recovery-crash.raw.restored
mayfail: returned code 134 (SIGABRT), not ignored
test failed for case 003-multi-check-unmounted
However the culprit function btrfs_alloc_dev_extent() has proper error
handling label err:, just using that label would solve the problem easily.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
An infinite loop can be triggered during fuzz/003:
====== RUN MAYFAIL btrfs check --repair tests/fuzz-tests/images/bko-199833-reloc-recovery-crash.raw.restored
[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
ctree.c:1650: leaf_space_used: Warning: assertion `data_len < 0` failed, value 1
bad key ordering 18 19
ctree.c:1650: leaf_space_used: Warning: assertion `data_len < 0` failed, value 1
bad key ordering 18 19
ctree.c:1650: leaf_space_used: Warning: assertion `data_len < 0` failed, value 1
bad key ordering 18 19
[CAUSE]
In try_to_fix_bad_block() it's possible that btrfs_find_all_roots()
finds no root referring to that tree block, thus we can't do any repair.
However in that case, we still return 0 since the last caller assigning
@ret is btrfs_find_all_roots(), and the ulist while loop doesn't get run
at all.
And since try_to_fix_bad_block() returns 0, check_block() in
check/main.c will return -EAGAIN to re-check the tree block.
This leads to the infinite loop.
[FIX]
Change the default return value from 0 to -EIO in
try_to_fix_bad_block(), so if there is no tree referring to the bad tree
block, it won't cause infinite loop anymore.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Another BUG_ON() during fuzz/003:
====== RUN MAYFAIL btrfs check --init-csum-tree tests/fuzz-tests/images/bko-161821.raw.restored
[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
parent transid verify failed on 4198400 wanted 14 found 1114126
parent transid verify failed on 4198400 wanted 14 found 1114126
Ignoring transid failure
owner ref check failed [4198400 4096]
repair deleting extent record: key [4198400,169,0]
adding new tree backref on start 4198400 len 4096 parent 0 root 5
Repaired extent references for 4198400
ref mismatch on [4222976 4096] extent item 1, found 0
backref 4222976 root 7 not referenced back 0x5617f8ecf780
incorrect global backref count on 4222976 found 1 wanted 0
backpointer mismatch on [4222976 4096]
owner ref check failed [4222976 4096]
repair deleting extent record: key [4222976,169,0]
Repaired extent references for 4222976
[3/7] checking free space cache
[4/7] checking fs roots
parent transid verify failed on 4198400 wanted 14 found 1114126
Ignoring transid failure
Wrong generation of child node/leaf, wanted: 1114126, have: 14
root 5 missing its root dir, recreating
parent transid verify failed on 4198400 wanted 14 found 1114126
Ignoring transid failure
ERROR: child eb corrupted: parent bytenr=4222976 item=0 parent level=1 child level=2
ERROR: errors found in fs roots
extent buffer leak: start 4222976 len 4096
extent_io.c:611: free_extent_buffer_internal: BUG_ON `eb->flags & EXTENT_DIRTY` triggered, value 1
failed (ignored, ret=134): btrfs check --init-csum-tree tests/fuzz-tests/images/bko-161821.raw.restored
mayfail: returned code 134 (SIGABRT), not ignored
test failed for case 003-multi-check-unmounted
Since we're shifting to use btrfs_abort_transaction() in btrfs-progs,
it will be more and more common to see dirty leaked eb. Instead of
BUG_ON(), we only need to report it as a warning.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When running test fuzz/003, we could hit the following BUG_ON:
====== RUN MAYFAIL btrfs check --init-csum-tree tests//fuzz-tests/images/bko-155621-bad-block-group-offset.raw.restored
Unable to find block group for 0
Unable to find block group for 0
Unable to find block group for 0
extent-tree.c:2657: alloc_tree_block: BUG_ON `ret` triggered, value -28
failed (ignored, ret=134): btrfs check --init-csum-tree tests/fuzz-tests/images/bko-155621-bad-block-group-offset.raw.restored
mayfail: returned code 134 (SIGABRT), not ignored
test failed for case 003-multi-check-unmounted
Just remove that BUG_ON() and allow us to exit gracefully, the caller
handles the errors.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
- group entries that belong together
- add / prefix for files that are at fixed location
- remove obsolte build targets
- remove automake support scripts
- add missing targets (.static)
Signed-off-by: David Sterba <dsterba@suse.com>
There's a regular manual page that matches the file glob mask *.8 so we
have to be more careful and remove only the known intermediate files.
Signed-off-by: David Sterba <dsterba@suse.com>
The manual pages are not compressed anymore and we can remove gzip from
build dependencies and build steps.
Signed-off-by: David Sterba <dsterba@suse.com>
Build systems do not typically compress man pages when installing them.
This is generally left to distro packaging mechanisms, which may end up
recompressing them using a different compressor.
Author: Mike Gilbert <floppym@gentoo.org>
Signed-off-by: David Sterba <dsterba@suse.com>
In order to install uncompressed manual pages we can't use the symlink
for the deprecated btrfsck page. Replace it by source command provided
by the manual page format.
Old: man8/btrfsck.8.gz (symlink)
New: man8/btrfsck.8 (file)
Reported-by: Mike Gilbert <floppym@gentoo.org>
Signed-off-by: David Sterba <dsterba@suse.com>
When printing tree nodes, we output slots like:
key (EXTENT_TREE ROOT_ITEM 0) block 73625600 (17975) gen 16
The number in the parentheses is blockptr / nodesize.
However this number doesn't really do anything useful. And in fact for
unaligned metadata block group (block group start bytenr is not aligned
to 16K), the number doesn't even make sense as it's rounded down.
In fact kernel doesn't ever output such divided result in its
print-tree.c
Remove it so later reader won't wonder what the number means.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Before this patch:
$ ls nothingness
ls: cannot access 'nothingness': No such file or directory
$ btrfs inspect-internal dump-tree nothingness
ERROR: not a block device or regular file: nothingness
The confusing error message makes users think that nonexistent file
exiss but is of a wrong type.
This patch lets check_arg_type return -errno if realpath failed. And
print strerror if check_arg_type failed and the returned code is
negative. Like:
$ btrfs inspect-internal dump-tree nothingness
ERROR: invalid argument: nothingness: No such file or directory
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a test which ensures the kernel returns the correct error value
when missing device removal is requested. This test verifies that kernel
refactoring didn't break the return value.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The option '-R' of btrfs-scrub was documented by mistake as
'print raw statistics per-device instead of a summary'.
Here change it to 'raw print mode, print full data instead of
summary' which it works actually.
Fixes: 162257574a ("btrfs-progs: docs: update btrfs-scrub")
Reported-by: Chris Murphy <chris@colorremedies.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The commit d99615284a ("btrfs-progs:
fsck-tests: Add test image to check if btrfs check reports uninitialized
rescan as error") added test 035, should have been 036.
Signed-off-by: David Sterba <dsterba@suse.com>
For dump-tree/dump-super the completion uses default filedir -d, which
is far from convenient. Use filedir for
dump-tree/dump-super/inode-resolve just like rootid.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For developers it's pretty common to call "btrfs check" on a raw image
dump other than real block device. It's also possible to end users to
do some tests on loop devices.
So current _btrfs_devs() is really making things worse. Use _filedir()
to replace _btrfs_devs() so it can complete any filenames, no matter if
it's just a file or a real block device.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are cases that btrfs_commit_transaction() itself can fail, mostly
due to ENOSPC when allocating space.
Don't panic out in this case.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When the 'btrfsune -u' command is interrupted, the final filesystem fsid
is not written to the superblock and it cannot be mounted. Too bad that
'btrfstune' cannot continue to finish the UUID change as it should.
This patch fixes that and passes the relaxed flags for superblock and
only warns when it detects the fsid mismatch. As this is something that
should be noted in case it would be needed for further debugging, it's
not just silent.
Signed-off-by: David Sterba <dsterba@suse.com>
Reset ret value to zero after snprintf(), which returns the number
of written chars. Otherwise non-zero value returns after command
success with -P option. Also set return value from __ino_to_path_fd() to
reflect the final status for default behavior.
Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
During fstests/btrfs/166, it's possible to hit a certain case where
qgroup is just enabled but rescan hasn't kicked in.
Since at qgroup enable time, we set the flag INCONSISTENT, and let later
rescan clear that flag. If power loss occurs before the rescan starts,
it's possible we get a qgroup status item with ON|INCONSISTENT but
without the RESCAN flag.
And in that case, it will definitely cause difference in qgroup
accounting as all numbers in the qgroup tree are 0.
Fix this false alert by also checking rescan progress from
btrfs_status_item. And if we find rescan progress is still 0,
INCONSISTENT flag set and no RESCAN flag set, we won't treat it as an
error.
Reported-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In check_inode_recs(), for repair mode we always reset @ret to 0. It
makes no sense and later we check @ret to determine if the repair is
successful.
Fix it by removing the offending overwrite.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Exposed by fuzz-tests/003-multi-check-unmounted/ on fuzzed image
bko-161811.raw.xz.
It's caused by the fact when check_fs_roots() finds tree root is
modified, it re-search tree root by goto again: label.
However again: label. will also reset root objectid to 0.
If we failed to repair one fs root but still modified tree root, we will
go into such infinite loop.
Fix it by recording which root we should skip for repair mode.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have function btrfs_fsck_reinit_root() to reinit csum or extent tree.
However this function allows us to let it overwrite existing tree blocks
using @overwrite parameter.
Such behavior is pretty dangerous while no caller is using this feature
explicitly.
So just remove @overwrite parameter and allow btrfs_fsck_reinit_root()
to error out when it fails to allocate tree block.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The build system mentioned in the previous commit builds libraries in
both PIC and non-PIC mode. Shared libraries don't work in PIC mode, so
it expects a --disable-shared configure option, which most open source
libraries using autoconf have. Let's add it, too.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have a build system internally which only needs to build and install
the libraries out of a repository, not any binaries. There's no easy way
to do this in btrfs-progs currently. Add --disable-programs to
./configure to support this.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Prevent unnecessary error from failing fsync(), if opened read only.
Performed 'grep "writeable = " *.h *.c' to make sure there were no odd
situations where fsync() might still be desired here. They're all straight-
forward. The only situation where writeable will be 0 is if btrfs_open_devices
is given flags without O_RDWR. There is no situation where a writeable volume
temporarily becomes unwriteable, or anything like that. Given that it's being
opened O_RDWR, there's no reason to attempt fsync().
utils.c
int btrfs_add_to_fsid() {
...
device->writeable = 1;
volumes.c
int btrfs_close_devices() {
...
while (!list_empty(&fs_devices->devices)) {
...
// just after the fsync() being patched
267: device->writeable = 0;
...
int btrfs_open_devices() {
...
list_for_each_entry(device, &fs_devices->devices, dev_list) {
...
if (flags & O_RDWR)
332: device->writeable = 1
kernel btrfs_close_devices() does not have a corresponding fsync() that I see.
Signed-off-by: James Harvey <jamespharvey20@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>