Verify that a filesystem check operation (fsck) does not report the
following scenario as an error:
An extent is shared between two inodes, as a result of clone/reflink
operation, and for one of the inodes, lets call it inode A, the extent is
referenced through a file extent item as a prealloc extent, while for the
other inode, call it inode B, the extent is referenced through a regular
file extent item, that is, it was written to. The goal of this test is to
make sure a filesystem check operation will not report "odd csum items"
errors for the prealloc extent at inode A, because this scenario is valid
since the extent was written through inode B and therefore it is expected
to have checksum items in the filesystem's checksum btree for that shared
extent.
Such scenario can be created with the following steps for example:
mkfs.btrfs -f /dev/sdb
mount /dev/sdb /mnt
touch /mnt/foo
xfs_io -c "falloc 0 256K" /mnt/foo
sync
xfs_io -c "pwrite -S 0xab 0 256K" /mnt/foo
touch /mnt/bar
xfs_io -c "reflink /mnt/foo 0 0 256K" /mnt/bar
xfs_io -c "fsync" /mnt/bar
<power fail>
mount /dev/sdb /mnt
umount /mnt
This scenario is fixed by the following patch for the filesystem checker:
"Btrfs-progs: check, fix false error reports for shared prealloc extents"
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Under some cases the filesystem checker reports an error when it finds
checksum items for an extent that is referenced by an inode as a prealloc
extent. Such cases are not an error when the extent is actually shared
(was cloned/reflinked) with other inodes and was written through one of
those other inodes.
Example:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ touch /mnt/foo
$ xfs_io -c "falloc 0 256K" /mnt/foo
$ sync
$ xfs_io -c "pwrite -S 0xab 0 256K" /mnt/foo
$ touch /mnt/bar
$ xfs_io -c "reflink /mnt/foo 0 0 256K" /mnt/bar
$ xfs_io -c "fsync" /mnt/bar
<power fail>
$ mount /dev/sdb /mnt
$ umount /mnt
$ btrfs check /dev/sdc
Checking filesystem on /dev/sdb
UUID: 52d3006e-ee3b-40eb-aa21-e56253a03d39
checking extents
checking free space cache
checking fs roots
root 5 inode 257 errors 800, odd csum item
ERROR: errors found in fs roots
found 688128 bytes used, error(s) found
total csum bytes: 256
total tree bytes: 163840
total fs tree bytes: 65536
total extent tree bytes: 16384
btree space waste bytes: 138819
file data blocks allocated: 10747904
referenced 10747904
$ echo $?
1
So teach check to not report such cases as errors by checking if the
extent is shared with other inodes and if so, consider it an error the
existence of checksum items only if all those other inodes are referencing
the extent as a prealloc extent.
This case can be hit often when running the generic/475 testcase from
fstests.
A test case will follow in a separate patch.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Strangely, we have level check in btrfs_print_tree() while we don't have
the same check in read_node_slot().
That's to say, for the following corruption, btrfs_search_slot() or
btrfs_next_leaf() can return invalid leaf:
Parent eb:
node XXXXXX level 1
^^^^^^^
Child should be leaf (level 0)
...
key (XXX XXX XXX) block YYYYYY
Child eb:
leaf YYYYYY level 1
^^^^^^^
Something went wrong now
And for the corrupted leaf returned, later caller can be screwed up
easily.
Reported-by: Ralph Gauges <ralphgauges@googlemail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The commit cebf3b3722 ("btrfs-progs: introduce TEST_TOP and
INTERNAL_BIN for tests") did not convert all test paths. This would
break the exported testsutie.
Signed-off-by: David Sterba <dsterba@suse.com>
Regression test for false alerts in lowmem mode.
Signed-off-by: Qu Wenruo <wqu@suse.com>
[ update test ]
Signed-off-by: David Sterba <dsterba@suse.com>
Btrfs can delay inode deletion and in that case btrfs will unlink the
victim inode from its parent dir, and insert a marker to info btrfs to
delete it later.
In that case, such victim inode will have nlinks == 0, but is still
completely valid.
Original mode won't report such problem, but lowmem mode doesn't check
the ORPHAN_ITEM key for such inode, and can report false alert like:
------
ERROR: root 257 INODE[28891726] is orphan item
------
Fix such false alert by checking orphan item for inode whose nlink is 0.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Although it's not a good practice to format all patches under project
directory, sometimes lazy bones like me just like to put patches under
project directory.
Just ignore such patches.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add the testcase for false alert of data extent backref lost with the
extent offset.
The image can be reproduced by the following commands:
------
dev=~/test.img
mnt=/mnt/btrfs
umount $mnt &> /dev/null
fallocate -l 128M $dev
mkfs.btrfs $dev
mount $dev $mnt
for i in `seq 1 10`; do
xfs_io -f -c "pwrite 0 2K" $mnt/file$i
done
xfs_io -f -c "falloc 0 64K" $mnt/file11
for i in `seq 1 32`; do
xfs_io -f -c "reflink $mnt/file11 0 $(($i * 64))K 64K" $mnt/file11
done
xfs_io -f -c "reflink $mnt/file11 32K $((33 * 64))K 32K" $mnt/file11
btrfs subvolume snapshot $mnt $mnt/snap1
umount $mnt
btrfs-image -c9 $dev extent_data_ref.img
------
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Btrfs lowmem check reports the following false alert:
------
ERROR: file extent[267 2162688] root 256 owner 5 backref lost
------
The file extent is in the leaf which is shared by file tree 256 and fs
tree.
------
leaf 30605312 items 46 free space 4353 generation 7 owner 5
......
item 45 key (267 EXTENT_DATA 2162688) itemoff 5503 itemsize 53
generation 7 type 2 (prealloc)
prealloc data disk byte 13631488 nr 65536
prealloc data offset 32768 nr 32768
------
And there is the corresponding extent_data_ref item in the extent tree.
------
item 1 key (13631488 EXTENT_DATA_REF 1007496934287921081) itemoff 15274 itemsize 28
extent data backref root 5 objectid 267 offset 2129920 count 1
------
The offset of EXTENT_DATA_REF which is the hash of the owner root objectid,
the inode number and the calculated offset (file offset - extent offset).
What caused the false alert is the code mix up the owner root objectid and
the file tree objectid.
Fixes: b0d360b541 ("btrfs-progs: check: introduce function to check data backref in extent tree")
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Instead of the disk_bytenr and disk_num_bytes of the extent_item which the
file extent references, we should output the objectid and offset of the
file extent. And the leaf may be shared by the file trees, we should print
the objectid of the root and the owner of the leaf.
Fixes: b0d360b541 ("btrfs-progs: check: introduce function to check data backref in extent tree")
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When we found free space difference between free space cache and block
group item, we just discard this free space cache.
Normally such difference is caused by btrfs_reserve_extent() called by
delalloc which is out of a transaction.
And since all btrfs_release_extent() is called with a transaction, under
heavy race free space cache can have less free space than block group
item.
Normally kernel will detect such difference and just discard that cache.
However we must be more careful if free space cache has more free space
cache, and if that happens, paried with above race one invalid free
space cache can be loaded into kernel.
So if we find any free space cache who has more free space then block
group item, we report it as an error other than ignoring it.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The user may have specified a different version of Python than the
python3 from their $PATH (e.g., with PYTHON=/usr/bin/python3.6).
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The updated image now provides clang, so the variable is exported from
the base environment to docker. And we have python too.
Signed-off-by: David Sterba <dsterba@suse.com>
For a slight speed up of the build, cache reiserfs and zstd. A quick
cache validity is done, or it can be cleared manually on travis web UI.
Signed-off-by: David Sterba <dsterba@suse.com>
This gets the remaining occurrences that weren't covered by previous
conversions.
Signed-off-by: Omar Sandoval <osandov@fb.com>
[ fixup test_issubvolume due to removed dependency patch ]
Signed-off-by: David Sterba <dsterba@suse.com>
Now implemented with btrfs_util_subvolume_path(),
btrfs_util_subvolume_info(), and subvolume iterators.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Most of the interesting part of this command is the commit mode, so this
only saves a little bit of code.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The only thing of note here is the "top level" column. This used to mean
something else, but for a long time it has been equal to the parent ID.
I preserved this for backwards compatability.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We also support recursive deletion using a subvolume iterator.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Thanks to subvolume iterators, we can also implement recursive snapshot
fairly easily.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is how we can implement stuff like `btrfs subvol list`. Rather than
producing the entire list upfront, the iterator approach uses less
memory in the common case where the whole list is not stored (O(max
subvolume path length)). It supports both pre-order traversal (useful
for, e.g, recursive snapshot) and post-order traversal (useful for
recursive delete).
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
set_default_subvolume() is a trivial ioctl(), but there's no ioctl() for
get_default_subvolume(), so we need to search the root tree.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the future, btrfs_util_[gs]et_subvolume_flags() might be useful, but
since these are the only subvolume flags we've defined in all this time,
this will do for now.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This gets the the information in `btrfs subvolume show` from the root
item.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We can just walk up root backrefs with BTRFS_IOC_TREE_SEARCH and inode
paths with BTRFS_IOC_INO_LOOKUP.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Doing the ioctl() directly isn't too bad, but passing in a full path is
more convenient than opening the parent and passing the path component.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
These are the most trivial helpers in the library and will be used to
implement several of the more involved functions.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Systems with older kernels won't have these available, and the copies in
btrfs-progs aren't quite compatible, so for now, let's just copy these
in. We can potentially deduplicate some of this in the future.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
These were broken when the patch series got shuffled around.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Just as kernel find_free_dev_extent(), allow it to return maximum hole
size for us to build device list for later chunk allocator rework.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As part of the effort to unify code and behavior between btrfs-progs and
kernel, copy the btrfs_raid_array from kernel to btrfs-progs.
So later we can use the btrfs_raid_array[] to get needed raid info other
than manually do if-else branches.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We want to hide struct btrfs_qgroup_inherit from the user because that
comes from the Btrfs UAPI headers. Instead, wrap it in a struct
btrfs_util_qgroup_inherit and provide helpers to manipulate it. This
will be used for subvolume and snapshot creation.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The C libbtrfsutil library isn't very useful for scripting, so we also
want bindings for Python. Writing unit tests in Python is also much
easier than doing so in C. Only Python 3 is supported; if someone really
wants Python 2 support, they can write their own bindings. This commit
is just the scaffolding.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently, users wishing to manage Btrfs filesystems programatically
have to shell out to btrfs-progs and parse the output. This isn't ideal.
The goal of libbtrfsutil is to provide a library version of as many of
the operations of btrfs-progs as possible and to migrate btrfs-progs to
use it.
Rather than simply refactoring the existing btrfs-progs code, the code
has to be written from scratch for a couple of reasons:
* A lot of the btrfs-progs code was not designed with a nice library API
in mind in terms of reusability, naming, and error reporting.
* libbtrfsutil is licensed under the LGPL, whereas btrfs-progs is under
the GPL, which makes it dubious to directly copy or move the code.
Eventually, most of the low-level btrfs-progs code should either live in
libbtrfsutil or the shared kernel/userspace filesystem code, and
btrfs-progs will just be the CLI wrapper.
This first commit just includes the build system changes, license,
README, and error reporting helper.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>