The kernel code no longer has BTRFS_CRC32_SIZE and only uses
btrfs_csum_sizes[]. So, update the progs code as well.
Suggested-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently, the top-level subvolume lacks the UUID. As a result, both
non-snapshot subvolume and snapshot of top-level subvolume do not have
Parent UUID and cannot be distinguisued. Therefore "fi show" of
top-level lists all the subvolumes which lacks the UUID in
"Snapshot(s)" filed. Also, it lacks the otime information.
Fix this by adding the UUID and otime at the mkfs time. As a
consequence, snapshots of top-level subvolume now have a Parent UUID and
UUID tree will create an entry for top-level subvolume at mount time.
This should not cause the problem for current kernel, but user program
which relies on the empty Parent UUID may be affected by this change.
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently we print the raw values of the owner field of leaf/nodes.
This can result in output like the following:
leaf 30490624 items 2 free space 16061 generation 4 owner 18446744073709551607
With the patch applied the same leaf looks like:
leaf 30490624 items 2 free space 16061 generation 4 owner DATA_RELOC_TREE
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a test case for mkfs --rootdir, using files with different file
sizes to check if invalid large inline extent could exist.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For inline compressed file extent, kernel doesn't allow inline extent
ram size larger than sector size and on-disk inline extent size should
not exceed BTRFS_MAX_INLINE_DATA_SIZE().
For inline uncompressed file extent, kernel doesn't allow inline extent
ram and on-disk size larger than either BTRFS_MAX_INLINE_DATA_SIZE() or
sector size.
Check it in original mode.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Just like convert, we need extra check against sector size for creating
inline extent.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[Bug]
On btrfs converted from ext*, one user reported the following kernel
warning:
------------[ cut here ]------------
BTRFS: Transaction aborted (error -95)
WARNING: CPU: 0 PID: 324 at fs/btrfs/inode.c:3042 btrfs_finish_ordered_io+0x7ab/0x850 [btrfs]
Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
RIP: 0010:btrfs_finish_ordered_io+0x7ab/0x850 [btrfs]
...
Call Trace:
normal_work_helper+0x39/0x370 [btrfs]
process_one_work+0x1ce/0x410
worker_thread+0x2b/0x3d0
? process_one_work+0x410/0x410
kthread+0x113/0x130
? kthread_create_on_node+0x70/0x70
? do_syscall_64+0x74/0x190
? SyS_exit_group+0x10/0x10
ret_from_fork+0x35/0x40
---[ end trace c8ed62ff6a525901 ]---
BTRFS: error (device dm-2) in
btrfs_finish_ordered_io:3042: errno=-95 unknown
BTRFS info (device dm-2): forced readonly
BTRFS error (device dm-2): pending csums is 6447104
[Cause]
The call trace and the unique return value points to
__btrfs_drop_extents(), when we tries to drop pages of an inline extent,
we will trigger such -EOPNOTSUPP.
However kernel has limitation on the size of inline file extent
(sector size for ram size and sector size - 1 for on-disk size),
btrfs-convert doesn't have the same limitation, resulting much larger
file extent.
The lack of correct inline extent size check dates back to 2008 when
btrfs-convert is added into btrfs-progs.
[Fix]
Fix the inline extent creation condition, not only using
BTRFS_MAX_INLINE_DATA_SIZE(), which is only the maximum size of inline
data according to nodesize, but also limit it against sector size.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When debuging with "btrfs inspect dump-tree", it's not that handy if we
want to iterate all child tree blocks starting from a specified block.
-b can only print a single block, while without -b "btrfs inspect dump-tree"
will need extra tree roots fulfilled to continue, which is not possible
for a damaged filesystem.
Add a new option --follow to iterate a sub-tree starting from block
specified by --block.
Signed-off-by: Qu Wenruo <wqu@suse.com>
[ remove the short option for now ]
Signed-off-by: David Sterba <dsterba@suse.com>
This patch enhances the tree block level mismatch by the following
methods:
1) Merge same warning branches into one
We had two branches showing the same message, and their condition
is also the same. Merge them
2) Only skip bad slot
The old code skipped all the remaining slots, here we just skip one
slot to output as many correct tree blocks as possible.
3) Enhance warning message
Output the parent bytenr and expected and wrong level, so we don't
need to refer to stdout to get which tree block is corrupted.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit Fixes: 8c36786c81 ("btrfs-progs: print-tree: Print offset as
tree objectid for ROOT_ITEM") changes how we translate offset of
ROOT_ITEM.
However the fact is, even for ROOT_ITEM, we have different meaning of
offset.
For tree reloc tree, it's indeed subvolume id. But for other trees,
it's the transid of when it's created.
Reported-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In latest e2fsprogs (1.44.0) definition of ext2_ext_attr_entry has
removed member e_value_block, as currently ext* doesn't support it set
anyway.
So remove such check so that we can pass compile.
Issue: #110
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199071
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Verify that a filesystem check operation (fsck) does not report the
following scenario as an error:
An extent is shared between two inodes, as a result of clone/reflink
operation, and for one of the inodes, lets call it inode A, the extent is
referenced through a file extent item as a prealloc extent, while for the
other inode, call it inode B, the extent is referenced through a regular
file extent item, that is, it was written to. The goal of this test is to
make sure a filesystem check operation will not report "odd csum items"
errors for the prealloc extent at inode A, because this scenario is valid
since the extent was written through inode B and therefore it is expected
to have checksum items in the filesystem's checksum btree for that shared
extent.
Such scenario can be created with the following steps for example:
mkfs.btrfs -f /dev/sdb
mount /dev/sdb /mnt
touch /mnt/foo
xfs_io -c "falloc 0 256K" /mnt/foo
sync
xfs_io -c "pwrite -S 0xab 0 256K" /mnt/foo
touch /mnt/bar
xfs_io -c "reflink /mnt/foo 0 0 256K" /mnt/bar
xfs_io -c "fsync" /mnt/bar
<power fail>
mount /dev/sdb /mnt
umount /mnt
This scenario is fixed by the following patch for the filesystem checker:
"Btrfs-progs: check, fix false error reports for shared prealloc extents"
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Under some cases the filesystem checker reports an error when it finds
checksum items for an extent that is referenced by an inode as a prealloc
extent. Such cases are not an error when the extent is actually shared
(was cloned/reflinked) with other inodes and was written through one of
those other inodes.
Example:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ touch /mnt/foo
$ xfs_io -c "falloc 0 256K" /mnt/foo
$ sync
$ xfs_io -c "pwrite -S 0xab 0 256K" /mnt/foo
$ touch /mnt/bar
$ xfs_io -c "reflink /mnt/foo 0 0 256K" /mnt/bar
$ xfs_io -c "fsync" /mnt/bar
<power fail>
$ mount /dev/sdb /mnt
$ umount /mnt
$ btrfs check /dev/sdc
Checking filesystem on /dev/sdb
UUID: 52d3006e-ee3b-40eb-aa21-e56253a03d39
checking extents
checking free space cache
checking fs roots
root 5 inode 257 errors 800, odd csum item
ERROR: errors found in fs roots
found 688128 bytes used, error(s) found
total csum bytes: 256
total tree bytes: 163840
total fs tree bytes: 65536
total extent tree bytes: 16384
btree space waste bytes: 138819
file data blocks allocated: 10747904
referenced 10747904
$ echo $?
1
So teach check to not report such cases as errors by checking if the
extent is shared with other inodes and if so, consider it an error the
existence of checksum items only if all those other inodes are referencing
the extent as a prealloc extent.
This case can be hit often when running the generic/475 testcase from
fstests.
A test case will follow in a separate patch.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Strangely, we have level check in btrfs_print_tree() while we don't have
the same check in read_node_slot().
That's to say, for the following corruption, btrfs_search_slot() or
btrfs_next_leaf() can return invalid leaf:
Parent eb:
node XXXXXX level 1
^^^^^^^
Child should be leaf (level 0)
...
key (XXX XXX XXX) block YYYYYY
Child eb:
leaf YYYYYY level 1
^^^^^^^
Something went wrong now
And for the corrupted leaf returned, later caller can be screwed up
easily.
Reported-by: Ralph Gauges <ralphgauges@googlemail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The commit cebf3b3722 ("btrfs-progs: introduce TEST_TOP and
INTERNAL_BIN for tests") did not convert all test paths. This would
break the exported testsutie.
Signed-off-by: David Sterba <dsterba@suse.com>
Regression test for false alerts in lowmem mode.
Signed-off-by: Qu Wenruo <wqu@suse.com>
[ update test ]
Signed-off-by: David Sterba <dsterba@suse.com>
Btrfs can delay inode deletion and in that case btrfs will unlink the
victim inode from its parent dir, and insert a marker to info btrfs to
delete it later.
In that case, such victim inode will have nlinks == 0, but is still
completely valid.
Original mode won't report such problem, but lowmem mode doesn't check
the ORPHAN_ITEM key for such inode, and can report false alert like:
------
ERROR: root 257 INODE[28891726] is orphan item
------
Fix such false alert by checking orphan item for inode whose nlink is 0.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Although it's not a good practice to format all patches under project
directory, sometimes lazy bones like me just like to put patches under
project directory.
Just ignore such patches.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add the testcase for false alert of data extent backref lost with the
extent offset.
The image can be reproduced by the following commands:
------
dev=~/test.img
mnt=/mnt/btrfs
umount $mnt &> /dev/null
fallocate -l 128M $dev
mkfs.btrfs $dev
mount $dev $mnt
for i in `seq 1 10`; do
xfs_io -f -c "pwrite 0 2K" $mnt/file$i
done
xfs_io -f -c "falloc 0 64K" $mnt/file11
for i in `seq 1 32`; do
xfs_io -f -c "reflink $mnt/file11 0 $(($i * 64))K 64K" $mnt/file11
done
xfs_io -f -c "reflink $mnt/file11 32K $((33 * 64))K 32K" $mnt/file11
btrfs subvolume snapshot $mnt $mnt/snap1
umount $mnt
btrfs-image -c9 $dev extent_data_ref.img
------
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Btrfs lowmem check reports the following false alert:
------
ERROR: file extent[267 2162688] root 256 owner 5 backref lost
------
The file extent is in the leaf which is shared by file tree 256 and fs
tree.
------
leaf 30605312 items 46 free space 4353 generation 7 owner 5
......
item 45 key (267 EXTENT_DATA 2162688) itemoff 5503 itemsize 53
generation 7 type 2 (prealloc)
prealloc data disk byte 13631488 nr 65536
prealloc data offset 32768 nr 32768
------
And there is the corresponding extent_data_ref item in the extent tree.
------
item 1 key (13631488 EXTENT_DATA_REF 1007496934287921081) itemoff 15274 itemsize 28
extent data backref root 5 objectid 267 offset 2129920 count 1
------
The offset of EXTENT_DATA_REF which is the hash of the owner root objectid,
the inode number and the calculated offset (file offset - extent offset).
What caused the false alert is the code mix up the owner root objectid and
the file tree objectid.
Fixes: b0d360b541 ("btrfs-progs: check: introduce function to check data backref in extent tree")
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Instead of the disk_bytenr and disk_num_bytes of the extent_item which the
file extent references, we should output the objectid and offset of the
file extent. And the leaf may be shared by the file trees, we should print
the objectid of the root and the owner of the leaf.
Fixes: b0d360b541 ("btrfs-progs: check: introduce function to check data backref in extent tree")
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When we found free space difference between free space cache and block
group item, we just discard this free space cache.
Normally such difference is caused by btrfs_reserve_extent() called by
delalloc which is out of a transaction.
And since all btrfs_release_extent() is called with a transaction, under
heavy race free space cache can have less free space than block group
item.
Normally kernel will detect such difference and just discard that cache.
However we must be more careful if free space cache has more free space
cache, and if that happens, paried with above race one invalid free
space cache can be loaded into kernel.
So if we find any free space cache who has more free space then block
group item, we report it as an error other than ignoring it.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The user may have specified a different version of Python than the
python3 from their $PATH (e.g., with PYTHON=/usr/bin/python3.6).
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The updated image now provides clang, so the variable is exported from
the base environment to docker. And we have python too.
Signed-off-by: David Sterba <dsterba@suse.com>
For a slight speed up of the build, cache reiserfs and zstd. A quick
cache validity is done, or it can be cleared manually on travis web UI.
Signed-off-by: David Sterba <dsterba@suse.com>
This gets the remaining occurrences that weren't covered by previous
conversions.
Signed-off-by: Omar Sandoval <osandov@fb.com>
[ fixup test_issubvolume due to removed dependency patch ]
Signed-off-by: David Sterba <dsterba@suse.com>
Now implemented with btrfs_util_subvolume_path(),
btrfs_util_subvolume_info(), and subvolume iterators.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Most of the interesting part of this command is the commit mode, so this
only saves a little bit of code.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The only thing of note here is the "top level" column. This used to mean
something else, but for a long time it has been equal to the parent ID.
I preserved this for backwards compatability.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>