The preferred order:
- system headers
- standard headers
- libraries
- kernel library
- kernel shared
- common headers
- other tools
- own headers
Signed-off-by: David Sterba <dsterba@suse.com>
Block group tree feature is completely a standalone feature, and it has
been over 5 years before the initial introduction to solve the long
mount time.
I don't really want to waste another 5 years waiting for a feature which
may or may not work, but definitely not properly reviewed for its
preparation patches.
So this patch will separate the block group tree feature into a
standalone compat RO feature.
There is a catch, in mkfs create_block_group_tree(), current
tree-checker only accepts block group item with valid chunk_objectid,
but the existing code from extent-tree-v2 didn't properly initialize it.
This patch will also fix above mentioned problem so kernel can mount it
correctly.
Now mkfs/fsck should be able to handle the fs with block group tree.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In read_tree_block, extent buffer EXTENT_BAD_TRANSID flagged will
be added into fs_info->recow_ebs with an increment of its refs.
The corresponding free_extent_buffer should be called after we
fix transid error by cowing extent buffer then remove them from
fs_info->recow_ebs.
Otherwise, extent buffers will be leaked as fsck-tests/002 reports:
===================================================================
====== RUN CHECK /root/btrfs-progs/btrfs check --repair --force ./default_case.img.restored
parent transid verify failed on 29360128 wanted 9 found 755944791
parent transid verify failed on 29360128 wanted 9 found 755944791
parent transid verify failed on 29360128 wanted 9 found 755944791
Ignoring transid failure
[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
extent buffer leak: start 29360128 len 4096
enabling repair mode
===================================================================
Fixes: c64485544b ("Btrfs-progs: keep track of transid failures and fix them if possible")
Signed-off-by: Su Yue <glass@fydeos.io>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a bug report that, one btrfs got its underlying device shrunk
accidentally.
Fortunately the user has no data at the truncated range. However kernel
will reject such filesystem, while btrfs-check reports nothing wrong
with it.
This can be easily reproduced by:
# truncate -s 1G test.img
# mkfs.btrfs test.img
# truncate -s 996M test.img
# btrfs check test.img
Opening filesystem to check...
Checking filesystem on test.img
UUID: dbf0a16d-f158-4383-9025-29d7f4c43f17
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 16527360 bytes used, no error found
^^^^^^^^^^^^^^
total csum bytes: 13836
total tree bytes: 2359296
total fs tree bytes: 2162688
total extent tree bytes: 65536
btree space waste bytes: 503569
file data blocks allocated: 14168064
referenced 14168064
[CAUSE]
Btrfs check really only checks the metadata cross references, not really
bothering if the underlying device has correct size. Thus we completely
ignored such size mismatch.
[FIX]
For both regular and lowmem mode, add extra check against the underlying
block device size.
If the block device size is smaller than its total_bytes, gives a error
message and error out.
Now the check looks like this for both modes:
...
[2/7] checking extents
ERROR: block device size is smaller than total_bytes in device item, has 1046478848 expect >= 1073741824
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space tree
...
found 16527360 bytes used, error(s) found
Issue: #504
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Commit 06b6ad5e01 ("btrfs-progs: check: check for invalid free space
tree entries") makes btrfs check to report eb leakage even on newly
created btrfs:
# mkfs.btrfs -f test.img
# btrfs check test.img
Opening filesystem to check...
Checking filesystem on test.img
UUID: 13c26b6a-3b2c-49b3-94c7-80bcfa4e494b
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 147456 bytes used, no error found
total csum bytes: 0
total tree bytes: 147456
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 140595
file data blocks allocated: 0
referenced 0
extent buffer leak: start 30572544 len 16384 <<< Extent buffer leakage
[CAUSE]
The patch in mailinglist uses a dynamically allocated path while the
committed one has been converted to on-stack path, which is preferred.
However, the cleanup was not done properly. We only release the path
inside the while loop, no at out label. This means, if we hit error or
even just exhausted free space tree as expected, we will leak the path
to free space tree root.
Thus leading to the above leak report.
[FIX]
Fix the bug by calling btrfs_release_path() at out: label too.
This should make the code behave the same as the patch submitted to the
mailing list.
Fixes: 06b6ad5e01 ("btrfs-progs: check: check for invalid free space tree entries")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While testing some changes to how we reclaim block groups I started
hitting failures with my TEST_DEV. This occurred because I had a bug
and failed to properly remove a block groups free space tree entries.
However this wasn't caught in testing when it happened because
btrfs check only checks that the free space cache for the existing block
groups is valid, it doesn't check for free space entries that don't have
a corresponding block group.
Fix this by checking for free space entries that don't have a
corresponding block group. Additionally add a test image to validate
this fix.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add constant for initial value to avoid unexpected clashes with user
defined getopt values and shift the common size getopt values.
Signed-off-by: David Sterba <dsterba@suse.com>
There are several call sites utilizing btrfs_commit_transaction() just
to update members in super blocks, without any metadata update.
This can be problematic for some simple call sites, like zero_log_tree()
or check_and_repair_super_num_devs().
If we have big problems preventing the fs to be mounted in the first
place, and need to clear the log or super block size, but by some other
problems in extent tree, we're unable to allocate new blocks.
Then we fall into a deadlock that, we need to mount (even
ro,rescue=all) to collect extra info, but btrfs-progs can not do any
super block updates.
Fix the problem by allowing the following super blocks only operations
to be done without using btrfs_commit_transaction():
- btrfs_fix_super_size()
- check_and_repair_super_num_devs()
- zero_log_tree().
There are some exceptions in btrfstune.c, related to the csum type
conversion and seed flags.
In those btrfstune cases, we in fact wants to proper error report in
btrfs_commit_transaction(), as those operations are not mount critical,
and any early error can be helpful to expose any problems in the fs.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It should have been deleted, as CHANGES mentioned this in v5.14, but
obvious it's not.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When testing my new RAID56J code, there is a bug causing dev extents
overlapping.
Although both modes can detect the problem, lowmem has leaked some
extent buffers:
$ btrfs check --mode=lowmem /dev/test/scratch1
Opening filesystem to check...
Checking filesystem on /dev/test/scratch1
UUID: 65775ce9-bb9d-4f61-a210-beea52eef090
[1/7] checking root items
[2/7] checking extents
ERROR: dev extent devid 1 offset 1095761920 len 1073741824 overlap with previous dev extent end 1096810496
ERROR: dev extent devid 2 offset 1351614464 len 1073741824 overlap with previous dev extent end 1352663040
ERROR: dev extent devid 3 offset 1351614464 len 1073741824 overlap with previous dev extent end 1352663040
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs done with fs roots in lowmem mode, skipping
[7/7] checking quota groups skipped (not enabled on this FS)
found 3221372928 bytes used, error(s) found
total csum bytes: 0
total tree bytes: 147456
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 136231
file data blocks allocated: 3221225472
referenced 3221225472
extent buffer leak: start 30752768 len 16384
extent buffer leak: start 30752768 len 16384
extent buffer leak: start 30752768 len 16384
[CAUSE]
In the function check_dev_item(), we iterate through all the dev
extents, but when we found overlapping extents, we exit without
releasing the path, causing extent buffer leakage.
[FIX]
Just release the path before we exit the function.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The function read_extent_from_disk() is only a wrapper to read tree
block.
And read_extent_data() is just a while loop to eliminate short read
caused by stripe boundary.
In fact, a lot of call sites of read_extent_data() are either reading
metadata (thus no possible short read) or doing extra loop by
themselves.
This patch will replace those two functions with read_data_from_disk(),
making it the only entrance for data/metadata read.
And update read_data_from_disk() to return the read bytes, so caller can
do a simple while loop.
For the few callers of read_extent_data(), open-code a small while loop
for them.
This will allow later RAID56 read repair using P/Q much easier.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
The following script can lead to false positive from btrfs check:
mkfs.btrfs -f $dev1
mount $dev1 $mnt
btrfstune -S1 $dev1
mount $dev1 $mnt
btrfs dev add -f $dev2 $mnt
umount $mnt
# Now dev1 is seed, and dev2 is the rw fs.
btrfs check $dev2
...
[2/7] checking extents
WARNING: minor unaligned/mismatch device size detected
WARNING: recommended to use 'btrfs rescue fix-device-size' to fix it
...
This false positive only happens on $dev2, $dev1 is completely fine.
[CAUSE]
The warning is from is_super_size_valid(), in that function we verify
the super block total bytes (@super_bytes) is correct against the total
device bytes (@total_bytes).
However the when calculating @total_bytes, we only use devices in
current fs_devices, which only contains RW devices.
Thus all bytes from seed device are not taken into consideration, and
trigger the false positive.
[FIX]
Fix it by also iterating seed devices.
Since we're here, also output @total_bytes and @super_bytes when
outputting the warning message, to allow end users have a better idea on
what's going wrong.
Reviewed-by: Su Yue <l@damenly.su>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While working on my Windows driver, I found that it was inadvertently
allowing users to create xattrs with names longer than 255 bytes, which
wasn't being picked up by btrfs-check.
If the Linux driver encounters a file with an invalid xattr like this,
it makes the whole directory it's in inaccessible. If it's the root
directory, it'll refuse to mount the filesystem entirely.
Pull-request: #456
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a bug report of kernel rejecting fs which has a mismatch in
super num devices and num devices found in chunk tree.
But btrfs-check reports no problem about the fs.
[CAUSE]
We just didn't verify super num devices against the result found in
chunk tree.
[FIX]
Add such check and repair ability for btrfs-check.
The ability is mode independent.
Reported-by: Luca Béla Palkovics <luca.bela.palkovics@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CKmp3igr44okQ@mail.gmail.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a weird error message when running btrfs check on a specific
image:
[7/7] checking quota groups
ERROR: out of memory
ERROR: Loading qgroups from disk: -2
ERROR: failed to check quota groups
[CAUSE]
The "Out of memory" one is in load_quota_info(), which is output in two
cases:
- No memory can be allocated for btrfs_qgroup_list
AKA, real -ENOMEM.
- No qgroup can be found for either the child or the parent qgroup
This returnes -ENOENT.
Obvious the image has hit -ENOENT case, but the error message is fixed
to ENOMEM case.
[FIX]
Fix it by using %m to output the real reason of failure.
Reported-by: Andrei Borzenkov <arvidjaar@gmail.com>
Link: https://forums.opensuse.org/showthread.php/567851-btrfs-fails-to-load-qgroups-from-disk-with-error-2-(out-of-memory)
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
With the current set of changes we could probably do this check, but it
would involve changing the code quite a bit, and in the future we're not
going to track the metadata in the extent tree at all. Since this check
was for a very old kernel just skip it for extent tree v2.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We need to make sure we process the block group root, and mark its
blocks as used for the free space tree checking.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the case of per-bg roots we may be missing the root items. To
re-initialize them we want to add the root item as well as allocate the
empty block. To achieve this extract out the reinit root logic to a
helper that just takes the root key and then does the appropriate work
to allocate an empty root and update the root item. Fix the normal
reinit root helper to use this new helper.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The free space tree needs to be validated against all referenced blocks
in the file system, so use the btrfs_mark_used_blocks() helper to check
the free space tree and free space cache against. This will do the
right thing for both extent tree v1 and extent tree v2.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When we switch to per-block group extent roots we'll need to scan each
individual extent root. To make this easier in the future go ahead and
use the range of the block groups to scan the extents.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This makes the appropriate changes to enable the block group tree
checking for both lowmem and normal check modes. This is relatively
straightforward, simply need to use the helper to get the right root for
dealing with block groups.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When we're messing with block group items use the
btrfs_block_group_root() helper to get the correct root to search, and
this will do the right thing based on the file system flags.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Instead of accessing the extent root directory for modifying block
groups, use the helper which will do the correct thing based on the
flags of the file system.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When we change the size of the btrfs_header we're going to need to
change how these helpers calculate where to find the start of items or
block ptrs. To prepare for that make these helpers take the
extent_buffer as an argument so we can do the appropriate math based on
the version type.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that all callers are using the _nr variations we can simply rename
these helpers to btrfs_item_##member/btrfs_set_item_##member and change
the actual item SETGET funcs to raw_item_##member/set_raw_item_##member
and then change all callers to drop the _nr part.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
All callers use the btrfs_item_end_nr() variation, simply drop
btrfs_item_end() and make btrfs_item_end_nr() use the _nr() variations
of the item get helpers.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This matches how the kernel does it, simply pass in the slot and fix up
btrfs_file_extent_inline_item_len to use the btrfs_item_nr() helper and
the correct define. Fixup all the callers to use the slot now instead
of passing in the btrfs_item.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have a lot of the following patterns
item = btrfs_item_nr(nr);
btrfs_set_item_*(eb, item, val);
btrfs_set_item_*(eb, btrfs_item_nr(nr), val);
in a lot of places in our code. Instead add _nr variations of these
helpers and convert all of the users to this new helper.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have this pattern in a lot of places
item = btrfs_item_nr(slot);
btrfs_item_size(leaf, item);
btrfs_item_offset(leaf, item);
when we could simply use
btrfs_item_size_nr(leaf, slot);
btrfs_item_offset_nr(leaf, slot);
Fix all callers of btrfs_item_size() and btrfs_item_offset() to use the
_nr variation of the helpers.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When implementing the GC tree I started getting btrfsck errors when a
test rm -rf <directory> with files inside of it and immediately unmount,
leaving behind orphaned directory items that have GC items for them.
This made me realize that we don't actually handle this case currently
for our normal orphan path. If we fail to clean everything up and leave
behind the orphan items we'll fail fsck.
Fix this by not processing any backrefs we find if we found an inode
item and its nlink is 0. This allows us to pass the test case I've
provided to validate this patch.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While running make test on other patches I noticed we are now
segfaulting on the fuzz tests. This is because when I converted us to a
rb tree for the global roots we lost the ability to catch that there's
no extent root at all. Before we'd populate a dummy
fs_info->extent_root with a not uptodate node, but now you simply don't
get an extent root in the rb_tree. Fix the check_global_roots_uptodate
helper to count how many roots we find and make sure it matches the
number we expect. If it doesn't then we can return -EIO.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a bug report that kernel tree-checker rejected an invalid
metadata item:
corrupt leaf: block=934474399744 slot=68 extent bytenr=425173254144 len=16384 invalid tree level, have 33554432 expect [0, 7]
But original mode btrfs-check reports nothing wrong.
(lowmem mode will just crash, and fixed in previous patch).
[CAUSE]
For original mode it doesn't really check tree level, thus didn't find
the problem.
[FIX]
I don't have a good idea to completely make original mode to verify the
level in backref and in the tree block (while lowmem does that).
But at least we can detect obviously corrupted level just like kernel.
Now original mode will detect such problem:
...
[2/7] checking extents
ERROR: tree block 30457856 has bad backref level, has 256 expect [0, 7]
ref mismatch on [30457856 16384] extent item 0, found 1
tree backref 30457856 root 5 not found in extent tree
backpointer mismatch on [30457856 16384]
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space tree
...
Reported-by: Stickstoff <stickstoff@posteo.de>
Link: https://lore.kernel.org/linux-btrfs/6ed4cd5a-7430-f326-4056-25ae7eb44416@posteo.de/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When running lowmem mode with METADATA_ITEM which has invalid level, it
will crash with the following backtrace:
(gdb) bt
#0 0x0000555555616b0b in btrfs_header_bytenr (eb=0x4)
at ./kernel-shared/ctree.h:2134
#1 0x0000555555620c78 in check_tree_block_backref (root_id=5,
bytenr=30457856, level=256) at check/mode-lowmem.c:3818
#2 0x0000555555621f6c in check_extent_item (path=0x7fffffffd9c0)
at check/mode-lowmem.c:4334
#3 0x00005555556235a5 in check_leaf_items (root=0x555555688e10,
path=0x7fffffffd9c0, nrefs=0x7fffffffda30, account_bytes=1)
at check/mode-lowmem.c:4835
#4 0x0000555555623c6d in walk_down_tree (root=0x555555688e10,
path=0x7fffffffd9c0, level=0x7fffffffd984, nrefs=0x7fffffffda30,
check_all=1) at check/mode-lowmem.c:4967
#5 0x000055555562494f in check_btrfs_root (root=0x555555688e10, check_all=1)
at check/mode-lowmem.c:5266
#6 0x00005555556254ee in check_chunks_and_extents_lowmem ()
at check/mode-lowmem.c:5556
#7 0x00005555555f0b82 in do_check_chunks_and_extents () at check/main.c:9114
#8 0x00005555555f50ea in cmd_check (cmd=0x55555567c640 <cmd_struct_check>,
argc=3, argv=0x7fffffffdec0) at check/main.c:10892
#9 0x000055555556b2b1 in cmd_execute (argv=0x7fffffffdec0, argc=3,
cmd=0x55555567c640 <cmd_struct_check>) at cmds/commands.h:125
[CAUSE]
For function check_extent_item() it will go through inline extent items
and then check their backrefs.
But for METADATA_ITEM, it doesn't really validate key.offset, which is
u64 and can contain value way larger than BTRFS_MAX_LEVEL (mostly caused
by bit flip).
In that case, if we have a larger value like 256 in key.offset, then
later check_tree_block_backref() will use 256 as level, and overflow
path->nodes[level] and crash.
[FIX]
Just verify the level, no matter if it's from btrfs_tree_block_level()
(which is just u8), or it's from key.offset (which is u64).
To do the check properly and detect higher bits corruption, also change
the type of @level from u8 to u64.
Now lowmem mode can detect the problem properly:
...
[2/7] checking extents
ERROR: tree block 30457856 has bad backref level, has 256 expect [0, 7]
ERROR: extent[30457856 16384] level mismatch, wanted: 0, have: 256
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space tree
...
Reviewed-by: Su Yue <l@damenly.su>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When using `--init-csum-tree` with `--init-extent-tree`, csum tree
population will be done by iterating all file extent items.
This allow us to skip preallocated extents, but it still has the
following problems:
- Inodes with NODATASUM
- Hole file extents
- Written preallocated extents
We will generate csum for the whole extent, while other part may still
be unwritten.
Make it to follow the same behavior of recently introduced
fill_csum_for_file_extent(), so we can generate correct csum.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
If a btrfs filesystem has preallocated file extents, 'btrfs check
--init-csum-tree' will create csum item for preallocated extents, and
cause error:
# mkfs.btrfs -f test.img
# mount test.img /mnt/btrfs
# fallocate -l 32K /mnt/btrfs/file
# umount /mnt/btrfs
# btrfs check --init-csum-tree --force test.img
...
[4/7] checking fs roots
root 5 inode 257 errors 800, odd csum item
ERROR: errors found in fs roots
found 376832 bytes used, error(s) found
And the csum tree is not empty, containing csum for the preallocated
extent:
$ btrfs ins dump-tree -t csum test.img
btrfs-progs v5.15.1
checksum tree key (CSUM_TREE ROOT_ITEM 0)
leaf 30408704 items 1 free space 16226 generation 9 owner CSUM_TREE
leaf 30408704 flags 0x1(WRITTEN) backref revision 1
fs uuid ecc79835-5611-4609-b985-e4ccd6f15b54
chunk uuid b1c75553-5b82-4aa6-bbbe-e7f50643b1a8
item 0 key (EXTENT_CSUM EXTENT_CSUM 13631488) itemoff 16251 itemsize 32
range start 13631488 end 13664256 length 32768
[CAUSE]
For `--init-csum-tree` alone, we will use extent tree to iterate each
data extent, and calculate csum for them.
But extent items alone can not tell us if the file extent belongs to a
NODATASUM inode, nor if it's preallocated.
Thus we create csums for those data extents, and cause the problem.
[FIX]
However the fix is not that simple, we can not just generate csum for
non-preallocated range.
As the following case we still need csum for the un-referred part:
xfs_io -f -c "pwrite 0 8K" -c "sync" -c "punch 0 4K"
So here we have to go another direction by:
- Always generate csum for the whole data extent
This is the same as the old code
- Iterate the file extents, and delete csum for preallocated range
or NODATASUM inodes
Issue: #430
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This part has no mode specific operations, just move them into
mode-common.[ch].
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch fixes potential bugs in fixup_extent_refs(). If
btrfs_start_transaction() fails in some way and returns error ptr, It
goes to out logic. But old code checkes whether it is null and it calls
commit. This patch solves the problem with make that it calls only if
ret is no error.
Issue: #409
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We may have multiple extent roots, so cycle through all of the extent
roots and populate the csum tree based on the content of every extent
root we have in the file system.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Use the global roots tree to find all of the csum roots in the system
and check all of them as appropriate.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Instead of checking the csum and extent tree individually, walk through
the global roots and validate them all. This will work properly if we
have extent tree v1 or extent tree v2.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Instead of looking for the first extent root or csum root in memory,
scan through the tree root and re-init any root items that match the
given objectid. This will allow reinit to work with both extent tree v1
and extent tree v2 global roots when using the --reinit option.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When we switch to multiple global trees we'll need to access the
appropriate extent root depending on the block group or possibly root.
To handle this, use a helper in most places and then the actual root in
places where it is required. We will whittle down the direct accessors
with future patches, but this does the bulk of the preparatory work.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is going to be used for the extent tree v2 stuff more commonly, so
move it out so that it is accessible from everywhere that we need it.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We will walk all referenced tree blocks during check in order to avoid
writing over any referenced blocks during fsck. However in the future
we're going to need to do this for things like fixing block group
accounting with extent tree v2. This is because extent tree v2 will not
refer to all of the allocated blocks in the extent tree. Refactor the
code some to allow us to send down an arbitrary extent_io_tree so we can
use this helper for any case where we need to figure out where all the
used space is on an extent tree v2 file system.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
With extent tree v2 we will have per-block group checksums, so add a
helper to access the csum root and rename the fs_info csum_root to
_csum_root to catch all the places that are accessing it directly.
Convert everybody to use the helper except for internal things.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We pass the csum root from way high in the call chain in check down to
where we actually need it. However we can just get it from the fs_info
in these places, so clean up the functions to skip passing around the
csum root needlessly.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We are going to need to start looking up the csum root based on the
bytenr with extent tree v2. To that end stop passing the root to the
csum related functions so that can be done in the helper functions
themselves.
There's an unrelated deletion of a function prototype that no longer
exists.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We will lookup implied refs for every root id for every block that we
find when verifying qgroups, but we don't need to worry about non
fstrees, so skip them here.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
I screwed up a fix where we're setting the bytenr range as dirty when
marking all tree blocks used, I was looking at btrfs_pin_extent and put
->nodesize for end instead of the actual end, which is bytenr +
->nodesize - 1. Fix this up so it's correct.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
My tool ntfs2btrfs has been creating btrfs volumes in a way that the
kernel doesn't like, but which isn't picked up by btrfs check - see
maharmstone/ntfs2btrfs#23 for the details, including a backtrace. This
patch adds a check for when a csum item contains too many entries -
effectively it ensures that there's always at least sizeof(struct
btrfs_item) bytes free in the tree, otherwise btrfs_del_csums can throw
an error.
max_entries is the value of the __MAX_CSUM_ITEMS macro in
fs/btrfs/file-item.c.
Pull-request: #401
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: David Sterba <dsterba@suse.com>