Introduce a new test image, which has an extent item with no inlined
extent data ref, but all keyed extent data ref.
Only in this case we can trigger fase data extent backref lost bug in
lowmem mode.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For keyed extent ref, its offset is calculated offset (file offset -
file extent offset), just like inlined extent data ref.
However the code is using file offset to hash extent data ref offset,
causing false backref lost warning like:
------
ERROR: data extent[16913485824 7577600] backref lost
------
Fixes: b0d360b541 ("btrfs-progs: check: introduce function to check data backref in extent tree")
Reported-by: Chris Murphy <chris@colorremedies.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When lowmem fsck tries to find backref of a specified file extent, it
searches inlined data ref first.
However, extent data ref contains both owner root objectid, inode number
and calculated offset (file offset - extent offset).
The code only checks owner root objectid, not checking inode number nor
calculated offset.
This makes lowmem mode fail to detect any backref mismatch if there is
a inlined data ref with the same owner objectid.
Fix it by also checking extent data ref's objectid and offset.
Fixes: b0d360b541 ("btrfs-progs: check: introduce function to check data backref in extent tree")
Reported-by: Chris Murphy <chris@colorremedies.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
v4.14 btrfs-progs can't pass new self test image with large tree reloc
trees. It will fail with later "shared_block_ref_only.raw.xz" test
image with NULL pointer access.
[CAUSE]
For image with higher (level >= 2) tree reloc tree, for function
need_check() its ulist will be empty as tree reloc tree won't be
accounted in btrfs_find_all_roots(). Then accessing ulist->roots with
rb_first() will return NULL pointer.
[FIX]
For need_check() function, if @roots is empty, meaning it's a tree reloc
tree, always check them. Although this can be slow, but at least it's
safe that we won't skip any possible wrong tree block.
Fixes: 5e2dc77047 ("btrfs-progs: check: skip shared node or leaf check for low_memory mode")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Commit 723427d7e6 ("btrfs-progs: check: change the way lowmem mode
traverses metadata") introduces a regression which could make some fsck
self test case to fail.
For fsck test case 004-no-dir-item, btrfs check --mode=lowmem --repair
can cause BUG_ON() with ret = -17 (-EEXIST) when committing transaction.
The problem happens with the following backtrace:
./btrfs(+0x22045)[0x555d0dade045]
./btrfs(+0x2216f)[0x555d0dade16f]
./btrfs(+0x29df1)[0x555d0dae5df1]
./btrfs(+0x2a142)[0x555d0dae6142]
./btrfs(btrfs_alloc_free_block+0x78)[0x555d0dae6202]
./btrfs(__btrfs_cow_block+0x177)[0x555d0dad00a2]
./btrfs(btrfs_cow_block+0x116)[0x555d0dad05a8]
./btrfs(commit_tree_roots+0x91)[0x555d0db1fd4f]
./btrfs(btrfs_commit_transaction+0x18c)[0x555d0db20100]
./btrfs(btrfs_fix_super_size+0x190)[0x555d0db005a4]
./btrfs(btrfs_fix_device_and_super_size+0x177)[0x555d0db00771]
./btrfs(cmd_check+0x1757)[0x555d0db4f6ab]
./btrfs(main+0x138)[0x555d0dace5dd]
/usr/lib/libc.so.6(__libc_start_main+0xea)[0x7fa5e4613f6a]
./btrfs(_start+0x2a)[0x555d0dacddda]
The bug is triggered by that, extent allocator considers range
[29360128, 29376512) as free and allocates it. However when inserting
EXTENT_ITEM, btrfs finds there is already one tree block (fs tree root),
returning -EEXIST and causing the later BUG_ON().
[CAUSE]
The cause is in repair mode, lowmem check always pins all metadata
blocks. However pinned metadata blocks will be unpined when transaction
commits, and will be marked as *FREE* space.
So later extent allocator will consider such range free and allocates
them incorrectly.
[FIX]
Don't pin metadata blocks without valid reason or preparation (like
discard all free space cache to re-calculate free space on next write).
Fixes: 723427d7e6 ("btrfs-progs: check: change the way lowmem mode traverses metadata")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The standalone utility btrfs-show-super has been obsoleted by 'btrfs
inspect-internal dump-super' but it's still in the repository and should
build in case somebody still uses it.
Reported-by: "John L. Center" <jlcenter15@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently ctime/otime/stime/rtime of ROOT_ITEM are not printed in
print_root_item(). Fix this and print them if the values are not zero.
The function print_timespec() is moved forward to reuse.
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The helper script ./travis-should-run-test has been moved to a directory
in 4.13.3 but the path in the config was not updated. This was not
caught in the CI environment and the tests did not report a failure.
Signed-off-by: David Sterba <dsterba@suse.com>
The kernel 4.14 supports zstd, for version parity the btrfs-progs now
require libzstd by default. This can still be disabled by
./configure --disable-zstd.
Signed-off-by: David Sterba <dsterba@suse.com>
Build with musl libc needs the sys/types.h header for the dev_t type,
since this header is not included indirectly. This fixes the following
build failure:
In file included from convert/source-fs.c:23:0:
./convert/source-fs.h:112:1: error: unknown type name ‘dev_t’
dev_t decode_dev(u32 dev);
^~~~~
convert/source-fs.c:31:1: error: unknown type name ‘dev_t’
dev_t decode_dev(u32 dev)
^~~~~
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: David Sterba <dsterba@suse.com>
For cases like reloc trees and subvolume trees, their key offset is the
tree id. The key will be printed as:
(TREE_RELOC ROOT_ITEM 18446744073709551607)
The negative number is long and even guys with real engineer brains
can't easily get the meaning.
This patch will change the output format to:
(TREE_RELOC ROOT_ITEM DATA_RELOC_TREE)
While for special offset value like 0 or (u64)-1, it's still shown as
is.
Signed-off-by: Qu Wenruo <wqu@suse.com>
[ reword comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
The function update_qgroup has too many arguments that are too difficult
to use. Therefore, split it to update_qgroup_info, update_qgroup_limit,
update_qgroup_relation.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are reusable parts between update_qgroup and add_qgroup. So
introduce the function get_or_add_qgroup and use update_qgroup instead
of add_qgroup.
No functional changes.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a image which can reproduce the extent item referencer count
mismatch false alert for lowmem mode.
Reported-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The normal back reference counting doesn't care about the extent referred
by the extent data in the shared leaf. The check_extent_data_backref
function need to skip the leaf that owner mismatch with the root_id.
Reported-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Make lowmem mode output more detailed information about file extent
interrupt.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a image that the inlined extent coexist with the regular extent.
Reported-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The image has 2 problems mixed:
1) Too small super total_bytes
This super total_bytes is manually modified to create such problem.
2) Unaligned dev item total_bytes
This is created by v4.12 kernel, with 128M + 2K device added, and
original device removed.
Then we can create such image with unaligned dev item total_bytes.
Signed-off-by: Qu Wenruo <quwenruo.btrfs@gmx.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Along with the rescue introduced, also introduce check and repair for them.
Unlike normal check functions, some of the check is optional, and even if
the image failed to pass optional check, kernel can still runs fine.
(But may cause noisy kernel warning)
So some check, mainly for alignment, will not cause btrfs check to fail,
but only to output warning and instructs how to fix it.
For repair, it just calls the same repair function in rescue, and is
included in 'btrfs check --repair'.
But 'btrfs rescue' is still the preferred method, since it can be used
independent of all the 'check' passes, if we know what's the exact
problem to fix.
Signed-off-by: Qu Wenruo <quwenruo.btrfs@gmx.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce new subcommand 'fix-device-size' to the rescue group, to fix
device size alignment-related problems.
Especially for people unable to mount their fs with super::total_bytes
mismatch, this tool will fix the problems and let the mount continue.
Reported-by: Asif Youssuff <yoasif@gmail.com>
Reported-by: Rich Rauenzahn <rrauenza@gmail.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Recent kernel (starting from v4.6) will refuse to mount if super block
total bytes is smaller than all devices' size.
This makes end user unable to do anything to their otherwise quite
healthy fs.
To fix such problem, introduce repair function to fix it on an unmounted
filesystem.
Reported-by: Asif Youssuff <yoasif@gmail.com>
Reported-by: Rich Rauenzahn <rrauenza@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Recent kernel introduced alignment check for dev item, however older
kernel doesn't align device size when adding new device or shrinking
existing device.
This makes noisy kernel warning every time when any DEV_ITEM gets updated.
Introduce function to fix device size on an unmounted filesystem.
Reported-by: Asif Youssuff <yoasif@gmail.com>
Reported-by: Rich Rauenzahn <rrauenza@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch updates help/document of "btrfs device remove" in two points:
1. Add explanation of 'missing' for 'device remove'. This is only
written in wikipage currently.
(https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices)
2. Add example of device removal in the man document. This is because
that explanation of "remove" says "See the example section below", but
there is no example of removal currently.
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
[ move "" from the macro to help strings ]
Signed-off-by: David Sterba <dsterba@suse.com>
State that the 'delete' is the alias of 'remove' as the man page says.
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The underscore types are for ioctl structures and should not be used for
regular code that does not need them.
Signed-off-by: David Sterba <dsterba@suse.com>
Currently "fi usage" (and "dev usage") cannot run for the filesystem
using the seed device.
This is because FS_INFO ioctl returns the number of devices excluding
seeds, but load_device_info() tries to access valid device from devid 0
to max_id, and results in accessing seeds too (thus causing mismatching
number of devices).
Since only the size of non-seed devices matters, fix this by just
skipping seed device by checking device's fsid and comparing it to the
fsid obtained by FS_INFO ioctl.
Anand Jain:
%fi_args.num_devices provides number of devices excluding the seed device.
So when looping through the device list for a given fsid, determine if the
given device is a seed device by reading its superblock and then skip it
if its a seed device. Reading of the superblock is done by the function
dev_to_fsid() which can fail if the user is not root OR if the device has
media errors as well. So skip the seed check altogether if we fail to know
the device superblock and thus the fsid.
With this now we are able to view the btrfs fi usage when the device is
bad.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Move dev_to_fsid() from cmds-filesystem.c to cmds-fi-usage.c in order to
call it from both "fi show" and "fi usage".
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Seems to be a typo that, in (ret > 0) branch of check_mounted(),
zero-log set the return value but doesn't return.
Fix it by adding back the missing return.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We should not free the string until we don't call strtok any longer.
If the string is freed in advance, in fact, the second and subsequent
sort items will be ignored.
Fixes: 9fcdf8f894 ("btrfs-progs: don't write to optarg in btrfs_qgroup_parse_sort_string")
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
test_minimum_size() function is only called to check if provided device
is large enough.
However the minimal device size only needs to be calculated once, and
can be reused everywhere.
Refactor that function to make later modification easier.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For rollback, we only needs to open the fs to check if it meets the
condition to rollback. And this RW read makes us failed to rollback
btrfs with v2 space cache.
In fact, we don't even start a transaction during rollback.
So open the fs RO for rollback, to avoid v2 space cache problem.
Reported-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Reviewed-by: Gu JinXiang <gujx@cn.fujitsu.com>
Tested-by: Gu JinXiang <gujx@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
--rootdir option will start a transaction to fill the fs, however if
something goes wrong, from ENOSPC to lack of permission, we won't commit
the transaction and cause BUG_ON triggered by uncommitted transaction:
------
extent buffer leak: start 29392896 len 16384
extent_io.c:579: free_extent_buffer: BUG_ON `eb->flags & EXTENT_DIRTY` triggered, value 1
------
The root fix is to introduce btrfs_abort_transaction() in btrfs-progs,
however in this particular case, we can workaround it by force
committing the transaction.
Since during mkfs, the magic of btrfs is set to an invalid one, without
setting fs_info->finalize_on_close() the fs is never able to be mounted.
So even we force to commit wrong transaction we won't screw up things
worse.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For mkfs failure, especially --rootdir errors like EPERM/ENOSPC, the out
branch will overwrite the return value, causing wrong status code.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since we're calling btrfs_search_slot() the return value can be
positive. However we just pass that return value out, causing undefined
return value.
This can cause mkfs to return 1, which indicates something wrong.
Fix it.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When passing directory larger than block device using --rootdir
parameter, we get the following backtrace:
------
extent-tree.c:2693: btrfs_reserve_extent: BUG_ON `ret` triggered, value -28
./mkfs.btrfs(+0x1a05d)[0x557939e6b05d]
./mkfs.btrfs(btrfs_reserve_extent+0xb5a)[0x557939e710c8]
./mkfs.btrfs(+0xb0b6)[0x557939e5c0b6]
./mkfs.btrfs(main+0x15d5)[0x557939e5de04]
/usr/lib/libc.so.6(__libc_start_main+0xea)[0x7f83b101af6a]
./mkfs.btrfs(_start+0x2a)[0x557939e5af5a]
------
Nothing special, just BUG_ON() abusing from ancient code.
Fix them by using correct return.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is a warning in btrfs-filesystem(8) saying that running 'defrag'
in Linux will almost certainly break ref-links, with much data potentially
being physically duplicated.
However, many users tend to read man pages *after* trying to run things
on their own risk and may miss this important information. This commit
adds a brief copy of this warning into the command built-in help message
where it has good chances to be spotted before user is stuck with
a crowded filesystem.
Pull-request: #73
Signed-off-by: Pavel Kretov <firegurafiku@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>