Commit Graph

247 Commits

Author SHA1 Message Date
Wang Xiaoguang
ac80fca06c btrfs-progs: check: make low memory mode support partially dropped snapshots
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-21 11:49:57 +02:00
David Sterba
50882c447d Revert "btrfs-progs: check: supplement extent backref list with rbtree"
This reverts commit 31d8235410.

False report of backref mismatches, lots of messages similar to:

Incorrect local backref count on 12713984 root 5 owner 257 offset 12845056 found 1 wanted 0 back 0x7b3ed0
backpointer mismatch on [12713984 131072]

Repairing will make things worse. A fix has been proposed, but is not
finalized so we go with a revert.

Reported-by: Chris Murphy <bugzilla@colorremedies.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=155791
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-05 12:20:24 +02:00
David Sterba
b1c0235e11 Revert "btrfs-progs: check: switch to iterating over the backref_tree"
This reverts commit bbebe814c0.
2016-09-05 12:20:24 +02:00
Qu Wenruo
9a0f2f1e53 btrfs-progs: fsck: Avoid abort and BUG_ON in add_tree_backref
Add_tree_backref() can cause BUG_ON() and abort() in quite a lot of
cases, from the ENOMEM to existing tree backref records.

Change all these BUG_ON() and abort() to return proper values.
And modify all callers to handle such problems.

Reported-by: Lukas Lueg <lukas.lueg@gmail.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-05 10:05:01 +02:00
Qu Wenruo
77deae9cb7 btrfs-progs: fsck: Check bytenr alignment for extent item
Check bytenr alignment for extent item to filter invalid items early.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-05 10:04:40 +02:00
Qu Wenruo
0d2c2d4809 btrfs-progs: fsck: Check drop level before walking through fs tree
Exposed by fuzzed image from Lukas, which contains invalid drop level
(16), causing segfault when accessing path->nodes[drop_level].

This patch will check drop level against fs tree level and
BTRFS_MAX_LEVEL to avoid such problem.

Reported-by: Lukas Lueg <lukas.lueg@gmail.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-05 10:04:32 +02:00
Qu Wenruo
2f242115d1 btrfs-progs: Do extra chunk check before processing chunk item
Current we only do chunk validation check at mount time.

It's good for most case, but for fuzzed or manually crafted images, we
can insert a CHUNK_ITEM key into root tree.

Since mount time check will only check chunk tree, it will not check
CHUNK_ITEM in root tree.

Even with previous key type check against leaf owner, it is still
possible to modify the leaf owner to by-pass it.

So we still need to check chunk validation before processing it.

Reported-by: Lukas Lueg <lukas.lueg@gmail.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-05 10:04:16 +02:00
Qu Wenruo
f1934f4c78 btrfs-progs: check: ignore invalid key in invalid root
Btrfs tree implies a lot of restriction on which key types are allowed
in specific roots.

Like CHUNK_ITEM keys are only valid in chunk root.

This patch will add such check at run_next_block() for original mode.

Reported-by: Lukas Lueg <lukas.lueg@gmail.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-05 10:04:13 +02:00
David Sterba
c11bd9cfd2 btrfs-progs: pass OPEN_CTREE flags as unsigned
As we're passing a set of flags, the enum type is not appropriate.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-08-24 14:36:54 +02:00
David Sterba
87c1bd13c1 btrfs-progs: check: adjust command line options for the low-memory mode
Change the single-purpose option --low-memory to a generic option that
takes the mode. Currently supported are the original mode and the
low-memory in the same way.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-08-17 20:00:00 +02:00
Lu Fengqi
bfecd8bd6a btrfs-progs: check: introduce low memory mode
Introduce a new fsck mode: low memory mode.

Old btrfsck is working efficiently but uses some memory for each extent
item.  This method will ensure extents are only iterated once at
extent/chunk tree check process.

But since it uses some memory for each extent item, for a large fs with
several TB metadata, this can easily eat up memory and cause OOM.

To handle such limitation and improve scalability, the new low-memory
mode will not use any heap memory to record which extent is checked.
Instead it will use extent backref to avoid most of uneeded checks on
shared fs/subvolume tree blocks.
And with the use forward and backward reference cross check, we can also
ensure every tree block is at least checked once.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-08-17 19:04:51 +02:00
Lu Fengqi
dad817d3ba btrfs-progs: check: introduce traversal function for fsck
Introduce a new function traverse_tree_block() to do pre-order
traversal, to co-operate with new fs/subvolume tree skip function.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-08-17 18:51:21 +02:00
Lu Fengqi
3974c9117e btrfs-progs: check: introduce function to speed up fs tree check
Introduce function should_check() to reduced duplicated tree block check
for fs/subvolume tree.

The idea is, we only check the fs/subvolue tree block if we have the
lowest referencer rootid, according to extent tree.

In that case, we can skip a lot of fs/subvolume tree block checks if
there are a lot of snapshots.

Although we will do a lot of extent tree searches for it.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-08-17 18:51:21 +02:00
Lu Fengqi
fbf33f2ffd btrfs-progs: check: introduce main entry function for checking leaf items
Introduce an entry function, check_leaf_items() to check all
known/valuable items and update related accounting like total_bytes and
csum_bytes.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-08-17 18:51:21 +02:00
Lu Fengqi
f6647e2131 btrfs-progs: check: introduce function to check chunk item
Introduce function check_chunk_item() to check a chunk item.
It will check all chunk stripes with dev extents and the corresponding
block group item.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-08-17 18:51:20 +02:00
Lu Fengqi
406a4cdea4 btrfs-progs: check: introduce function to check block group item
Introduce function check_block_group_item() to check a block group item.
It will check the referencer chunk and the used space accounting with
extent tree.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-08-17 18:51:20 +02:00
Lu Fengqi
f9ed952b37 btrfs-progs: check: introduce function to check dev used space
Introduce function check_dev_item() to check used space with dev extent
items.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-08-17 18:51:20 +02:00
Lu Fengqi
d93fde8234 btrfs-progs: check: introduce function to check dev extent item
Introduce function check_dev_extent_item() to find its referencer chunk.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-08-17 18:51:20 +02:00
Lu Fengqi
ea5052e51f btrfs-progs: check: introduce function to check an extent
Introduce function check_extent_item() using previously introduced
functions.

With previous function to check referencer and backref, this function
can be quite easy.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-08-17 18:51:20 +02:00
Lu Fengqi
bf5e74204e btrfs-progs: check: introduce function to check shared data backref
Introduce the function check_shared_data_backref() to check the
referencer of a given shared data backref.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-08-17 18:51:20 +02:00
Lu Fengqi
4e6bbaebb8 btrfs-progs: check: introduce function to check referencer for data backref
Introduce new function check_extent_data_backref() to search referencer
for a given data backref.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-08-17 18:51:20 +02:00
Lu Fengqi
1035ca5118 btrfs-progs: check: introduce function to check shared block ref
Introduce function check_shared_block_backref() to check shared block
ref.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-08-17 18:51:20 +02:00
Lu Fengqi
d07873c007 btrfs-progs: check: introduce function to check referencer of a backref
Introduce a new function check_tree_block_backref() to check if a
backref points to correct referencer.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-08-17 18:51:20 +02:00
Lu Fengqi
ae5271dd52 btrfs-progs: check: introduce function to query tree block level
Introduce function query_tree_block_level() to resolve tree block level
by the following method:
1) tree block backref level
2) tree block header level

And only when header level == backref level, and transid matches, it will
return the tree block level.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-08-17 18:51:19 +02:00
Lu Fengqi
b0d360b541 btrfs-progs: check: introduce function to check data backref in extent tree
Introduce a new function check_data_extent_item() to check if the
corresponding data backref exists in extent tree.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-08-17 18:51:19 +02:00
Lu Fengqi
ae60006cee btrfs-progs: check: introduce function to check tree block backref in extent tree
Introduce function check_tree_block_ref() to check whether a tree block
has correct backref in extent tree.

Unlike old extent tree check method, we only use search_slot() to search
reference, no extra structure will be allocated in heap to record what we
have checked.

This method may cause a little more IO, but should work for super large
fs without triggering OOM.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-08-17 18:51:19 +02:00
Wang Xiaoguang
a13bfba3e2 btrfs-progs: eliminate some unnecessary btrfs_lookup_extent_info() calls in walk_down_tree()
In walk_down_tree(), we may call btrfs_lookup_extent_info() for same tree
block many times, obviously unnecessary. Here we define a simple struct to
record whether we already have gotten tree block's refs:
        struct node_refs {
                u64 bytenr[BTRFS_MAX_LEVEL];
                u64 refs[BTRFS_MAX_LEVEL];
        };

I fill a disk partition with linux kernel source codes and use below
test script to have performance test.
	#!/bin/bash

	echo 3 > /proc/sys/vm/drop_caches
	for ((i = 0; i < 20; i++)); do
		time ./btrfsck  /dev/sdc5
	done 2>&1 | grep real | awk -F "[ms]" '{run_time += $2} END{print run_time / 20}'

Before this patch, it averagely took 0.8447s for every btrfsck execution,
and with this patch, it averagely took 0.7807s.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-08-16 16:02:03 +02:00
Mark Fasheh
a804254361 btrfs-progs: check: write corrected qgroup info to disk
Now that we can verify all qgroups, we can write the corrected qgroups out
to disk when '--repair' is specified. The qgroup status item is also updated
to clear any out-of-date state. The repair_ functions were modeled after the
inode repair code in cmds-check.c.

I also renamed the 'scan' member of qgroup_status_item to 'rescan' in order
to keep consistency with the kernel.

Testing this was easy, I just reproduced qgroup inconsistencies via the
usual routes and had btrfsck fix them.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-13 18:44:26 +02:00
David Sterba
72e2a08a5c btrfs-progs: factor out repair mode
Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-04 15:22:53 +02:00
Jeff Mahoney
bbebe814c0 btrfs-progs: check: switch to iterating over the backref_tree
We now have two data structures that can be used to iterate the same data
set, and there may be quite a few of them in memory.  Eliminating the
list_head member will reduce memory consumption while iterating over
the extent backrefs.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-04 14:27:14 +02:00
Jeff Mahoney
31d8235410 btrfs-progs: check: supplement extent backref list with rbtree
For the pathlogical case, like xfstests generic/297 that creates a
large file consisting of one, repeating reflinked extent, fsck can
take hours.  The root cause is that calling find_data_backref while
iterating the extent records is an O(n^2) algorithm.  For my
example test run, n was 2*2^20 and fsck was at 8 hours and counting.

This patch supplements the list with an rbtree and drops the runtime
of that testcase to about 20 seconds.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-04 14:24:40 +02:00
Jeff Mahoney
93014d2353 btrfs-progs: check: add helpers for converting between structures
We either open code list_entry calls or outright cast between types.  The
compiler will do the right thing if we use static inlines to do
typesafe conversions.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-04 14:22:17 +02:00
Mark Fasheh
183995781f btrfs-progs: free qgroup counts in btrfsck
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17 17:16:44 +02:00
Nicholas D Steeves
bd2cc320af btrfs-progs: typo review of strings and comments
Signed-off-by: Nicholas D Steeves <nsteeves@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-01 14:56:56 +02:00
Qu Wenruo
e6de81e959 btrfs-progs: check: fix found bytes accounting error
In the new add_extent_rec_nolookup() function, we add bytes_used to
update found bytes accounting.

However there is a typo that we used tmpl->nr, which should be rec->nr.
This will make us to add 1 for data backref, instead the correct size.

Reported-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-11 15:53:06 +02:00
David Sterba
25b93eefe2 btrfs-progs: check: check stripe crossing against nodesize
The extent record's max_size might be 0 and the stripe crossing check
will report a false positive, should use the filesyste nodesize. There's
a global fs_info available.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-11 15:50:31 +02:00
David Sterba
b241a46c7c btrfs-progs: check: refactor add_extent_rec, reduce argument count
Similar to add_extent_rec_nolookup, pass the arguments via a temporary
structure. In case the extent is found, some of the values are not
assigned directly so the semantics is preserved.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-11 15:50:29 +02:00
David Sterba
5b1bbc8924 btrfs-progs: check: reduce size of extent_record
There are just 3 values of flag_block_full_backref, we can utilize a
bitfield and save 8 bytes (192 now).

Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-11 15:50:26 +02:00
David Sterba
427643f069 btrfs-progs: check: simplify assignments in add_extent_rec_nolookup
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-11 15:50:24 +02:00
David Sterba
a087884799 btrfs-progs: check: pass a template to add_extent_rec_nolookup
Reduce number of parameters that just fill the extent_record from a
temporary template that's supposed to be zeroed and filled by the
callers.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-11 15:50:21 +02:00
Qu Wenruo
f172bd2b8d btrfs-progs: Fix return value bug of qgroups check
Before this patch, although btrfsck will check qgroups if quota is
enabled, it always return 0 even qgroup numbers are corrupted.

Fix it by allowing return value from report_qgroups function (formally
defined as print_qgroup_difference).

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-02 14:42:28 +02:00
David Sterba
5f8cd780b9 btrfs-progs: check: use add_extent_rec_nolookup after lookups
The lookup was duplicated, use the helper that does not do it.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-02 14:41:35 +02:00
David Sterba
e1a5ecc206 btrfs-progs: check: refactor add_extent_rec, separate lookup
Separate the part of add_extent_rec that comes after the lookup does not
succeed, there are callers interested in just this.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-02 14:41:27 +02:00
David Sterba
e83012d57e btrfs-progs: check: cleanup, move structure definitions to the beginning
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-02 14:41:07 +02:00
David Sterba
3be6e3e7c9 btrfs-progs: deprecate and stop using btrfs_level_size
Size of a b-tree node is always nodesize, regardless of the level.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-02 14:40:23 +02:00
David Sterba
2a796d84af btrfs-progs: replace leafsize with nodesize
Nodesize is used in kernel, the values are always equal. We have to keep
leafsize in headers, similarly the tree setting functions still take and
set leafsize, but it's effectively a no-op.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-02 14:40:18 +02:00
Qu Wenruo
12234d0202 btrfs-progs: fsck: Fix a false metadata extent warning
At least 2 user from mail list reported btrfsck reported false alert of
"bad metadata [XXXX,YYYY) crossing stripe boundary".

While the reported number are all inside the same 64K boundary.
After some check, all the false alert have the same bytenr feature,
which can be divided by stripe size (64K).

The result seems to be initial 'max_size' can be 0, causing 'start' +
'max_size' - 1, to cross the stripe boundary.

Fix it by always update extent_record->cross_stripe when the
extent_record is updated, to avoid temporary false alert to be reported.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-02 14:40:05 +02:00
David Sterba
378e12701a btrfs-progs: check: unify naming of long option values
We use GETOP_VAL_ .

Signed-off-by: David Sterba <dsterba@suse.com>
2016-03-14 13:42:47 +01:00
David Sterba
28d924290e btrfs-progs: docs: update check options
Signed-off-by: David Sterba <dsterba@suse.com>
2016-03-14 13:42:47 +01:00
David Sterba
d3257893d5 btrfs-progs: check: drop short option for --chunk-tree
The need to specify the chunk root is not that common, we will reserve
the short option -c for later use.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-03-14 13:42:47 +01:00