Do not use fprintf, adjust messages, add verbose errno or at least the
errorr code if there's no clear mapping to a string.
Signed-off-by: David Sterba <dsterba@suse.com>
Change the single-purpose option --low-memory to a generic option that
takes the mode. Currently supported are the original mode and the
low-memory in the same way.
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new fsck mode: low memory mode.
Old btrfsck is working efficiently but uses some memory for each extent
item. This method will ensure extents are only iterated once at
extent/chunk tree check process.
But since it uses some memory for each extent item, for a large fs with
several TB metadata, this can easily eat up memory and cause OOM.
To handle such limitation and improve scalability, the new low-memory
mode will not use any heap memory to record which extent is checked.
Instead it will use extent backref to avoid most of uneeded checks on
shared fs/subvolume tree blocks.
And with the use forward and backward reference cross check, we can also
ensure every tree block is at least checked once.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new function traverse_tree_block() to do pre-order
traversal, to co-operate with new fs/subvolume tree skip function.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce function should_check() to reduced duplicated tree block check
for fs/subvolume tree.
The idea is, we only check the fs/subvolue tree block if we have the
lowest referencer rootid, according to extent tree.
In that case, we can skip a lot of fs/subvolume tree block checks if
there are a lot of snapshots.
Although we will do a lot of extent tree searches for it.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce an entry function, check_leaf_items() to check all
known/valuable items and update related accounting like total_bytes and
csum_bytes.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce function check_chunk_item() to check a chunk item.
It will check all chunk stripes with dev extents and the corresponding
block group item.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce function check_block_group_item() to check a block group item.
It will check the referencer chunk and the used space accounting with
extent tree.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce function check_dev_item() to check used space with dev extent
items.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce function check_dev_extent_item() to find its referencer chunk.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce function check_extent_item() using previously introduced
functions.
With previous function to check referencer and backref, this function
can be quite easy.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce the function check_shared_data_backref() to check the
referencer of a given shared data backref.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce new function check_extent_data_backref() to search referencer
for a given data backref.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce function check_shared_block_backref() to check shared block
ref.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new function check_tree_block_backref() to check if a
backref points to correct referencer.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce function query_tree_block_level() to resolve tree block level
by the following method:
1) tree block backref level
2) tree block header level
And only when header level == backref level, and transid matches, it will
return the tree block level.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new function check_data_extent_item() to check if the
corresponding data backref exists in extent tree.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce function check_tree_block_ref() to check whether a tree block
has correct backref in extent tree.
Unlike old extent tree check method, we only use search_slot() to search
reference, no extra structure will be allocated in heap to record what we
have checked.
This method may cause a little more IO, but should work for super large
fs without triggering OOM.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In walk_down_tree(), we may call btrfs_lookup_extent_info() for same tree
block many times, obviously unnecessary. Here we define a simple struct to
record whether we already have gotten tree block's refs:
struct node_refs {
u64 bytenr[BTRFS_MAX_LEVEL];
u64 refs[BTRFS_MAX_LEVEL];
};
I fill a disk partition with linux kernel source codes and use below
test script to have performance test.
#!/bin/bash
echo 3 > /proc/sys/vm/drop_caches
for ((i = 0; i < 20; i++)); do
time ./btrfsck /dev/sdc5
done 2>&1 | grep real | awk -F "[ms]" '{run_time += $2} END{print run_time / 20}'
Before this patch, it averagely took 0.8447s for every btrfsck execution,
and with this patch, it averagely took 0.7807s.
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In read_one_chunk(), we may add an empty entry for a missing device.
However, this entry wasn't being added to the dev_list, and so it never
got freed.
Signed-off-by: Justin Maggard <jmaggard@netgear.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The raid6 code matches kernel implementation that also does the
unaligned access. So to keep the code close, add helpers for unaligned
native access and use them. The helpers are local as we don't plan to
use them elsewhere.
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This will not cause unaligned access as the checksum is at the beginning
of btrfs_header and thus aligned to a page, but for clarity use the
helper.
Signed-off-by: David Sterba <dsterba@suse.com>
The message about discard is printed unconditionally and does not
conform to the --quite option eg. in mkfs. Consolidate the operation
flags into one argument and add support for verbosity.
Signed-off-by: David Sterba <dsterba@suse.com>
The extent_buffer::data might be unaligned wrt unsigned long, depends on
acutal layout of the structure and width of the int types. Use explicit
unaligned access helpers.
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
cmds-balance.c: In function 'cmd_balance_start':
cmds-balance.c:654:6: warning: ignoring return value of 'chdir', declared with
attribute warn_unused_result [-Wunused-result]
chdir("/");
Reported-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently, balance operations are run synchronously in the foreground.
This is nice for interactive management, but is kind of crappy when you
start looking at automation and similar things.
This patch adds an option to `btrfs balance start` to tell it to
daemonize prior to running the balance operation, thus allowing us to
preform balances asynchronously. The two biggest use cases I have for
this are starting a balance on a remote server without establishing a
full shell session, and being able to background the balance in a
recovery shell (which usually has no job control) so I can still get
progress information.
Because it simply daemonizes prior to calling the balance ioctl, this
doesn't actually need any kernel support.
Signed-off-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Copy le_test_bit() from the kernel and use that for the free space tree
bitmaps.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
I have a valid btrfs image which contains,
...
item 10 key (1103101952 BLOCK_GROUP_ITEM 1288372224) itemoff 15947 itemsize 24
block group used 655360 chunk_objectid 256 flags DATA|RAID5
item 11 key (1103364096 EXTENT_ITEM 131072) itemoff 15894 itemsize 53
extent refs 1 gen 11 flags DATA
extent data backref root 5 objectid 258 offset 0 count 1
item 12 key (1103888384 EXTENT_ITEM 262144) itemoff 15841 itemsize 53
extent refs 1 gen 15 flags DATA
extent data backref root 1 objectid 256 offset 0 count 1
item 13 key (1104281600 EXTENT_ITEM 262144) itemoff 15788 itemsize 53
extent refs 1 gen 15 flags DATA
extent data backref root 1 objectid 257 offset 0 count 1
...
The extent [1103364096, 131072) has length 131072, but if we run
"btrfs-map-logical -l 1103364096 -b $((65536 * 3)) /dev/sda"
it will return mapping info 's of non-existing extents.
It's because it assumes that extents's are contiguous on logical address,
when it's not true, after one loop (cur_logical += cur_len) and mapping
the next extent, we can get an extent that is out of our search range and
we end up with a negative @real_len and printing all mapping infos till
the disk end.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>