Before the patch, chunk will be considered bad if the corresponding
block group is missing, even the only uncertain data is the 'used'
member of the block group.
This patch will try to recalculate the 'used' value of the block group
and rebuild it.
So even only chunk item and dev extent item is found, the chunk can be
recovered.
Although if extent tree is damanged and needed extent item can't be
read, the block group's 'used' value will be the block group length, to
prevent any later write/block reserve damaging the block group.
In that case, we will prompt user and recommend them to use
'--init-extent-tree' to rebuild extent tree if possible.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
As of now commands mentioned below (with in [..]) are calling call register-device
ioctl BTRFS_IOC_SCAN_DEV for all the devices in the system.
Some issues with it:
BTRFS_IOC_SCAN_DEV: ioctl is a write operation, we don't want command like
btrfs-debug-tree threads to do that..
eg:
----
$ cat /proc/fs/btrfs/devlist | egrep fsid | wc -l
0
$ btrfs-debug-tree /dev/sde (num_device > 1)
$ cat /proc/fs/btrfs/devlist | egrep fsid | wc -l
5
----
btrfs_scan_fs_devices() ends up calling this ioctl only when num_device > 1.
That's inconsistency with in feature/bug.
We don't have to register _all_ the btrfs devices (again) in the system
without user consent.
Why its inconsistent:
function btrfs_scan_fs_devices() calls btrfs_scan_lblkid only when
num_devices is > 1, which in turn calls BTRFS_IOC_SCAN_DEV ioctl, if
conditions are met.
But main issue is we have too many consumers of btrfs_scan_fs_devices()
the names below with in [] is the cli leading to this function.
open_ctree_broken() [btrfs-find-root]
recover_prepare() [btrfs rescue super-recover]
__open_ctree_fd
(updates always except when flag OPEN_CTREE_RECOVER_SUPER is set and
flag OPEN_CTREE_RECOVER_SUPER is set only by 'btrfs rescue super-
recover' but still this thread sneaks through the open_ctree function
to call register-device-ioctl as show below).
open_ctree_fs_info
[btrfs-debug-tree]
[btrfs-image -r]
[btrfs check]
open_fs
[btrfs restore]
open_ctree
[calc-size]
[btrfs-corrupt-block]
[btrfs-image] (create)
[btrfs-map-logical]
[btrfs-select-super]
[btrfstune]
[btrfs-zero-log]
[tester]
[mkfs]
[quick-test.c]
[btrfs label set unmounted]
[btrfs get label unmounted]
[btrfs rescue super-recover]
open_ctree_fd
[btrfs-convert]
Fix:
In an effort to make register-device consistent, all calls to
btrfs_scan_fs_devices() will have 5th parameter set to 0. that means
we don't need 5th parameter at all. And with this function not calling
the register ioctl at all, finally we will have following two cli to call
the ioctl BTRFS_IOC_SCAN_DEV.
btrfs dev scan and
mkfs.btrfs
Threads needing to update kernel about a device would have to use
btrfs_register_one_device() separately.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
chunk-recover.c: In function btrfs_calc_stripe_index
chunk-recover.c:1481: warning: index may be used uninitialized in this function
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
recover_prepare() in chunk-recover.c alloc memory which only contains
sizeof(struct btrfs_super_block). This will cause glibc malloc error
after superblock csum is calculated.
Use BTRFS_SUPER_INFO_SIZE to fix the bug.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Btrfs-progs superblock checksum check is somewhat too restricted for
super-recover, since current btrfs-progs will only read the 1st
superblock and if you need super-recover the 1st superblock is
possibly already damaged.
The fix is introducing super_recover parameter for
btrfs_read_dev_super() and callers to allow scan backup superblocks if
needed.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The chunk-recover.c/BTRFS_NUM_MIRRORS in the userspace means
the same thing as ctree.h/BTRFS_MAX_MIRRORS in the kernelspace,
so to stay consistent with the kernelspace, just make this movement
in the userspace:
chunk-recover.c/BTRFS_NUM_MIRRORS
===>
ctree.h/BTRFS_MAX_MIRRORS
This provides convenience for future use.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When run chunk-recover on a health btrfs(data profile raid0, with
plenty of data), the program has a chance to abort on the number
of mirrors of an extent.
According to the kernel code, the max mirror number of an extent
is 3 not 2:
ctree.h: BTRFS_MAX_MIRRORS 3
chunk-recover.c : BTRFS_NUM_MIRRORS 2
just change BTRFS_NUM_MIRRORS to 3, and everything goes well.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When deal with the p & q stripes for data profile raid6, chunk-recover
forgets to insert them into the chunk record. Just insert them back
freely.
Also wrap the insert procedure into a new function, fill_chunk_up.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The 'num_unordered' will be recounted after 'goto out',
just remove it.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
If the csum of one stripe is not able to judge the order of two
device extents, the stripe may happen to belong to the device extent
that is already kicked out as ordered.
Take this condition into consideration, don't report failure and
give more tries with the stripes following.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When count the number of unordered device extents in chunk-recover,
the counter should be reinitialized to be used.
Also, introduce a new function for the counting job.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Free memory if open call fails. Prevent pthread_cancel on threads
which have already finished successfully. If all calls to
pthread_create and pthread_join are successful, we mistakenly call
pthread_cancel because cancel_from and cancel_to are both zero.
Make POSIX.1-2001 happy by supplying a non-NULL second argument to
pthread_setcanceltype.
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Fix corporate name for copyright.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
a clean up patch, the BTRFS_STRIPE_LEN is been duplicated across
btrfs-progs, the kernel defines it in volume.h so do the same
for progs.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Originally, multi devices are scanned one by one;
Now, one thread is used per device to scan.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Decide the raid0/5/6 data stripes' order using checksums.
For one chunk, fetch each 64k logical stripe
1. search its checksum in the csum tree
2. read the physical stripe data on each device
3. calc the data checksums
4. if one checksum matches the value from the csum tree,
then the logical stripe resides in that device,
the stripe order index can be calculated.
5. if more than one checksums match,
then use the successive csum in the tree to compare again.
6. if equal stripes are encountered, just fetch next stripe.
7. if some devices' order are still not decided, then they
can not be recovered.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
If no chunks need to be recovered, skip the recover works,
meanwhile the user won't be annoyed by the "ask_user".
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
When reading block groups we will searching it's corresponding chunk, however, at this
time, some chunks has not been built(data chunks raid0/raid10/raid56), don't bug_on here,
we will try to rebuild these chunks later.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
When allocating chunk root node, we should use nodesize rather than sectorsize,
this will casue regression when making other nodesize choice.(for example 16k size now)
Reported-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Internally, btrfs_header_chunk_tree_uuid() calculates an unsigned
long, but casts it to a pointer, while all callers cast it to unsigned
long again.
From btrfs commit b308bc2f05a86e728bd035e21a4974acd05f4d1e
Signed-off-by: Ross Kirk <ross.kirk@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Internally, btrfs_header_fsid() calculates an unsigned long, but casts
it to a pointer, while all callers cast it to unsigned long again.
Committed to btrfs as fba6aa75654394fccf2530041e9451414c28084f
Fix line length issues and match changes to kernelspace
Signed-off-by: Ross Kirk <ross.kirk@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Remove unused parameter, 'eb'. Unused since introduction in
7777e63b42
Signed-off-by: Ross Kirk <ross.kirk@gmail.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
If some fatal superblocks are damaged, running ioctl will return failure,
in this case, we should avoid run ioctl.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
The command has been moved and we should rename the files accordingly,
so the entry point is now in cmds-rescue.c and the core functionality
in it's own file.
Return codes of btrfs_recover_chunk_tree have been simplified not to
require a define and another file for defintion.
CC: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>