[BUG]
For a degraded RAID5, btrfs check will fail to even read the chunk root:
# mkfs.btrfs -f -m raid5 -d raid5 $dev1 $dev2 $dev3
# wipefs -fa $dev1
# btrfs check $dev2
Opening filesystem to check...
warning, device 1 is missing
bad tree block 22036480, bytenr mismatch, want=22036480, have=0
ERROR: cannot read chunk root
ERROR: cannot open file system
[CAUSE]
Although read_tree_block() function from btrfs-progs is properly
iterating the mirrors (mirror 1 is reading from the disk directly,
mirror 2 will be rebuild from parity), the raid56 recovery path is not
handling the read error correctly.
The existing code will try to read the full stripe, but any read failure
(including missing device) will immediately cause an error:
for (i = 0; i < num_stripes; i++) {
ret = btrfs_pread(multi->stripes[i].dev->fd, pointers[i],
BTRFS_STRIPE_LEN, multi->stripes[i].physical,
fs_info->zoned);
if (ret < BTRFS_STRIPE_LEN) {
ret = -EIO;
goto out;
}
}
[FIX]
To make failed_a/failed_b calculation much easier, and properly handle
too many missing devices, here this patch will introduce a new bitmap
based solution.
The new @failed_stripe_bitmap will represent all the failed stripes.
So the initial read will mark all the missing devices in the
@failed_stripe_bitmap, and later operations will all operate on that
bitmap.
Only before we call raid56_recov(), we convert the bitmap to the old
failed_a/failed_b interface and continue.
Now btrfs check can handle above case properly:
# btrfs check $dev2
Opening filesystem to check...
warning, device 1 is missing
Checking filesystem on /dev/test/scratch2
UUID: 8b2e1cb4-f35b-4856-9b11-262d39d8458b
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 147456 bytes used, no error found
total csum bytes: 0
total tree bytes: 147456
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 139871
file data blocks allocated: 0
referenced 0
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The radix-tree is not used in userspace code. In kernel it's for
tracking unpersisted and in-memory structures and has been replaced by
the xarray.
Signed-off-by: David Sterba <dsterba@suse.com>
Some older compilers do not support overflow builtins introduced in
5ad2aacd24 ("btrfs-progs: kernel-lib: sync include/overflow.h"). Add
stubs to make it compile. This fixes CI build of Centos 7.
Signed-off-by: David Sterba <dsterba@suse.com>
The snapper build fails due to updates to kernel-lib files, the type
casts do not work the same way in C++. Simplify READ_ONCE/WRITE_ONCE
even more, drop use of 'new' as identifier.
Issue: https://github.com/openSUSE/snapper/issues/725
Signed-off-by: David Sterba <dsterba@suse.com>
Copy inline helpers for the cached variant of the rbtree, not used yet.
Rename 'new' for C++ compatibility.
Signed-off-by: David Sterba <dsterba@suse.com>
In order to use rb_root_cached we need to sync with kernel sources. Copy
the file from linux.git/include/linux/rbtree_types.h and update so it's
C++ protected for inclusion to libbtrfs and remove duplicate
definitions.
Signed-off-by: David Sterba <dsterba@suse.com>
There is a bug in raid56_recov() which doesn't properly repair data and
P case corruption:
/* Data and P*/
if (dest2 == nr_devs - 1)
return raid6_recov_datap(nr_devs, stripe_len, dest1, data);
Note that, dest1/2 is to indicate which slot has corruption.
For RAID6 cases:
[0, nr_devs - 2) is for data stripes,
@data_devs - 2 is for P,
@data_devs - 1 is for Q.
For above code, the comment is correct, but the check condition is
wrong, and leads to the only project, btrfs-fuse, to report raid6
recovery error for 2 devices missing case.
Fix it by using correct condition.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Decrease dependency on system headers, remove where they're not needed
or became stale after code moved. The path-utils.h encapsulate path
operations so include linux/limits.h here, that's where PATH_MAX is
defined.
Signed-off-by: David Sterba <dsterba@suse.com>
With the introduction of xxhash64 to btrfs-progs we created a crypto/
directory for all the hashes used in btrfs (although no
cryptographically secure hash is there yet).
Move the crc32c implementation from kernel-lib/ to crypto/ as well so we
have all hashes consolidated.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Copy of include/linux/overflow.h from the kernel.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
[ split from the original patch ]
Signed-off-by: David Sterba <dsterba@suse.com>
Create directory for all sources that can be used by anything that's not
rellated to a relevant kernel part, all common functions, helpers,
utilities that do not fit any other specific category.
The traditional location would be probably lib/ with all things that are
statically linked to the main binaries, but we have libbtrfs and
libbtrfsutil so this would be confusing.
Signed-off-by: David Sterba <dsterba@suse.com>
Replaced bswap with _ variants bswap_32 etc. While it's a glibc
extension, all of the popular libc implementations (glibc, uClibc, musl,
BIONIC) seem to support it.
Added static inline to two functions to match little endian variants. This
fixes a linking error experienced when compiling on gcc 7.3.0 with LTO,
possibly a bug that was fixed later.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This commit introduces explicit little endian bit operations. The only
difference with the existing bitops implementation is that bswap(32|64)
is called when the _le versions are invoked on a big-endian machine.
This is in preparation for adding free space tree conversion support.
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Replace existing find_*_bit functions with kernel equivalent. This
reduces duplication, simplifies the code (we really have one worker
function _find_next_bit) and is quite likely faster. No functional
changes.
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The contents of tables.c hasn't changed for more than 15 years, we don't
expect any changes to current contents. New tables might be still added,
in that case the file should be regenerated using the included mktables
tool and updated.
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a wrapper to recover raid56 data.
The logical is the same with kernel one, but with different interfaces,
since kernel ones cares the performance while in btrfs we don't care
that much.
And the interface is more caller friendly inside btrfs-progs.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Copied from kernel lib/raid6/recov.c.
Minor modifications includes:
- Rename from raid6_datap_recov_intx() to raid5_recov_datap()
- Rename parameter from faila to dest1
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Copied from kernel lib/raid6/recov.c raid6_2data_recov_intx1() function.
With the following modification:
- Rename to raid6_recov_data2() for shorter name
- s/kfree/free/g modification
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Use kernel RAID6 galois tables for later RAID6 recovery.
Galois tables file, kernel-lib/tables.c is generated by user space
program, mktable.
Galois field tables declaration, in kernel-lib/raid56.h, is completely
copied from kernel.
The mktables.c is copied from kernel with minor header/macro
modification, to ensure the generated tables.c works well in
btrfs-progs.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new header, kernel-lib/raid56.h, for later raid56 works.
It contains 2 functions, from original btrfs-progs code:
void raid6_gen_syndrome(int disks, size_t bytes, void **ptrs);
int raid5_gen_result(int nr_devs, size_t stripe_len, int dest, void **data);
Will be expanded later and some part of it(RAID6 recover part) may keep
sync with kernel later.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ unify gpl header, rename header macro ]
Signed-off-by: David Sterba <dsterba@suse.com>
Large numbers like (1024 * 1024 * 1024) may cost reader/reviewer to
waste one second to convert to 1G.
Introduce kernel include/linux/sizes.h to replace any intermediate
number larger than 4096 (not including 4096) to SZ_*.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
ASAN reports that at some point the crc function gets an unaligned
buffer. It's the optimized intel version that casts char to ulong, the
buffer is the embedded filename in the directory items.
Signed-off-by: David Sterba <dsterba@suse.com>