The image contains one inode item with invalid generation. The image
can be crafted by "btrfs-corrupt-block -i 257 -f generation". It should
emulate the bad inode generation caused by older kernel around 2014.
The image is repairable for both original and lowmem mode.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are at least two bug reports of kernel tree-checker complaining
about invalid inode generation.
All offending inodes seem to be caused by old kernel around 2014, with
inode generation overflow.
So add such check and repair ability to lowmem mode check first.
This involves:
- Calculate the inode generation upper limit
Unlike the lowmem mode context, we don't have anyway to determine if
this inode belongs to log tree.
So we use super_generation + 1 as upper limit, just like what we did
in kernel tree checker.
- Check if the inode generation is larger than the upper limit
- Repair by resetting inode generation to current transaction
generation
The difference is, in original mode, we have a common trans handle for
all repair and reset path for each repair.
Reported-by: Charles Wright <charles.v.wright@gmail.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are at least two bug reports of kernel tree-checker complaining
about invalid inode generation.
All offending inodes seem to be caused by old kernel around 2014, with
inode generation overflow.
So add such check and repair ability to lowmem mode check first.
This involves:
- Calculate the inode generation upper limit
If it's an inode from log tree, then the upper limit is
super_generation + 1, otherwise it's super_generation.
- Check if the inode generation is larger than the upper limit
- Repair by resetting inode generation to current transaction
generation
Reported-by: Charles Wright <charles.v.wright@gmail.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add new test image for imode repair in subvolume trees.
The new test cases including the following cases:
- Regular file with bad imode
It still has the valid INODE_REF and parent dir has correct DIR_INDEX
and DIR_ITEM.
In this case, no matter if the file is empty or not, it should be
repaired using the info from DIR_INDEX of parent dir.
- Non-empty regular file with bad imode, and without INODE_REF
The file should be mostly an orphan, so no INODE_REF for imode lookup.
But it has EXTENT_DATA which should be enough for imode repair.
The repair also involves moving the orphan to lost+found dir.
- Non-empty dir with bad imode, and without INODE_REF
Pretty much the same case, but now a directory.
The repair also involves moving the orphan to lost+found dir.
Also rename the existing test case 039-bad-free-space-cache-inode-mode
to 039-bad-inode-mode, since now we can fix all bad imode.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
To make original mode to repair imode error in subvolume trees, this
patch will do:
- Remove the show-stopper checks for root->objectid.
Now repair_imode_original() will accept inodes in subvolume trees.
- Export detect_imode() for original mode
Due to the call requirement, original mode must use an existing trans
handler to do the repair, thus we need to re-implement most of the
work done in repair_imode_common().
- Make repair_imode_original() to use detect_imode().
- Free the path after reset_imode()
reset_imode() keeps the path, as lowmem mode uses path to locate its
current check position.
But for original mode, the unreleased path can cause later repair to
report warning, so we need to manually release the path.
- Update rec->imode after imode reset
So later repair depending on rec->imode can get correct value.
- Move the repair before repair_inode_nlinks()
repair_inode_nlinks() needs correct imode to add DIR_INDEX/DIR_ITEM.
So moving the repair before repair_inode_nlinks() makes the latter
repair happier.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For lowmem mode, if we hit a bad inode mode, normally it is reported
when we checking the DIR_INDEX/DIR_ITEM of the parent inode.
If we didn't repair at that time, the error will be recorded even if we
fixed it later.
So this patch will check for INODE_ITEM_MISMATCH error type, and if it's
really caused by invalid imode, repair it and clear the error.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[[PROBLEM]]
Before this patch, repair_imode_common() can only handle two types of
inodes:
- Free space cache inodes
- ROOT DIR inodes
For inodes in subvolume trees, the core complexity is how to determine
the correct imode, thus it was not implemented.
However there are more reports of incorrect imode in subvolume trees, we
need to support such fix.
[[ENHANCEMENT]]
So this patch adds a new function, detect_imode(), to detect imode for
inodes in subvolume trees. The policy here is, try our best to find a
valid imode to recovery. If no convicing info can be found, fail out.
That function will determine imode by:
1) Search for INODE_REF of the inode
If we have INODE_REF, we will then try to find DIR_ITEM/DIR_INDEX.
As long as one valid DIR_ITEM or DIR_INDEX can be found, we convert
the BTRFS_FT_* to imode, then call it a day.
This should be the most accurate way.
2) Search for DIR_INDEX/DIR_ITEM belongs to this inode
If above search fails, we falls back to locate the DIR_INDEX/DIR_ITEM
just after the INODE_ITEM.
Thus this only works for non-empty directory.
If any can be found, it's definitely a directory.
3) Search for EXTENT_DATA belongs to this inode
If EXTENT_DATA can be found, it's either REG or LNK.
Thus this only works for non-empty file or soft link.
For this case, we default to REG, as user can inspect the file to
determine if it's a file or just a path.
4) Use rdev to detect BLK/CHR
If all above fails, but INODE_ITEM has non-zero rdev, then it's either
a BLK or CHR file. Then we default to BLK.
5) Fail out if none of above methods succeeded
No educated guess to make things worse.
[[SHORTCOMING]]
The above search is not perfect, there are cases where we can't repair:
E.g. orphan empty regular inode. Since it's already orphan, it has no
INODE_REF. And it's regular empty file, it has no DIR_INDEX nor
EXTENT_DATA nor rdev. Thus we can't recover. Although for this case, it
really doesn't matter as it's already orphan and will be deleted anyway.
Furthermore, due to the DIR_ITEM/DIR_INDEX/INODE_REF repair code which
can happen before imode repair, it's possible that DIR_ITEM search code
may not be executed. If there is only DIR_ITEM remaining, repair code
will remove the DIR_ITEM completely and move the inode to lost+found,
leaving us no info to rebuild imode. If there is DIR_INDEX missing,
repair code will re-insert the DIR_INDEX, then imode repair code will go
DIR_INDEX directly.
But overall, the repair code should handle the invalid imode caused by
older kernels without problem.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a function, find_file_type(), to find filetype using info from
INODE_REF, including dir_id from key index/name from inode_ref_item.
This function will:
- Search DIR_INDEX first
DIR_INDEX is easier since there is only one item in it.
- Validate the DIR_INDEX item
If the DIR_INDEX is valid, use the filetype and call it a day.
- Search DIR_ITEM then
It needs extra iteration since it's possible to have hash collision.
- Validate the DIR_ITEM
If valid, call it a day. Or return -ENOENT;
This would be used as the primary method to determine the imode in later
imode repair code.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function will be later used by common mode code, so export it.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Before this patch, we were using a very inefficient way to search
chunks:
We iterate through all clusters to find the chunk root tree block first,
then re-iterate all clusters again to find every child tree block.
Each time we need to iterate all clusters just to find a chunk tree
block. This is obviously inefficient, especially when chunk tree gets
larger. So the original author leaves a comment on it:
/* If you have to ask you aren't worthy */
static int search_for_chunk_blocks()
This patch will change the behavior so that we will only iterate all
clusters once.
The idea behind the optimization is, since we have the superblock
restored first, we could use the CHUNK_ITEMs in
super_block::sys_chunk_array to build a SYSTEM chunk mapping.
Then, when we start to iterate through all items, we can easily skip
unrelated items at different level:
- At cluster level
If a cluster starts beyond last system chunk map, it must not contain
any chunk tree blocks (as chunk tree blocks only lives inside system
chunks)
- At item level
If one item has no intersection with any system chunk map, then it
must not contain any tree blocks.
By this, we can iterate through all clusters just once, and find out all
CHUNK_ITEMs.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new helper function, is_in_sys_chunks(), to determine if an
item is in the range of system chunks.
Since btrfs-image will merge adjacent same type extents into one item,
this function is designed to return true for any bytes in system chunk
range.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently we are doing a pretty slow search for system chunks before
restoring real data.
The current behavior is to search all clusters for chunk tree root
first, then search all clusters again and again for every chunk tree
block.
This causes recursive calls and pretty slow start up, the only good news
is since chunk tree are normally small, we don't need to iterate too
many times, thus overall it's acceptable.
To address such bad behavior, we could take usage of system chunk array
in the super block.
By recording all system chunks ranges, we could easily determine if an
extent belongs to chunk tree, thus do one loop simple linear search for
chunk tree leaves.
This patch only introduces the code base for later patches.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is no need to allocate 2 * max_pending_size (which can be 256M) if
we're just extracting super block.
We only need to prepare BTRFS_SUPER_INFO_SIZE as buffer size.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We can easily get confusing error message like:
ERROR: restore failed: Success
This is caused by wrong "%m" usage, as we normally use ret to indicate
error, without populating errno.
This patch will fix it by output the return value directly as normally
we have extra error message to show more meaning message than the return
value.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The removed paragraph in btrfs-man5.asciidoc says the same as the
previous one.
Signed-off-by: Merlin Büge <merlin.buege@tuhh.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Add definition, crypto wrappers and support to mkfs for blake2 for
checksumming. There are 2 aliases either blake2 or blake2b.
Signed-off-by: David Sterba <dsterba@suse.com>
Upstream commit 997fa5ba1e14b52c554fb03ce39e579e6f27b90c,
git repository: git://github.com/BLAKE2/BLAKE2
The reference implemetation added in this patch is unchanged and will be
modified only to compile in current code base and with minimal other
modifications in case of future sync with upstream code. IOW, the coding
style should stay as-is and does not conform to the other btrfs-progs
code. This is an exception for xxhash and sha256 code as well.
Signed-off-by: David Sterba <dsterba@suse.com>
Add the definition to the checksum types and let mkfs accept it.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
A simple tool to microbenchmark performance of the hashes. Uses rdtsc
for timing, so works only on x86_64.
$ make hash-speedtest
$ ./hash-speedtest [iterations]
Block size: 4096
Iterations: 100000
NULL-NOP: cycles: 56061823, c/i 560
NULL-MEMCPY: cycles: 61296469, c/i 612
CRC32C: cycles: 179961796, c/i 1799
XXHASH: cycles: 138434590, c/i 1384
Signed-off-by: David Sterba <dsterba@suse.com>
The table won't change at runtime and the string name can be in a buffer
avoiding the pointer indirection. Make one entry aligned to 16 bytes,
plenty of space to store reasonably long csum names.
Signed-off-by: David Sterba <dsterba@suse.com>
The SHA256 is going to be used in the future, so this makes it a second
user and we also have the appropriate directory now.
Signed-off-by: David Sterba <dsterba@suse.com>
With the introduction of xxhash64 to btrfs-progs we created a crypto/
directory for all the hashes used in btrfs (although no
cryptographically secure hash is there yet).
Move the crc32c implementation from kernel-lib/ to crypto/ as well so we
have all hashes consolidated.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Copy of xxhash.[ch] from git://github.com/Cyan4973/xxHash, version
v0.7.1. The include xxh3.h has been commented out as we don't have it
here, otherwise the copy is unchnaged.
Signed-off-by: David Sterba <dsterba@suse.com>
The libbtrfs-test simulated build happens outside of the source
repository, but sometimes the system library is used instead of the repo
one. When -rpath does not work, force the correct library by LD_PRELOAD.
Signed-off-by: David Sterba <dsterba@suse.com>
Several people reported build breakage of snapper, due to missing
symbols in libbtrfs.so. Move the objects to the library objects, now we
don't have to worry about the new exports as the libbtrfs.sym is
unchanged. And there are no new .h files being exported though there are
the .o files in the library.
Issue: #214
Link: https://github.com/openSUSE/snapper/issues/500
Signed-off-by: David Sterba <dsterba@suse.com>
The shared library exports many functions that are not supposed to be
public, like rb-tree, crc32c or internal helpers but as this has been
potentially in use we should at least make a list. There's only a
subset being used by the snapper project.
Export majority of current symbols visible in libbtrfs so any future
additions to libbtrfs objects are automatically hidden and don't pollute
the namespace further.
Note that all projects should switch to libbtrfsutil rather than
libbtrfs that exists for historical reasons and will be deprecated in
the future.
Signed-off-by: David Sterba <dsterba@suse.com>
Make use of GitLab-CI nested virutal environment to start QEMU instance
inside containers and perform btrfs-progs build, execute unit test cases
and save the logs.
This allows to run the progs testsuite on a recent kernel, newer than
what CI instances usually provide. As this is is emulated, the runtime
is longer.
Issue: #171
Signed-off-by: Lakshmipathi.G <lakshmipathi.ganapathi@collabora.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Fix compiler warnings and errors in btrfs-sb-mod due to incorrect
conversion with the checksum updates.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
If we generate testsuite tarball by 'make testsuite', there is no
clean-test.sh in it. But some tests need cleanup if a testcase failed.
Signed-off-by: An Long <lan@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Discovered with cppcheck. Fix signed/unsigned int mismatches, sizeof and
long formats.
Pull-request: #197
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
A user reports that some symbols are missing from libbtrfs, eg.
radix_tree_init. This is correct and there are few more. The headers
exported through the library all need the respective object files.
The sources are GPL so is libbtrfs, which is known
Issue: #205
Signed-off-by: David Sterba <dsterba@suse.com>
As of 5.1, btrfs now supports compression levels for zstd. Let users
know about this in the man page.
Pull-request: #204
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: David Sterba <dsterba@suse.com>
uClibc does not provide backtrace() nor <execinfo.h>. When building
btrfs-progs, passing --disable-backtrace is enough to make it build with
uClibc. But once btrfs-progs is installed and another program/library
includes kerncompat.h, it fails to build with uClibc, because
BTRFS_DISABLE_BACKTRACE is not defined.
The most correct fix for this would be to have kerncompat.h generated
from kerncompat.h.in during the btrfs-progs build process, and tuned
depending on autoconf/automake variables. But as a quick fix that
follows the current strategy, we simply tweak the existing __GLIBC__
conditional. Indeed, uClibc pretends to be glibc and defines __GLIBC__,
but it does not replace it completely, hence the need to define
BTRFS_DISABLE_BACKTRACE when __GLIBC__ is not defined *or* when
__UCLIBC__ is defined.
Pull-request: #206
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
[Retrieved from: https://git.buildroot.net/buildroot/tree/package/btrfs-progs/0002-kerncompat.h-define-BTRFS_DISABLE_BACKTRACE-when-bui.patch]
Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As mkfs will grow new checksums, print the used checksum in it's
versbose output.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Adding this table will make extending btrfs-progs with new checksum types
easier.
Also add accessor functions to access the table fields.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a helper to check if we have a valid csum type from the super block.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Add an option to mkfs to specify which checksum algorithm will be used
for the filesystem. Currently only crc32c is supported.
The option name is -c, presumably one of the comonly used options so it
gets the lowercase option.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Update the checksumming API to be able to cope with more checksum types
than just CRC32C. The finalization call is merged into btrfs_csum_data.
There are some fixme's and asserts added that need to be resolved.
Co-developed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
update_block_csum() in btrfs-sb-mod.c is always called with the 'is_sb'
argument set to 1.
Get rid of the special case for is_sb == 0.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>