This tests ensures that the kernel correctly persists backup roots in
case the filesystem has been mounted from a backup root.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
[ cleanup to use common helpers ]
Signed-off-by: David Sterba <dsterba@suse.com>
As progs' transaction/CoW logic evolved over the years the metadata block
corruption code failed to do so. It's currently impossible to corrupt
the generation because the CoW logic will not only set it to the value
of the currently running transaction (__btrfs_cow_block) but the
current code will ASSERT due to the following check in __btrfs_cow_block:
WARN_ON(!(buf->flags & EXTENT_BAD_TRANSID) &&
btrfs_header_generation(buf) > trans->transid);
Fix this by making the generation corruption code directly write
the modified block, outside of the transaction mechanism. At the same
time move the old code into BTRFS_METADATA_BLOCK_SHIFT_ITEMS handling
case, essentially leaving it unchanged.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We access btrfs_block_group_cache::item mostly for @used and @flags.
@flags is already a dedicated member in btrfs_block_group_cache, only
@used doesn't have a dedicated member.
This patch will remove btrfs_block_group_cache::item and add
btrfs_block_group_cache::used.
It's the btrfs-progs equivalent of the following kernel patches:
btrfs: move block_group_item::used to block group
btrfs: move block_group_item::flags to block group
btrfs: remove embedded block_group_cache::item
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs balance status supports both short and long option -v|--verbose
but usage failed to show it in its --help. This patch fixes the --help.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs balance start supports both short and long option -v|--verbose
however usage failed to show the long option. This patch fixes the --help.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Even when -q option specified, the receive sub-command is not quiet as
shown below.
$ btrfs receive -q -f /tmp/t /btrfs1
At snapshot ss3
It must be quiet at least when it's been asked to be quiet.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This test uses tool dmsetup so add the global prereq.
Issue: #192
Signed-off-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Seems that 18.04 has arrived to travis, switch to it. The gcc is 7.4 and
kernel is unfortuantelly still 4.15.
Signed-off-by: David Sterba <dsterba@suse.com>
Avoid introducing new cases of implicit fallthrough by having this flag
always set, though a conditional check is needed to avoid build breakage
on older compilers or on CI.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When compiling with clang, this warning is shown:
common/utils.c:404:3: warning: declaration does not declare anything [-Wmissing-declarations]
__attribute__ ((fallthrough));
This attribute seems to silence the same warning in GCC. Changing this
attribute with /* fallthrough */ fixes the warning for both gcc and
clang.
Full support for the attribute will be in clang 10, gcc supports that
now. Let's use what works for both and switch to the attribute in the
future.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch does the following refactor:
- Refactor parameter from @root to @fs_info
- Refactor the large loop body into another function
Now we have a helper function, read_one_block_group(), to handle
block group cache and space info related routine.
- Refactor the return value
Even we have the code handling ret > 0 from find_first_block_group(),
it never works, as when there is no more block group,
find_first_block_group() just return -ENOENT other than 1.
This is super confusing, it's almost a mircle it even works.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The following functions are just using @root to reach fs_info:
- exclude_super_stripes
- free_excluded_extents
- add_excluded_extent
Refactor them to use fs_info directly.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
The image contains one inode item with invalid generation. The image
can be crafted by "btrfs-corrupt-block -i 257 -f generation". It should
emulate the bad inode generation caused by older kernel around 2014.
The image is repairable for both original and lowmem mode.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are at least two bug reports of kernel tree-checker complaining
about invalid inode generation.
All offending inodes seem to be caused by old kernel around 2014, with
inode generation overflow.
So add such check and repair ability to lowmem mode check first.
This involves:
- Calculate the inode generation upper limit
Unlike the lowmem mode context, we don't have anyway to determine if
this inode belongs to log tree.
So we use super_generation + 1 as upper limit, just like what we did
in kernel tree checker.
- Check if the inode generation is larger than the upper limit
- Repair by resetting inode generation to current transaction
generation
The difference is, in original mode, we have a common trans handle for
all repair and reset path for each repair.
Reported-by: Charles Wright <charles.v.wright@gmail.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are at least two bug reports of kernel tree-checker complaining
about invalid inode generation.
All offending inodes seem to be caused by old kernel around 2014, with
inode generation overflow.
So add such check and repair ability to lowmem mode check first.
This involves:
- Calculate the inode generation upper limit
If it's an inode from log tree, then the upper limit is
super_generation + 1, otherwise it's super_generation.
- Check if the inode generation is larger than the upper limit
- Repair by resetting inode generation to current transaction
generation
Reported-by: Charles Wright <charles.v.wright@gmail.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add new test image for imode repair in subvolume trees.
The new test cases including the following cases:
- Regular file with bad imode
It still has the valid INODE_REF and parent dir has correct DIR_INDEX
and DIR_ITEM.
In this case, no matter if the file is empty or not, it should be
repaired using the info from DIR_INDEX of parent dir.
- Non-empty regular file with bad imode, and without INODE_REF
The file should be mostly an orphan, so no INODE_REF for imode lookup.
But it has EXTENT_DATA which should be enough for imode repair.
The repair also involves moving the orphan to lost+found dir.
- Non-empty dir with bad imode, and without INODE_REF
Pretty much the same case, but now a directory.
The repair also involves moving the orphan to lost+found dir.
Also rename the existing test case 039-bad-free-space-cache-inode-mode
to 039-bad-inode-mode, since now we can fix all bad imode.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
To make original mode to repair imode error in subvolume trees, this
patch will do:
- Remove the show-stopper checks for root->objectid.
Now repair_imode_original() will accept inodes in subvolume trees.
- Export detect_imode() for original mode
Due to the call requirement, original mode must use an existing trans
handler to do the repair, thus we need to re-implement most of the
work done in repair_imode_common().
- Make repair_imode_original() to use detect_imode().
- Free the path after reset_imode()
reset_imode() keeps the path, as lowmem mode uses path to locate its
current check position.
But for original mode, the unreleased path can cause later repair to
report warning, so we need to manually release the path.
- Update rec->imode after imode reset
So later repair depending on rec->imode can get correct value.
- Move the repair before repair_inode_nlinks()
repair_inode_nlinks() needs correct imode to add DIR_INDEX/DIR_ITEM.
So moving the repair before repair_inode_nlinks() makes the latter
repair happier.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For lowmem mode, if we hit a bad inode mode, normally it is reported
when we checking the DIR_INDEX/DIR_ITEM of the parent inode.
If we didn't repair at that time, the error will be recorded even if we
fixed it later.
So this patch will check for INODE_ITEM_MISMATCH error type, and if it's
really caused by invalid imode, repair it and clear the error.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[[PROBLEM]]
Before this patch, repair_imode_common() can only handle two types of
inodes:
- Free space cache inodes
- ROOT DIR inodes
For inodes in subvolume trees, the core complexity is how to determine
the correct imode, thus it was not implemented.
However there are more reports of incorrect imode in subvolume trees, we
need to support such fix.
[[ENHANCEMENT]]
So this patch adds a new function, detect_imode(), to detect imode for
inodes in subvolume trees. The policy here is, try our best to find a
valid imode to recovery. If no convicing info can be found, fail out.
That function will determine imode by:
1) Search for INODE_REF of the inode
If we have INODE_REF, we will then try to find DIR_ITEM/DIR_INDEX.
As long as one valid DIR_ITEM or DIR_INDEX can be found, we convert
the BTRFS_FT_* to imode, then call it a day.
This should be the most accurate way.
2) Search for DIR_INDEX/DIR_ITEM belongs to this inode
If above search fails, we falls back to locate the DIR_INDEX/DIR_ITEM
just after the INODE_ITEM.
Thus this only works for non-empty directory.
If any can be found, it's definitely a directory.
3) Search for EXTENT_DATA belongs to this inode
If EXTENT_DATA can be found, it's either REG or LNK.
Thus this only works for non-empty file or soft link.
For this case, we default to REG, as user can inspect the file to
determine if it's a file or just a path.
4) Use rdev to detect BLK/CHR
If all above fails, but INODE_ITEM has non-zero rdev, then it's either
a BLK or CHR file. Then we default to BLK.
5) Fail out if none of above methods succeeded
No educated guess to make things worse.
[[SHORTCOMING]]
The above search is not perfect, there are cases where we can't repair:
E.g. orphan empty regular inode. Since it's already orphan, it has no
INODE_REF. And it's regular empty file, it has no DIR_INDEX nor
EXTENT_DATA nor rdev. Thus we can't recover. Although for this case, it
really doesn't matter as it's already orphan and will be deleted anyway.
Furthermore, due to the DIR_ITEM/DIR_INDEX/INODE_REF repair code which
can happen before imode repair, it's possible that DIR_ITEM search code
may not be executed. If there is only DIR_ITEM remaining, repair code
will remove the DIR_ITEM completely and move the inode to lost+found,
leaving us no info to rebuild imode. If there is DIR_INDEX missing,
repair code will re-insert the DIR_INDEX, then imode repair code will go
DIR_INDEX directly.
But overall, the repair code should handle the invalid imode caused by
older kernels without problem.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a function, find_file_type(), to find filetype using info from
INODE_REF, including dir_id from key index/name from inode_ref_item.
This function will:
- Search DIR_INDEX first
DIR_INDEX is easier since there is only one item in it.
- Validate the DIR_INDEX item
If the DIR_INDEX is valid, use the filetype and call it a day.
- Search DIR_ITEM then
It needs extra iteration since it's possible to have hash collision.
- Validate the DIR_ITEM
If valid, call it a day. Or return -ENOENT;
This would be used as the primary method to determine the imode in later
imode repair code.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function will be later used by common mode code, so export it.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Before this patch, we were using a very inefficient way to search
chunks:
We iterate through all clusters to find the chunk root tree block first,
then re-iterate all clusters again to find every child tree block.
Each time we need to iterate all clusters just to find a chunk tree
block. This is obviously inefficient, especially when chunk tree gets
larger. So the original author leaves a comment on it:
/* If you have to ask you aren't worthy */
static int search_for_chunk_blocks()
This patch will change the behavior so that we will only iterate all
clusters once.
The idea behind the optimization is, since we have the superblock
restored first, we could use the CHUNK_ITEMs in
super_block::sys_chunk_array to build a SYSTEM chunk mapping.
Then, when we start to iterate through all items, we can easily skip
unrelated items at different level:
- At cluster level
If a cluster starts beyond last system chunk map, it must not contain
any chunk tree blocks (as chunk tree blocks only lives inside system
chunks)
- At item level
If one item has no intersection with any system chunk map, then it
must not contain any tree blocks.
By this, we can iterate through all clusters just once, and find out all
CHUNK_ITEMs.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new helper function, is_in_sys_chunks(), to determine if an
item is in the range of system chunks.
Since btrfs-image will merge adjacent same type extents into one item,
this function is designed to return true for any bytes in system chunk
range.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently we are doing a pretty slow search for system chunks before
restoring real data.
The current behavior is to search all clusters for chunk tree root
first, then search all clusters again and again for every chunk tree
block.
This causes recursive calls and pretty slow start up, the only good news
is since chunk tree are normally small, we don't need to iterate too
many times, thus overall it's acceptable.
To address such bad behavior, we could take usage of system chunk array
in the super block.
By recording all system chunks ranges, we could easily determine if an
extent belongs to chunk tree, thus do one loop simple linear search for
chunk tree leaves.
This patch only introduces the code base for later patches.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is no need to allocate 2 * max_pending_size (which can be 256M) if
we're just extracting super block.
We only need to prepare BTRFS_SUPER_INFO_SIZE as buffer size.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We can easily get confusing error message like:
ERROR: restore failed: Success
This is caused by wrong "%m" usage, as we normally use ret to indicate
error, without populating errno.
This patch will fix it by output the return value directly as normally
we have extra error message to show more meaning message than the return
value.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The removed paragraph in btrfs-man5.asciidoc says the same as the
previous one.
Signed-off-by: Merlin Büge <merlin.buege@tuhh.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Add definition, crypto wrappers and support to mkfs for blake2 for
checksumming. There are 2 aliases either blake2 or blake2b.
Signed-off-by: David Sterba <dsterba@suse.com>
Upstream commit 997fa5ba1e14b52c554fb03ce39e579e6f27b90c,
git repository: git://github.com/BLAKE2/BLAKE2
The reference implemetation added in this patch is unchanged and will be
modified only to compile in current code base and with minimal other
modifications in case of future sync with upstream code. IOW, the coding
style should stay as-is and does not conform to the other btrfs-progs
code. This is an exception for xxhash and sha256 code as well.
Signed-off-by: David Sterba <dsterba@suse.com>
Add the definition to the checksum types and let mkfs accept it.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
A simple tool to microbenchmark performance of the hashes. Uses rdtsc
for timing, so works only on x86_64.
$ make hash-speedtest
$ ./hash-speedtest [iterations]
Block size: 4096
Iterations: 100000
NULL-NOP: cycles: 56061823, c/i 560
NULL-MEMCPY: cycles: 61296469, c/i 612
CRC32C: cycles: 179961796, c/i 1799
XXHASH: cycles: 138434590, c/i 1384
Signed-off-by: David Sterba <dsterba@suse.com>
The table won't change at runtime and the string name can be in a buffer
avoiding the pointer indirection. Make one entry aligned to 16 bytes,
plenty of space to store reasonably long csum names.
Signed-off-by: David Sterba <dsterba@suse.com>
The SHA256 is going to be used in the future, so this makes it a second
user and we also have the appropriate directory now.
Signed-off-by: David Sterba <dsterba@suse.com>
With the introduction of xxhash64 to btrfs-progs we created a crypto/
directory for all the hashes used in btrfs (although no
cryptographically secure hash is there yet).
Move the crc32c implementation from kernel-lib/ to crypto/ as well so we
have all hashes consolidated.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Copy of xxhash.[ch] from git://github.com/Cyan4973/xxHash, version
v0.7.1. The include xxh3.h has been commented out as we don't have it
here, otherwise the copy is unchnaged.
Signed-off-by: David Sterba <dsterba@suse.com>
The libbtrfs-test simulated build happens outside of the source
repository, but sometimes the system library is used instead of the repo
one. When -rpath does not work, force the correct library by LD_PRELOAD.
Signed-off-by: David Sterba <dsterba@suse.com>
Several people reported build breakage of snapper, due to missing
symbols in libbtrfs.so. Move the objects to the library objects, now we
don't have to worry about the new exports as the libbtrfs.sym is
unchanged. And there are no new .h files being exported though there are
the .o files in the library.
Issue: #214
Link: https://github.com/openSUSE/snapper/issues/500
Signed-off-by: David Sterba <dsterba@suse.com>
The shared library exports many functions that are not supposed to be
public, like rb-tree, crc32c or internal helpers but as this has been
potentially in use we should at least make a list. There's only a
subset being used by the snapper project.
Export majority of current symbols visible in libbtrfs so any future
additions to libbtrfs objects are automatically hidden and don't pollute
the namespace further.
Note that all projects should switch to libbtrfsutil rather than
libbtrfs that exists for historical reasons and will be deprecated in
the future.
Signed-off-by: David Sterba <dsterba@suse.com>