[BUG]
Sometimes test case btrfs/012 fails randomly, with the failure to read a
symlink:
QA output created by 012
Checking converted btrfs against the original one:
-OK
+readlink: Structure needs cleaning
Checking saved ext2 image against the original one:
OK
Furthermore, this will trigger a kernel error message:
BTRFS critical (device dm-2): regular/prealloc extent found for non-regular inode 133081
[CAUSE]
For that specific inode 133081, the tree dump looks like this:
item 127 key (133081 INODE_ITEM 0) itemoff 40984 itemsize 160
generation 1 transid 1 size 4095 nbytes 4096
block group 0 mode 120777 links 1 uid 0 gid 0 rdev 0
sequence 0 flags 0x0(none)
item 128 key (133081 INODE_REF 133080) itemoff 40972 itemsize 12
index 2 namelen 2 name: l3
item 129 key (133081 EXTENT_DATA 0) itemoff 40919 itemsize 53
generation 4 type 1 (regular)
extent data disk byte 2147483648 nr 38080512
extent data offset 37974016 nr 4096 ram 38080512
extent compression 0 (none)
Note that, the symlink inode size is 4095 at the max size (PATH_MAX,
removing the terminating NUL).
But the nbytes is 4096, exactly matching the sector size of the btrfs.
Thus it results the creation of a regular extent, but for btrfs we do
not accept a symlink with a regular/preallocated extent, thus kernel
rejects such read and failed the readlink call.
The root cause is in the convert code, where for symlinks we always
create a data extent with its size + 1, causing the above problem.
I guess the original code is to handle the terminating NUL, but in btrfs
we never need to store the terminating NUL for inline extents nor
file names.
Thus this pitfall in btrfs-convert leads to the above invalid data
extent and fail the test case.
[FIX]
- Fix the ext2 and reiserfs symbolic link creation code
To remove the terminating NUL.
- Add extra checks for the size of a symbolic link
Btrfs has extra limits on the size of a symbolic link, as btrfs must
store symbolic link targets as inlined extents.
This means for 4K node sized btrfs, the size limit is smaller than the
usual PATH_MAX - 1 (only around 4000 bytes instead of 4095).
So for certain nodesize, some filesystems can not be converted to
btrfs.
(this should be rare, because the default nodesize is 16K already)
- Split the symbolic link and inline data extent size checks
For symbolic links the real limit is PATH_MAX - 1 (removing the
terminating NUL), but for inline data extents the limit is
sectorsize - 1, which can be different from 4096 - 1 (e.g. 64K sector
size).
Pull-request: #884
Signed-off-by: Qu Wenruo <wqu@suse.com>
Use the objectid, type, offset natural order as it's more readable and
we're used to read keys like that.
Signed-off-by: David Sterba <dsterba@suse.com>
This flag is used by the kernel btrfs_search_slot to make sure that leaf
splitting decision doesn't subtract the size of an item. This is for
inline extent items and csum items where we know we're going to find the
item we want, and we're only going to want to extend it. Currently this
flag doesn't do anything, but when we sync ctree.c we'll stop making the
right decision WRT the leaf space, so add the flag usage in the places
we need it so we can sync ctree.c easily.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Similar to btrfs_truncate_item(), this is void in the kernel as the
failure case is simply BUG_ON(). Additionally there is no root
parameter as it's not used in the function at all. Make these changes
and update the callers.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is void in the kernel, and this makes sense in btrfs-progs as it
stands currently as it doesn't actually return an error if there's a
problem, it simply BUG()'s. Update this to be a void and update the
callers to make it easier to sync ctree.c.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the kernel we just pass the btrfs_fs_info, and we const'ify the
new_key. Update the btrfs-progs definition to make syncing ctree.c
easier.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch would modify btrfs_csum_file_block() to handle csum type
other than the one used in the current fs.
The new data checksum would use a different objectid (-13) to
distinguish with the existing one (-10).
This needs to change tree-checker to skip the item size checks,
since new csum can be larger than the original csum.
After this stage, the resulted csum tree would look like this:
item 0 key (CSUM_CHANGE EXTENT_CSUM 13631488) itemoff 8091 itemsize 8192
range start 13631488 end 22020096 length 8388608
item 1 key (EXTENT_CSUM EXTENT_CSUM 13631488) itemoff 7067 itemsize 1024
range start 13631488 end 14680064 length 1048576
Note the itemsize is 8 times the original one, as the original csum is
CRC32, while target csum is SHA256, which is 8 times the size.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The existing attempt for changing csum types is as the following:
- Create a new temporary csum root
- Generate new data csums into the temporary csum root
- Drop the old csum tree and make the temporary one as csum root
- Change the checksums for metadata in-place
Unfortunately after some experiments, the csum root switch method has a
big pitfall, the backref items in extent tree.
Those backref items still point back to the old tree, meaning without a
lot of extra tricks, the extent tree would be corrupted.
Thus we have to go a new single tree variant:
- Generate new data csums into the csum root
The new data csums would have a different objectid to distinguish
them.
- Drop the old data csum items
- Change the key objectids of the new csums
- Change the checksums for metadata in-place
This means unfortunately we have to revert most of the old code, and
update the temporary item format.
The new temporary item would only record the target csum type.
At every stage we have a method to determine the progress, thus no need
for an item, but in the future it's still open for change.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch syncs file-item.h into btrfs-progs. This carries with it an
API change for btrfs_del_csums, which takes a root argument in the
kernel, so all callsites have been updated accordingly.
I didn't sync file-item.c because it carries with it a bunch of bio
related helpers which are difficult to adapt to the kernel.
Additionally there's a few helpers in the local copy of file-item.c that
aren't in the kernel that are required for different tools.
This requires more cleanups in both the kernel and progs in order to
sync file-item.c, so for now just do file-item.h in order to pull things
out of ctree.h.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The radix-tree is not used in userspace code. In kernel it's for
tracking unpersisted and in-memory structures and has been replaced by
the xarray.
Signed-off-by: David Sterba <dsterba@suse.com>
Now that all callers are using the _nr variations we can simply rename
these helpers to btrfs_item_##member/btrfs_set_item_##member and change
the actual item SETGET funcs to raw_item_##member/set_raw_item_##member
and then change all callers to drop the _nr part.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is still work in progress but can survive some stress testing.
There are still some sanity checks missing, do not user this on valuable
data. To enables this, configure must be run with the experimental
features enabled.
$ mkfs.btrfs --csum crc32c /dev/sdx
$ <mount, fill with data, unmount>
$ btrfstune --csum sha256
Will change the checksum to sha256.
Implementation:
- set bit on superblock when the checksums are being changed (similar to
the uuid rewrite)
- metadata checksums are overwritten in place
- data checksums:
- the checksum tree is completely deleted and no checksums are
verified
- data blocks are enumerated and all checksums generated (same as
check --init-csum-tree)
To make it usable, it should be restartable and track the current
progress somehow. Also the previous data checksums should be verified
any time they're available.
Signed-off-by: David Sterba <dsterba@suse.com>
With extent tree v2 we will have per-block group checksums, so add a
helper to access the csum root and rename the fs_info csum_root to
_csum_root to catch all the places that are accessing it directly.
Convert everybody to use the helper except for internal things.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We are going to need to start looking up the csum root based on the
bytenr with extent tree v2. To that end stop passing the root to the
csum related functions so that can be done in the helper functions
themselves.
There's an unrelated deletion of a function prototype that no longer
exists.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Just like kernel commit 22b6331d9617 ("btrfs: store precalculated
csum_size in fs_info"), we can cache csum_size and csum_type in
btrfs_fs_info.
Furthermore, there is already a 32 bits hole in btrfs_fs_info, and we
can fit csum_type and csum_size into the hole without increase the size
of btrfs_fs_info.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function lies in the kernel-shared directory and is supposed to be
close to 1:1 copy with its kernel counterpart, yet it takes one extra
argument - root. But this is now unused to simply remove it.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The check condition (csum_result == 0) does not make sense anymore as
it's not the buffer and not the crc32c result as it used to be. The
message does not bring any value and looks like it's some debugging aid
from the old times (added in 2008 as bb7055ec21 ("Add some extra
debugging around file data checksum failures")).
Signed-off-by: David Sterba <dsterba@suse.com>
For passing authentication keys to the checksumming functions we need a
container for the key.
Pass in a btrfs_fs_info to btrfs_csum_data() so we can use the fs_info
as a container for the authentication key.
Note this is not always possible for all callers of btrfs_csum_data() so
we're just passing in NULL for now
Functions calling btrfs_csum_data() with a NULL fs_info argument are
currently not supported in the context of an authenticated file system.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>