There is an internal report that, during btrfs-convert to block-group
tree, by accident some systemd events triggered the mount of the target
fs.
This leads to double mount (one by kernel and one by the btrfs-progs),
which seems to cause quite some problems.
To avoid such accident, exclusively opens all devices if btrfs-progs is
doing write operations.
Pull-request: #888
Reported-by: pandada8 <pandada8@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
[BUG]
Sometimes test case btrfs/012 fails randomly, with the failure to read a
symlink:
QA output created by 012
Checking converted btrfs against the original one:
-OK
+readlink: Structure needs cleaning
Checking saved ext2 image against the original one:
OK
Furthermore, this will trigger a kernel error message:
BTRFS critical (device dm-2): regular/prealloc extent found for non-regular inode 133081
[CAUSE]
For that specific inode 133081, the tree dump looks like this:
item 127 key (133081 INODE_ITEM 0) itemoff 40984 itemsize 160
generation 1 transid 1 size 4095 nbytes 4096
block group 0 mode 120777 links 1 uid 0 gid 0 rdev 0
sequence 0 flags 0x0(none)
item 128 key (133081 INODE_REF 133080) itemoff 40972 itemsize 12
index 2 namelen 2 name: l3
item 129 key (133081 EXTENT_DATA 0) itemoff 40919 itemsize 53
generation 4 type 1 (regular)
extent data disk byte 2147483648 nr 38080512
extent data offset 37974016 nr 4096 ram 38080512
extent compression 0 (none)
Note that, the symlink inode size is 4095 at the max size (PATH_MAX,
removing the terminating NUL).
But the nbytes is 4096, exactly matching the sector size of the btrfs.
Thus it results the creation of a regular extent, but for btrfs we do
not accept a symlink with a regular/preallocated extent, thus kernel
rejects such read and failed the readlink call.
The root cause is in the convert code, where for symlinks we always
create a data extent with its size + 1, causing the above problem.
I guess the original code is to handle the terminating NUL, but in btrfs
we never need to store the terminating NUL for inline extents nor
file names.
Thus this pitfall in btrfs-convert leads to the above invalid data
extent and fail the test case.
[FIX]
- Fix the ext2 and reiserfs symbolic link creation code
To remove the terminating NUL.
- Add extra checks for the size of a symbolic link
Btrfs has extra limits on the size of a symbolic link, as btrfs must
store symbolic link targets as inlined extents.
This means for 4K node sized btrfs, the size limit is smaller than the
usual PATH_MAX - 1 (only around 4000 bytes instead of 4095).
So for certain nodesize, some filesystems can not be converted to
btrfs.
(this should be rare, because the default nodesize is 16K already)
- Split the symbolic link and inline data extent size checks
For symbolic links the real limit is PATH_MAX - 1 (removing the
terminating NUL), but for inline data extents the limit is
sectorsize - 1, which can be different from 4096 - 1 (e.g. 64K sector
size).
Pull-request: #884
Signed-off-by: Qu Wenruo <wqu@suse.com>
Currently mkfs uses its own create_uuid_tree(), but that function is
only handling FS_TREE. This means for btrfs-convert we do not generate
the uuid tree, nor add the UUID of the image subvolume. This can be a
problem if we're going to support multiple subvolumes during mkfs time.
To address this, introduce a new helper, btrfs_rebuild_uuid_tree():
- Create a new uuid tree if there is not one
- Remove all the existing items from uuid tree
- Iterate through all subvolumes
* If the subvolume has no valid UUID, regenerate one
* Add the uuid entry for the subvolume UUID
* If the subvolume has received UUID, also add it to UUID tree
By this, this new helper can handle all the uuid tree generation needs for:
- Current mkfs
Only one uuid entry for FS_TREE
- Current btrfs-convert
Only FS_TREE and the image subvolume
- Future multi-subvolume mkfs
As we do the scan for all subvolumes.
- Future "btrfs rescue rebuild-uuid-tree"
Signed-off-by: Qu Wenruo <wqu@suse.com>
[BUG]
When rolling back a converted btrfs, the filename output is corrupted:
$ btrfs-convert -r ~/test.img
btrfs-convert from btrfs-progs v6.9.2
Open filesystem for rollback:
Label:
UUID: df54baf3-c91e-4956-96f9-99413a857576
Restoring from: ext2_saved0ƨy/image
^^^ Corruption
Rollback succeeded
[CAUSE]
The error is in how we handle the filename. In btrfs all our strings
are not '\0' terminated, but with explicit length.
But in C, most strings are '\0' terminated, so after reading a filename
from btrfs, we need to manually terminate the string.
However the code adding the terminating '\0' looks like this:
/* Get the filename length. */
name_len = btrfs_root_ref_name_len(path.nodes[0], root_ref_item);
/*
* This should not happen, but as an extra handling for possible
* corrupted btrfs.
*/
if (name_len > sizeof(dir_name))
name_len = sizeof(dir_name) - 1;
/* Got the real filename into our buffer. */
read_extent_buffer(path.nodes[0], dir_name, (unsigned long)(root_ref_item + 1), name_len);
/* Terminate the string. */
dir_name[sizeof(dir_name) - 1] = 0;
The problem is, the final termination is totally wrong, it always make
the last buffer char '\0', not using the @name_len we read before.
[FIX]
Use @name_len to terminate the string, as we have already updated it to
handle buffer overflow, it can handle both the regular and corrupted
case.
Fixes: dc29a5c51d ("btrfs-progs: convert: update default output")
Signed-off-by: Qu Wenruo <wqu@suse.com>
The function btrfs_mksubvol() is very different between btrfs-progs and
kernel, the former version is really just linking a subvolume to another
directory inode, but the kernel version is really to make a completely
new subvolume.
Instead of same-named function, introduce btrfs_link_subvolume() and use
it to replace the old btrfs_mksubvol().
This is done by:
- Introduce btrfs_link_subvolume()
Which does extra checks before doing any modification:
* Make sure the target inode is a directory
* Make sure no filename conflict
Then do the linkage:
* Add the dir_item/dir_index into the parent inode
* Add the forward and backward root refs into tree root
- Introduce link_image_subvolume() helper
Currently btrfs_mksubvol() has a dedicated convert filename retry
behavior, which is unnecessary and should be done by the convert code.
Now move the filename retry behavior into the helper.
- Remove btrfs_mksubvol()
Since there is only one caller utilizing btrfs_mksubvol(), and it's
now gone, we can remove the old btrfs_mksubvol().
Signed-off-by: Qu Wenruo <wqu@suse.com>
There are two different subvolume/data reloc tree creation routines:
- create_subvol() from convert/main.c
* calls btrfs_copy_root() to create an empty root
This is not safe, as it relies on the source root to be empty.
* calls btrfs_read_fs_root() to add it to the cache and trace it
properly
* calls btrfs_make_root_dir() to initialize the empty new root
- create_data_reloc_tree() from mkfs/main.c
* calls btrfs_create_tree() to create an empty root
* Manually add the root to fs_root cache
This is only safe for data reloc tree as it's never updated
inside btrfs-progs.
But not safe for other subvolume trees.
* manually setup the root dir
Both have their good and bad aspects, so here we introduce a new helper,
btrfs_make_subvolume():
- Calls btrfs_create_tree() to create an empty root
- Calls btrfs_read_fs_root() to setup the cache and tracking properly
- Calls btrfs_make_root_dir() to initialize the root dir
- Calls btrfs_update_root() to reflect the rootdir change
So this new helper can replace both create_subvol() and
create_data_reloc_tree().
Signed-off-by: Qu Wenruo <wqu@suse.com>
Commit 4db925911c ("btrfs-progs: use strncpy_null everywhere")
replaced strncpy with strncpy_null, the maximum xattr name length is 255
(current limit), the target buffer is large enough for the whole size so
make sure the last character is also copied.
Signed-off-by: David Sterba <dsterba@suse.com>
Use the safe version of strncpy that makes sure the string is
terminated.
To be noted:
- the conversion in scrub path handling was skipped
- sizes of device paths in some ioctl related structures is
BTRFS_DEVICE_PATH_NAME_MAX + 1
Recently gcc 13.3 started to detect problems with our use of strncpy
potentially lacking the null terminator, warnings like:
cmds/inspect.c: In function ‘cmd_inspect_logical_resolve’:
cmds/inspect.c:294:33: warning: ‘__builtin_strncpy’ specified bound 4096 equals destination size [-Wstringop-truncation]
294 | strncpy(mount_path, mounted, PATH_MAX);
| ^
Signed-off-by: David Sterba <dsterba@suse.com>
Now that there's only __strncpy_null we can drop the underscore and move
it to string-utils as it's a generic string function rather than
something for paths.
Signed-off-by: David Sterba <dsterba@suse.com>
The label is of a fixed size 256 bytes and expects the zero terminator.
Using __strncpy_null is correct as it makes sure there's always the zero
termination but the argument passed in skips the last character.
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a bug report that btrfs-convert cannot handle unwritten extents
(EXT2_EXTENT_FLAGS_UNINIT set, which is pretty much the same as
BTRFS_FILE_EXTENT_PREALLOC), which can cause the converted image to have
incorrect contents.
[CAUSE]
Currently we use ext2fs_block_iterate2() to go through all data extents
of an ext2 inode, but it doesn't provide the info on if the range is
unwritten or not.
Thus for unwritten extents, the results btrfs would just treat it as
regular extents, and read the contents from disk other than setting the
contents to zero.
[FIX]
Instead of the ext2fs_block_iterate2(), here we follow the debugfs'
"dump_extents" command, to use ext2fs_extent_*() helpers to go through
every data extent of the inode, that's if the inode supports the
EXT4_EXTENTS_FL flag.
Now we can properly get the info of which extents are unwritten, and use
holes to replace those unwritten extents.
Reported-by: Yordan <y16267966@gmail.com>
Link: https://lore.kernel.org/all/d34c7d77a7f00c93bea6a4d6e83c7caf.mailbg@mail.bg/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is a new release candidate for e2fsprogs https://github.com/tytso/e2fsprogs/releases/tag/v1.47.1-rc2
Linking btrfs-progs v6.8 against this version of e2fsprogs leads to the following compile error:
convert/source-ext2.c: In function 'ext4_copy_inode_timespec_extra':
convert/source-ext2.c:733:13: warning: implicit declaration of function 'inode_includes' [-Wimplicit-function-declaration]
733 | if (inode_includes(inode_size, i_ ## xtime ## _extra)) { \
| ^~~~~~~~~~~~~~
convert/source-ext2.c:769:9: note: in expansion of macro 'EXT4_COPY_XTIME'
769 | EXT4_COPY_XTIME(atime, dst, tv_sec, tv_nsec);
| ^~~~~~~~~~~~~~~
convert/source-ext2.c:733:40: error: 'i_atime_extra' undeclared (first use in this function)
733 | if (inode_includes(inode_size, i_ ## xtime ## _extra)) { \
| ^~
convert/source-ext2.c:769:9: note: in expansion of macro 'EXT4_COPY_XTIME'
769 | EXT4_COPY_XTIME(atime, dst, tv_sec, tv_nsec);
| ^~~~~~~~~~~~~~~
convert/source-ext2.c:733:40: note: each undeclared identifier is reported only once for each function it appears in
733 | if (inode_includes(inode_size, i_ ## xtime ## _extra)) { \
| ^~
convert/source-ext2.c:769:9: note: in expansion of macro 'EXT4_COPY_XTIME'
769 | EXT4_COPY_XTIME(atime, dst, tv_sec, tv_nsec);
| ^~~~~~~~~~~~~~~
convert/source-ext2.c:733:40: error: 'i_mtime_extra' undeclared (first use in this function)
733 | if (inode_includes(inode_size, i_ ## xtime ## _extra)) { \
| ^~
convert/source-ext2.c:770:9: note: in expansion of macro 'EXT4_COPY_XTIME'
770 | EXT4_COPY_XTIME(mtime, dst, tv_sec, tv_nsec);
| ^~~~~~~~~~~~~~~
convert/source-ext2.c:733:40: error: 'i_ctime_extra' undeclared (first use in this function)
733 | if (inode_includes(inode_size, i_ ## xtime ## _extra)) { \
| ^~
convert/source-ext2.c:771:9: note: in expansion of macro 'EXT4_COPY_XTIME'
771 | EXT4_COPY_XTIME(ctime, dst, tv_sec, tv_nsec);
| ^~~~~~~~~~~~~~~
convert/source-ext2.c:774:40: error: 'i_crtime_extra' undeclared (first use in this function)
774 | if (inode_includes(inode_size, i_crtime_extra)) {
| ^~~~~~~~~~~~~~
from tytso/e2fsprogs@ca8bc92
Fix inode_includes() macro to properly wrap "inode" parameter,
and rename to ext2fs_inode_includes() to avoid potential name
clashes. Use this to check inode field inclusion in debugfs
instead of bare constants for inode field offsets.
To fix that use the new prefixed macro and add backward compatibility that
would still use inode_includes().
Issue: #785
Signed-off-by: David Sterba <dsterba@suse.com>
Use the objectid, type, offset natural order as it's more readable and
we're used to read keys like that.
Signed-off-by: David Sterba <dsterba@suse.com>
This patch introduces a new parser helper, parse_u64_with_suffix(),
which has a better error handling, following all the parse_*()
helpers to return non-zero value for errors.
This new helper is going to replace parse_size_from_string(), which
would directly call exit(1) to stop the whole program.
Furthermore most callers of parse_size_from_string() are expecting
exit(1) for error, so that they can skip the error handling.
For those call sites, introduce a wrapper, arg_strtou64_with_suffix(),
to do that. The only disadvantage is a little less detailed error
report for why the parse failed, but for most cases the generic error
string should be enough.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a report about failed btrfs-convert, which shows the following
error:
Create btrfs metadata
corrupt leaf: root=5 block=5001931145216 slot=1 ino=89911763, invalid previous key objectid, have 89911762 expect 89911763
leaf 5001931145216 items 336 free space 7 generation 90 owner FS_TREE
leaf 5001931145216 flags 0x1(WRITTEN) backref revision 1
fs uuid 8b69f018-37c3-4b30-b859-42ccfcbe2449
chunk uuid 448ce78c-ea41-49f6-99dc-46ad80b93da9
item 0 key (89911762 INODE_REF 3858733) itemoff 16222 itemsize 61
index 171 namelen 51 name: [FILENAME1]
item 1 key (89911763 INODE_REF 3858733) itemoff 16161 itemsize 61
index 103 namelen 51 name: [FILENAME2]
[CAUSE]
When iterating a directory, btrfs-convert would insert the DIR_ITEMs,
along with the INODE_REF of that inode.
This leads to above stray INODE_REFs, and trigger the tree-checker.
This can only happen for large fs, as for most cases we have all these
modified tree blocks cached, thus tree-checker won't be triggered.
But when the tree block cache is not hit, and we have to read from disk,
then such behavior can lead to above tree-checker error.
[FIX]
Insert a dummy INODE_ITEM for the INODE_REF first, the inode items would
be updated when iterating the child inode of the directory.
Issue: #731
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Although kernel scrub code has been updated to handle the unaligned
chunk length, there is also no harm if we can allocate data chunk with
both start and length aligned.
This patch handles this by rounding up the end bytenr when allocating
data chunks for the conversion.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a report about failed btrfs-convert, which shows the following
error:
corrupt leaf: root=5 block=5001928998912 slot=1 ino=89911763, invalid previous key objectid, have 89911762 expect 89911763
ERROR: failed to copy ext2 inode 89911320: -5
ERROR: error during copy_inodes -5
WARNING: error during conversion, the original filesystem is not modified
[CAUSE]
Above error is triggered when checking the following items inside a
subvolume:
- inode ref
- dir item/index
- file extent
- xattr
This is to make sure these items have correct previous key.
However btrfs-convert is not following this requirement, it always
inserts those items first, then creates a btrfs_inode for it.
Thus it can lead to the error.
This can only happen for large fs, as for most cases we have all these
modified tree blocks cached, thus tree-checker won't be triggered.
But when the tree block cache is not hit, and we have to read from disk,
then such behavior can lead to above tree-checker error.
[FIX]
Make sure we insert the inode item first, then the file extents/dir
items/xattrs. And after the file extents/dir items/xattrs inserted, we
update the existing inode (to update its size and bytes).
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The clear-cache functionality is shared by several commands:
- btrfs check
For --clear-cache and --clear-ino-cache.
- btrfstune
Mostly for block-group-tree feature conversion.
- btrfs-convert
To enable the now default v2 space cache.
Thus it's no longer proper to keep clear-cache.[ch] under check/
directory, move them to common/ directory.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function and it's related functions only exist for the utilities
that populate existing file systems, and do not exist in the upstream
kernel. Move this function and the related function into it's own
common source file and out of the kernel-shared sources, and then update
all of the users to include the new location of this code.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This simply zero's out the path, and this is used everywhere we use a
stack path. Drop this usage and simply init the path's to empty instead
of using a function to do the memset.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
We got some test failures related to btrfs-convert with subpage, e.g.
btrfs/012, the failure would cause the following dmesg:
BTRFS warning (device nvme0n1p7): v1 space cache is not supported for page size 16384 with sectorsize 4096
BTRFS error (device nvme0n1p7): open_ctree failed
[CAUSE]
v1 space cache has tons of hard coded PAGE_SIZE usage, and considering
v2 space cache is going to replace it (which is already the new default
since v5.15 btrfs-progs), thus for btrfs subpage support, we just simply
reject the v1 space cache, and utilize v2 space cache when possible.
But there is special catch in btrfs-convert, although we're specifying
v2 space cache as the new default for btrfs-convert, it doesn't really
follow the specification at all.
Thus the converted filesystem will still go v1 space cache.
[FIX]
It can be a huge change to btrfs-convert to make the initial btrfs image
to support v2 cache.
Thus this patch would change the fs at the final stage, just before we
finalize the btrfs.
This patch would drop all the v1 cache created, then call
btrfs_create_free_space_tree() to populate the free space tree and
commit the superblock with needed compat_ro flags.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch would modify btrfs_csum_file_block() to handle csum type
other than the one used in the current fs.
The new data checksum would use a different objectid (-13) to
distinguish with the existing one (-10).
This needs to change tree-checker to skip the item size checks,
since new csum can be larger than the original csum.
After this stage, the resulted csum tree would look like this:
item 0 key (CSUM_CHANGE EXTENT_CSUM 13631488) itemoff 8091 itemsize 8192
range start 13631488 end 22020096 length 8388608
item 1 key (EXTENT_CSUM EXTENT_CSUM 13631488) itemoff 7067 itemsize 1024
range start 13631488 end 14680064 length 1048576
Note the itemsize is 8 times the original one, as the original csum is
CRC32, while target csum is SHA256, which is 8 times the size.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a report that btrfs-convert leads to bad csum for the image
file.
The reproducer looks like this:
(note the 64K block size, it's used to force a certain chunk layout)
# touch test.img
# truncate -s 10G test.img
# mkfs.ext4 -b 64K test.img
# btrfs-convert -N 64K test.img
# btrfs check --check-data-csum test.img
Opening filesystem to check...
Checking filesystem on /home/adam/test.img
UUID: 39d49537-a9f5-47f1-b6ab-7857707b9133
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking csums against data
mirror 1 bytenr 4563140608 csum 0x3f1fa0ef expected csum 0xa4c4c072
mirror 1 bytenr 4563206144 csum 0x55dcf0d3 expected csum 0xa4c4c072
mirror 1 bytenr 4563271680 csum 0x4491b00a expected csum 0xa4c4c072
mirror 1 bytenr 4563337216 csum 0x655d1f61 expected csum 0xa4c4c072
mirror 1 bytenr 4563402752 csum 0xd37114d3 expected csum 0xa4c4c072
mirror 1 bytenr 4563468288 csum 0x4c2dab30 expected csum 0xa4c4c072
mirror 1 bytenr 4563533824 csum 0xa80fceed expected csum 0xa4c4c072
mirror 1 bytenr 4563599360 csum 0xaf610db8 expected csum 0xa4c4c072
mirror 1 bytenr 4563795968 csum 0x67b3c8a0 expected csum 0xa4c4c072
ERROR: errors found in csum tree
[6/7] checking root refs
...
[CAUSE]
Above initial failure is for logical bytenr of 4563140608, which is
inside the relocated range of the image file offset [0, 1M).
During convert, we migrate the original image file ranges which would
later be covered by super and other reserved ranges.
The migration happens as:
- Read out the original data
- Reserve a new file extent
- Write the data back to the file extent
Note that, the new file extent can be inside some new data chunks,
thus it's no longer 1:1 mapped.
- Generate the new csum for the new file extent
The problem happens at the last stage. We should read out the data from
the new file extent, but we call read_disk_extent() using the logical
bytenr, however read_disk_extent() is not doing logical -> physical
mapping.
Thus we will read some garbage, not the newly written data, and use
those garbage to generate csum. And caused the above problem.
[FIX]
Instead of read_disk_extent(), call read_data_from_disk(), which would
do the proper logical -> physical mapping, thus would fix the bug.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The function write_and_map_eb() is quite abused as a way to write any
generic buffer back to disk.
But we have a more suitable function already, write_data_to_disk().
This patch would remove the abused write_data_to_disk() calls, and
convert the only three valid call sites to write_data_to_disk() instead.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch syncs file-item.h into btrfs-progs. This carries with it an
API change for btrfs_del_csums, which takes a root argument in the
kernel, so all callsites have been updated accordingly.
I didn't sync file-item.c because it carries with it a bunch of bio
related helpers which are difficult to adapt to the kernel.
Additionally there's a few helpers in the local copy of file-item.c that
aren't in the kernel that are required for different tools.
This requires more cleanups in both the kernel and progs in order to
sync file-item.c, so for now just do file-item.h in order to pull things
out of ctree.h.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We want to keep this file locally as we want to be uptodate with
upstream, so we can build btrfs-progs regardless of which kernel is
currently installed. Sync this with the upstream version and put it in
kernel-shared/uapi to maintain some semblance of where this file comes
from.
There are some changes that need to be synced back to kernel. A local
definition of static_assert is used to avoid compilation problems on gcc
(< 9) due to mandatory 2nd parameter.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While syncing messages.[ch] I had to back out the ASSERT() code in
kerncompat.h, which means we now rely on the kernel code for ASSERT().
In order to maintain some semblance of separation introduce UASSERT()
and use that in all the purely userspace code.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Since e2fsprog 1.47, even with a newly created empty ext4 filesystem,
btrfs-convert would result an fs that btrfs-check would complain:
# mkfs.ext4 -F test.img
# btrfs-convert test.img
# btrfs-check test.img
Opening filesystem to check...
Checking filesystem on test.img
UUID: e45da158-8967-4e4d-9c9f-66b0d127dbce
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
root 5 inode 266 errors 2000, link count wrong
ERROR: errors found in fs roots
found 26333184 bytes used, error(s) found <<<
total csum bytes: 25540
total tree bytes: 180224
total fs tree bytes: 49152
total extent tree bytes: 16384
btree space waste bytes: 145423
file data blocks allocated: 33947648
referenced 26284032
[CAUSE]
Ext4 has a new compat feature, COMPAT_ORPHAN_FILE, as a better way to
track all the orphan inodes.
This new feature would create a new special inode for this purpose, and
such orphan file inode would not be reachable from any other inode, but
only from super block.
Unfortunately btrfs-convert only skip ext2 known special inodes, not the
newer one.
[FIX]
According to the kernel document, we can locate the orphan file inode
using ext2 super block s_orphan_file_inum, and skip it for
btrfs-convert.
And such skip would only happen if we have the definition of
EXT4_FEATURE_COMPAT_ORPHAN_FILE, to be compatible with older e2fsprogs.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Prepare a single location that will detect or set accelerated versions
of hash algorithms. Right now it's the crc32c, blake2 and sha256 do
an if-else switch while crc32c sets a function pointer.
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Even with chunk_objectid bug fixed, mkfs.btrfs can still caused stack
overflow when enabling extent-tree-v2 feature (need experimental
features enabled):
# ./mkfs.btrfs -f -O extent-tree-v2 ~/test.img
btrfs-progs v5.19.1
See http://btrfs.wiki.kernel.org for more information.
ERROR: superblock magic doesn't match
NOTE: several default settings have changed in version 5.15, please make sure
this does not affect your deployments:
- DUP for metadata (-m dup)
- enabled no-holes (-O no-holes)
- enabled free-space-tree (-R free-space-tree)
Label: (null)
UUID: 205c61e7-f58e-4e8f-9dc2-38724f5c554b
Node size: 16384
Sector size: 4096
Filesystem size: 512.00MiB
Block group profiles:
Data: single 8.00MiB
Metadata: DUP 32.00MiB
System: DUP 8.00MiB
SSD detected: no
Zoned device: no
=================================================================
[... Skip full ASAN output ...]
==65655==ABORTING
[CAUSE]
For experimental build, we have unified feature output, but the old
buffer size is only 64 bytes, which is too small to cover the new full
feature string:
extref, skinny-metadata, no-holes, free-space-tree, block-group-tree, extent-tree-v2
Above feature string is already 84 bytes, over the 64 on-stack memory
size.
This can also be proved by the ASAN output:
==65655==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffc4e03b1d0 at pc 0x7ff0fc05fafe bp 0x7ffc4e03ac60 sp 0x7ffc4e03a408
WRITE of size 17 at 0x7ffc4e03b1d0 thread T0
#0 0x7ff0fc05fafd in __interceptor_strcat /usr/src/debug/gcc/libsanitizer/asan/asan_interceptors.cpp:377
#1 0x55cdb7b06ca5 in parse_features_to_string common/fsfeatures.c:316
#2 0x55cdb7b06ce1 in btrfs_parse_fs_features_to_string common/fsfeatures.c:324
#3 0x55cdb7a37226 in main mkfs/main.c:1783
#4 0x7ff0fbe3c28f (/usr/lib/libc.so.6+0x2328f)
#5 0x7ff0fbe3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
#6 0x55cdb7a2cb34 in _start ../sysdeps/x86_64/start.S:115
[FIX]
Introduce a new macro, BTRFS_FEATURE_STRING_BUF_SIZE, along with a new
sanity check helper, btrfs_assert_feature_buf_size().
The problem is I can not find a build time method to verify
BTRFS_FEATURE_STRING_BUF_SIZE is large enough to contain all feature
names, thus have to go the runtime function to do the BUG_ON() to verify
the macro size.
Now the minimal buffer size for experimental build is 138 bytes, just
bump it to 160 for future expansion.
And if further features go beyond that number, mkfs.btrfs/btrfs-convert
will immediately crash at that BUG_ON(), so we can definitely detect it.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Tested-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Commit "btrfs-progs: prepare merging compat feature lists" tries to
merged "-O" and "-R" options, as they don't correctly represents
btrfs features.
But that commit caused the following bug during mkfs for experimental
build:
$ mkfs.btrfs -f -O block-group-tree /dev/nvme0n1
btrfs-progs v5.19.1
See http://btrfs.wiki.kernel.org for more information.
ERROR: superblock magic doesn't match
ERROR: illegal nodesize 16384 (not equal to 4096 for mixed block group)
[CAUSE]
Currently btrfs_parse_fs_features() will return a u64, and reuse the
same u64 for both incompat and compat RO flags for experimental branch.
This can easily leads to conflicts, as
BTRFS_FEATURE_INCOMPAT_MIXED_BLOCK_GROUP and
BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE both share the same bit
(1 << 2).
Thus for above case, mkfs.btrfs believe it has set MIXED_BLOCK_GROUP
feature, but what we really want is BLOCK_GROUP_TREE.
[FIX]
Instead of incorrectly re-using the same bits in btrfs_feature, split
the old flags into 3 flags:
- incompat_flag
- compat_ro_flag
- runtime_flag
The first two flags are easy to understand, the corresponding flag of
each feature.
The last runtime_flag is to compensate features which doesn't have any
on-disk flag set, like QUOTA and LIST_ALL.
And since we're no longer using a single u64 as features, we have to
introduce a new structure, btrfs_mkfs_features, to contain above 3
flags.
This also mean, things like default mkfs features must be converted to
use the new structure, thus those old macros are all converted to
const static structures:
- BTRFS_MKFS_DEFAULT_FEATURES + BTRFS_MKFS_DEFAULT_RUNTIME_FEATURES
-> btrfs_mkfs_default_features
- BTRFS_CONVERT_ALLOWED_FEATURES -> btrfs_convert_allowed_features
And since we're using a structure, it's not longer as easy to implement
a disallowed mask.
Thus functions with @mask_disallowed are all changed to using
an @allowed structure pointer (which can be NULL).
Finally if we have experimental features enabled, all features can be
specified by -O options, and we can output a unified feature list,
instead of the old split ones.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are cases where the BUG_ON should be replaced by error
handling as it's validating the data from the source filesystem or
possibility to convert. The unconverted cases are asserts and will be
replaced later.
Signed-off-by: David Sterba <dsterba@suse.com>
The (unsigned long long) type casts can be dropped, printf understands
%llu and u64 and does not warn. In cases where the type is not u64 keep
the cast.
Signed-off-by: David Sterba <dsterba@suse.com>
The logic at the beginning of this function to handle reserved ranges
was pretty complex and hard to follow. By refactoring it to use the
existing intersect_with_reserved() function, we can remove most of the
comparisons and boolean operators while preserving the exact same logic.
This change is only for readability. It does not change the logic itself
at all.
Author: Thomas Hebb <tommyhebb@gmail.com>
Pull-request: #494
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We currently open code a similar operation in create_image_file_range().
By exposing intersect_with_reserved() outside of source-fs.c and
slightly changing its semantics to return the entire range instead of
just the end address, we can reuse it in create_image_file_range().
Author: Thomas Hebb <tommyhebb@gmail.com>
Pull-request: #494
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When checking if the requested range starts in a valid region but later
hits a reserved range, we require the reserved range to end before the
requested one does.
This is incorrect. Since we're going to truncate the requested range
anyway, we want this check to pass even if the requested range ends
partway through a reserved range.
Fix the issue by checking against the reserved range's start address
instead of its end.
Luckily, I don't believe this bug makes a difference in the current code
path, since the range we pass to this function never ends before the end
of the filesystem.
Issue: #297
Issue: #349
Author: Thomas Hebb <tommyhebb@gmail.com>
Pull-request: #494
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
intersect_with_reserved() currently succeeds if (bytenr + num_bytes) is
greater than or equal to the first address in the range, assuming that
bytenr is also not past the end of the range.
This is wrong. (bytenr + num bytes) is one byte past the last address in
the range we're checking, meaning that our range only overlaps the
reserved range if it's strictly greater than the reserved range's start
address.
For example, imagine a range at 0x3000 with length 0x1000 that we're
checking against a reserved range that starts at 0x4000. The addresses
in our range are 0x3000-0x3fff: it doesn't overlap. But the current
check, (0x3000 + 0x1000 >= 0x4000), will erroneously pass.
Fix the issue by changing >= to >.
Issue: #297
Issue: #349
Author: Thomas Hebb <tommyhebb@gmail.com>
Pull-request: #494
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is currently defined in source-fs.h, but main.c uses it far more
than source-fs.c does. Put it in common.h instead, since it's a useful
standalone type.
Author: Thomas Hebb <tommyhebb@gmail.com>
Pull-request: #494
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The preferred order:
- system headers
- standard headers
- libraries
- kernel library
- kernel shared
- common headers
- other tools
- own headers
Signed-off-by: David Sterba <dsterba@suse.com>