Use the objectid, type, offset natural order as it's more readable and
we're used to read keys like that.
Signed-off-by: David Sterba <dsterba@suse.com>
This patch introduces a new parser helper, parse_u64_with_suffix(),
which has a better error handling, following all the parse_*()
helpers to return non-zero value for errors.
This new helper is going to replace parse_size_from_string(), which
would directly call exit(1) to stop the whole program.
Furthermore most callers of parse_size_from_string() are expecting
exit(1) for error, so that they can skip the error handling.
For those call sites, introduce a wrapper, arg_strtou64_with_suffix(),
to do that. The only disadvantage is a little less detailed error
report for why the parse failed, but for most cases the generic error
string should be enough.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Although kernel scrub code has been updated to handle the unaligned
chunk length, there is also no harm if we can allocate data chunk with
both start and length aligned.
This patch handles this by rounding up the end bytenr when allocating
data chunks for the conversion.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The clear-cache functionality is shared by several commands:
- btrfs check
For --clear-cache and --clear-ino-cache.
- btrfstune
Mostly for block-group-tree feature conversion.
- btrfs-convert
To enable the now default v2 space cache.
Thus it's no longer proper to keep clear-cache.[ch] under check/
directory, move them to common/ directory.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function and it's related functions only exist for the utilities
that populate existing file systems, and do not exist in the upstream
kernel. Move this function and the related function into it's own
common source file and out of the kernel-shared sources, and then update
all of the users to include the new location of this code.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This simply zero's out the path, and this is used everywhere we use a
stack path. Drop this usage and simply init the path's to empty instead
of using a function to do the memset.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
We got some test failures related to btrfs-convert with subpage, e.g.
btrfs/012, the failure would cause the following dmesg:
BTRFS warning (device nvme0n1p7): v1 space cache is not supported for page size 16384 with sectorsize 4096
BTRFS error (device nvme0n1p7): open_ctree failed
[CAUSE]
v1 space cache has tons of hard coded PAGE_SIZE usage, and considering
v2 space cache is going to replace it (which is already the new default
since v5.15 btrfs-progs), thus for btrfs subpage support, we just simply
reject the v1 space cache, and utilize v2 space cache when possible.
But there is special catch in btrfs-convert, although we're specifying
v2 space cache as the new default for btrfs-convert, it doesn't really
follow the specification at all.
Thus the converted filesystem will still go v1 space cache.
[FIX]
It can be a huge change to btrfs-convert to make the initial btrfs image
to support v2 cache.
Thus this patch would change the fs at the final stage, just before we
finalize the btrfs.
This patch would drop all the v1 cache created, then call
btrfs_create_free_space_tree() to populate the free space tree and
commit the superblock with needed compat_ro flags.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch would modify btrfs_csum_file_block() to handle csum type
other than the one used in the current fs.
The new data checksum would use a different objectid (-13) to
distinguish with the existing one (-10).
This needs to change tree-checker to skip the item size checks,
since new csum can be larger than the original csum.
After this stage, the resulted csum tree would look like this:
item 0 key (CSUM_CHANGE EXTENT_CSUM 13631488) itemoff 8091 itemsize 8192
range start 13631488 end 22020096 length 8388608
item 1 key (EXTENT_CSUM EXTENT_CSUM 13631488) itemoff 7067 itemsize 1024
range start 13631488 end 14680064 length 1048576
Note the itemsize is 8 times the original one, as the original csum is
CRC32, while target csum is SHA256, which is 8 times the size.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a report that btrfs-convert leads to bad csum for the image
file.
The reproducer looks like this:
(note the 64K block size, it's used to force a certain chunk layout)
# touch test.img
# truncate -s 10G test.img
# mkfs.ext4 -b 64K test.img
# btrfs-convert -N 64K test.img
# btrfs check --check-data-csum test.img
Opening filesystem to check...
Checking filesystem on /home/adam/test.img
UUID: 39d49537-a9f5-47f1-b6ab-7857707b9133
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking csums against data
mirror 1 bytenr 4563140608 csum 0x3f1fa0ef expected csum 0xa4c4c072
mirror 1 bytenr 4563206144 csum 0x55dcf0d3 expected csum 0xa4c4c072
mirror 1 bytenr 4563271680 csum 0x4491b00a expected csum 0xa4c4c072
mirror 1 bytenr 4563337216 csum 0x655d1f61 expected csum 0xa4c4c072
mirror 1 bytenr 4563402752 csum 0xd37114d3 expected csum 0xa4c4c072
mirror 1 bytenr 4563468288 csum 0x4c2dab30 expected csum 0xa4c4c072
mirror 1 bytenr 4563533824 csum 0xa80fceed expected csum 0xa4c4c072
mirror 1 bytenr 4563599360 csum 0xaf610db8 expected csum 0xa4c4c072
mirror 1 bytenr 4563795968 csum 0x67b3c8a0 expected csum 0xa4c4c072
ERROR: errors found in csum tree
[6/7] checking root refs
...
[CAUSE]
Above initial failure is for logical bytenr of 4563140608, which is
inside the relocated range of the image file offset [0, 1M).
During convert, we migrate the original image file ranges which would
later be covered by super and other reserved ranges.
The migration happens as:
- Read out the original data
- Reserve a new file extent
- Write the data back to the file extent
Note that, the new file extent can be inside some new data chunks,
thus it's no longer 1:1 mapped.
- Generate the new csum for the new file extent
The problem happens at the last stage. We should read out the data from
the new file extent, but we call read_disk_extent() using the logical
bytenr, however read_disk_extent() is not doing logical -> physical
mapping.
Thus we will read some garbage, not the newly written data, and use
those garbage to generate csum. And caused the above problem.
[FIX]
Instead of read_disk_extent(), call read_data_from_disk(), which would
do the proper logical -> physical mapping, thus would fix the bug.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The function write_and_map_eb() is quite abused as a way to write any
generic buffer back to disk.
But we have a more suitable function already, write_data_to_disk().
This patch would remove the abused write_data_to_disk() calls, and
convert the only three valid call sites to write_data_to_disk() instead.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch syncs file-item.h into btrfs-progs. This carries with it an
API change for btrfs_del_csums, which takes a root argument in the
kernel, so all callsites have been updated accordingly.
I didn't sync file-item.c because it carries with it a bunch of bio
related helpers which are difficult to adapt to the kernel.
Additionally there's a few helpers in the local copy of file-item.c that
aren't in the kernel that are required for different tools.
This requires more cleanups in both the kernel and progs in order to
sync file-item.c, so for now just do file-item.h in order to pull things
out of ctree.h.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While syncing messages.[ch] I had to back out the ASSERT() code in
kerncompat.h, which means we now rely on the kernel code for ASSERT().
In order to maintain some semblance of separation introduce UASSERT()
and use that in all the purely userspace code.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Prepare a single location that will detect or set accelerated versions
of hash algorithms. Right now it's the crc32c, blake2 and sha256 do
an if-else switch while crc32c sets a function pointer.
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Even with chunk_objectid bug fixed, mkfs.btrfs can still caused stack
overflow when enabling extent-tree-v2 feature (need experimental
features enabled):
# ./mkfs.btrfs -f -O extent-tree-v2 ~/test.img
btrfs-progs v5.19.1
See http://btrfs.wiki.kernel.org for more information.
ERROR: superblock magic doesn't match
NOTE: several default settings have changed in version 5.15, please make sure
this does not affect your deployments:
- DUP for metadata (-m dup)
- enabled no-holes (-O no-holes)
- enabled free-space-tree (-R free-space-tree)
Label: (null)
UUID: 205c61e7-f58e-4e8f-9dc2-38724f5c554b
Node size: 16384
Sector size: 4096
Filesystem size: 512.00MiB
Block group profiles:
Data: single 8.00MiB
Metadata: DUP 32.00MiB
System: DUP 8.00MiB
SSD detected: no
Zoned device: no
=================================================================
[... Skip full ASAN output ...]
==65655==ABORTING
[CAUSE]
For experimental build, we have unified feature output, but the old
buffer size is only 64 bytes, which is too small to cover the new full
feature string:
extref, skinny-metadata, no-holes, free-space-tree, block-group-tree, extent-tree-v2
Above feature string is already 84 bytes, over the 64 on-stack memory
size.
This can also be proved by the ASAN output:
==65655==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffc4e03b1d0 at pc 0x7ff0fc05fafe bp 0x7ffc4e03ac60 sp 0x7ffc4e03a408
WRITE of size 17 at 0x7ffc4e03b1d0 thread T0
#0 0x7ff0fc05fafd in __interceptor_strcat /usr/src/debug/gcc/libsanitizer/asan/asan_interceptors.cpp:377
#1 0x55cdb7b06ca5 in parse_features_to_string common/fsfeatures.c:316
#2 0x55cdb7b06ce1 in btrfs_parse_fs_features_to_string common/fsfeatures.c:324
#3 0x55cdb7a37226 in main mkfs/main.c:1783
#4 0x7ff0fbe3c28f (/usr/lib/libc.so.6+0x2328f)
#5 0x7ff0fbe3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
#6 0x55cdb7a2cb34 in _start ../sysdeps/x86_64/start.S:115
[FIX]
Introduce a new macro, BTRFS_FEATURE_STRING_BUF_SIZE, along with a new
sanity check helper, btrfs_assert_feature_buf_size().
The problem is I can not find a build time method to verify
BTRFS_FEATURE_STRING_BUF_SIZE is large enough to contain all feature
names, thus have to go the runtime function to do the BUG_ON() to verify
the macro size.
Now the minimal buffer size for experimental build is 138 bytes, just
bump it to 160 for future expansion.
And if further features go beyond that number, mkfs.btrfs/btrfs-convert
will immediately crash at that BUG_ON(), so we can definitely detect it.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Tested-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Commit "btrfs-progs: prepare merging compat feature lists" tries to
merged "-O" and "-R" options, as they don't correctly represents
btrfs features.
But that commit caused the following bug during mkfs for experimental
build:
$ mkfs.btrfs -f -O block-group-tree /dev/nvme0n1
btrfs-progs v5.19.1
See http://btrfs.wiki.kernel.org for more information.
ERROR: superblock magic doesn't match
ERROR: illegal nodesize 16384 (not equal to 4096 for mixed block group)
[CAUSE]
Currently btrfs_parse_fs_features() will return a u64, and reuse the
same u64 for both incompat and compat RO flags for experimental branch.
This can easily leads to conflicts, as
BTRFS_FEATURE_INCOMPAT_MIXED_BLOCK_GROUP and
BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE both share the same bit
(1 << 2).
Thus for above case, mkfs.btrfs believe it has set MIXED_BLOCK_GROUP
feature, but what we really want is BLOCK_GROUP_TREE.
[FIX]
Instead of incorrectly re-using the same bits in btrfs_feature, split
the old flags into 3 flags:
- incompat_flag
- compat_ro_flag
- runtime_flag
The first two flags are easy to understand, the corresponding flag of
each feature.
The last runtime_flag is to compensate features which doesn't have any
on-disk flag set, like QUOTA and LIST_ALL.
And since we're no longer using a single u64 as features, we have to
introduce a new structure, btrfs_mkfs_features, to contain above 3
flags.
This also mean, things like default mkfs features must be converted to
use the new structure, thus those old macros are all converted to
const static structures:
- BTRFS_MKFS_DEFAULT_FEATURES + BTRFS_MKFS_DEFAULT_RUNTIME_FEATURES
-> btrfs_mkfs_default_features
- BTRFS_CONVERT_ALLOWED_FEATURES -> btrfs_convert_allowed_features
And since we're using a structure, it's not longer as easy to implement
a disallowed mask.
Thus functions with @mask_disallowed are all changed to using
an @allowed structure pointer (which can be NULL).
Finally if we have experimental features enabled, all features can be
specified by -O options, and we can output a unified feature list,
instead of the old split ones.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are cases where the BUG_ON should be replaced by error
handling as it's validating the data from the source filesystem or
possibility to convert. The unconverted cases are asserts and will be
replaced later.
Signed-off-by: David Sterba <dsterba@suse.com>
The (unsigned long long) type casts can be dropped, printf understands
%llu and u64 and does not warn. In cases where the type is not u64 keep
the cast.
Signed-off-by: David Sterba <dsterba@suse.com>
The logic at the beginning of this function to handle reserved ranges
was pretty complex and hard to follow. By refactoring it to use the
existing intersect_with_reserved() function, we can remove most of the
comparisons and boolean operators while preserving the exact same logic.
This change is only for readability. It does not change the logic itself
at all.
Author: Thomas Hebb <tommyhebb@gmail.com>
Pull-request: #494
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When checking if the requested range starts in a valid region but later
hits a reserved range, we require the reserved range to end before the
requested one does.
This is incorrect. Since we're going to truncate the requested range
anyway, we want this check to pass even if the requested range ends
partway through a reserved range.
Fix the issue by checking against the reserved range's start address
instead of its end.
Luckily, I don't believe this bug makes a difference in the current code
path, since the range we pass to this function never ends before the end
of the filesystem.
Issue: #297
Issue: #349
Author: Thomas Hebb <tommyhebb@gmail.com>
Pull-request: #494
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The preferred order:
- system headers
- standard headers
- libraries
- kernel library
- kernel shared
- common headers
- other tools
- own headers
Signed-off-by: David Sterba <dsterba@suse.com>
On large (blockcount > 32bit) filesystems reading directly
super_block->s_blocks_count is not sufficient as the block count is held
in 2 separate 32 bit variables. Instead always use the provided
ext2fs_blocks_count to read the value. This can result in assertion
failure, when the block count is only held in the high 32 bits, in this
case s_block_counts would be zero, which would result in
btrfs_convert_context::block_count/total_bytes to also be 0 and hit an
assertion failure:
convert/main.c:1162: do_convert: Assertion `cctx.total_bytes != 0` failed, value 0
btrfs-convert(+0xffb0)[0x557defdabfb0]
btrfs-convert(main+0x6c5)[0x557defdaa125]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7f66e1f8bd0a]
btrfs-convert(_start+0x2a)[0x557defdab52a]
Aborted
What's worse it can also result in btrfs-convert mistakenly thinking
that a filesystem is smaller than it actually is (ignoring the top 32 bits).
Link: https://lore.kernel.org/linux-btrfs/023b5ca9-0610-231b-fc4e-a72fe1377a5a@jansson.tech/
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add constant for initial value to avoid unexpected clashes with user
defined getopt values and shift the common size getopt values.
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When running some tests, I notice that my debug build of btrfs-convert
is throwing out garbage for target fs label:
$ ./btrfs-convert ~/test.img
btrfs-convert from btrfs-progs v5.17
Source filesystem:
Type: ext2
Label:
Blocksize: 4096
UUID: 29d159a8-cb46-41d3-8089-3c5c65e4afae
Target filesystem:
Label: @pcwU <<< Garbage here
Blocksize: 4096
Nodesize: 16384
UUID: 682bf5f2-8cb1-4390-b9ac-6883cd87ed39
Checksum: crc32c
...
[CAUSE]
The fslabel[] array is just not initialized, thus it can contain
garbage.
[FIX]
Initialize fslabel[] array to all zero.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The mkfs_config can hold the BTRFS_LEAF_DATA_SIZE, so calculate this at
config creation time and then use that value throughout convert instead
of calling __BTRFS_LEAF_DATA_SIZE.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When we switch to multiple global trees we'll need to access the
appropriate extent root depending on the block group or possibly root.
To handle this, use a helper in most places and then the actual root in
places where it is required. We will whittle down the direct accessors
with future patches, but this does the bulk of the preparatory work.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have this helper sitting in extent-tree.c, but it's a repair
function. I'm going to need to make changes to this for extent-tree-v2
and would rather this live outside of the code we need to share with the
kernel.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We are going to need to start looking up the csum root based on the
bytenr with extent tree v2. To that end stop passing the root to the
csum related functions so that can be done in the helper functions
themselves.
There's an unrelated deletion of a function prototype that no longer
exists.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are a lot of call sites where we use the following code snippet:
u8 super_block_data[BTRFS_SUPER_INFO_SIZE];
struct btrfs_super_block *sb;
u64 ret;
sb = (struct btrfs_super_block *)super_block_data;
The reason for this is, structure btrfs_super_block was smaller than
BTRFS_SUPER_INFO_SIZE.
Thus for anything with csum involved, we have to use a proper 4K buffer.
Since the recent unification of sizeof(struct btrfs_super_block), we no
longer need such workaround, and can use struct btrfs_super_block
directly to do any operation.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Several extent_buffer initializations miss fs_info initialization. This
is OK before the following patch ("btrfs-progs: use direct-io for zoned
device") as eb->fs_info is not always necessary. But, after that patch,
we will use fs_info to determine it is zoned or not and that causes
segfault in such cases.
Properly set fs_info when initializing extent_buffers to fix the issue.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Relax the condition about a unique uuid for convert, only print a
warning. In case we copy the uuid, it's expected that at the time the
conversion starts the uuid is not unique as it sill exists on the source
filesystem.
In case user sets the uuid manually but it's still the same one as on
the source filesystem we should also allow that, so it warns in this
case as well.
Update the test so it creates a block device where the uuid would be
also cached by blkid and lets the non-unique check succeed.
Issue: #404
Signed-off-by: David Sterba <dsterba@suse.com>
There are various parsing helpers scattered everywhere, unify them to
one file and start with helpers already in utils.c.
Signed-off-by: David Sterba <dsterba@suse.com>
Add new option --uuid to convert with the following modes:
- 'copy' -- copy the UUID from the source filesystem
- 'new' -- (default) generate new UUID
- UUID -- a valid UUID that will be set on btrfs
Based on patch from Florian
https://lore.kernel.org/linux-btrfs/1357486331-4615-2-git-send-email-falbrechtskirchinger@gmail.com/
and ported to contemporary codebase.
Issue: #391
Signed-off-by: David Sterba <dsterba@suse.com>
There's a group of functions that are related to opening filesystem in
various modes, this can be moved to a separate file.
Signed-off-by: David Sterba <dsterba@suse.com>
The libmount dependency has been added in commit 61ecaff036
("btrfs-progs: build: add libmount dependency"), and static build got
broken. There are functions that do basically the same thing and also
share the name, which in turn fails at link time.
ld: /../lib64/libmount.a(libcommon_la-canonicalize.o): in function `canonicalize_dm_name':
util-linux-2.34/lib/canonicalize.c:58: multiple definition of `canonicalize_dm_name';
common/path-utils.static.o:btrfs-progs/common/path-utils.c:286: first defined here
In case the collision can be resolved by renaming, it's done
(canonicalize_path and parse_size). There are 2 symbols from selinux
that are substituted by a weak aliases during the static build.
There's one new warning due to use of getgrnam_r in libmount that
depends on dynamic linking and may not work properly with static build.
We're not using the related functions directly or indirectly, so it
should be safe to ignore the warnings.
ld: ../lib64/libmount.a(la-utils.o): in function `mnt_get_gid':
util-linux-2.34/libmount/src/utils.c:625: warning: Using 'getgrnam_r' in statically linked applications
+requires at runtime the shared libraries from the glibc version used for linking
Issue: #333
Signed-off-by: David Sterba <dsterba@suse.com>
Now if an ENOSPC error happened, the free space report would help user
to determine if it's a real ENOSPC or a bug in convert.
The reported free space is the calculated free space, which doesn't
include super block space, nor merged data chunks.
The free space is always smaller than the reported available space of
the original fs, as we need extra padding space for used space to avoid
too fragmented data chunks.
The output would be:
$ ./btrfs-convert /dev/sda
create btrfs filesystem:
blocksize: 4096
nodesize: 16384
features: extref, skinny-metadata (default)
checksum: crc32c
free space report:
total: 10737418240
free: 0 (0.00%)
ERROR: unable to create initial ctree: No space left on device
WARNING: an error occurred during conversion, the original filesystem is not modified
Signed-off-by: Qu Wenruo <wqu@suse.com>
[ put total, free to separate lines ]
Signed-off-by: David Sterba <dsterba@suse.com>
The original fs is not touched until we migrate the super blocks.
Under most error cases, we fail before that thus the original fs is
still safe.
So change the error message according the stages we failed to reflect
that.
Signed-off-by: Qu Wenruo <wqu@suse.com>
[ adjust wording of messages ]
Signed-off-by: David Sterba <dsterba@suse.com>