Calculate average fanout between levels:
Levels: 4
Total nodes: 289048
On level 0: 288054
On level 1: 989 (avg fanout 291)
On level 2: 4 (avg fanout 247)
On level 3: 1 (avg fanout 4)
Signed-off-by: David Sterba <dsterba@suse.com>
The new test case will test whether btrfs-convert can handle 64G ext*
fs.
This exercise the cctx->total_bytes calculation where multiplying 4K
(unsigned int) and 16777216 (u32) could lead to bit overflow for
unsigned int and got 0, and screw up later free space calculation.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit "btrfs-progs: convert: prevent 32bit overflow for
cctx->total_bytes" added an assert to ensure that cctxx.total_bytes did
not overflow, but this ASSERT calls assert_trace, which expects a long
value.
By converting the u64 to long overflows in a 32bit machine, leading the
assert_trace to be triggered since cctx.total_bytes turns to zero.
Fix this problem by comparing the cctx.total_bytes with zero when
calling ASSERT.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When convert is called on a 64GiB ext4 fs, it fails like this:
$ btrfs-convert /dev/loop0p1
create btrfs filesystem:
blocksize: 4096
nodesize: 16384
features: extref, skinny-metadata (default)
checksum: crc32c
creating ext2 image file
ERROR: missing data block for bytenr 1048576
ERROR: failed to create ext2_saved/image: -2
WARNING: an error occurred during conversion, filesystem is partially created but not finalized and not mountable
Btrfs-convert also corrupts the source fs:
$ LANG=C e2fsck /dev/loop0p1 -f
e2fsck 1.45.6 (20-Mar-2020)
Resize inode not valid. Recreate<y>? yes
Pass 1: Checking inodes, blocks, and sizes
Deleted inode 3681 has zero dtime. Fix<y>? yes
Inodes that were part of a corrupted orphan linked list found. Fix<y>? yes
Inode 3744 was part of the orphaned inode list. FIXED.
Deleted inode 3745 has zero dtime. Fix<y>? yes
Inode 3747 has INLINE_DATA_FL flag on filesystem without inline data support.
Clear<y>? yes
...
[CAUSE]
After some debugging, the first strange behavior is, the value of
cctx->total_bytes is 0 in ext2_open_fs().
It turns out that, the value assign for cctx->total_bytes could lead to
bit overflow for the unsigned int value.
And that 0 cctx->total_bytes leads to various problems for later free
space calculation.
For example, in calculate_available_space(), we use cctx->total_bytes to
ensure we won't create a data chunk beyond device end:
cue_len = min(cctx->total_bytes - cur_off, cur_len);
If that cur_offset is also 0, we will create a cache_extent with 0 size,
which could cause a lot of problems for cache tree search.
[FIX]
Do manual casting for the multiply operation, so we could got a real u64
result. The fix will be applied to all supported fses (ext* and
reiserfs).
Reported-by: Christian Zangl <coralllama@gmail.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Kernel patch ff51bf02d107 ("btrfs: block-group: fix free-space bitmap
threshold") is needed to exercise both branches of the test, this can be
detected by lack of the objectid and offset parsed from the dump.
Issue: #288
Signed-off-by: David Sterba <dsterba@suse.com>
For workarounds or known missing support add helper to print a
notification about that, when _not_run is not suitable.
Signed-off-by: David Sterba <dsterba@suse.com>
- add missing raid1c34 profiles to the list of supported profiles
- document defaults change in 5.8
- update wording
Signed-off-by: David Sterba <dsterba@suse.com>
The single profile is better suited as default for data on multiple
devices. Switch from RAID0 because:
- it's easier to convert to other profiles, as single consumes some
chunks per device, but RAID0 has chunks on all devices regardless of
the used space
- RAID0 has no redundancy and compared one disk failure affects many
files due to striping, while with single the chances are a bit higher
that complete files are stored on one device
- when the device sizes are not equal and not even close to equal, the
maximum achievable size with RAID0 is size of the smallest device due
to striping, with single it's the sum of all device sizes
The changed defaults could affect scripts and deployments that rely on
the old values, but given the number of possible profiles for multiple
devices let's hope that they're specified explicitly in majority of
cases.
Issue: #270
Signed-off-by: David Sterba <dsterba@suse.com>
Extract the defaults for data and metadata profiles to a header and
use the symbolic names instead of hardcoding the profiles.
Signed-off-by: David Sterba <dsterba@suse.com>
The node/leaf stats have been calculated but never displayed. Moreover,
a more detailed information about counts on each level can be useful,
add it to the output of tree-stats.
Example output:
Levels: 3
Total nodes: 25692
On level 0: 25601
On level 1: 90
On level 2: 1
Issue: #266
Signed-off-by: David Sterba <dsterba@suse.com>
The options have been deprecated for a long time, the kernel mount
options are being removed too (in 5.9).
Signed-off-by: David Sterba <dsterba@suse.com>
The option -A was used long time ago for debugging and marked as
obsolete since 4.14.1. Remove the option and set the alloc start to the
default value 1MiB.
Signed-off-by: David Sterba <dsterba@suse.com>
There's still some interest in the btrfs-debugfs tool, make it work with
python v3 until we have a replacement.
Issue: #261
Signed-off-by: Lakshmipathi <lakshmipathi.ganapathi@collabora.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Many subcommands have their own verbosity options that are being
superseded by the global options. Update the help text to reflect that
where applicable.
Signed-off-by: David Sterba <dsterba@suse.com>
Add a test case to check if the converted fs has device extent beyond
boundary.
The disk layout of source ext4 fs needs some extents to make them
allocated at the very end of the fs. The script is from the original
reporter.
Also, since the existing convert tests always uses 512M as device size,
which is not suitable for this test case, make it to grab the existing
device size to co-operate with this test case.
Reported-by: Jiachen YANG <farseerfc@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
The following script could lead to corrupted btrfs fs after
btrfs-convert:
fallocate -l 1G test.img
mkfs.ext4 test.img
mount test.img $mnt
fallocate -l 200m $mnt/file1
fallocate -l 200m $mnt/file2
fallocate -l 200m $mnt/file3
fallocate -l 200m $mnt/file4
fallocate -l 205m $mnt/file1
fallocate -l 205m $mnt/file2
fallocate -l 205m $mnt/file3
fallocate -l 205m $mnt/file4
umount $mnt
btrfs-convert test.img
The result btrfs will have a device extent beyond its boundary:
pening filesystem to check...
Checking filesystem on test.img
UUID: bbcd7399-fd5b-41a7-81ae-d48bc6935e43
[1/7] checking root items
[2/7] checking extents
ERROR: dev extent devid 1 physical offset 993198080 len 85786624 is beyond device boundary 1073741824
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 913960960 bytes used, error(s) found
total csum bytes: 891500
total tree bytes: 1064960
total fs tree bytes: 49152
total extent tree bytes: 16384
btree space waste bytes: 144885
file data blocks allocated: 2129063936
referenced 1772728320
[CAUSE]
Btrfs-convert first collect all used blocks in the original fs, then
slightly enlarge the used blocks range as new btrfs data chunks.
However the enlarge part has a problem, that it doesn't take the device
boundary into consideration.
Thus it caused device extents and data chunks to go beyond device
boundary.
[FIX]
Just to extra check before inserting data chunks into
btrfs_convert_context::data_chunk.
Reported-by: Jiachen YANG <farseerfc@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[WARNING]
When compiling btrfs-progs, the following warning pops up:
In file included from /usr/include/stdio.h:867,
from ./kerncompat.h:22,
from common/fsfeatures.c:17:
In function 'printf',
inlined from 'process_features' at common/fsfeatures.c:192:4,
inlined from 'btrfs_process_runtime_features' at common/fsfeatures.c:205:2:
/usr/include/bits/stdio2.h:107:10: warning: '%s' directive argument is null [-Wformat-overflow=]
107 | return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ());
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This only occur with default make parameters. If compiling with D=1, the
warning just disappears.
The involved tool chain is:
- GCC 10.1.0
[CAUSE]
The offending code is:
static void process_features(u64 flags, enum feature_source source)
{
...
if (flags & feat->flag) {
printf("Turning ON incompat feature '%s': %s\n",
feat->name, feat->desc);
}
...
}
Currently, there is no runtime/fs feature without a name nor
description. So we shouldn't hit a feature with NULL as name nor
description.
This looks like a bug in GCC though.
[WORKAROUND]
However can workaround it by doing an explicit check on feat->name and
feat->desc to teach GCC not to do a wrong warning.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[WARNING]
When compiling btrfs-progs, there is one warning from convert ext2 code:
convert/source-ext2.c: In function 'ext2_open_fs':
convert/source-ext2.c:91:44: warning: pointer targets in passing argument 1 of 'strndup' differ in signedness [-Wpointer-sign]
91 | cctx->volume_name = strndup(ext2_fs->super->s_volume_name, 16);
| ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
| |
| __u8 * {aka unsigned char *}
In file included from ./kerncompat.h:25,
from convert/source-ext2.c:19:
/usr/include/string.h:175:35: note: expected 'const char *' but argument is of type '__u8 *' {aka 'unsigned char *'}
175 | extern char *strndup (const char *__string, size_t __n)
| ~~~~~~~~~~~~^~~~~~~~
The toolchain involved is:
- GCC 10.1.0
- e2fsprogs 1.45.6
[CAUSE]
Obviously, in the offending e2fsprogs, the volume label is using u8,
which is unsigned char, not char.
/*078*/ __u8 s_volume_name[EXT2_LABEL_LEN]; /* volume name, no NUL? */
[FIX]
Just do a forced conversion to suppress the warning is enough.
I don't think we need to apply -Wnopointer-sign yet.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For SINGLE and DUP RAID profiles we can get the num_stripes values from
btrfs_raid_attr::dev:stripes. For all other RAID profiles the value is
calculated anyways.
As this was the last remaining member of the btrfs_raid_profile_table we
can remove it as well.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The max_stripes value of btrfs_raid_profile_table is unused, so we can
just remove it as well.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Both btrfs_raid_attr and btrfs_raid_profile define the minimal number of
stripes for each raid profile.
The difference is in btrfs_raid_attr the number of stripes is called
devs_min and in btrfs_raid_profile its called min_stripes.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Both btrfs_raid_attr and btrfs_raid_profile define the number of
sub_stripes a raid type has.
Use the one from btrfs_raid_attr and get rid of the field in
btrfs_raid_profile.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a test case to ensure we can create a 350M fs with 128M rootdir.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a bug report that using DUP data profile, with a 400MiB source
dir, on a 7G disk leads to mkfs failure caused by ENOSPC.
[CAUSE]
After some debugging, it turns out that do_chunk_alloc() is always
passing SINGLE profile for new chunks.
The offending code looks like: extent-tree.c:: do_chunk_alloc()
ret = btrfs_alloc_chunk(trans, fs_info, &start, &num_bytes,
space_info->flags);
However since commit bce7dbba2859 ("Btrfs-progs: only build space info's
for the main flags"), we no longer store the profile bits in space_info
anymore.
This makes space_info never get updated properly, and causing us to
creating more and more chunks to eat up most of the disk with unused
SINGLE chunks, and finally leads to ENOSPC.
[FIX]
Fix the bug by passing the proper flags to btrfs_alloc_chunk().
Also, to address the original problem commit 2689259501c1 ("btrfs progs:
fix extra metadata chunk allocation in --mixed case") tries to fix, here
we do extra bit OR to ensure we get the proper flags.
Issue: #258
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Based on user questions, the word 'rewrite' may sound confusing as the
defragmetation also rewrites data but break extent sharing, while
balance does not do that.
Signed-off-by: David Sterba <dsterba@suse.com>
Enable the quiet option to the scrub cancel command.
Does the job quietly. For example:
$ btrfs -q scrub cancel <mnt>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Enable the quiet option to the subvolume snapshot command.
Does the job quietly. For example:
$ btrfs -q subvolume snapshot <src> <dest>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Enable the quiet option to the balance resume command.
Does the job quietly. For example:
$ btrfs -q balance resume <path>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Enable the quiet option to the balance start command.
Does the job quietly. For example:
$ btrfs -q balance start --full-balance <path>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Enable the quiet option to the subvolume delete command.
Does the job quietly. For example:
$ btrfs --quiet subvolume delete <path>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Enable the quiet option to the subvolume create command.
Does the job quietly. For example:
$ btrfs --quiet subvolume create <path>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Enable the quiet option to the quota rescan command.
Does the job quietly. For example:
$ btrfs --quiet quota rescan
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For backward compatibility with tools that may rely on the messages we
need a special level to print the message unless the verbosity settings
haven't been set on command line.
Signed-off-by: David Sterba <dsterba@suse.com>
All striping RAID Levels use the same method to set the number of RAID
stripes, so consolidate it.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since it's addition in 2009 BTRFS RAID5/6 uses a constant stripe length
and we're lacking the meta-data to define a per stripe length, so it's
unlikely to change in the near future for RAID5/6.
So let's not pretend something we don't do and remove the RAID5/6 stripe
length special casing.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Factor out setting of alloc_chuk_ctl fileds in a separate function
init_alloc_chunk_ctl.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that most of the RAID profile dependent chunk allocation parameters
have been converted to table lookus and moved out of the if-statement
maze, all that remains is the actual calculation of the number of stripes.
Compact the 5 if statements into a single switch statemnt to make the code
a bit more compact and more intuitive to follow.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For all RAID profiles in btrfs_alloc_chunk() we're doing a sanity check if
the number of stripes is smaller than the minimal number of stripes
needed for this profile.
Consolidate this per-profile check to a single check after assigning the
number of stripes to further reduce code.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For the two simple "RAID" profiles single and dup, we can simply fallback
to a table lookup to get num_stripes.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
All 3 different RAID1 profiles use the same calculation mehod for the
number of used stripes.
Now that we do table lookups fo rmost of the allocation control
parameters, we can consolidate all 3 RAID1 profiles into a single branch
for the calculation.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that we have a table holding the sub_stripes value we can consolidate
all setting of alloc_chunk_ctl::sub_stripes to a signle table lookup.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that we have a table holding the min_stripes value we can consolidate
all setting of alloc_chunk_ctl::min_stripes to a signle table lookup.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>