[BUG]
When running btrfs/011 with subpage case, even with RAID56 support, it
still fails with the following error:
QA output created by 011
*** test btrfs replace
mkfs failed
(see /home/adam/xfstests-dev/results//btrfs/011.full for details)
The full log shows:
---------workout "-m single -d single -M" 1 no 64-----------
ERROR: illegal nodesize 65536 (not equal to 4096 for mixed block group)
mkfs failed
This is a critical error, making test case to be aborted, without
checking the rest profiles.
[CAUSE]
Mkfs.btrfs always uses the maximum value between sectorsize and page
size for its mixed profile nodesize.
For subpage case, it means we always go PAGE_SIZE, no matter whatever
the sectorsize is passed in.
[FIX]
Just get rid of the direct PAGE_SIZE usage when determining nodesize for
mixed profiles.
And use sectorsize directly (either passed in by the user, or
determined from page size).
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently, we create the initial system block group in the zone 2. That
will create the BG at 64MB when the zone size is 32 MB, which collides with
the regular superblock location. It results in mount failure with:
BTRFS info (device nullb0): zoned mode enabled with zone size 33554432
BTRFS error (device nullb0): zoned: block group 67108864 must not contain super block
BTRFS error (device nullb0): failed to read block groups: -117
BTRFS error (device nullb0): open_ctree failed
Fix that by calculating the proper location of the initial system BG. It
avoids using zones reserved for zoned superblock logging and the zones
where a regular superblock resides.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that we have all of the supporting code, add the ability to create
all of the global roots for an extent tree v2 fs. This will default to
nr_cpu's, but also allow the user to specify how many global roots they
would like.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Our initial block group will use global root id 0 with extent tree v2,
so adjust the helper to take the chunk_objectid as an argument, as we'll
set this to 0 for extent tree v2 and then
BTRFS_FIRST_CHUNK_TREE_OBJECTID for extent tree v1.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We're going to start create global roots from mkfs, and we need to have
a offset set for the root key. Make the btrfs_create_tree() take a key
for the root_key instead of just the objectid so we can setup these new
style roots properly.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In order to make sure the file system is consistent we need to record
the number of global roots we should have in the super block. We could
infer this from the number of global roots we find, however this could
lead to interesting fuzzing problems, so add a source of truth to the
super block in order to make it easier to verify the file system is
consistent.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add the extent tree v2 table with the block group tree as a root, and
then create the empty root and use the proper root for cleanup up the
temporary block groups.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Instead of accessing the extent root directory for modifying block
groups, use the helper which will do the correct thing based on the
flags of the file system.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that all callers are using the _nr variations we can simply rename
these helpers to btrfs_item_##member/btrfs_set_item_##member and change
the actual item SETGET funcs to raw_item_##member/set_raw_item_##member
and then change all callers to drop the _nr part.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have a lot of the following patterns
item = btrfs_item_nr(nr);
btrfs_set_item_*(eb, item, val);
btrfs_set_item_*(eb, btrfs_item_nr(nr), val);
in a lot of places in our code. Instead add _nr variations of these
helpers and convert all of the users to this new helper.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This helper only takes the nodesize, but in the future it'll take a bool
to indicate if we're extent tree v2. The remaining users are all where
we only have extent_buffer, but we should always have a valid
eb->fs_info in these cases, so add BUG_ON()'s for the !eb->fs_info case
and then convert these callers to use BTRFS_LEAF_DATA_SIZE which takes
the fs_info.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We use __BTRFS_LEAF_DATA_SIZE() in a few places for mkfs. With extent
tree v2 we'll be increasing the size of btrfs_header, so it'll be kind
of annoying to add flags to all callers of __BTRFS_LEAF_DATA_SIZE, so
simply calculate it once and put it in the mkfs_config and use that.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pass BTRFS_BLOCK_GROUP_DATA and BTRFS_BLOCK_GROUP_METADATA to
zoned_profile_supported(), so we can actually distinguish if it is a data
or a meta-data block group.
Fixes: 8f914d518a46 ("btrfs-progs: zoned support DUP on metadata block groups")
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently we have two places checking if a block-group profile is
supported on a zoned device, one in mkfs/main.c and one in
kernel-shared/zoned.c.
Use the one from kernel-shared/zoned.c in mkfs as well, unifying all
checks.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
cfg->fs_uuid is either 0 or set to the value of the -U parameter
passed to mkfs.btrfs. However the value of the latter is already being
validated in the main mkfs function. Just remove the duplicated checks
in make_btrfs as they effectively can never be executed.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We want to enable developers to test the extent tree v2 features as they
are added, add the ability to mkfs an extent tree v2 fs if we have
experimental enabled.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We are going to have multiples of these trees with extent tree v2, so
add a rb tree to track them based on their root key value. This works
for both v1 and v2, so we can remove the direct pointers to these roots
in our fs_info.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We're going to have multiple free space roots in the future, so access
it via a helper in most cases. We will address the remaining direct
accesses in future patches.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When we switch to multiple global trees we'll need to access the
appropriate extent root depending on the block group or possibly root.
To handle this, use a helper in most places and then the actual root in
places where it is required. We will whittle down the direct accessors
with future patches, but this does the bulk of the preparatory work.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
With extent tree v2 we will have per-block group checksums, so add a
helper to access the csum root and rename the fs_info csum_root to
_csum_root to catch all the places that are accessing it directly.
Convert everybody to use the helper except for internal things.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We are going to need to start looking up the csum root based on the
bytenr with extent tree v2. To that end stop passing the root to the
csum related functions so that can be done in the helper functions
themselves.
There's an unrelated deletion of a function prototype that no longer
exists.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are a lot of call sites where we use the following code snippet:
u8 super_block_data[BTRFS_SUPER_INFO_SIZE];
struct btrfs_super_block *sb;
u64 ret;
sb = (struct btrfs_super_block *)super_block_data;
The reason for this is, structure btrfs_super_block was smaller than
BTRFS_SUPER_INFO_SIZE.
Thus for anything with csum involved, we have to use a proper 4K buffer.
Since the recent unification of sizeof(struct btrfs_super_block), we no
longer need such workaround, and can use struct btrfs_super_block
directly to do any operation.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In commit a138daac17 ("btrfs-progs: mkfs: set super_cache_generation
to 0 if we're using free space tree") the space cache (v1) generation
was reset to 0 to let kernel know it's not used when the free-space-tree
is enabled.
This got broken again in 5.14 when the free space tree code got
refactored in 4b6cf2a3eb ("btrfs-progs: mkfs: generate free space tree
at make_btrfs() time").
Reset the space cache generation to 0.
Issue: #414
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Since btrfs-progs v5.14, mkfs.btrfs no longer cleans up the temporary
SINGLE metadata chunks if "-R free-space-tree" is specified:
$ mkfs.btrfs -f -R free-space-tree -m dup -d dup /dev/test/test
$ btrfs ins dump-tree -t chunk /dev/test/test | grep "type METADATA"
length 8388608 owner 2 stripe_len 65536 type METADATA
length 268435456 owner 2 stripe_len 65536 type METADATA|DUP
[CAUSE]
Since commit 4b6cf2a3eb ("btrfs-progs: mkfs: generate free space tree
at make_btrfs() time"), free space tree is created when the temporary
btrfs image is created.
This behavior itself has no problem at all. The problem happens when
"-m DUP -d DUP" (or other profiles) is specified.
This makes btrfs to create extra chunks, enlarging free space tree so
that it can be as high as level 1.
During mkfs, we rely on recow_roots() to re-COW all tree blocks to the
newly allocated chunks.
But __recow_root() can only handle tree root at level 0, as it forces
root node to be COWed, not bothering the children leaves/nodes.
This makes part of the free space cache tree still live on the old
temporary chunks, leaving later cleanup_temp_chunks() unable to delete
temporary SINGLE chunks.
[FIX]
Rework __recow_root() to do a proper COW of the whole tree.
But above rework is not enough, as if a free space tree block is
allocated during current transaction, but before new chunks added.
Then the reworked __recow_root() can't COW it, as btrfs_search_slot()
won't COW a tree block allocated in current transaction.
So this patch will also commit current transaction before calling
recow_roots(), to force us to re-cow all tree blocks.
This shouldn't be a problem, as at the time of calling, we should have
less than a dozen tree blocks, thus there won't be a performance impact.
Reported-by: FireFish5000 <firefish5000@gmail.com>
Fixes: 4b6cf2a3eb ("btrfs-progs: mkfs: generate free space tree at make_btrfs() time")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We need to use direct-IO for zoned devices to preserve the write
ordering. Instead of detecting if the device is zoned or not, we simply
use direct-IO for any kind of device (even if emulated zoned mode on a
regular device).
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Wrap pwrite with btrfs_pwrite(). It simply calls pwrite() on non-zoned
btrfs (opened without O_DIRECT). On zoned mode (opened with O_DIRECT),
it allocates an aligned bounce buffer, copies the contents and uses it
for direct-IO writing.
Writes in device_zero_blocks() and btrfs_wipe_existing_sb() are a little
tricky. We don't have fs_info on our hands, so use zinfo to determine it
is a zoned device or not.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Several extent_buffer initializations miss fs_info initialization. This
is OK before the following patch ("btrfs-progs: use direct-io for zoned
device") as eb->fs_info is not always necessary. But, after that patch,
we will use fs_info to determine it is zoned or not and that causes
segfault in such cases.
Properly set fs_info when initializing extent_buffers to fix the issue.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since zone_size() returns an emulated zone size even for non-zoned
device, we cannot use cfg.zone_size to determine the device is zoned or
not. Set zone_size = 0 on non-zoned mode.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Changing several defaults at once is desirable for easier reference,
rather than a number of scattered releases enabling each. The changes
are documented but printing a notice won't hurt as not everybody reads
the documentation or release notes.
Undesired features can be unselected by prepending ^ to the option name,
like:
$ mkfs.btrfs -O ^no-holes
Signed-off-by: David Sterba <dsterba@suse.com>
The original idea of not doing DUP on SSD was that the duplicate blocks
get deduplicated again by the driver firmware. This was in 2013, years
ago. Then it was speculative and even nowadays we don't have much
reliable information from vendors what optimizations are done on the
drive level.
After the year there's enough information gathered by user community and
there's no simple answer. Expensive drives are more reliable but less
common, for cheap consumer drive it's vice versa. The characteristics
are described in more detail in manual page btrfs(5) in section "SOLID
STATE DRIVES (SSD)".
The reasoning is based on numerous reports on IRC and technical
difficulty on mkfs side to do the right decision. The default is chosen
to be the safe option and up to user to change that based on informed
decision.
Issue: #319
Signed-off-by: David Sterba <dsterba@suse.com>
The free space tree is a better way to track the free space and has been
tested in the wild for a long time. The backward compatibility is
sufficient, several long term kernels. On-line conversion from v1 to v2
can be done by mount, switching from v2 to v1 can be done by 'btrfs
check'.
Issue: #295
Signed-off-by: David Sterba <dsterba@suse.com>
Make the helpers using crc32c not inline so the crc32c.h can be removed
from the public headers exported by libbtrfs.
Signed-off-by: David Sterba <dsterba@suse.com>
The default output of mkfs is intentionally verbose so we did not need
the verbosity option. For some additional information it could be useful
to increase the level in case it's wired to the global verbosity
settings.
Signed-off-by: David Sterba <dsterba@suse.com>
Add the GPL v2 header to files where it was missing and is not from an
external source, update to the most recent version with the address.
Signed-off-by: David Sterba <dsterba@suse.com>
There are various parsing helpers scattered everywhere, unify them to
one file and start with helpers already in utils.c.
Signed-off-by: David Sterba <dsterba@suse.com>
I will have a lot of preparatory patches to reduce the review pain of
this large feature. In order to enable that work define the incompat
flag. Once all of the work lands to support the feature there will be a
patch to actually enable us to select it and manipulate file systems
with that incompat flag set.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
With extent-tree-v2 we won't be able to cache block groups based on the
extent tree, so we need to have a valid free space tree before we open
the temporary file system to finish setting the file system up. Set up
the basic free space entries for our temporary system chunk if we have
the free space tree enabled and stop generating the tree after the fact.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently we build a bare-bones file system in make_btrfs(), and then we
load it up and fill in the rest of the file system after the fact. One
thing we omit in make_btrfs() is the block group item for the temporary
system chunk we allocate, because we just add it after we've opened the
file system.
However I want to be able to generate the free space tree at
make_btrfs() time, because extent tree v2 will not have an extent tree
that has every block allocated in the system. In order to do this I
need to make sure that the free space tree entries are added on block
group creation, which is annoying if we have to add this chunk after
I've created a free space tree.
So make future work simpler by simply adding our block group item at
make_btrfs() time, this way I can do the right things with the free
space tree in the generic make block group code without needing a
special case for our temporary system chunk.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
With extent tree v2 we're going to be writing some more empty trees for
the initial mkfs step, so take this common code and make it a helper.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For the root tree we were just hard setting the nritems to 4, which will
change when we move to extent tree v2. Instead set the nritems after
we've added all the root items we need to the root tree.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We were setting the super block's used bytes to a static number.
However the number of blocks we have to write has the correct used size,
so just add up the total number of blocks we're allocating as we
determine their offsets. This value will be used later which is why I'm
calculating it this way instead of doing the math to set the bytes_super
specifically.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We use these block's in order to keep track of which blocks need to be
added to the extent tree and where their roots need to be written.
However we skip MKFS_SUPER_BLOCK for all of these helpers, and we don't
actually need to keep track of the specific block we allocated because
it is always BTRFS_SUPER_INFO_OFFSET. Remove this enum as we don't need
it.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Allow creating trees more flexible, eg. in no fixed order. To handle
this we want to rework the initial mkfs step to take an array of the
blocks we want to create and use the array to keep track of which blocks
we need to create. Use that for current format, make it ready for extent
tree v2.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>