When we switch to multiple global trees we'll need to access the
appropriate extent root depending on the block group or possibly root.
To handle this, use a helper in most places and then the actual root in
places where it is required. We will whittle down the direct accessors
with future patches, but this does the bulk of the preparatory work.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
With extent tree v2 we will have per-block group checksums, so add a
helper to access the csum root and rename the fs_info csum_root to
_csum_root to catch all the places that are accessing it directly.
Convert everybody to use the helper except for internal things.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Since btrfs-progs v5.14, mkfs.btrfs no longer cleans up the temporary
SINGLE metadata chunks if "-R free-space-tree" is specified:
$ mkfs.btrfs -f -R free-space-tree -m dup -d dup /dev/test/test
$ btrfs ins dump-tree -t chunk /dev/test/test | grep "type METADATA"
length 8388608 owner 2 stripe_len 65536 type METADATA
length 268435456 owner 2 stripe_len 65536 type METADATA|DUP
[CAUSE]
Since commit 4b6cf2a3eb ("btrfs-progs: mkfs: generate free space tree
at make_btrfs() time"), free space tree is created when the temporary
btrfs image is created.
This behavior itself has no problem at all. The problem happens when
"-m DUP -d DUP" (or other profiles) is specified.
This makes btrfs to create extra chunks, enlarging free space tree so
that it can be as high as level 1.
During mkfs, we rely on recow_roots() to re-COW all tree blocks to the
newly allocated chunks.
But __recow_root() can only handle tree root at level 0, as it forces
root node to be COWed, not bothering the children leaves/nodes.
This makes part of the free space cache tree still live on the old
temporary chunks, leaving later cleanup_temp_chunks() unable to delete
temporary SINGLE chunks.
[FIX]
Rework __recow_root() to do a proper COW of the whole tree.
But above rework is not enough, as if a free space tree block is
allocated during current transaction, but before new chunks added.
Then the reworked __recow_root() can't COW it, as btrfs_search_slot()
won't COW a tree block allocated in current transaction.
So this patch will also commit current transaction before calling
recow_roots(), to force us to re-cow all tree blocks.
This shouldn't be a problem, as at the time of calling, we should have
less than a dozen tree blocks, thus there won't be a performance impact.
Reported-by: FireFish5000 <firefish5000@gmail.com>
Fixes: 4b6cf2a3eb ("btrfs-progs: mkfs: generate free space tree at make_btrfs() time")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We need to use direct-IO for zoned devices to preserve the write
ordering. Instead of detecting if the device is zoned or not, we simply
use direct-IO for any kind of device (even if emulated zoned mode on a
regular device).
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since zone_size() returns an emulated zone size even for non-zoned
device, we cannot use cfg.zone_size to determine the device is zoned or
not. Set zone_size = 0 on non-zoned mode.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Changing several defaults at once is desirable for easier reference,
rather than a number of scattered releases enabling each. The changes
are documented but printing a notice won't hurt as not everybody reads
the documentation or release notes.
Undesired features can be unselected by prepending ^ to the option name,
like:
$ mkfs.btrfs -O ^no-holes
Signed-off-by: David Sterba <dsterba@suse.com>
The original idea of not doing DUP on SSD was that the duplicate blocks
get deduplicated again by the driver firmware. This was in 2013, years
ago. Then it was speculative and even nowadays we don't have much
reliable information from vendors what optimizations are done on the
drive level.
After the year there's enough information gathered by user community and
there's no simple answer. Expensive drives are more reliable but less
common, for cheap consumer drive it's vice versa. The characteristics
are described in more detail in manual page btrfs(5) in section "SOLID
STATE DRIVES (SSD)".
The reasoning is based on numerous reports on IRC and technical
difficulty on mkfs side to do the right decision. The default is chosen
to be the safe option and up to user to change that based on informed
decision.
Issue: #319
Signed-off-by: David Sterba <dsterba@suse.com>
The free space tree is a better way to track the free space and has been
tested in the wild for a long time. The backward compatibility is
sufficient, several long term kernels. On-line conversion from v1 to v2
can be done by mount, switching from v2 to v1 can be done by 'btrfs
check'.
Issue: #295
Signed-off-by: David Sterba <dsterba@suse.com>
Make the helpers using crc32c not inline so the crc32c.h can be removed
from the public headers exported by libbtrfs.
Signed-off-by: David Sterba <dsterba@suse.com>
The default output of mkfs is intentionally verbose so we did not need
the verbosity option. For some additional information it could be useful
to increase the level in case it's wired to the global verbosity
settings.
Signed-off-by: David Sterba <dsterba@suse.com>
There are various parsing helpers scattered everywhere, unify them to
one file and start with helpers already in utils.c.
Signed-off-by: David Sterba <dsterba@suse.com>
I will have a lot of preparatory patches to reduce the review pain of
this large feature. In order to enable that work define the incompat
flag. Once all of the work lands to support the feature there will be a
patch to actually enable us to select it and manipulate file systems
with that incompat flag set.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
With extent-tree-v2 we won't be able to cache block groups based on the
extent tree, so we need to have a valid free space tree before we open
the temporary file system to finish setting the file system up. Set up
the basic free space entries for our temporary system chunk if we have
the free space tree enabled and stop generating the tree after the fact.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently we build a bare-bones file system in make_btrfs(), and then we
load it up and fill in the rest of the file system after the fact. One
thing we omit in make_btrfs() is the block group item for the temporary
system chunk we allocate, because we just add it after we've opened the
file system.
However I want to be able to generate the free space tree at
make_btrfs() time, because extent tree v2 will not have an extent tree
that has every block allocated in the system. In order to do this I
need to make sure that the free space tree entries are added on block
group creation, which is annoying if we have to add this chunk after
I've created a free space tree.
So make future work simpler by simply adding our block group item at
make_btrfs() time, this way I can do the right things with the free
space tree in the generic make block group code without needing a
special case for our temporary system chunk.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The defaults for rotational devices are to enable DUP for metadata, this
does not yet work on zoned devices and fails with messages like:
Zoned: /dev/sda: host-managed device detected, setting zoned feature
ERROR: cannot use RAID/DUP profile in zoned mode
The RAID/DUP support will be implemented in the future and we don't want
to change the defaults to revert them back again. This makes it a bit
awkward for the user until this happens, so at least print a hint what
to do that single/single must be set manually.
Link: https://lore.kernel.org/linux-btrfs/20210706091922.38650-1-johannes.thumshirn@wdc.com/
Reported-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In order to create a usable zoned filesystem a minimum of 5 zones is
needed:
- 2 zones for the 1st superblock
- 1 zone for the system block group
- 1 zone for a metadata block group
- 1 zone for a data block group
Some tests in fstests create a sized filesystem and depending on the zone
size of the underlying device, it may happen, that this filesystem is too
small to be used. It's better to not create a filesystem at all than to
create an unusable filesystem.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The zone size belongs to the zoned section so indent it accordingly:
Label: (null)
UUID: 0d27fc11-8068-4f28-a1c5-5d97cbf2890a
Node size: 16384
Sector size: 4096
Filesystem size: 2.00GiB
Block group profiles:
Data: single 256.00MiB
Metadata: single 256.00MiB
System: single 256.00MiB
SSD detected: yes
Zoned device: yes
Zone size: 256.00MiB
Incompat features: extref, skinny-metadata, zoned
Runtime features:
Checksum: crc32c
Number of devices: 1
Devices:
ID SIZE PATH
1 2.00GiB /dev/nullb0
Signed-off-by: David Sterba <dsterba@suse.com>
There is a support to build on android but it's incomplete and there's
little interest to fix it.
To reinstate we'll need:
* fix remaining issues from
lore.kernel.org/linux-btrfs/20170802185111.187922-1-filipbystricky@google.com
* find CI host with Android support to verify build, either local eg. in
docker or in a hosted environment
* switch the make-based build to 'soong' (source.android.com/setup/build)
Issue: #357
Signed-off-by: David Sterba <dsterba@suse.com>
In zoned mode, chunks must be aligned to zone size to ensure sequential
writing to a block group maps to sequential writing to a device zone.
Thus, we need to tweak the position and the size of the initial system
block group.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This commit disables some features which are incompatible with zoned btrfs.
RAID/DUP is disabled because we cannot handle two zone append writes to
different zones in the kernel. MIXED_BG is disabled because the allocated
metadata region will be write holes for data writes. Space-cache (v1)
require in-place updatings.
It also disables the "--rootdir" option for now. The copying from a
directory needs some tweaks for zoned btrfs (e.g. zone size aware space
calculation), and we do not implement them yet.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Make mkfs.btrfs aware of the "zoned" feature flag and prepare the disks
for mkfs.btrfs. It automatically detects host-managed zoned device and
enables the future.
It also adds "zone_size" to struct btrfs_mkfs_config to track the zone
size.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce the queue_param helper function to get a device request queue
parameter. This helper will be used later to query information of a zoned
device.
Furthermore, rewrite is_ssd() using the helper function.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
[Naohiro] fixed error return value
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The correct checksum type value is set a few lines below, there seems
to be stale crc32 initialization. Also remove the crc32c.h include as
it's not used directly anymore.
Signed-off-by: David Sterba <dsterba@suse.com>
Extending open_ctree with more parameters would be difficult, we'll need
to add more so factor out the parameters to a structure for easier
extension.
Signed-off-by: David Sterba <dsterba@suse.com>
We all know there's some dark and scary corners with RAID5/6, but users
may not know. Add a warning message in mkfs so anybody trying to use
this will know things can go very wrong.
Issue: #265
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ reword message ]
Signed-off-by: David Sterba <dsterba@suse.com>
The libmount dependency has been added in commit 61ecaff036
("btrfs-progs: build: add libmount dependency"), and static build got
broken. There are functions that do basically the same thing and also
share the name, which in turn fails at link time.
ld: /../lib64/libmount.a(libcommon_la-canonicalize.o): in function `canonicalize_dm_name':
util-linux-2.34/lib/canonicalize.c:58: multiple definition of `canonicalize_dm_name';
common/path-utils.static.o:btrfs-progs/common/path-utils.c:286: first defined here
In case the collision can be resolved by renaming, it's done
(canonicalize_path and parse_size). There are 2 symbols from selinux
that are substituted by a weak aliases during the static build.
There's one new warning due to use of getgrnam_r in libmount that
depends on dynamic linking and may not work properly with static build.
We're not using the related functions directly or indirectly, so it
should be safe to ignore the warnings.
ld: ../lib64/libmount.a(la-utils.o): in function `mnt_get_gid':
util-linux-2.34/libmount/src/utils.c:625: warning: Using 'getgrnam_r' in statically linked applications
+requires at runtime the shared libraries from the glibc version used for linking
Issue: #333
Signed-off-by: David Sterba <dsterba@suse.com>
There are several problems for current sectorsize check:
- No check at all for sectorsize
This means you can even specify "-s 62k".
- No way to specify sectorsize smaller than page size
Fix all these problems by:
- Introduce btrfs_check_sectorsize()
To do:
* power of 2 check for sectorsize
* lower and upper boundary check for sectorsize
* warn about sectorsize mismatch with page size
- Remove the max() between page size and sectorsize
This allows us to override the sectorsize for 64K page systems.
- Make nodesize calculation based on sectorsize
No need to use page size any more.
Users who specify sectorsize manually really know what they are doing,
and we have warned them already.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a runtime feature (-R) flag for the free space tree. A filesystem
that is mkfs'd with -R free-space-tree then mounted with no options has
the same contents as one mkfs'd without the option, then mounted with
'-o space_cache=v2'.
The only tricky thing is in exactly how to call the tree creation code.
Using btrfs_create_free_space_tree as is did not quite work, because an
extra reference to the eb (root->commit_root) is leaked, which mkfs
complains about with a warning. I opted to follow how the uuid tree is
created by adding it to the dirty roots list for cleanup by
commit_tree_roots in commit_transaction. As a result,
btrfs_create_free_space_tree no longer exactly matches the version in
the kernel sources.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Extract the defaults for data and metadata profiles to a header and
use the symbolic names instead of hardcoding the profiles.
Signed-off-by: David Sterba <dsterba@suse.com>
The option -A was used long time ago for debugging and marked as
obsolete since 4.14.1. Remove the option and set the alloc start to the
default value 1MiB.
Signed-off-by: David Sterba <dsterba@suse.com>
Add support for enabling quotas at mkfs time. The qgroup accounting will
be consistent, ie. works with --rootdir.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Just like -O|--features, introduce -R|--runtime-features to enable
features that are now enabled on a mounted filesystem
Currently only mkfs is supported, convert is not supported yet.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Make the features structures more generic to allow mkfs-time and
mount-time sets to be defined.
This provides base for later mkfs support of mount-time features like
quotas.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new function, setup_quota_root(), which will create quota
root, and do an offline rescan to ensure all quota accounting numbers
are correct.
Signed-off-by: Qu Wenruo <wqu@suse.com>
[ minor improvement in the fail path ]
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new function, insert_qgroup_items(), to insert qgroup info
item and qgroup limit item for later mkfs qgroup support.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
To sync with the refactored kernel code. Also since we're here, sync
the function parameters with kernel too.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This would sync the code between kernel and btrfs-progs, and save at
least 1 byte for each btrfs_block_group_cache.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add definition, crypto wrappers and support to mkfs for blake2 for
checksumming. There are 2 aliases either blake2 or blake2b.
Signed-off-by: David Sterba <dsterba@suse.com>
Add the definition to the checksum types and let mkfs accept it.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
With the introduction of xxhash64 to btrfs-progs we created a crypto/
directory for all the hashes used in btrfs (although no
cryptographically secure hash is there yet).
Move the crc32c implementation from kernel-lib/ to crypto/ as well so we
have all hashes consolidated.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
As mkfs will grow new checksums, print the used checksum in it's
versbose output.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Add an option to mkfs to specify which checksum algorithm will be used
for the filesystem. Currently only crc32c is supported.
The option name is -c, presumably one of the comonly used options so it
gets the lowercase option.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Add checksum type to the definition structure for a new filesystem, this
will be used in following patches.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
When btrfs_add_to_fsid fails in mkfs we try to close the ctree. That
complains that we already have a transaction open. We should be taking
the error path and exit cleanly without writing.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When creating a filesystem with mixed block groups, we are creating two
space info objects to track used/reserved/pinned space, one only for data
and another one only for metadata.
This is making fstests test case generic/416 fail, with btrfs' check
reporting over an hundred errors about bad extents:
(...)
bad extent [17186816, 17190912), type mismatch with chunk
bad extent [17195008, 17199104), type mismatch with chunk
bad extent [17203200, 17207296), type mismatch with chunk
(...)
Because, surprisingly, this results in block groups that do not have the
BTRFS_BLOCK_GROUP_DATA flag set but have data extents allocated in them.
This is a regression introduced in btrfs-progs v5.2.
So fix this by making sure we only create one space info object, for both
metadata and data, when mixed block groups are enabled.
Fixes: c31edf610c ("btrfs-progs: Fix false ENOSPC alert by tracking used space correctly")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Build several standalone tools into one binary and switch the function
by name (symlink or hardlink).
* btrfs
* mkfs.btrfs
* btrfs-image
* btrfs-convert
* btrfstune
The static target is also supported. The name of resulting boxed
binaries is btrfs.box and btrfs.box.static . All the binaries can be
built at the same time without prior configuration.
text data bss dec hex filename
822454 27000 19724 869178 d433a btrfs
927314 28816 20812 976942 ee82e btrfs.box
2067745 58004 44736 2170485 211e75 btrfs.static
2627198 61724 83800 2772722 2a4ef2 btrfs.box.static
File sizes:
857496 btrfs
968536 btrfs.box
2141400 btrfs.static
2704472 btrfs.box.static
Standalone utilities:
512504 btrfs-convert
495960 btrfs-image
471224 btrfstune
491864 mkfs.btrfs
1747720 btrfs-convert.static
1411416 btrfs-image.static
1304256 btrfstune.static
1361696 mkfs.btrfs.static
So the shared 900K binary saves ~2M, or ~5.7M for static build.
Signed-off-by: David Sterba <dsterba@suse.cz>
[BUG]
There is a bug report of unexpected ENOSPC from btrfs-convert, issue #123.
After some debugging, even when we have enough unallocated space, we
still hit ENOSPC at btrfs_reserve_extent().
[CAUSE]
Btrfs-progs relies on chunk preallocator to make enough space for
data/metadata.
However after the introduction of delayed-ref, it's no longer reliable
to rely on btrfs_space_info::bytes_used and
btrfs_space_info::bytes_pinned to calculate used metadata space.
For a running transaction with a lot of allocated tree blocks,
btrfs_space_info::bytes_used stays its original value, and will only be
updated when running delayed ref.
This makes btrfs-progs chunk preallocator completely useless. And for
btrfs-convert/mkfs.btrfs --rootdir, if we're going to have enough
metadata to fill a metadata block group in one transaction, we will hit
ENOSPC no matter whether we have enough unallocated space.
[FIX]
This patch will introduce btrfs_space_info::bytes_reserved to track how
many space we have reserved but not yet committed to extent tree.
To support this change, this commit also introduces the following
modification:
- More comment on btrfs_space_info::bytes_*
To make code a little easier to read
- Export update_space_info() to preallocate empty data/metadata space
info for mkfs.
For mkfs, we only have a temporary fs image with SYSTEM chunk only.
Export update_space_info() so that we can preallocate empty
data/metadata space info before we start a transaction.
- Proper btrfs_space_info::bytes_reserved update
The timing is the as kernel (except we don't need to update
bytes_reserved for data extents)
* Increase bytes_reserved when call alloc_reserved_tree_block()
* Decrease bytes_reserved when running delayed refs
With the help of head->must_insert_reserved to determine whether we
need to decrease.
Issue: #123
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Although moderm hardware is fast enough and crc32c calculation is not a
hotspot, doing such optimization won't hurt anyway.
Issue: #175
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For data reloc tree creation, we copy its contents from the fs tree just
for its INODE_ITEM, INODE_REF and dirid. This hides the detail and is
not obvious for why we're copying from fs root.
This patch will create data reloc tree from scratch:
- Create root, including root item and new tree root
- Change dirid to BTRFS_FIRST_FREE_OBJECTID
- Insert root INODE_ITEM and INODE_REF
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Similar to the changes where strerror(errno) was converted, continue
with the remaining cases where the argument was stored in another
variable.
The savings in object size are about 4500 bytes:
$ size btrfs.old btrfs.new
text data bss dec hex filename
805055 24248 19748 849051 cf49b btrfs.old
804527 24248 19748 848523 cf28b btrfs.new
Signed-off-by: David Sterba <dsterba@suse.com>
The old flag OPEN_CTREE_FS_PARTIAL is in fact quite easy to be confused
with OPEN_CTREE_PARTIAL, which allow btrfs-progs to open damaged
filesystem (like corrupted extent/csum tree).
However OPEN_CTREE_FS_PARTIAL, unlike its name, is just allowing
btrfs-progs to open fs with temporary superblocks (which only has 6
basic trees on SINGLE meta/sys chunks).
The usage of FS_PARTIAL is really confusing here.
So rename OPEN_CTREE_FS_PARTIAL to OPEN_CTREE_TEMPORARY_SUPER, and add
extra comment for its behavior.
Also rename BTRFS_MAGIC_PARTIAL to BTRFS_MAGIC_TEMPORARY to keep the
naming consistent.
And with above comment, the usage of FS_PARTIAL in dump-tree is
obviously incorrect, fix it.
Fixes: 8698a2b9ba ("btrfs-progs: Allow inspect dump-tree to show specified tree block even some tree roots are corrupted")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
With mkfs.btrfs on a thin provisioned device with very small backing
size and big virtual size, all code works well in mkfs.btrfs until
close_ctree() is called.
close_ctree() fails to sync device due to small backing size while
closing devices. However, mkfs returns 0 in such situation which causes
failure of fstests generic/405.
So, let mkfs returns nonzero value if previous steps succeeded but
close_ctree() failed. Then fstests generic/405 passes now.
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We can easily create the uuid tree that's usually created after first
mount. The kernel will still check the tree on first mount so we don't
try to fake the uuid tree generation so it appears consistent, even if
it's empty.
Signed-off-by: David Sterba <dsterba@suse.com>
Currently, the top-level subvolume lacks the UUID. As a result, both
non-snapshot subvolume and snapshot of top-level subvolume do not have
Parent UUID and cannot be distinguisued. Therefore "fi show" of
top-level lists all the subvolumes which lacks the UUID in
"Snapshot(s)" filed. Also, it lacks the otime information.
Fix this by adding the UUID and otime at the mkfs time. As a
consequence, snapshots of top-level subvolume now have a Parent UUID and
UUID tree will create an entry for top-level subvolume at mount time.
This should not cause the problem for current kernel, but user program
which relies on the empty Parent UUID may be affected by this change.
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
@chunk_objectid of btrfs_make_block_group() function is always fixed to
BTRFS_FIRST_FREE_OBJECTID, so there is no need to pass it as parameter
explicitly.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As btrfs is specific to Linux, %m can be used instead of strerror(errno)
in format strings. This has some size reduction benefits for embedded
systems.
glibc, musl, and uclibc-ng all support %m as a modifier to printf.
A quick glance at the BIONIC libc source indicates that it has
support for %m as well. BSDs and Windows do not but I do believe
them to be beyond the scope of btrfs-progs.
Compiled sizes on Ubuntu 16.04:
Before:
3916512 btrfs
233688 libbtrfs.so.0.1
4899 bcp
2367672 btrfs-convert
2208488 btrfs-corrupt-block
13302 btrfs-debugfs
2152160 btrfs-debug-tree
2136024 btrfs-find-root
2287592 btrfs-image
2144600 btrfs-map-logical
2130760 btrfs-select-super
2152608 btrfstune
2131760 btrfs-zero-log
2277752 mkfs.btrfs
9166 show-blocks
After:
3908744 btrfs
233256 libbtrfs.so.0.1
4899 bcp
2366560 btrfs-convert
2207432 btrfs-corrupt-block
13302 btrfs-debugfs
2151104 btrfs-debug-tree
2134968 btrfs-find-root
2281864 btrfs-image
2143536 btrfs-map-logical
2129704 btrfs-select-super
2151552 btrfstune
2130696 btrfs-zero-log
2276272 mkfs.btrfs
9166 show-blocks
Total savings: 23928 (24 kilo)bytes
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When creating btrfs, mkfs.btrfs will firstly create a temporary system
chunk as basis, and then created needed trees or new devices.
However the layout temporary system chunk is hard-coded and uses
reserved [0, 1M) range of devid 1.
Change the temporary chunk layout from old:
0 1M 4M 5M
|<----------- temp chunk -------------->|
And it's 1:1 mapped, which means it's a SINGLE chunk,
and stripe offset is also 0.
to new layout:
0 1M 4M 5M
|<----------- temp chunk -------------->|
And still keeps the 1:1 mapping.
However this also affects btrfs_min_dev_size() which still assume
temporary chunks starts at device offset 0.
The problem can only be exposed by "-m single" or "-M" where we reuse the
temporary chunk.
With other meta profiles, system and meta chunks are allocated by later
btrfs_alloc_chunk() call, and old SINGLE chunks are removed, so it will
be no such problem for other meta profiles.
Reported-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
[ folded fix for the minimal device size calculation ]
Signed-off-by: David Sterba <dsterba@suse.com>
For --rootdir, even for large existing file or block device, it will
always shrink the resulting filesystem.
The problem is, mkfs.btrfs will try to calculate the dir size, and use
it as @block_count to mkfs, which makes the filesystem shrunk.
Fix it by trying to get the original block device or file size as
@block_count, so mkfs.btrfs can use the full file/block device for
--rootdir option.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit 460e93f257 ("btrfs-progs: mkfs: check the status of file at mkfs")
will try to check the file state before creating fs on it.
The check is mostly fine for normal mkfs case, while for --rootdir
option, it's allowed to create a new file if the destination file
doesn't exist.
Fix it by allowing non-existent file if --rootdir is specified.
Fixes: 460e93f257 ("btrfs-progs: mkfs: check the status of file at mkfs")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Make --shrink a separate option for --rootdir, and change the default to
off.
The shrinking behaviour is not a commonly used feature but can be useful
for creating minimal pre-filled images, in one step, without requiring
to mount.
Signed-off-by: Qu Wenruo <wqu@suse.com>
[ update changelog and error messages ]
Signed-off-by: David Sterba <dsterba@suse.com>
Use the new dev extent based shrink method for rootdir option. This
restores the original behaviour when --rootdir will create a minimal
filesystem size.
Signed-off-by: Qu Wenruo <wqu@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Use an easier method to calculate the estimate device size for
mkfs.btrfs --rootdir.
The new method will over-estimate, but should ensure we won't encounter
ENOSPC.
It relies on the following data:
1) number of inodes -- for metadata chunk size
2) rounded up data size of each regular inode -- for data chunk size
Total meta chunk size = round_up(nr_inode * (PATH_MAX * 3 + sectorsize),
min_chunk_size) * profile_multiplier
PATH_MAX is the maximum size possible for INODE_REF/DIR_INDEX/DIR_ITEM.
Sectorsize is the maximum size possible for inline extent.
min_chunk_size is 8M for SINGLE, and 32M for DUP, get from
btrfs_alloc_chunk().
profile_multiplier is 1 for Single, 2 for DUP.
Total data chunk size is much easier.
Total data chunk size = round_up(total_data_usage, min_chunk_size) *
profile_multiplier
Total_data_usage is the sum of *rounded up* size of each regular inode
use.
min_chunk_size is 8M for SINGLE, 64M for DUP, get from btrfS_alloc_chunk().
Same profile_multiplier for meta.
This over-estimate calculate is, of course inacurrate, but since we will
later shrink the fs to its real usage, it doesn't matter much now.
Signed-off-by: Qu Wenruo <wqu@suse.com>
[ update comments ]
Signed-off-by: David Sterba <dsterba@suse.com>
Remove the custom chunk allocator for mkfs. It is buggy in connection to
the --rootdir option and puts file data to the reerved 1M area. The
feature of the custom allocator was to reserve only minimal amount of
blockgroup space. This will temporarily stop working and will need an
explicit request by option, added by following patches.
Use the generic chunk allocator.
Signed-off-by: Qu Wenruo <wqu@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Cleanup of temporary chunks should be done as soon as possible, and it
should be especially before doing large tree operations, like filling
the filesystem when using --rootdir.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since new --rootdir can allocate chunk, it will modify the chunk
allocation result.
This patch will update allocation info before verbose output to reflect
such info.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Also rename the function from size_sourcedir() to mkfs_size_dir().
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In fact, --rootdir option is getting more and more independent from
normal mkfs code.
So move image creation function, make_image() and its related code to
mkfs/rootdir.[ch], and rename the function to btrfs_mkfs_fill_dir().
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It's a waste of IO to fill the whole image before creating btrfs on it,
just wiping the first 1M, and then write 1 byte to the last position to
create a sparse file.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>