Use sbwrite instead of pwrite to support superblock logging in zoned
mode. In addition, call fsync() to persist the superblock to ensure the
write order. It also helps us to detect an unaligned write (write to a
position other than the write pointer) error.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In zoned mode, chunks must be aligned to zone size to ensure sequential
writing to a block group maps to sequential writing to a device zone.
Thus, we need to tweak the position and the size of the initial system
block group.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This commit disables some features which are incompatible with zoned btrfs.
RAID/DUP is disabled because we cannot handle two zone append writes to
different zones in the kernel. MIXED_BG is disabled because the allocated
metadata region will be write holes for data writes. Space-cache (v1)
require in-place updatings.
It also disables the "--rootdir" option for now. The copying from a
directory needs some tweaks for zoned btrfs (e.g. zone size aware space
calculation), and we do not implement them yet.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Make mkfs.btrfs aware of the "zoned" feature flag and prepare the disks
for mkfs.btrfs. It automatically detects host-managed zoned device and
enables the future.
It also adds "zone_size" to struct btrfs_mkfs_config to track the zone
size.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce the queue_param helper function to get a device request queue
parameter. This helper will be used later to query information of a zoned
device.
Furthermore, rewrite is_ssd() using the helper function.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
[Naohiro] fixed error return value
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The correct checksum type value is set a few lines below, there seems
to be stale crc32 initialization. Also remove the crc32c.h include as
it's not used directly anymore.
Signed-off-by: David Sterba <dsterba@suse.com>
Extending open_ctree with more parameters would be difficult, we'll need
to add more so factor out the parameters to a structure for easier
extension.
Signed-off-by: David Sterba <dsterba@suse.com>
We all know there's some dark and scary corners with RAID5/6, but users
may not know. Add a warning message in mkfs so anybody trying to use
this will know things can go very wrong.
Issue: #265
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ reword message ]
Signed-off-by: David Sterba <dsterba@suse.com>
The libmount dependency has been added in commit 61ecaff036
("btrfs-progs: build: add libmount dependency"), and static build got
broken. There are functions that do basically the same thing and also
share the name, which in turn fails at link time.
ld: /../lib64/libmount.a(libcommon_la-canonicalize.o): in function `canonicalize_dm_name':
util-linux-2.34/lib/canonicalize.c:58: multiple definition of `canonicalize_dm_name';
common/path-utils.static.o:btrfs-progs/common/path-utils.c:286: first defined here
In case the collision can be resolved by renaming, it's done
(canonicalize_path and parse_size). There are 2 symbols from selinux
that are substituted by a weak aliases during the static build.
There's one new warning due to use of getgrnam_r in libmount that
depends on dynamic linking and may not work properly with static build.
We're not using the related functions directly or indirectly, so it
should be safe to ignore the warnings.
ld: ../lib64/libmount.a(la-utils.o): in function `mnt_get_gid':
util-linux-2.34/libmount/src/utils.c:625: warning: Using 'getgrnam_r' in statically linked applications
+requires at runtime the shared libraries from the glibc version used for linking
Issue: #333
Signed-off-by: David Sterba <dsterba@suse.com>
There are several problems for current sectorsize check:
- No check at all for sectorsize
This means you can even specify "-s 62k".
- No way to specify sectorsize smaller than page size
Fix all these problems by:
- Introduce btrfs_check_sectorsize()
To do:
* power of 2 check for sectorsize
* lower and upper boundary check for sectorsize
* warn about sectorsize mismatch with page size
- Remove the max() between page size and sectorsize
This allows us to override the sectorsize for 64K page systems.
- Make nodesize calculation based on sectorsize
No need to use page size any more.
Users who specify sectorsize manually really know what they are doing,
and we have warned them already.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a runtime feature (-R) flag for the free space tree. A filesystem
that is mkfs'd with -R free-space-tree then mounted with no options has
the same contents as one mkfs'd without the option, then mounted with
'-o space_cache=v2'.
The only tricky thing is in exactly how to call the tree creation code.
Using btrfs_create_free_space_tree as is did not quite work, because an
extra reference to the eb (root->commit_root) is leaked, which mkfs
complains about with a warning. I opted to follow how the uuid tree is
created by adding it to the dirty roots list for cleanup by
commit_tree_roots in commit_transaction. As a result,
btrfs_create_free_space_tree no longer exactly matches the version in
the kernel sources.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
The single profile is better suited as default for data on multiple
devices. Switch from RAID0 because:
- it's easier to convert to other profiles, as single consumes some
chunks per device, but RAID0 has chunks on all devices regardless of
the used space
- RAID0 has no redundancy and compared one disk failure affects many
files due to striping, while with single the chances are a bit higher
that complete files are stored on one device
- when the device sizes are not equal and not even close to equal, the
maximum achievable size with RAID0 is size of the smallest device due
to striping, with single it's the sum of all device sizes
The changed defaults could affect scripts and deployments that rely on
the old values, but given the number of possible profiles for multiple
devices let's hope that they're specified explicitly in majority of
cases.
Issue: #270
Signed-off-by: David Sterba <dsterba@suse.com>
Extract the defaults for data and metadata profiles to a header and
use the symbolic names instead of hardcoding the profiles.
Signed-off-by: David Sterba <dsterba@suse.com>
The option -A was used long time ago for debugging and marked as
obsolete since 4.14.1. Remove the option and set the alloc start to the
default value 1MiB.
Signed-off-by: David Sterba <dsterba@suse.com>
Add support for enabling quotas at mkfs time. The qgroup accounting will
be consistent, ie. works with --rootdir.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Just like -O|--features, introduce -R|--runtime-features to enable
features that are now enabled on a mounted filesystem
Currently only mkfs is supported, convert is not supported yet.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Make the features structures more generic to allow mkfs-time and
mount-time sets to be defined.
This provides base for later mkfs support of mount-time features like
quotas.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new function, setup_quota_root(), which will create quota
root, and do an offline rescan to ensure all quota accounting numbers
are correct.
Signed-off-by: Qu Wenruo <wqu@suse.com>
[ minor improvement in the fail path ]
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new function, insert_qgroup_items(), to insert qgroup info
item and qgroup limit item for later mkfs qgroup support.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
To sync with the refactored kernel code. Also since we're here, sync
the function parameters with kernel too.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This would sync the code between kernel and btrfs-progs, and save at
least 1 byte for each btrfs_block_group_cache.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add definition, crypto wrappers and support to mkfs for blake2 for
checksumming. There are 2 aliases either blake2 or blake2b.
Signed-off-by: David Sterba <dsterba@suse.com>
Add the definition to the checksum types and let mkfs accept it.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
With the introduction of xxhash64 to btrfs-progs we created a crypto/
directory for all the hashes used in btrfs (although no
cryptographically secure hash is there yet).
Move the crc32c implementation from kernel-lib/ to crypto/ as well so we
have all hashes consolidated.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
As mkfs will grow new checksums, print the used checksum in it's
versbose output.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Adding this table will make extending btrfs-progs with new checksum types
easier.
Also add accessor functions to access the table fields.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Add an option to mkfs to specify which checksum algorithm will be used
for the filesystem. Currently only crc32c is supported.
The option name is -c, presumably one of the comonly used options so it
gets the lowercase option.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Add the checksum type to csum_tree_block_size(), __csum_tree_block_size()
and verify_tree_block_csum_silent().
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
The callers of csum_tree_block_size() blindly assume we're only having
crc32c as a possible checksum and thus pass in
btrfs_csum_sizes[BTRFS_CSUM_TYPE_CRC32] for the size argument of
csum_tree_block_size().
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Add checksum type to the definition structure for a new filesystem, this
will be used in following patches.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Use the return value of listxattr instead of tokenizing.
The end of the extended attribute list is indicated by the return value,
not an empty list item (two consecutive NULs). Using strtok in this way
thus sometimes caused add_xattr_item to reuse stack data in xattr_list
from the previous invocation, thus querying attributes that are not
actually in the file's xattr list.
Issue: #194
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Vladimir Panteleev <git@vladimir.panteleev.md>
Signed-off-by: David Sterba <dsterba@suse.com>
When btrfs_add_to_fsid fails in mkfs we try to close the ctree. That
complains that we already have a transaction open. We should be taking
the error path and exit cleanly without writing.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When creating a filesystem with mixed block groups, we are creating two
space info objects to track used/reserved/pinned space, one only for data
and another one only for metadata.
This is making fstests test case generic/416 fail, with btrfs' check
reporting over an hundred errors about bad extents:
(...)
bad extent [17186816, 17190912), type mismatch with chunk
bad extent [17195008, 17199104), type mismatch with chunk
bad extent [17203200, 17207296), type mismatch with chunk
(...)
Because, surprisingly, this results in block groups that do not have the
BTRFS_BLOCK_GROUP_DATA flag set but have data extents allocated in them.
This is a regression introduced in btrfs-progs v5.2.
So fix this by making sure we only create one space info object, for both
metadata and data, when mixed block groups are enabled.
Fixes: c31edf610c ("btrfs-progs: Fix false ENOSPC alert by tracking used space correctly")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Build several standalone tools into one binary and switch the function
by name (symlink or hardlink).
* btrfs
* mkfs.btrfs
* btrfs-image
* btrfs-convert
* btrfstune
The static target is also supported. The name of resulting boxed
binaries is btrfs.box and btrfs.box.static . All the binaries can be
built at the same time without prior configuration.
text data bss dec hex filename
822454 27000 19724 869178 d433a btrfs
927314 28816 20812 976942 ee82e btrfs.box
2067745 58004 44736 2170485 211e75 btrfs.static
2627198 61724 83800 2772722 2a4ef2 btrfs.box.static
File sizes:
857496 btrfs
968536 btrfs.box
2141400 btrfs.static
2704472 btrfs.box.static
Standalone utilities:
512504 btrfs-convert
495960 btrfs-image
471224 btrfstune
491864 mkfs.btrfs
1747720 btrfs-convert.static
1411416 btrfs-image.static
1304256 btrfstune.static
1361696 mkfs.btrfs.static
So the shared 900K binary saves ~2M, or ~5.7M for static build.
Signed-off-by: David Sterba <dsterba@suse.cz>