Print the total zone_unusable size in the summary for 'fi usage' for a
filesystem in zoned mode. It's a sum of all the zone_unusable values
from 'fi df'. Per-device stats are not implemented and would need more
complicated calculations from raw data, kernel does not export that (but
it could).
As of 5.12, the zone_unusable is stored only in memory so we'd have to
map raw block device zones to the block groups and the live extents in
the associated block groups to get the exact numbers.
Example:
# btrfs fi usage /mnt
Overall:
Device size: 2.00GiB
Device allocated: 768.00MiB
Device unallocated: 1.25GiB
Device missing: 0.00B
Device zone unusable: 320.00KiB
Used: 128.00KiB
Free (estimated): 1.50GiB (min: 1.50GiB)
Free (statfs, df): 1.50GiB
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 3.25MiB (used: 32.00KiB)
Multiple profiles: no
Data,single: Size:256.00MiB, Used:0.00B (0.00%)
/dev/nullb0 256.00MiB
Metadata,single: Size:256.00MiB, Used:112.00KiB (0.04%)
/dev/nullb0 256.00MiB
System,single: Size:256.00MiB, Used:16.00KiB (0.01%)
/dev/nullb0 256.00MiB
Unallocated:
/dev/nullb0 1.25GiB
# btrfs fi df
Data, single: total=256.00MiB, used=0.00B, zone_unusable=0.00B
System, single: total=256.00MiB, used=16.00KiB, zone_unusable=160.00KiB
Metadata, single: total=256.00MiB, used=112.00KiB, zone_unusable=160.00KiB
GlobalReserve, single: total=3.25MiB, used=32.00KiB
Signed-off-by: David Sterba <dsterba@suse.com>
Getting the per bg type zone unusable space will be used in other size
reports like 'fi us', so export it to the device utils.
Signed-off-by: David Sterba <dsterba@suse.com>
Extend build coverage. The versions are different on all images and can
be run as:
$ ./docker-run --env CC=clang
Signed-off-by: David Sterba <dsterba@suse.com>
The runner script allows to pass arguments to docker and the final
command, using the -- separator. This did not work as expected, the
arguments got concatenated to the first member, not all of them passed.
The following now works:
$ ./docker-run --env CC=clang
$ ./docker-run --env CC=clang --
$ ./docker-run --env CC=clang -- /bin/bash
Signed-off-by: David Sterba <dsterba@suse.com>
The support to recognize a zoned btrfs in util-linux/blkid may take time
to get updated everywhere. Add a fallback check for the signature to
avoid accidental overwrites.
The following will not succeed on a zoned device:
$ mkfs.btrfs /dev/zoned1
$ mkfs.btrfs /dev/zoned1
WARNING: /dev/zoned1 contains zoned btrfs signature but was not detected by blkid, please update
ERROR: use the -f option to force overwrite of /dev/zoned1
Signed-off-by: David Sterba <dsterba@suse.com>
The zone size belongs to the zoned section so indent it accordingly:
Label: (null)
UUID: 0d27fc11-8068-4f28-a1c5-5d97cbf2890a
Node size: 16384
Sector size: 4096
Filesystem size: 2.00GiB
Block group profiles:
Data: single 256.00MiB
Metadata: single 256.00MiB
System: single 256.00MiB
SSD detected: yes
Zoned device: yes
Zone size: 256.00MiB
Incompat features: extref, skinny-metadata, zoned
Runtime features:
Checksum: crc32c
Number of devices: 1
Devices:
ID SIZE PATH
1 2.00GiB /dev/nullb0
Signed-off-by: David Sterba <dsterba@suse.com>
In the zoned mode there are parts of chunks that become unusable once
they get COWed and the zone must be reclaimed and reset to make the
space usable again. Provide a way to show the total size per block group
type in fi df:
$ btrfs fi df .
Data, single: total=1.00GiB, used=257.51MiB, zone_unusable=238.43MiB
System, single: total=256.00MiB, used=16.00KiB, zone_unusable=224.00KiB
Metadata, single: total=256.00MiB, used=816.00KiB, zone_unusable=8.61MiB
GlobalReserve, single: total=3.25MiB, used=0.00B
This will not be shown on non-zoned filesystems.
Signed-off-by: David Sterba <dsterba@suse.com>
Move installation of gzip before autotools as it would otherwise pull
busybox-gzip (and busybox) and that causes problems later.
Signed-off-by: David Sterba <dsterba@suse.com>
The free travis-ci.org service is going to be discontinued. The
replacement travis-ci.com could be used instead but is not exactly the
same.
The images provided by the service contained old kernel and this hinders
testing of new features, tests were failing and the coverage was
incomplete. The docker images will be used to do build coverage in
another way. A hosted CI is still desired so the search continues.
Issue: #171
Signed-off-by: David Sterba <dsterba@suse.com>
The support for zoned mode is incomplete and won't change so we can
disable it on Leap 15.2 and Centos 8.
Signed-off-by: David Sterba <dsterba@suse.com>
The build now fails on older distros that have incomplete support for
zoned mode. Lack of blkzoned.h will automatically skip it. The
member blk_zone.capacity has been added in 5.9 and this would fail the
build, similarly BLKGETZONESZ.
Check each of them separately and fail the build unless --disable-zoned
is set. Build verified on Leap 15.2, Centos 7/8.
Signed-off-by: David Sterba <dsterba@suse.com>
In file test_filesystem.py the class name should be TestFilesystem, this
looks like a typo and does not affect functionality.
Signed-off-by: David Sterba <dsterba@suse.com>
Move the file to common as it's used by several parts, while still
keeping the name 'repair' although the only thing it does is adding a
corrupted extent.
Signed-off-by: David Sterba <dsterba@suse.com>
This new test case is to make sure the restored image file has been
properly enlarged so that newer kernel won't complain.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
If restoring dumped image to a new file, under most cases kernel will
reject it since version 5.11:
# mkfs.btrfs -f /dev/test/test
# btrfs-image /dev/test/test /tmp/dump
# btrfs-image -r /tmp/dump ~/test.img
# mount ~/test.img /mnt/btrfs
mount: /mnt/btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
# dmesg -t | tail -n 7
loop0: detected capacity change from 10592 to 0
BTRFS info (device loop0): disk space caching is enabled
BTRFS info (device loop0): has skinny extents
BTRFS info (device loop0): flagging fs with big metadata feature
BTRFS error (device loop0): device total_bytes should be at most 5423104 but found 10737418240
BTRFS error (device loop0): failed to read chunk tree: -22
BTRFS error (device loop0): open_ctree failed
[CAUSE]
When btrfs-image restores an image into a file, and the source image
contains only single device, then we don't need to modify the
chunk/device tree, as we can reuse the existing chunk/dev tree without
any problem.
This also means, for such restore, we also won't do any target file
enlarge. This behavior itself is fine, as at that time, kernel won't
check if the device is smaller than the device size recorded in device
tree.
But later kernel commit 3a160a933111 ("btrfs: drop never met disk total
bytes check in verify_one_dev_extent") introduces new check on device
size at mount time, rejecting any loop file which is smaller than the
original device size.
[FIX]
Do extra file enlarge for single device restore if the restored file is
smaller than the device size.
Reported-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Su Yue <l@damenly.su>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In restore_metadump(), we call stat() but never use the result. This
call site is left by some code refactoring, as the stat() call is now
moved into fixup_device_size(). We can safely remove the call.
Reviewed-by: Su Yue <l@damenly.su>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is a support to build on android but it's incomplete and there's
little interest to fix it.
To reinstate we'll need:
* fix remaining issues from
lore.kernel.org/linux-btrfs/20170802185111.187922-1-filipbystricky@google.com
* find CI host with Android support to verify build, either local eg. in
docker or in a hosted environment
* switch the make-based build to 'soong' (source.android.com/setup/build)
Issue: #357
Signed-off-by: David Sterba <dsterba@suse.com>
There's a group of functions that are related to opening filesystem in
various modes, this can be moved to a separate file.
Signed-off-by: David Sterba <dsterba@suse.com>
Decrease dependency on system headers, remove where they're not needed
or became stale after code moved. The path-utils.h encapsulate path
operations so include linux/limits.h here, that's where PATH_MAX is
defined.
Signed-off-by: David Sterba <dsterba@suse.com>
The helper wraps a raw ioctl but some users may already have the fd and
not necessarily the path. Add a suitable helper for convenience.
Signed-off-by: David Sterba <dsterba@suse.com>
This helper hasn't been used since 63bbf2931d ("btrfs-progs: rework
calculations of fi usage") a few years ago and we don't need the statfs
based calculations anywhere.
Signed-off-by: David Sterba <dsterba@suse.com>
The newly added zoned mode constants can utilize the const ilog2
version. Copy it from kernel include/linux/log2.h.
Signed-off-by: David Sterba <dsterba@suse.com>
This patch checks if the target file system is flagged as ZONED. If it is,
the device to be added is flagged PREP_DEVICE_ZONED. Also add checks to
prevent mixing non-zoned devices and zoned devices.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Check if the target file system is flagged as ZONED. If it is, the
device to be added is flagged PREP_DEVICE_ZONED. Also add checks to
prevent mixing non-zoned devices and zoned devices.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
mkfs.btrfs uses a temporary superblock during the initialization process.
The temporary superblock uses BTRFS_MAGIC_TEMPORARY as its magic which is
different from a regular superblock. As a result, libblkid, which only
supports the usual magic, cannot recognize the volume as btrfs. So, let's
wipe the temporary magic before writing out the usual superblock.
Technically, we can add the temporary magic to the libblkid's table. But,
it will result in recognizing a half-baked filesystem as btrfs, which is
not ideal.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Use sbwrite instead of pwrite to support superblock logging in zoned
mode. In addition, call fsync() to persist the superblock to ensure the
write order. It also helps us to detect an unaligned write (write to a
position other than the write pointer) error.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In zoned mode, chunks must be aligned to zone size to ensure sequential
writing to a block group maps to sequential writing to a device zone.
Thus, we need to tweak the position and the size of the initial system
block group.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This commit disables some features which are incompatible with zoned btrfs.
RAID/DUP is disabled because we cannot handle two zone append writes to
different zones in the kernel. MIXED_BG is disabled because the allocated
metadata region will be write holes for data writes. Space-cache (v1)
require in-place updatings.
It also disables the "--rootdir" option for now. The copying from a
directory needs some tweaks for zoned btrfs (e.g. zone size aware space
calculation), and we do not implement them yet.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Make mkfs.btrfs aware of the "zoned" feature flag and prepare the disks
for mkfs.btrfs. It automatically detects host-managed zoned device and
enables the future.
It also adds "zone_size" to struct btrfs_mkfs_config to track the zone
size.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We cannot overwrite superblock magic in a sequential required zone.
Instead, we can reset the zone to wipe it.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we zero out a region in a sequential write required zone, we cannot
write to the region until we reset the zone. Thus, we must prohibit zeroing
out to a sequential write required zone.
zero_dev_clamped() is modified to take the zone information and it calls
zero_zone_blocks() if the device is host managed to avoid writing to
sequential write required zones.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
All zones of zoned block devices should be reset before writing. Support
this by introducing PREP_DEVICE_ZONED.
btrfs_reset_all_zones() walk all the zones on a device, and reset a zone if
it is sequential required zone, or discard the zone range otherwise.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When freeing a chunk, we can/should reset the underlying device zones
for the chunk. Introduce btrfs_reset_chunk_zones() and reset the zones.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Tree manipulating operations like merging nodes often release
once-allocated tree nodes. Btrfs cleans such nodes so that pages in the
node are not uselessly written out. On ZONED drives, however, such
optimization blocks the following IOs as the cancellation of the write
out of the freed blocks breaks the sequential write sequence expected by
the device.
Check if next dirty extent buffer is continuous to a previously written
one. If not, it redirty extent buffers between the previous one and the
next one, so that all dirty buffers are written sequentially.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Conventional zones do not have a write pointer, so we cannot use it to
determine the allocation offset for sequential allocation if a block
group contains a conventional zone.
But instead, we can consider the end of the highest addressed extent in
the block group for the allocation offset.
For new block group, we cannot calculate the allocation offset by
consulting the extent tree, because it can cause deadlock by taking
extent buffer lock after chunk mutex, which is already taken in
btrfs_make_block_group(). Since it is a new block group anyways, we can
simply set the allocation offset to 0.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>