With the change of minimal number of zones, mkfs-tests/030-zoned-rst now
fails because the loopback device is 2GB and can contain 8x 256MB zones.
Use the nullb helpers to choose a smaller zone size.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add test for mkfs.btrfs zone reset behavior to check if
- it resets all the zones without "-b" option
- it detects an active zone outside of the FS range
- it do not reset a zone outside of the range
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Even with "mkfs.btrfs -b", mkfs.btrfs resets all the zones on the device.
Limit the reset target within the specified length.
Also, we need to check that there is no active zone outside of the FS
range. Having an active zone outside FS reduces the number of zones btrfs
can write simultaneously. Technically, we can still scan all the device
zones and keep active zones outside FS intact and try to live with the
limited active zones. But, that will make btrfs operations harder.
It is generally bad idea to use "-b" on a non-test usage on a device with
active zone limit in the first place. You really need to take care that FS
and outside the FS goes over the limit. That means you'll never be able to
use zones outside the FS anyway.
So, until there is a strong request for that, I don't think it's worthwhile
to do so.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While "byte_count" is eventually rounded down to sectorsize at make_btrfs()
or btrfs_add_to_fs_id(), it would be better round it down first and do the
size checks not to confuse the things.
Also, on a zoned device, creating a filesystem whose size is not aligned
to the zone boundary can be confusing. Round it down further to the zone
boundary.
The size calculation with a source directory is also tweaked to be aligned.
device_get_partition_size_fd_stat() must be aligned down not to exceed the
device size. And, btrfs_mkfs_size_dir() should have return sectorsize aligned
size. So, add an UASSERT for it.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently, we check if a device is larger than 5 zones to determine we can
create btrfs on the device or not. Actually, we need more zones to create
DUP block groups, so it fails with "ERROR: not enough free space to
allocate chunk". Implement proper support for non-SINGLE profile.
Also, current code does not ensure we can create tree-log BG and data
relocation BG, which are essential for the real usage. Count them as
requirement too.
The calculation for a regular btrfs is also adjusted to use dev_stripes
style.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
We are going to implement a better minimum size calculation for the zoned
mode. Move the current logic to btrfs_min_dev_size() and unify the size
checking path.
Also, convert "int mixed" to "bool mixed" while at it.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
test_minimum_size() already checks if each device can host the initial
block groups. There is no need to check if the first device can host the
initial system chunk again.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
block_count and dev_block_count are counting the size in bytes. And,
comparing them with e.g, "min_dev_size" is confusing. Rename them to
represent the unit better.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The dependency generation and change tracking was completely broken.
Partly after changes to move the generated files to .deps/.
- the generic rule did work as it could not reflect that dependency file
is not next to the source file
- inclusion of dependency files directive never found them because the
list was not updated to look to .deps, also there were some ancient
files mentioned
- the dependency generation commands were run before each target,
slowing down build
How it works now:
- dependencies are created before each .c file is compiled, so that way
any changes to the file get translated to the dependency files
- missing dependencies are create after first run
- there's single dependency file for all build types (box, static, both)
- same as before, 'make clean' will also delete the dependency files
Signed-off-by: David Sterba <dsterba@suse.com>
After patch "btrfs-progs: qgroup: handle stale qgroup deletion more
accurately" cleaning stale qgroups may not happen if the subvolume
cleaning is still in progress. Update the test so it's able to handle
that.
Signed-off-by: David Sterba <dsterba@suse.com>
Run tests with enabled sanitizers. There are still known problems with
leaks that will make the whole fail. This needs to be fixed before the
workflow can be enabled for devel or master.
Signed-off-by: David Sterba <dsterba@suse.com>
Use unaligned access helper for code that potentially or actually
accesses data that come from on-disk structures. This is for image or
chunk restore. This may pessimize some cases but is in general safer on
strict alignment architectures and has no effect on other architectures.
Related issue #770.
Signed-off-by: David Sterba <dsterba@suse.com>
The search header is used for extracting data from buffer returned by
the SEARCH_TREE ioctl and needs special access helpers as there are no
guarantees about alignment.
With -fsanitize=alignment this still leads to an error because address
of the members is taken, regardless of the unaligned access method is
used (both temporary memcpy to a structure or the packed struct cast).
Add another hint to compiler that the structure is special and add the
packed attribute. This fixes the sanitizer error.
Signed-off-by: David Sterba <dsterba@suse.com>
There's a lot of places with unsafe access to data that come from a
search buffer, which is packed and the structures there are not
guaranteed to be aligned, also accessing the on-disk format structures.
- search header - this is an in-memory buffer with a series of on-disk
structures, no alignment must be assumed
- anything that's not a byte buffer must be accessed as an unaligned
buffer (the exceptions are name-like buffers)
Signed-off-by: David Sterba <dsterba@suse.com>
We will need generic helpers for unaligned access with LE->CPU
conversion, so add them. Should be use for potentially unaligned read
from tree search buffer.
Signed-off-by: David Sterba <dsterba@suse.com>
Fix parsing of send stream, properly access potentially unaligned data.
This can happen on hosts with strict alignment (ARM v5 or v6).
Related issue: #770
Signed-off-by: David Sterba <dsterba@suse.com>
There will be an unplanned update to libbtrfs (fixing send/receive
stream parsing and unaligned data access). The current ABI is frozen and
won't change but at least the patch level should change. Update the build
to create all links up to the major.minor.patch. Until now it was just
major.minor:
- libbtrfs.so -> libbtrfs.so.0.1.2
- libbtrfs.so.0 -> libbtrfs.so.0.1.2
- libbtrfs.so.0.1 -> libbtrfs.so.0.1.2
- libbtrfs.so.0.1.2
Signed-off-by: David Sterba <dsterba@suse.com>
Same fix a previous commit, unaligned access on strict alignment hosts
could produce wrong results (reported on send/receive and arm5). As
libbtrfs has own copy of the code fix it here too, replacing leXX_to_cpu
with get_unaligned_leXX where appropriate. This means any access to raw
buffers that get cast to a structure.
Issue: #770
Signed-off-by: David Sterba <dsterba@suse.com>
There's a report:
ERROR: Failed to send/receive subvolume: .../testbackup.20240330T1102 -> .../testbackup.20240330T1102
ERROR: ... Command execution failed (exitcode=1)
ERROR: ... sh: btrfs send '.../testbackup.20240330T1102' | ssh user@host.lan 'sudo -n btrfs receive '\''...'\'''
ERROR: ... invalid tlv in cmd tlv_type = 816
This is send/receive between arm64 and armv5el hosts, with btrfs-progs
6.2.1. Last known working version is 5.16. This looked like another
custom protocol extension by NAS vendors but this was a false trace and
this is indeed a bug in stream parsing after changes to the v2 protocol.
The most likely explanation is that the armv5 host requires strict
alignment for reads (32bit type must be 4 byte aligned) but the way the
raw data buffer is mapped to the cmd structure in read_cmd() does not
guarantee that.
Issue: #770
Fixes: aa1ca3789e ("btrfs-progs: receive: support v2 send stream DATA tlv format")
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a bug report that, with very weird mount status, there can be
some mount source which can not be accessed:
/path/dev/exports fs 500G 57G 444G 12% /path/dev/exports
Strace shows we can not access the above mount source:
131065 stat("/path/dev/exports", 0x7ffed17b8e20) = -1 EACCES (Permission denied)
And lead to failed mount check:
131065 write(2, "ERROR: ", 7) = 7
131065 write(2, "cannot check mount status of /de"..., 56) = 56
131065 write(2, "\n", 1) = 1
[CAUSE]
The mount check is based on libblkid, which gives the mount source, and
for non-btrfs mounts, we call path_is_reg_or_block_device() to check if
we even need to continue checking.
But in above case, the mount source is another fs, and we can not access
the source.
So we error out causing the check_mounted() to return error.
[FIX]
There is never any guarantee we can access the mount source, but on the
other hand, I do not want to ignore all access failure for the mount
source.
Let test_status_for_mkfs() to only skip check_mounted() error if
@force_overwrite is true.
This would still keep the old strict checks on whether the target is
already mounted, but if the end user really knows that certain mount
source do not need to be checked, they can always pass "-f" option to
skip the false alerts.
Link: https://bugzilla.suse.com/show_bug.cgi?id=1223799
Reported-by: Jiri Belka <jiri.belka@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Recent patches updated stale qgroup handling, using 'unlinked' and
'dropped' where we otherwise use 'deleted' and 'cleaned'.
Signed-off-by: David Sterba <dsterba@suse.com>
Currently `btrfs qgroup show` command shows any 0 level qgroup without a
root backref as `<stale>`, which is not correct.
There are several more cases:
- Under deletion
The subvolume is not yet full dropped, but unlinked.
In that case we would not have a root backref item, but the qgroup is
not stale.
- Squota space holder
This is for squota mode, that a fully dropped subvolume still have
extents accounting on the already-gone subvolume.
In this case it's not stale either, and future accounting relies on
it.
This patch would add above special cases, and add an extra `SPECIAL
PATHS` section to explain all the cases, including `<stale>`.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The current stale qgroup deletion doesn't handle the following cases at
all:
- It doesn't detect stale qgroups correctly
The current check is using the root backref, which means unlinked but
not yet fully dropped subvolumes would mark its corresponding qgroups
stale.
This is incorrect. The real stale check should be based on the root
item, not root backref.
- Squota non-empty but stale qgroups
Such qgroups can not and should not be deleted, as future accounting
still require them.
- Full accounting mode, stale qgroups but not empty
Since qgroup numbers are inconsistent already, it's common to have
such stale qgroups with non-zero numbers.
Now it's dependent on the kernel to determine whether such qgroup can
be deleted.
Address the above problems:
- Do root_item based detection
So that btrfs_qgroup::stale would properly indicate if there is a
subvolume root item for the qgroup.
- Do not attempt to delete squota stale but non-empty qgroups
- Attempt to delete stale but non-empty qgroups for full accounting mode
And deletion failure would not count as an error.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This allows the users to identify if the running qgroup mode and whether
the numbers are already inconsistent.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since qgroup numbers are only updated at transaction commit time, it's
better to do a sync before reading the quota tree, to reduce the chance
of uncommitted qgroup changes.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The -m | -M option for btrfstune, sounds like metadata_uuid is being
changed which is wrong. The fsid is being changed the original fsid is
being copied into the metadata_uuid. So update the help text.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
For simple quota mode btrfs, dump tree does not show the extra flags
correctly:
# mkfs.btrfs -f -O squota $dev
# btrfs inspect dump-tree -t quota $dev | grep QGROUP_STATUS -A1
item 0 key (0 QGROUP_STATUS 0) itemoff 16243 itemsize 40
version 1 generation 10 flags ON scan 0 enable_gen 7
Note just ON is shown, but squota has one extra bit set for it.
[CAUSE]
Just no support for the new flag.
[FIX]
Add the new flag support, also to be consistent with other flags string
output, add output for extra unknown flags.
With a hand crafted image, the output with unknown flags looks like
this:
item 0 key (0 QGROUP_STATUS 0) itemoff 16243 itemsize 40
version 1 generation 10 flags ON|SIMPLE_MODE|UNKNOWN(0xf00) scan 0 enable_gen 7
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The new test case would contain a file system image, with the following
file:
item 4 key (257 INODE_ITEM 0) itemoff 15883 itemsize 160
generation 7 transid 8 size 16384 nbytes 16384
block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
sequence 258 flags 0x0(none)
atime 1714635006.328482575 (2024-05-02 17:00:06)
ctime 1714635013.394980640 (2024-05-02 17:00:13)
mtime 1714635013.394980640 (2024-05-02 17:00:13)
otime 1714635006.328482575 (2024-05-02 17:00:06)
item 5 key (257 INODE_REF 256) itemoff 15869 itemsize 14
index 2 namelen 4 name: file
item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
generation 7 type 1 (regular)
extent data disk byte 13631488 nr 1048576
extent data offset 0 nr 16384 ram 16384
extent compression 0 (none)
Note the ram bytes, which should be 1048576.
Furthermore, the inode size is truncated to 16K (originally 1M), so that
offset + num_bytes would still be no larger than ram_bytes.
So the only error is the mismatch between ram_bytes and disk_num_bytes
for the non-compressed data extent.
The image is hand crafted for now, as btrfs-corrupt-block doesn't not yet
support corrupting the ram_bytes field.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For non-compressed non-hole file extent items, the ram_bytes should
match disk_num_bytes.
But due to kernel bugs, we have several cases where ram_bytes is not
correctly updated.
Thankfully this is really a very minor mismatch and can never cause data
corruption since the kernel does not utilize ram_bytes for
non-compressed extents at all.
So here we just detect and repair it for original mode.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For non-compressed non-hole file extent items, the ram_bytes should
match disk_num_bytes.
But due to kernel bugs, we have several cases where ram_bytes is not
correctly updated.
Thankfully this is really a very minor mismatch and can never cause data
corruption since the kernel does not utilize ram_bytes for
non-compressed extents at all.
So here we just detect and repair it for lowmem mode.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is a new release candidate for e2fsprogs https://github.com/tytso/e2fsprogs/releases/tag/v1.47.1-rc2
Linking btrfs-progs v6.8 against this version of e2fsprogs leads to the following compile error:
convert/source-ext2.c: In function 'ext4_copy_inode_timespec_extra':
convert/source-ext2.c:733:13: warning: implicit declaration of function 'inode_includes' [-Wimplicit-function-declaration]
733 | if (inode_includes(inode_size, i_ ## xtime ## _extra)) { \
| ^~~~~~~~~~~~~~
convert/source-ext2.c:769:9: note: in expansion of macro 'EXT4_COPY_XTIME'
769 | EXT4_COPY_XTIME(atime, dst, tv_sec, tv_nsec);
| ^~~~~~~~~~~~~~~
convert/source-ext2.c:733:40: error: 'i_atime_extra' undeclared (first use in this function)
733 | if (inode_includes(inode_size, i_ ## xtime ## _extra)) { \
| ^~
convert/source-ext2.c:769:9: note: in expansion of macro 'EXT4_COPY_XTIME'
769 | EXT4_COPY_XTIME(atime, dst, tv_sec, tv_nsec);
| ^~~~~~~~~~~~~~~
convert/source-ext2.c:733:40: note: each undeclared identifier is reported only once for each function it appears in
733 | if (inode_includes(inode_size, i_ ## xtime ## _extra)) { \
| ^~
convert/source-ext2.c:769:9: note: in expansion of macro 'EXT4_COPY_XTIME'
769 | EXT4_COPY_XTIME(atime, dst, tv_sec, tv_nsec);
| ^~~~~~~~~~~~~~~
convert/source-ext2.c:733:40: error: 'i_mtime_extra' undeclared (first use in this function)
733 | if (inode_includes(inode_size, i_ ## xtime ## _extra)) { \
| ^~
convert/source-ext2.c:770:9: note: in expansion of macro 'EXT4_COPY_XTIME'
770 | EXT4_COPY_XTIME(mtime, dst, tv_sec, tv_nsec);
| ^~~~~~~~~~~~~~~
convert/source-ext2.c:733:40: error: 'i_ctime_extra' undeclared (first use in this function)
733 | if (inode_includes(inode_size, i_ ## xtime ## _extra)) { \
| ^~
convert/source-ext2.c:771:9: note: in expansion of macro 'EXT4_COPY_XTIME'
771 | EXT4_COPY_XTIME(ctime, dst, tv_sec, tv_nsec);
| ^~~~~~~~~~~~~~~
convert/source-ext2.c:774:40: error: 'i_crtime_extra' undeclared (first use in this function)
774 | if (inode_includes(inode_size, i_crtime_extra)) {
| ^~~~~~~~~~~~~~
from tytso/e2fsprogs@ca8bc92
Fix inode_includes() macro to properly wrap "inode" parameter,
and rename to ext2fs_inode_includes() to avoid potential name
clashes. Use this to check inode field inclusion in debugfs
instead of bare constants for inode field offsets.
To fix that use the new prefixed macro and add backward compatibility that
would still use inode_includes().
Issue: #785
Signed-off-by: David Sterba <dsterba@suse.com>
The inode_cache functionality is long gone and the 'rescue' group
provides the clearning functionality, no point keeping it in check.
Move the --clear-space-cache option to the deprecaeted section so it can
be removed soon.
Signed-off-by: David Sterba <dsterba@suse.com>
The test case relies on `--delete-qgroup` option which has been removed.
The feature needs to be implemented in kernel as it's more complex than
just calling an ioctl. The test case does not take the complexity of
subvolume dropping into consideration and only tested the simplest
cases.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This reverts commit 9da773aa46.
There are several problems related to the --delete-qgroup option:
- Currently kernel doesn't allow to delete non-empty qgroups
- A qgroup can only be empty after fully dropped and a transaction is
committed
The tool doesn't take either factor into consideration
- Things like drop_subtree_threshold or other operations can mark qgroup
inconsistent and skip accounting
This can mean the target qgroup will never be empty until next rescan
On the other hand, even we do it the proper way, it would hugely delay
the command (wait until the subvolume to be cleaned).
Furthermore, even if the waiting is handled properly,
drop_subtree_threshold can still prevent us deleting the qgroup (qgroup
numbers are inconsistent, and accounting is skipped completely).
So the qgroup cleanup needs kernel to make it work properly.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Since commit 8049446bb0 ("btrfs-progs: docs: placeholder for
contents.rst file on older sphinx version"), on systems with much newer
sphinx-build, "make" would not work for Documentation directory:
$ make clean-all && ./autogen.sh && ./configure --prefix=/usr/ && make -j12
$ ls -alh Documentation/_build
ls: cannot access 'Documentation/_build': No such file or directory
The sphinx-build has a much newer version:
$ sphinx-build --version
sphinx-build 7.2.6
[CAUSE]
On systems which don't need the workaround, the phony target of
contents.rst seems to cause a dependency loop:
GNU Make 4.4.1
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Reading makefiles...
Reading makefile 'Makefile'...
Updating makefiles....
Considering target file 'Makefile'.
Looking for an implicit rule for 'Makefile'.
Trying pattern rule '%:' with stem 'Makefile'.
Found implicit rule '%:' for 'Makefile'.
Finished prerequisites of target file 'Makefile'.
No need to remake target 'Makefile'.
Updating goal targets....
Considering target file 'contents.rst'.
File 'contents.rst' does not exist.
Finished prerequisites of target file 'contents.rst'.
Must remake target 'contents.rst'.
Makefile:35: update target 'contents.rst' due to: target is .PHONY
if [ "$(sphinx-build --version | cut -d' ' -f2)" \< "1.7.7" ]; then \
touch contents.rst; \
fi
Putting child 0x64ee81960130 (contents.rst) PID 66833 on the chain.
Live child 0x64ee81960130 (contents.rst) PID 66833
Reaping winning child 0x64ee81960130 PID 66833
Removing child 0x64ee81960130 PID 66833 from chain.
Successfully remade target file 'contents.rst'.
All the default make doing is just try to generate contents.rst, but
since we have much newer version, we won't generate the file at all.
[FIX]
Instead of a phony target, just move the contents.rst generation into
man page target so that we won't cause loop target on contents.rst.
Fixes: 8049446bb0 ("btrfs-progs: docs: placeholder for contents.rst file on older sphinx version")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add some missing entries. Changes to supported levels:
- increase to 6.8 from 6.7 where applicable, there were fixes to squota
and temp-fsid
- raid-stripe-tree declares support from 6.7, however this is still
behind CONFIG_BTRFS_DEBUG option in kernel, there are some bugs
and the known lack of RAID56 support
[ci skip]
Signed-off-by: David Sterba <dsterba@suse.com>
Use the objectid, type, offset natural order as it's more readable and
we're used to read keys like that.
Signed-off-by: David Sterba <dsterba@suse.com>
A user who wants to shrink a btrfs filesystem within some other logical
device (like a partition) will likely want to adapt the size of the
underlying device, too.
This commit adds documentation that describes how the length of the
portion that btrfs uses of some device can be found out.
Thanks go out to Roman Mamedov <rm@romanrm.net> for hinting `btrfs
filesystem show` as alternative command.
Note: the granularity is one sectorsize and the input values are
silently rounded down to avoid bugs from converted filesystems that
would not adhere to the native btrfs constraints.
[ci skip]
Pull-request: #775
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Signed-off-by: David Sterba <dsterba@suse.com>