The initial proposal for file attributes was built on simply doing
SETFLAGS but this builds on an old and non-extensible interface that has
no direct mapping for all inode flags. There's a unified interface
fileattr that covers file attributes and xflags, it should be possible
to add new bits.
On the protocol level the value is copied as-is in the original inode
but this does not provide enough information how to apply the bits on
the receiving side. Eg. IMMUTABLE flag prevents any changes to the file
and has to be handled manually.
The receiving side does not apply the bits yet, only parses it from the
stream.
Signed-off-by: David Sterba <dsterba@suse.com>
Add constant for initial value to avoid unexpected clashes with user
defined getopt values and shift the common size getopt values.
Signed-off-by: David Sterba <dsterba@suse.com>
In send stream v2, send can emit a command for setting inode flags via
the setflags ioctl. Pass the flags attribute through to the ioctl call
in receive.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Send stream v2 can emit fallocate commands, so receive must support them
as well. The implementation simply passes along the arguments to the
syscall. Note that mode is encoded as a u32 in send stream but fallocate
takes an int, so there is a unsigned->signed conversion there.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a new btrfs_send_op and support for both dumping and proper receive
processing which does actual encoded writes.
Encoded writes are only allowed on a file descriptor opened with an
extra flag that allows encoded writes, so we also add support for this
flag when opening or reusing a file for writing.
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
The new format privileges the BTRFS_SEND_A_DATA attribute by
guaranteeing it will always be the last attribute in any command that
needs it, and by implicitly encoding the data length as the difference
between the total command length in the command header and the sizes of
the rest of the attributes (and of course the tlv_type identifying the
DATA attribute). To parse the new stream, we must read the tlv_type and
if it is not DATA, we proceed normally, but if it is DATA, we don't
parse a tlv_len but simply compute the length.
In addition, we add some bounds checking when parsing each chunk of
data, as well as for the tlv_len itself.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
In send stream v2, write commands can now be an arbitrary size. For that
reason, we can no longer allocate a fixed array in sctx for read_cmd.
Instead, read_cmd dynamically allocates sctx->read_buf. To avoid
needless reallocations, we reuse read_buf between read_cmd calls by also
keeping track of the size of the allocated buffer in sctx->read_buf_sz.
We do the first allocation of the old default size at the start of
processing the stream, and we only reallocate if we encounter a command
that needs a larger buffer.
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
An encoded extent can be up to 128K in length, which exceeds the largest
value expressible by the current send stream format's 16 bit tlv_len
field. Since encoded writes cannot be split into multiple writes by
btrfs send, the send stream format must change to accommodate encoded
writes.
Supporting this changed format requires retooling how we store the
commands we have processed. We currently store pointers to the struct
btrfs_tlv_headers in the command buffer. This is not sufficient to
represent the new BTRFS_SEND_A_DATA format. Instead, parse the attribute
headers and store them in a new struct btrfs_send_attribute which has a
32bit length field. This is transparent to users of the various TLV_GET
macros.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Headers that are only exported and not used for build do not need the
BTRFS_FLAT_INCLUDES switch (between local and installed headers). Now
that there are local copies of the shared headers drop the respective
part from local headers.
Signed-off-by: David Sterba <dsterba@suse.com>
Kernel commit efc0e69c2fea ("btrfs: introduce exclusive operation
BALANCE_PAUSED state") allows to start a device add when there's a
paused balance, eg. to let the balance finish when there's not enough
chunk space. Add the support for that, though this needs an updated
kernel to export the 'balance paused' in sysfs.
Signed-off-by: David Sterba <dsterba@suse.com>
We need to make sure we process the block group root, and mark its
blocks as used for the free space tree checking.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The fuzz-test/003 was infinite looping when I reworked the code to
re-calculate the used bytes for the superblock. This is because fsck
wasn't properly fixing the bad extent before my change, it just happened
to error out nicely, whereas my change made it so we go the wrong bytes
used count and just infinite looped trying to fix the problem.
Fix this by sanity checking the extent when we try to re-calculate the
bytes_used. This makes us no longer infinite loop so we can get through
the fuzz tests.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We want to enable developers to test the extent tree v2 features as they
are added, add the ability to mkfs an extent tree v2 fs if we have
experimental enabled.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We could have multiple extent roots, so add a helper to mark all the
used space in the FS based on any extent roots we find, and then use
this extent io tree to fixup the block group accounting.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When we switch to multiple global trees we'll need to access the
appropriate extent root depending on the block group or possibly root.
To handle this, use a helper in most places and then the actual root in
places where it is required. We will whittle down the direct accessors
with future patches, but this does the bulk of the preparatory work.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_mark_used_tree_blocks skips the reloc roots for some reason, which
causes problems because these blocks are in use, and we use this helper
to determine if the block accounting is correct with extent tree v2.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is going to be used for the extent tree v2 stuff more commonly, so
move it out so that it is accessible from everywhere that we need it.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have this helper sitting in extent-tree.c, but it's a repair
function. I'm going to need to make changes to this for extent-tree-v2
and would rather this live outside of the code we need to share with the
kernel.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
mkfs.btrfs v5.15 outputs a message even if the disk is a HDD without
TRIM/DISCARD support:
Performing full device TRIM /dev/sdc2 (326.03GiB) ...
[CAUSE]
mkfs.btrfs check TRIM/DISCARD support through the content of
queue/discard_granularity, but compare it against a wrong value.
When HDD without TRIM/DISCARD support, the content of
queue/discard_granularity is '0' '\n' '\0', rather than '0' '\0'.
[FIX]
- compare the value based on atoi() to provide more robustness
- delete unnecessary '\n' in pr_verbose()
Fixes: c50c448518 ("btrfs-progs: do sysfs detection of device discard capability")
Signed-off-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are a lot of call sites where we use the following code snippet:
u8 super_block_data[BTRFS_SUPER_INFO_SIZE];
struct btrfs_super_block *sb;
u64 ret;
sb = (struct btrfs_super_block *)super_block_data;
The reason for this is, structure btrfs_super_block was smaller than
BTRFS_SUPER_INFO_SIZE.
Thus for anything with csum involved, we have to use a proper 4K buffer.
Since the recent unification of sizeof(struct btrfs_super_block), we no
longer need such workaround, and can use struct btrfs_super_block
directly to do any operation.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Since commit dad03fac3b ("btrfs-progs: switch btrfs_group_profile_str
to use raid table"), fstests/btrfs/023 and btrfs/151 will always fail.
The failure of btrfs/151 explains the reason pretty well:
btrfs/151 1s ... - output mismatch
--- tests/btrfs/151.out 2019-10-22 15:18:14.068965341 +0800
+++ ~/xfstests-dev/results//btrfs/151.out.bad 2021-11-02 17:13:43.879999994 +0800
@@ -1,2 +1,2 @@
QA output created by 151
-Data, RAID1
+Data, raid1
...
(Run 'diff -u ~/xfstests-dev/tests/btrfs/151.out ~/xfstests-dev/results//btrfs/151.out.bad' to see the entire diff)
[CAUSE]
Commit dad03fac3b ("btrfs-progs: switch btrfs_group_profile_str to use
raid table") will use btrfs_raid_array[index].raid_name, which is all
lower case.
[FIX]
There is no need to bring such output format change.
So here we split the btrfs_raid_attr::raid_name[] into upper_name[] and
lower_name[], and make upper and lower case helpers for callers to use.
Now there are several types of callers referring to lower_name and
upper_name:
- parse_bg_profile()
It uses strcasecmp(), either case would be fine.
- btrfs_group_profile_str()
Originally it uses upper case for all profiles except "single".
Now unified to upper case.
- sprint_profiles()
It uses lower case.
- bg_flags_to_str()
It uses upper case.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit ("btrfs-progs: switch btrfs_group_profile_str to use raid table")
introduced a regression that raid profile of GlobalReserve will be
printed as 'unknown'.
$ btrfs filesystem df /mnt/test
Data, single: total=5.02TiB, used=4.98TiB
System, single: total=4.00MiB, used=624.00KiB
Metadata, single: total=11.01GiB, used=6.94GiB
GlobalReserve, unknown: total=512.00MiB, used=0.00B
Fix it by:
- take BTRFS_BLOCK_GROUP_RESERVED into account when masking the block
group flags
- update the define of BTRFS_BLOCK_GROUP_RESERVED too so it's same as in
kernel
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Kernel emits inode number for all mkfile/mkdir/... commands but the
receive part does not pass it to the callbacks. At least document that
and read it from the stream in case we'd like to use it in the future.
Signed-off-by: David Sterba <dsterba@suse.com>
Use the raid table helper to avoid hard coding profiles for the given
number of devices in test_num_disk_vs_raid.
Signed-off-by: David Sterba <dsterba@suse.com>
Another duplication of the raid table, in this case missing the changes
to raid10 and raid0 minimum devices changed in a177ef7dd444
("btrfs-progs: mkfs: allow degenerate raid0/raid10").
Define and use a helper using the table value.
Signed-off-by: David Sterba <dsterba@suse.com>
Wrap pread with btrfs_pread as well.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Wrap pwrite with btrfs_pwrite(). It simply calls pwrite() on non-zoned
btrfs (opened without O_DIRECT). On zoned mode (opened with O_DIRECT),
it allocates an aligned bounce buffer, copies the contents and uses it
for direct-IO writing.
Writes in device_zero_blocks() and btrfs_wipe_existing_sb() are a little
tricky. We don't have fs_info on our hands, so use zinfo to determine it
is a zoned device or not.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since we cannot create ext*/reiserfs on a zoned device, it is useless to
allow ZONED feature when converting a file system. Drop ZONED flag from
BTRFS_CONVERT_ALLOWED_FEATURES.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Due to ambiguity of error values in the public API subvol_uuid_search,
there was second version added. There's a separate copy in libbtrfs and
we don't have to distinguish in the internal code. All callers check for
IS_ERR and NULL, so we can safely merge the helpers into one.
Signed-off-by: David Sterba <dsterba@suse.com>
After removing uuid search fallback code the structure has become
trivial and copies the fd that all callers have in their context.
Signed-off-by: David Sterba <dsterba@suse.com>
After the uuid search fallback code has been removed, the finit helper
has become empty and can be removed.
Signed-off-by: David Sterba <dsterba@suse.com>
There's a lot of code under BTRFS_COMPAT_SEND_NO_UUID_TREE to support
kernels < 3.12 that don't have uuid tree and the subvolume uuids are
searched in a slow way, building the uuid tree on the userspace side.
As the uuid tree is always created, the fallback code was not exercised
anyway due to 'uuid_tree_existed' check in subvol_uuid_search2.
Delete the code from the internal copy of send-utils. The support still
stays for libbtrfs and will be removed in the future.
Signed-off-by: David Sterba <dsterba@suse.com>
The btrfs_list_* functions come with some overhead and for simple path
resolution we can use btrfs_subvolid_resolve.
Signed-off-by: David Sterba <dsterba@suse.com>
We don't need to include this besides btrfs-list.c itself and
subvolume.c that does use the btrfs_list_* API.
Signed-off-by: David Sterba <dsterba@suse.com>
The separate file was needed for libbtrfs in the past to avoid pulling
utils.c in, but this is not needed after recent cleanups.
Signed-off-by: David Sterba <dsterba@suse.com>
The free space tree is a better way to track the free space and has been
tested in the wild for a long time. The backward compatibility is
sufficient, several long term kernels. On-line conversion from v1 to v2
can be done by mount, switching from v2 to v1 can be done by 'btrfs
check'.
Issue: #295
Signed-off-by: David Sterba <dsterba@suse.com>
The no-holes feature reduces consumption of metadata by not representing
file holes. Reducing metadata is a good thing in general, this is the
main goal to enable this by default.
There's a drawback, related to the missing information about holes. The
'check' tool cannot use it to cross-reference extent information and in
some cases may not be able to detect a problem.
The no-hole feature can be also enabled by 'btrfstune -n' on an
unmounted filesystem.
Issue: #405
Signed-off-by: David Sterba <dsterba@suse.com>
The term 'path' is confusing as we normally use it for filesystem paths,
while for multipath it's more related to the physical path by which the
devices are connected (though it also shows up as another path in the
filesystem).
Rename the helper doing the multipath detection so it's clear what path
is meant by that.
Signed-off-by: David Sterba <dsterba@suse.com>
Since libudev doesn't provide a static version of the library for static
build btrfs-progs will have to provide manual fallback. This change does
this by parsing the udev database files hosted at /run/udev/data/.
Under that directory every block device should have a file with the
following name: bMAJ:MIN. So implement the bare minimum code necessary
to parse this file and search for the presence of DM_MULTIPATH_DEVICE_PATH
udev attribute. This could likely be racy since access to the udev
database is done outside of libudev but that's the best that can be
done when implementing this manually and is only for a limited usecase
where static build has to be used.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently btrfs-progs will happily enumerate any device which has a
btrfs filesystem on it irrespective of its type. For the majority of
use cases that's fine and there haven't been any problems with that.
However, there was a recent report that in multipath scenario when
running "btrfs fi show" after a path flap (path going down and then
coming back up) instead of the multipath device being show the device
which represents the flapped path is shown. So a multipath filesystem
might look like:
Label: none uuid: d3c1261f-18be-4015-9fef-6b35759dfdba
Total devices 1 FS bytes used 192.00KiB
devid 1 size 10.00GiB used 536.00MiB path /dev/mapper/3600140501cc1f49e5364f0093869c763
/dev/mapper/xxx is actually backed by an arbitrary number of paths,
which in turn are presented to the system as ordinary SCSI devices i.e
/dev/sdX. If a path flaps and a user re-runs 'btrfs fi show' the output
would look like:
Label: none uuid: d3c1261f-18be-4015-9fef-6b35759dfdba
Total devices 1 FS bytes used 192.00KiB
devid 1 size 10.00GiB used 536.00MiB path /dev/sdd
This only occurs on unmounted filesystems as those are enumerated by
btrfs-progs, for mounted filesystem the kernel properly deals only with
the actual multipath device.
Turns out the output of this command is consumed by libraries and the
presence of a path device rather than the actual multipath causes
issues.
Fix this by checking for the presence of DM_MULTIPATH_DEVICE_PATH
udev attribute as multipath path devices are tagged with this attribute
by the multipath udev scripts.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The detection of the discard status of a device is done by issuing a
real discard request but on an empty range. This works in most cases.
However there's a case of a VirtualBox driver that returns 'Operation
not supported' in that case, and then discard is skipped during mkfs.
The other tools like fstrim check the sysfs queue file
discard_granularity which is the recommended way. Do that as well.
Issue: #390
Signed-off-by: David Sterba <dsterba@suse.com>
We cannot zone reset a regular file with emulated zones. So, mkfs.btrfs
on such a file causes the following error.
ERROR: zoned: failed to reset device '/home/naota/tmp/btrfs.img' zones: Inappropriate ioctl for device
Introduce btrfs_zoned_device_info->emulated to distinguish the zones are
emulated or not. And, use it to decide it needs zone reset or not.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Reading partition size using an ioctl requires the device open, but that
does not work for unprivileged users. This leads to 0 size in device
info structures filled by device_get_partition_size.
As a consequence, this also misreports such devices as missing in 'fi
us' overview:
$ btrfs fi us /
WARNING: cannot read detailed chunk info, per-device usage will not be shown, run as root
Overall:
Device size: 411.35GiB
Device allocated: 53.01GiB
Device unallocated: 358.34GiB
Device missing: 411.35GiB
Used: 31.99GiB
Free (estimated): 379.16GiB (min: 379.16GiB)
Free (statfs, df): 379.35GiB
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 194.77MiB (used: 0.00B)
Multiple profiles: no
There should be 0 for 'Device missing'.
Add a fallback to read the device size from sysfs in case the ioctl is
not available.
Issue: #395
Signed-off-by: David Sterba <dsterba@suse.com>
The function btrfs_list_get_path_rootid is exported to libbtrfs so it
needs to stay, but we can inline the implementation.
Signed-off-by: David Sterba <dsterba@suse.com>
The header contains the protocol definitions and is almost exactly the
same as the kernel version, move it to the proper directory.
Signed-off-by: David Sterba <dsterba@suse.com>
The helper open codes what we already have in the raid attr table, so
use it. We assume a valid flags so there's no error value.
Signed-off-by: David Sterba <dsterba@suse.com>
We'll use plain qgroupid parsing function elsewhere so split that part
from parse_qgroupid_or_path. The parsing is slightly reworked and goes
from start to end, while previously it looked up the slash and worked
from there. In case a valid qgroupid is also a valid path, the path must
be specified as absolute.
Signed-off-by: David Sterba <dsterba@suse.com>
This helper can parse a qgroupid or a path, so rename it accordingly, so
a plain qgroupid parsing can be factored out as a standalone helper.
Signed-off-by: David Sterba <dsterba@suse.com>
Add the GPL v2 header to files where it was missing and is not from an
external source, update to the most recent version with the address.
Signed-off-by: David Sterba <dsterba@suse.com>
There are some duplicate parsers of the profile names, factor out the
one from balance to the common code.
Signed-off-by: David Sterba <dsterba@suse.com>
There are various parsing helpers scattered everywhere, unify them to
one file and start with helpers already in utils.c.
Signed-off-by: David Sterba <dsterba@suse.com>
Kernel patch b2f78e88052bc0bee ("btrfs: allow degenerate raid0/raid10")
in
5.15 will allow mounting and converting to single device raid0 or two
device raid10. Let mkfs create such filesystem.
"The motivation is to allow to preserve the profile type as long as it
possible for some intermediate state (device removal, conversion), or
when there are disks of different size, with raid0 the otherwise
unusable space of the last device will be used too. Similarly for
raid10, though the two largest devices would need to be the same."
Signed-off-by: David Sterba <dsterba@suse.com>
- Change it void
The old one always return csum_size.
- Use snprintf()
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Function btrfs_format_csum() is a special helper only used in
btrfs-progs.
Move it to common/utils.[ch] other than leaving it in
kernel-shared/disk-io.c.
Since we're moving the code, also introduce a macro,
BTRFS_CSUM_STRING_LEN, to replace open-coded string length calculation.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
check_running_fs_exclop() can return 1 when exclop is changed to "none"
The ret is set by the return value of the select() operation. Checking
the exclusive op changes just the exclop variable while ret is still
set to 1.
Set ret = 0 if exclop is set to BTRFS_EXCL_NONE or BTRFS_EXCL_UNKNOWN.
Remove unnecessary continue statement at the end of the block.
The command appears to have executed, but does not. This was found when
balance which typically reports chunks relocated did not print anything
on screen.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Sysfs hides the zone size of a block device in the queue/chunk_sectors
file, so add a helper that will read it for us when given the short
device name (that can be found in FSID/devices).
Signed-off-by: David Sterba <dsterba@suse.com>
There are several directories in /sys/fs/btrfs/FSID that contain more
than one file/directory. Add a helper to open the directory so that the
file descriptor can be used for fdopendir.
Signed-off-by: David Sterba <dsterba@suse.com>
Commit 8ef9313cf298 ("btrfs-progs: zoned: implement log-structured
superblock") changed to write BTRFS_SUPER_INFO_SIZE bytes to device.
The before num of bytes to be written is sectorsize.
It causes mkfs.btrfs failed on my 16k pagesize kvm:
$ /usr/bin/mkfs.btrfs -s 16k -f -mraid0 /dev/vdb2 /dev/vdb3
btrfs-progs v5.12
See http://btrfs.wiki.kernel.org for more information.
ERROR: superblock magic doesn't match
ERROR: superblock magic doesn't match
common/device-scan.c:195: btrfs_add_to_fsid: BUG_ON `ret != sectorsize`
triggered, value 1
/usr/bin/mkfs.btrfs(btrfs_add_to_fsid+0x274)[0xaaab4fe8a5fc]
/usr/bin/mkfs.btrfs(main+0x1188)[0xaaab4fe4dc8c]
/usr/lib/libc.so.6(__libc_start_main+0xe8)[0xffff7223c538]
/usr/bin/mkfs.btrfs(+0xc558)[0xaaab4fe4c558]
[1] 225842 abort (core dumped) /usr/bin/mkfs.btrfs -s 16k -f -mraid0
/dev/vdb2 /dev/vdb3
btrfs_add_to_fsid() now always calls sbwrite() to write
BTRFS_SUPER_INFO_SIZE bytes to device, so change condition of
the BUG_ON().
Also add comments for sbread() and sbwrite().
Signed-off-by: Su Yue <l@damenly.su>
Signed-off-by: David Sterba <dsterba@suse.com>
Compiling with clang produces warnings like:
common/utils.c:1670:31: warning: use of GNU 'missing =' extension in designator [-Wgnu-designator]
[BTRFS_EXCLOP_SWAP_ACTIVATE] "swap activate",
^
=
Use the form with '=' to fix the warning.
Signed-off-by: David Sterba <dsterba@suse.com>
Getting the per bg type zone unusable space will be used in other size
reports like 'fi us', so export it to the device utils.
Signed-off-by: David Sterba <dsterba@suse.com>
Move the file to common as it's used by several parts, while still
keeping the name 'repair' although the only thing it does is adding a
corrupted extent.
Signed-off-by: David Sterba <dsterba@suse.com>
There's a group of functions that are related to opening filesystem in
various modes, this can be moved to a separate file.
Signed-off-by: David Sterba <dsterba@suse.com>
Decrease dependency on system headers, remove where they're not needed
or became stale after code moved. The path-utils.h encapsulate path
operations so include linux/limits.h here, that's where PATH_MAX is
defined.
Signed-off-by: David Sterba <dsterba@suse.com>
The helper wraps a raw ioctl but some users may already have the fd and
not necessarily the path. Add a suitable helper for convenience.
Signed-off-by: David Sterba <dsterba@suse.com>
This helper hasn't been used since 63bbf2931df2 ("btrfs-progs: rework
calculations of fi usage") a few years ago and we don't need the statfs
based calculations anywhere.
Signed-off-by: David Sterba <dsterba@suse.com>
We cannot overwrite superblock magic in a sequential required zone.
Instead, we can reset the zone to wipe it.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we zero out a region in a sequential write required zone, we cannot
write to the region until we reset the zone. Thus, we must prohibit zeroing
out to a sequential write required zone.
zero_dev_clamped() is modified to take the zone information and it calls
zero_zone_blocks() if the device is host managed to avoid writing to
sequential write required zones.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
All zones of zoned block devices should be reset before writing. Support
this by introducing PREP_DEVICE_ZONED.
btrfs_reset_all_zones() walk all the zones on a device, and reset a zone if
it is sequential required zone, or discard the zone range otherwise.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Superblock (and its copies) is the only data structure in btrfs which has a
fixed location on a device. Since we cannot overwrite in a sequential write
required zone, we cannot place superblock in the zone. One easy solution
is limiting superblock and copies to be placed only in conventional zones.
However, this method has two downsides: one is reduced number of superblock
copies. The location of the second copy of superblock is 256GB, which is in
a sequential write required zone on typical devices in the market today.
So, the number of superblock and copies is limited to be two. Second
downside is that we cannot support devices which have no conventional zones
at all.
To solve these two problems, we employ superblock log writing. It uses two
adjacent zones as a circular buffer to write updated superblocks. Once the
first zone is filled up, start writing into the second one. Then, when
both zones are filled up and before starting to write to the first zone
again, reset the first zone.
We can determine the position of the latest superblock by reading write
pointer information from a device. One corner case is when both zones are
full. For this situation, we read out the last superblock of each zone, and
compare them to determine which zone is older.
The following zones are reserved as the circular buffer on ZONED btrfs.
- primary superblock: offset 0B (and the following zone)
- first copy: offset 512G (and the following zone)
- Second copy: offset 4T (4096G, and the following zone)
If these reserved zones are conventional, superblock is written fixed at
the start of the zone without logging.
Currently, superblock reading/writing is done by pread/pwrite. This
commit replace the call sites with sbread/sbwrite to wrap the functions.
For zoned btrfs, btrfs_sb_io which is called from sbread/sbwrite
reverses the IO position back to a mirror number, maps the mirror number
into the superblock logging position, and do the IO.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Get the zone information (number of zones and zone size) from all the
devices, if the volume contains a zoned block device. To avoid costly
run-time zone report commands to test the device zones type during block
allocation, it also records all the zone status (zone type, write
pointer position, etc.).
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
With the zoned feature enabled, a zoned block device-aware btrfs
allocates block groups aligned to the device zones and always written in
sequential zones at the zone write pointer position.
It also supports "emulated" zoned mode on a non-zoned device. In the
emulated mode, btrfs emulates conventional zones by slicing the device
into fixed-size zones.
We don't support conversion from the ext4 volume with the zoned feature
because we can't be sure all the converted block groups are aligned to
zone boundaries.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Likewise in the kernel code, provide fs_info access from struct
btrfs_device. This will help to unify the code between the kernel and
the userland.
Since fs_info can be NULL at the time of btrfs_add_to_fsid(), let's use
btrfs_open_devices() to set fs_info to the devices.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce the queue_param helper function to get a device request queue
parameter. This helper will be used later to query information of a zoned
device.
Furthermore, rewrite is_ssd() using the helper function.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
[Naohiro] fixed error return value
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently mkfs.btrfs will output a warning message if the sectorsize is
not the same as page size:
WARNING: the filesystem may not be mountable, sectorsize 4096 doesn't match page size 65536
But since btrfs subpage support for 64K page size is coming, this output
is populating the golden output of fstests, causing tons of false
alerts.
This patch will teach mkfs.btrfs to check
/sys/fs/btrfs/features/supported_sectorsizes and check if the sector
size is supported.
Then only output above warning message if the sector size is not
supported or the file is not found (ie. kernel does not export the file
yet).
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Line continuations and not simple "\n" for the json output, this got
inherited to the plain text output, but this is not necessary.
This also caused problems in fstests btrfs/006 where the extra newline
does not match the golden output and the test fails, when printing
device stats that now use the output formatter.
Change the plain text formatting to always expect that a fmt_print or a
manual line print (like is for the device stats) will append the newline
and remove it from the end of formatting.
Link: https://lore.kernel.org/linux-btrfs/CAL3q7H4b7QhL02aSOpN0-k_9P2EAbj1t+NkA6VwidKEg4S996w@mail.gmail.com
Reported-by: Filipe Manana <fdmanana@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The current mount detection code in btrfs receive is not quite perfect.
For example, suppose /tmp is mounted as a tmpfs. In that case,
btrfs receive /tmp2 will find /tmp as the longest mount that matches a
prefix of /tmp2 and blow up because it is not a btrfs filesystem, even
if /tmp2 is just a directory in / mounted as btrfs.
Fix this by replacing the substring check with a dirname recursion to
only check the directories in the path of the dir, rather than every
substring.
Add a new test for this case.
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_open_dir already has a check whether the passed path is a
directory and if so it returns a specific error code (-3) when such an
error occurs. Use this instead of open-coding the directory check. To
avoid regression in cli/003 test also move directory checks before fs
type in btrfs_open.
Output before this check:
ERROR: resize works on mounted filesystems and accepts only
directories as argument. Passing file containing a btrfs image
would resize the underlying filesystem instead of the image.
After:
ERROR: not a directory: /root/btrfs-progs/tests/test.img
ERROR: resize works on mounted filesystems and accepts only
directories as argument. Passing file containing a btrfs image
would resize the underlying filesystem instead of the image.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Partial revert of 922eaa7b5472 ("btrfs-progs: build: fix linking with
static libmount"), remove the necessary workarounds like the weak
symbols and link time warnings. Symbols renamed not to clash with
libmount (parse_size, canonicalize_path) haven't been reverted because
the new names are acceptable.
Signed-off-by: David Sterba <dsterba@suse.com>