The helper open codes what we already have in the raid attr table, so
use it. We assume a valid flags so there's no error value.
Signed-off-by: David Sterba <dsterba@suse.com>
We'll use plain qgroupid parsing function elsewhere so split that part
from parse_qgroupid_or_path. The parsing is slightly reworked and goes
from start to end, while previously it looked up the slash and worked
from there. In case a valid qgroupid is also a valid path, the path must
be specified as absolute.
Signed-off-by: David Sterba <dsterba@suse.com>
This helper can parse a qgroupid or a path, so rename it accordingly, so
a plain qgroupid parsing can be factored out as a standalone helper.
Signed-off-by: David Sterba <dsterba@suse.com>
Add the GPL v2 header to files where it was missing and is not from an
external source, update to the most recent version with the address.
Signed-off-by: David Sterba <dsterba@suse.com>
There are some duplicate parsers of the profile names, factor out the
one from balance to the common code.
Signed-off-by: David Sterba <dsterba@suse.com>
There are various parsing helpers scattered everywhere, unify them to
one file and start with helpers already in utils.c.
Signed-off-by: David Sterba <dsterba@suse.com>
Kernel patch b2f78e88052bc0bee ("btrfs: allow degenerate raid0/raid10")
in
5.15 will allow mounting and converting to single device raid0 or two
device raid10. Let mkfs create such filesystem.
"The motivation is to allow to preserve the profile type as long as it
possible for some intermediate state (device removal, conversion), or
when there are disks of different size, with raid0 the otherwise
unusable space of the last device will be used too. Similarly for
raid10, though the two largest devices would need to be the same."
Signed-off-by: David Sterba <dsterba@suse.com>
- Change it void
The old one always return csum_size.
- Use snprintf()
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Function btrfs_format_csum() is a special helper only used in
btrfs-progs.
Move it to common/utils.[ch] other than leaving it in
kernel-shared/disk-io.c.
Since we're moving the code, also introduce a macro,
BTRFS_CSUM_STRING_LEN, to replace open-coded string length calculation.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
check_running_fs_exclop() can return 1 when exclop is changed to "none"
The ret is set by the return value of the select() operation. Checking
the exclusive op changes just the exclop variable while ret is still
set to 1.
Set ret = 0 if exclop is set to BTRFS_EXCL_NONE or BTRFS_EXCL_UNKNOWN.
Remove unnecessary continue statement at the end of the block.
The command appears to have executed, but does not. This was found when
balance which typically reports chunks relocated did not print anything
on screen.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Sysfs hides the zone size of a block device in the queue/chunk_sectors
file, so add a helper that will read it for us when given the short
device name (that can be found in FSID/devices).
Signed-off-by: David Sterba <dsterba@suse.com>
There are several directories in /sys/fs/btrfs/FSID that contain more
than one file/directory. Add a helper to open the directory so that the
file descriptor can be used for fdopendir.
Signed-off-by: David Sterba <dsterba@suse.com>
Commit 8ef9313cf2 ("btrfs-progs: zoned: implement log-structured
superblock") changed to write BTRFS_SUPER_INFO_SIZE bytes to device.
The before num of bytes to be written is sectorsize.
It causes mkfs.btrfs failed on my 16k pagesize kvm:
$ /usr/bin/mkfs.btrfs -s 16k -f -mraid0 /dev/vdb2 /dev/vdb3
btrfs-progs v5.12
See http://btrfs.wiki.kernel.org for more information.
ERROR: superblock magic doesn't match
ERROR: superblock magic doesn't match
common/device-scan.c:195: btrfs_add_to_fsid: BUG_ON `ret != sectorsize`
triggered, value 1
/usr/bin/mkfs.btrfs(btrfs_add_to_fsid+0x274)[0xaaab4fe8a5fc]
/usr/bin/mkfs.btrfs(main+0x1188)[0xaaab4fe4dc8c]
/usr/lib/libc.so.6(__libc_start_main+0xe8)[0xffff7223c538]
/usr/bin/mkfs.btrfs(+0xc558)[0xaaab4fe4c558]
[1] 225842 abort (core dumped) /usr/bin/mkfs.btrfs -s 16k -f -mraid0
/dev/vdb2 /dev/vdb3
btrfs_add_to_fsid() now always calls sbwrite() to write
BTRFS_SUPER_INFO_SIZE bytes to device, so change condition of
the BUG_ON().
Also add comments for sbread() and sbwrite().
Signed-off-by: Su Yue <l@damenly.su>
Signed-off-by: David Sterba <dsterba@suse.com>
Compiling with clang produces warnings like:
common/utils.c:1670:31: warning: use of GNU 'missing =' extension in designator [-Wgnu-designator]
[BTRFS_EXCLOP_SWAP_ACTIVATE] "swap activate",
^
=
Use the form with '=' to fix the warning.
Signed-off-by: David Sterba <dsterba@suse.com>
Getting the per bg type zone unusable space will be used in other size
reports like 'fi us', so export it to the device utils.
Signed-off-by: David Sterba <dsterba@suse.com>
Move the file to common as it's used by several parts, while still
keeping the name 'repair' although the only thing it does is adding a
corrupted extent.
Signed-off-by: David Sterba <dsterba@suse.com>
There's a group of functions that are related to opening filesystem in
various modes, this can be moved to a separate file.
Signed-off-by: David Sterba <dsterba@suse.com>
Decrease dependency on system headers, remove where they're not needed
or became stale after code moved. The path-utils.h encapsulate path
operations so include linux/limits.h here, that's where PATH_MAX is
defined.
Signed-off-by: David Sterba <dsterba@suse.com>
The helper wraps a raw ioctl but some users may already have the fd and
not necessarily the path. Add a suitable helper for convenience.
Signed-off-by: David Sterba <dsterba@suse.com>
This helper hasn't been used since 63bbf2931d ("btrfs-progs: rework
calculations of fi usage") a few years ago and we don't need the statfs
based calculations anywhere.
Signed-off-by: David Sterba <dsterba@suse.com>
We cannot overwrite superblock magic in a sequential required zone.
Instead, we can reset the zone to wipe it.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we zero out a region in a sequential write required zone, we cannot
write to the region until we reset the zone. Thus, we must prohibit zeroing
out to a sequential write required zone.
zero_dev_clamped() is modified to take the zone information and it calls
zero_zone_blocks() if the device is host managed to avoid writing to
sequential write required zones.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
All zones of zoned block devices should be reset before writing. Support
this by introducing PREP_DEVICE_ZONED.
btrfs_reset_all_zones() walk all the zones on a device, and reset a zone if
it is sequential required zone, or discard the zone range otherwise.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Superblock (and its copies) is the only data structure in btrfs which has a
fixed location on a device. Since we cannot overwrite in a sequential write
required zone, we cannot place superblock in the zone. One easy solution
is limiting superblock and copies to be placed only in conventional zones.
However, this method has two downsides: one is reduced number of superblock
copies. The location of the second copy of superblock is 256GB, which is in
a sequential write required zone on typical devices in the market today.
So, the number of superblock and copies is limited to be two. Second
downside is that we cannot support devices which have no conventional zones
at all.
To solve these two problems, we employ superblock log writing. It uses two
adjacent zones as a circular buffer to write updated superblocks. Once the
first zone is filled up, start writing into the second one. Then, when
both zones are filled up and before starting to write to the first zone
again, reset the first zone.
We can determine the position of the latest superblock by reading write
pointer information from a device. One corner case is when both zones are
full. For this situation, we read out the last superblock of each zone, and
compare them to determine which zone is older.
The following zones are reserved as the circular buffer on ZONED btrfs.
- primary superblock: offset 0B (and the following zone)
- first copy: offset 512G (and the following zone)
- Second copy: offset 4T (4096G, and the following zone)
If these reserved zones are conventional, superblock is written fixed at
the start of the zone without logging.
Currently, superblock reading/writing is done by pread/pwrite. This
commit replace the call sites with sbread/sbwrite to wrap the functions.
For zoned btrfs, btrfs_sb_io which is called from sbread/sbwrite
reverses the IO position back to a mirror number, maps the mirror number
into the superblock logging position, and do the IO.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Get the zone information (number of zones and zone size) from all the
devices, if the volume contains a zoned block device. To avoid costly
run-time zone report commands to test the device zones type during block
allocation, it also records all the zone status (zone type, write
pointer position, etc.).
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
With the zoned feature enabled, a zoned block device-aware btrfs
allocates block groups aligned to the device zones and always written in
sequential zones at the zone write pointer position.
It also supports "emulated" zoned mode on a non-zoned device. In the
emulated mode, btrfs emulates conventional zones by slicing the device
into fixed-size zones.
We don't support conversion from the ext4 volume with the zoned feature
because we can't be sure all the converted block groups are aligned to
zone boundaries.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Likewise in the kernel code, provide fs_info access from struct
btrfs_device. This will help to unify the code between the kernel and
the userland.
Since fs_info can be NULL at the time of btrfs_add_to_fsid(), let's use
btrfs_open_devices() to set fs_info to the devices.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce the queue_param helper function to get a device request queue
parameter. This helper will be used later to query information of a zoned
device.
Furthermore, rewrite is_ssd() using the helper function.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
[Naohiro] fixed error return value
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently mkfs.btrfs will output a warning message if the sectorsize is
not the same as page size:
WARNING: the filesystem may not be mountable, sectorsize 4096 doesn't match page size 65536
But since btrfs subpage support for 64K page size is coming, this output
is populating the golden output of fstests, causing tons of false
alerts.
This patch will teach mkfs.btrfs to check
/sys/fs/btrfs/features/supported_sectorsizes and check if the sector
size is supported.
Then only output above warning message if the sector size is not
supported or the file is not found (ie. kernel does not export the file
yet).
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Line continuations and not simple "\n" for the json output, this got
inherited to the plain text output, but this is not necessary.
This also caused problems in fstests btrfs/006 where the extra newline
does not match the golden output and the test fails, when printing
device stats that now use the output formatter.
Change the plain text formatting to always expect that a fmt_print or a
manual line print (like is for the device stats) will append the newline
and remove it from the end of formatting.
Link: https://lore.kernel.org/linux-btrfs/CAL3q7H4b7QhL02aSOpN0-k_9P2EAbj1t+NkA6VwidKEg4S996w@mail.gmail.com
Reported-by: Filipe Manana <fdmanana@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The current mount detection code in btrfs receive is not quite perfect.
For example, suppose /tmp is mounted as a tmpfs. In that case,
btrfs receive /tmp2 will find /tmp as the longest mount that matches a
prefix of /tmp2 and blow up because it is not a btrfs filesystem, even
if /tmp2 is just a directory in / mounted as btrfs.
Fix this by replacing the substring check with a dirname recursion to
only check the directories in the path of the dir, rather than every
substring.
Add a new test for this case.
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_open_dir already has a check whether the passed path is a
directory and if so it returns a specific error code (-3) when such an
error occurs. Use this instead of open-coding the directory check. To
avoid regression in cli/003 test also move directory checks before fs
type in btrfs_open.
Output before this check:
ERROR: resize works on mounted filesystems and accepts only
directories as argument. Passing file containing a btrfs image
would resize the underlying filesystem instead of the image.
After:
ERROR: not a directory: /root/btrfs-progs/tests/test.img
ERROR: resize works on mounted filesystems and accepts only
directories as argument. Passing file containing a btrfs image
would resize the underlying filesystem instead of the image.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Partial revert of 922eaa7b54 ("btrfs-progs: build: fix linking with
static libmount"), remove the necessary workarounds like the weak
symbols and link time warnings. Symbols renamed not to clash with
libmount (parse_size, canonicalize_path) haven't been reverted because
the new names are acceptable.
Signed-off-by: David Sterba <dsterba@suse.com>
In commit 57cfe29e69 ("btrfs-progs: utils: introduce
find_mount_fsroot") the entries in /proc/self/mountinfo are parsed by a
convenience library libmount, because getmntent does not provide the
information we need to distinguish bind mounts.
Using libmount turned out to be problematic in several ways:
- static build got broken due to clashing symbols, eg. for parsing size
or path canonicalization (#333)
- long-term distros do not have libmount new enough (2.24+) to provide
some functions (mnt_table_is_empty, #334)
- libmount internally uses getgrnam_r/mnt_get_uid/... that are not
static-build friendly, a warning is printed during link time for each
binary; we don't use any of the functions
- libmount has further library dependencies that we don't need:
$ ldd /usr/lib64/libmount.so.1
linux-vdso.so.1 (0x00007fff4f175000)
libc.so.6 => /lib64/libc.so.6 (0x00007f44a1763000)
libblkid.so.1 => /usr/lib64/libblkid.so.1 (0x00007f44a1730000)
libselinux.so.1 => /usr/lib64/libselinux.so.1 (0x00007f44a1704000)
/lib64/ld-linux-x86-64.so.2 (0x00007f44a1998000)
libpcre.so.1 => /usr/lib64/libpcre.so.1 (0x00007f44a166c000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f44a1666000)
namely selinux, pcre and dl.
Summing it up, libmount causes more trouble than it's worth using a
convenience library, we want to keep the dependencies minimal so the
custom mountinfo parser was inevitable.
Issue: #333
Issue: #334
Issue: #336
Signed-off-by: David Sterba <dsterba@suse.com>
The libmount dependency has been added in commit 61ecaff036
("btrfs-progs: build: add libmount dependency"), and static build got
broken. There are functions that do basically the same thing and also
share the name, which in turn fails at link time.
ld: /../lib64/libmount.a(libcommon_la-canonicalize.o): in function `canonicalize_dm_name':
util-linux-2.34/lib/canonicalize.c:58: multiple definition of `canonicalize_dm_name';
common/path-utils.static.o:btrfs-progs/common/path-utils.c:286: first defined here
In case the collision can be resolved by renaming, it's done
(canonicalize_path and parse_size). There are 2 symbols from selinux
that are substituted by a weak aliases during the static build.
There's one new warning due to use of getgrnam_r in libmount that
depends on dynamic linking and may not work properly with static build.
We're not using the related functions directly or indirectly, so it
should be safe to ignore the warnings.
ld: ../lib64/libmount.a(la-utils.o): in function `mnt_get_gid':
util-linux-2.34/libmount/src/utils.c:625: warning: Using 'getgrnam_r' in statically linked applications
+requires at runtime the shared libraries from the glibc version used for linking
Issue: #333
Signed-off-by: David Sterba <dsterba@suse.com>
To use optimized CRC implementation, the input buffer must be
unsigned long aligned. btrfs receive calculates checksum based on
read_buf, including btrfs_cmd_header (with zeroed CRC field)
and command content.
Reorder the buffer to the beginning of the structure and force the
alignment to 64, this should be cacheline friendly and could speed up
the data transfers.
Interesting parts from the report:
Sending host:
Fedora 33
AMD ThreadRipper 1920X - 128GB RAM
2x10GBit Ethernet, bonded
MegaRaid 9270
6x16TB Seagate Exos in RAID5
Receiving host:
Fedora 33
Intel i3-7300 - HT enabled - 32GB RAM
10GBit Ethernet, single connection
MegaRaid 9260
12x8TB WD NAS drives in RAID5
The 2 hosts are connected to the same 10G switch. The sender could definitely
saturate a 10GBit link. The practically achievable writes on the backup host
would be lower, but still at least 400MB/s. The file system contains mostly
large files of 1GB+, so there is little meta-data.
With btrfs send/receive I'm getting a steady transfer rate of 60MB/s. The copy
has been running for a little over 5 days now, having only transferred some
25TB. This is way too slow for this setup.
Analyzing resource usage, the sender side is fine, both the btrfs send and the
corresponding ssh process only use about 10-10% CPU, which on a 24 threaded
machine is virtually nothing. However, the receiver is running with a load of
~2.6, with the sshd using 30-50% CPU and the btrfs receive a further 60-70%.
The rest of the load comes from IO wait. So the bottleneck is the btrfs receive
clearly.
Issue: #324
Signed-off-by: Sheng Mao <shngmao@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This new function checks for filesystem path name that was mounted, thus
being different from find_mount_root. By using libmount we can easily
parse /proc/self/mountinfo file and check for the pathname field.
The function is useful to filter bind mounts with content different from
the original mount, thus making it safe to assume that the reported path
can be accessed by the user, with the right content.
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>