There's a group of functions that are related to opening filesystem in
various modes, this can be moved to a separate file.
Signed-off-by: David Sterba <dsterba@suse.com>
Decrease dependency on system headers, remove where they're not needed
or became stale after code moved. The path-utils.h encapsulate path
operations so include linux/limits.h here, that's where PATH_MAX is
defined.
Signed-off-by: David Sterba <dsterba@suse.com>
This patch checks if the target file system is flagged as ZONED. If it is,
the device to be added is flagged PREP_DEVICE_ZONED. Also add checks to
prevent mixing non-zoned devices and zoned devices.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Check if the target file system is flagged as ZONED. If it is, the
device to be added is flagged PREP_DEVICE_ZONED. Also add checks to
prevent mixing non-zoned devices and zoned devices.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Superblock (and its copies) is the only data structure in btrfs which has a
fixed location on a device. Since we cannot overwrite in a sequential write
required zone, we cannot place superblock in the zone. One easy solution
is limiting superblock and copies to be placed only in conventional zones.
However, this method has two downsides: one is reduced number of superblock
copies. The location of the second copy of superblock is 256GB, which is in
a sequential write required zone on typical devices in the market today.
So, the number of superblock and copies is limited to be two. Second
downside is that we cannot support devices which have no conventional zones
at all.
To solve these two problems, we employ superblock log writing. It uses two
adjacent zones as a circular buffer to write updated superblocks. Once the
first zone is filled up, start writing into the second one. Then, when
both zones are filled up and before starting to write to the first zone
again, reset the first zone.
We can determine the position of the latest superblock by reading write
pointer information from a device. One corner case is when both zones are
full. For this situation, we read out the last superblock of each zone, and
compare them to determine which zone is older.
The following zones are reserved as the circular buffer on ZONED btrfs.
- primary superblock: offset 0B (and the following zone)
- first copy: offset 512G (and the following zone)
- Second copy: offset 4T (4096G, and the following zone)
If these reserved zones are conventional, superblock is written fixed at
the start of the zone without logging.
Currently, superblock reading/writing is done by pread/pwrite. This
commit replace the call sites with sbread/sbwrite to wrap the functions.
For zoned btrfs, btrfs_sb_io which is called from sbread/sbwrite
reverses the IO position back to a mirror number, maps the mirror number
into the superblock logging position, and do the IO.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Likewise in the kernel code, provide fs_info access from struct
btrfs_device. This will help to unify the code between the kernel and
the userland.
Since fs_info can be NULL at the time of btrfs_add_to_fsid(), let's use
btrfs_open_devices() to set fs_info to the devices.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Resize to nums without sign prefix makes false output:
$ btrfs fi resize 1:150g /srv/extra
Resize device id 1 (/dev/sdb1) from 298.09GiB to 0.00B
The resize operation would take effect though.
Fix it by handling the case if mod is 0 in check_resize_args().
Issue: #307
Reported-by: Chris Murphy <lists@colorremedies.com>
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Su Yue <l@damenly.su>
Signed-off-by: David Sterba <dsterba@suse.com>
For passing authentication keys to the checksumming functions we need a
container for the key.
Pass in a btrfs_fs_info to btrfs_csum_data() so we can use the fs_info
as a container for the authentication key.
Note this is not always possible for all callers of btrfs_csum_data() so
we're just passing in NULL for now
Functions calling btrfs_csum_data() with a NULL fs_info argument are
currently not supported in the context of an authenticated file system.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Extending open_ctree with more parameters would be difficult, we'll need
to add more so factor out the parameters to a structure for easier
extension.
Signed-off-by: David Sterba <dsterba@suse.com>
Make output of 'btrfs filesystem resize' command more readable and
describe the changes in more detail.
Before:
Resize '/mnt' of '1:-1G'
After:
Resize device id 1 (/dev/vdb) from 4.00GiB to 3.00GiB
Issue: #307
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The warning is printed for profiles where it's not intended (like raid0
or raid1c4). Check the correct variable for the target profiles.
Issue: #355
Fixes: 1ed5db8db4 ("btrfs-progs: balance convert: add a warning and countdown for RAID56 conversion")
Signed-off-by: David Sterba <dsterba@suse.com>
Enhance --force to also skip the timeout, similar to what --full-balance
does. As this is only to warn about RAID56 that won't be necessary in
the future, don't add a separate option. The warning is still printed.
Signed-off-by: David Sterba <dsterba@suse.com>
Similar to the mkfs warning, add a warning to btrfs balance convert
options, with a countdown to allow the user to have time to cancel the
operation.
Issue: #265
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When replace starts with no-background and fails for the reason that
a BTRFS_FS_EXCL_OP is in progress, we still return the value 0 and also
leak the target device open, because in cmd_replace_start() we missed
the goto leave_with_error for this error.
So the test case btrfs/064 in its seqres.full output reports...
Replacing /dev/sdf with /dev/sdc
ERROR: /dev/sdc is mounted
instead of...
Replacing /dev/sdc with /dev/sdf
ERROR: ioctl(DEV_REPLACE_START) '/mnt/scratch': add/delete/balance/replace/resize operation in progress
for the failed replace attempts in the test case
Fix it by jumping to the error label which also fixes the leaked open
device.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a new subcommand 'btrfs rescue create-control-device' that creates
/dev/btrfs-control. This is helpful on systems that may not have `mknod`
installed and the device node is missing for some reason.
Issue: #223
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
[ update docs ]
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_open_dir already has a check whether the passed path is a
directory and if so it returns a specific error code (-3) when such an
error occurs. Use this instead of open-coding the directory check. To
avoid regression in cli/003 test also move directory checks before fs
type in btrfs_open.
Output before this check:
ERROR: resize works on mounted filesystems and accepts only
directories as argument. Passing file containing a btrfs image
would resize the underlying filesystem instead of the image.
After:
ERROR: not a directory: /root/btrfs-progs/tests/test.img
ERROR: resize works on mounted filesystems and accepts only
directories as argument. Passing file containing a btrfs image
would resize the underlying filesystem instead of the image.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We had a few bugs on the kernel side of send/receive where capabilities
ended up being lost after receiving a send stream. They all stem from the
fact that the kernel used to send all xattrs before issuing the chown
command, and the later clears any existing capabilities in a file or
directory.
Initially a workaround was added to btrfs-progs' receive command, in commit
123a2a0850 ("btrfs-progs: receive: restore capabilities after chown"),
and that fixed some instances of the problem. More recently, other instances
of the problem were found, a proper fix for the kernel was made, which fixes
the root problem by making send always emit the setxattr command for setting
capabilities after issuing a chown command. This was done in kernel commit
89efda52e6b693 ("btrfs: send: emit file capabilities after chown"), which
landed in kernel 5.8.
However, the workaround on the receive command now causes us to incorrectly
set a capability on a file that should not have it, because it assumes all
setxattr commands for a file always comes before a chown.
Example reproducer:
$ cat send-caps.sh
#!/bin/bash
DEV1=/dev/sdh
DEV2=/dev/sdi
MNT1=/mnt/sdh
MNT2=/mnt/sdi
mkfs.btrfs -f $DEV1 > /dev/null
mkfs.btrfs -f $DEV2 > /dev/null
mount $DEV1 $MNT1
mount $DEV2 $MNT2
touch $MNT1/foo
touch $MNT1/bar
setcap cap_net_raw=p $MNT1/foo
btrfs subvolume snapshot -r $MNT1 $MNT1/snap1
btrfs send $MNT1/snap1 | btrfs receive $MNT2
echo
echo "capabilities on destination filesystem:"
echo
getcap $MNT2/snap1/foo
getcap $MNT2/snap1/bar
umount $MNT1
umount $MNT2
When running the test script, we can see that both files foo and bar get
the capability set, when only file foo should have it:
$ ./send-caps.sh
Create a readonly snapshot of '/mnt/sdh' in '/mnt/sdh/snap1'
At subvol /mnt/sdh/snap1
At subvol snap1
capabilities on destination filesystem:
/mnt/sdi/snap1/foo cap_net_raw=p
/mnt/sdi/snap1/bar cap_net_raw=p
Since the kernel fix was backported to all currently supported stable
releases (5.10.x, 5.4.x, 4.19.x, 4.14.x, 4.9.x and 4.4.x), remove the
workaround from receive. Having such a workaround relying on the order
of commands in a send stream is always troublesome and doomed to break
one day.
A test case for fstests will come soon.
Issue: #85
Issue: #202
Issue: #292
Reported-by: Richard Brown <rbrown@suse.de>
Reviewed-by: Su Yue <l@damenly.su>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The libmount dependency has been added in commit 61ecaff036
("btrfs-progs: build: add libmount dependency"), and static build got
broken. There are functions that do basically the same thing and also
share the name, which in turn fails at link time.
ld: /../lib64/libmount.a(libcommon_la-canonicalize.o): in function `canonicalize_dm_name':
util-linux-2.34/lib/canonicalize.c:58: multiple definition of `canonicalize_dm_name';
common/path-utils.static.o:btrfs-progs/common/path-utils.c:286: first defined here
In case the collision can be resolved by renaming, it's done
(canonicalize_path and parse_size). There are 2 symbols from selinux
that are substituted by a weak aliases during the static build.
There's one new warning due to use of getgrnam_r in libmount that
depends on dynamic linking and may not work properly with static build.
We're not using the related functions directly or indirectly, so it
should be safe to ignore the warnings.
ld: ../lib64/libmount.a(la-utils.o): in function `mnt_get_gid':
util-linux-2.34/libmount/src/utils.c:625: warning: Using 'getgrnam_r' in statically linked applications
+requires at runtime the shared libraries from the glibc version used for linking
Issue: #333
Signed-off-by: David Sterba <dsterba@suse.com>
The id 0 of the default subvolume is an internal alias for the toplevel
fs tree, kernel does that conversion. Until 2116398b1d ("btrfs-progs:
use libbtrfsutil for set-default") there was no manual conversion and
the value was passed to kernel as-is. With the switch to the
libbtrfsutil API this got broken (4.19).
$ btrfs subvol set-default 0 /path
In this case the default subvolume would be containing subvolume of
/path instead of the toplevel one.
Fix it by manually switching the 0 to 5 in case user specifies that to
avoid the difference in the API, that we can't change.
Issue: #327
Reported-by: Chris Murphy
Signed-off-by: David Sterba <dsterba@suse.com>
By using find_mount_fsroot we ensure that we return a valid path to the
final user, by ensuring that even if we return a bind mount, the
pathname of btrfs used was the same from the original mount.
This for a case when bind mounts and normal mount -o subvol=/path are
mixed.
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The long options array for send is missing the zero terminator, so
unknown options result in a crash:
# btrfs send --foo
Segmentation fault (core dumped)
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add support for json formatting. Switch hard coded printing code to
formatted print with output formatter. Json output would be useful for
other programs that parse output of the command.
The plain text format is not changed for backward compatibility but this
requires to do another switch by the output type.
Example text format:
device: /dev/vdb
devid 1
write_io_errs: 0
read_io_errs: 0
flush_io_errs: 0
corruption_errs: 0
generation_errs: 0
Example json format:
{
"__header": {
"version": "1"
},
"device-stats": [
{
"device": "/dev/vdb",
"devid": "1",
"write_io_errs": "0",
"read_io_errs": "0",
"flush_io_errs": "0",
"corruption_errs": "0",
"generation_errs": "0"
}
]
}
Issue: #291
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The two variants of unit options are not suitable for all commands, the
short options could interfere with existing options or limit future
extensions.
In 'filesystem du' the short options are not documented neither in help
text, nor in documentation so fix the code
In 'scrub status' it's the same but the documentation needs to be fixed
as well.
Signed-off-by: David Sterba <dsterba@suse.com>
The help text and documentation of the --rootid and --uuid parameters
is wrong as it does not say there's a required parameter. Add it and
enhance the docs to clarify what the options do.
Issue: #317
Signed-off-by: David Sterba <dsterba@suse.com>
User reported that 'btrfs subvolume show -u -- /mnt' causes double free.
Pointer subvol_path was freed in iterations but still keeps the old
value. In the last iteration, error BTRFS_UTIL_ERROR_STOP_ITERATION
returned, then the double free of subvol_path happens in the out goto
label.
Set subvol_path to NULL after each free() in the loop to fix the issue.
Issue: #317
Signed-off-by: Su Yue <l@damenly.su>
Signed-off-by: David Sterba <dsterba@suse.com>
The exclusive ops will not start if there's one already running. Now
that we have the sysfs export (since kernel 5.10) to check if there's
one already running, use it to allow enqueueing of the operations as a
convenience.
Supported enqueuing:
btrfs balance start --enqueue
btrfs filesystem resize --enqueue
btrfs device add --enqueue
btrfs device delete --enqueue
btrfs replace start --enqueue
This patch implements the functionality based on Goldwyn's patch
https://lore.kernel.org/linux-btrfs/?q=20200825150338.32610-4-rgoldwyn%40suse.de
but on top of previous preparatory patches.
Note that 'filesystem resize' options could confuse getopt as the
negative size change looks like a series of short options and there's no
way to make getopt ignore the short options, so there's a custom option
parser.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Add available space information from statfs(). This can be different from
'Free (estimated)' in some cases. This patch provide more information
about filesystem usage like below.
Overall:
Device size: 5.00GiB
Device allocated: 1.02GiB
Device unallocated: 3.98GiB
Device missing: 0.00B
Used: 88.00KiB
Free (estimated): 4.48GiB (min: 2.49GiB)
Free (statfs, df) 4.48GiB
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 832.00KiB (used: 0.00B)
Multiple profiles: no
Issue: #306
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If the exclusive operation is available in sysfs file, check if there's
one already running. The check is done for:
- device add, remove, replace
- balance
- filesystem resize
All commands will validate arguments and check before the ioctl or
before any potentially irreversible operations (like clearing device
before replacing).
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add long options for size units, affecting total and currently scrubbed
bytes. The rate depends on the device speed and could be
disproportionate to the size so it is not affected, except the --raw
option that is in bytes without unit suffix.
Signed-off-by: David Sterba <dsterba@suse.com>
Add ratio of the bytes scrubbed to total in the status output, like:
Total to scrub: 2.54TiB
Bytes scrubbed: 1.59TiB (62.58%)
Signed-off-by: David Sterba <dsterba@suse.com>
Currently most btrfs commands separate their output with empty lines
which makes them more human readable. The scrub cmd when used with -d
arg to show per device information does not. It makes it harder to find
values for current disk because they are not separated from each other.
This commit adds an empty line after each device summary to make it
match other btrfs cmd outputs.
For some reason this was the only line in scrub status that did not
start from capital letter. Now it is more consistent with the rest.
Pull-request: #256
Author: Rafostar <Rafostar@users.noreply.github.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If subvolume deletion fails with EPERM, the most common reasons are that
it's a default subvolume (addressed by an earlier patch) or that the
subvolume is part of a send operation. This is printed to the system log
and there's no information available for user space, but at least the
warning can hint the user that something could be going on.
Signed-off-by: David Sterba <dsterba@suse.com>
Deleting the default subvolume is not permitted and kernel prints a
message to the system log. This is not immediately clear to the user and
we had requests to improve that.
This patch will read the default subvolume id and reject deletion
without trying to delete it.
Issue: #274
Issue: #255
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207975
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Calculate average fanout between levels:
Levels: 4
Total nodes: 289048
On level 0: 288054
On level 1: 989 (avg fanout 291)
On level 2: 4 (avg fanout 247)
On level 3: 1 (avg fanout 4)
Signed-off-by: David Sterba <dsterba@suse.com>
The node/leaf stats have been calculated but never displayed. Moreover,
a more detailed information about counts on each level can be useful,
add it to the output of tree-stats.
Example output:
Levels: 3
Total nodes: 25692
On level 0: 25601
On level 1: 90
On level 2: 1
Issue: #266
Signed-off-by: David Sterba <dsterba@suse.com>
Many subcommands have their own verbosity options that are being
superseded by the global options. Update the help text to reflect that
where applicable.
Signed-off-by: David Sterba <dsterba@suse.com>
Enable the quiet option to the scrub cancel command.
Does the job quietly. For example:
$ btrfs -q scrub cancel <mnt>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Enable the quiet option to the subvolume snapshot command.
Does the job quietly. For example:
$ btrfs -q subvolume snapshot <src> <dest>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Enable the quiet option to the balance resume command.
Does the job quietly. For example:
$ btrfs -q balance resume <path>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Enable the quiet option to the balance start command.
Does the job quietly. For example:
$ btrfs -q balance start --full-balance <path>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Enable the quiet option to the subvolume delete command.
Does the job quietly. For example:
$ btrfs --quiet subvolume delete <path>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Enable the quiet option to the subvolume create command.
Does the job quietly. For example:
$ btrfs --quiet subvolume create <path>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Enable the quiet option to the quota rescan command.
Does the job quietly. For example:
$ btrfs --quiet quota rescan
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Enable the quiet option to the device scan command. Does the job
quietly. For example:
$ btrfs -q device scan
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Function btrfs_scan_devices() is being used by commands such as
'btrfs filesystem' and 'btrfs device', by having the verbose argument in
the btrfs_scan_devices() we can control which threads to print the
messages when verbose is enabled by the global option.
Add an option %verbose to btrfs_scan_devices().
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Propagate global --verbose option down to the btrfs inspect-internal
logical-resolve subcommand.
Command 'btrfs inspect-internal logical-resolve' provides local verbose
option this patch makes it enable-able by using the global --verbose
option.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Propagate global --verbose option down to the btrfs inspect-internal
inode-resolve subcommand.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Propagate global --verbose option down to the btrfs restore subcommand.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Propagate global --verbose option down to the btrfs rescue super-recover
subcommand.
Both global and local verbose options are now supported:
btrfs -v rescue super-recover
btrfs rescue super-recover -v
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Propagate global --verbose option down to the btrfs rescue chunk-recover
subcommand.
Both global and local verbose options are now supported and aliases:
btrfs -v rescue chunk-recover
btrfs rescue chunk-recover -v
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Propagate global --verbose option down to the btrfs balance status
subcommand.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Propagate global --verbose option down to the btrfs balance start
subcommand.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Propagate global --verbose option down to the btrfs receive subcommand.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Propagate global --verbose option down to the btrfs subvolume delete
subcommand.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Propagate global --verbose and --quiet options down to the btrfs receive
subcommand.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Propagate global --verbose and --quiet options down to the btrfs send
subcommand.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
dump_superblock() is useful to debug eg. btrfs-image errors, like
fsck/012-* test case, where the superblock itself has something wrong
from the original image.
Export it so that we can call it in gdb.
Since we're exporting dump_superblock(), rename it to
btrfs_print_superblock() to following the existing naming schema.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The function print_filesystem_usage_overall() prints the info on the
basis of the r_*_chunk, r_*_used and l_*_chunks values computed for
data, metadata and system chunks.
For the RAID1/10/1C3/1C4/DUP these info are easily accessible from the
info returned by load_space_info().
However for RAID5/6 this is not true because the ratios between the l_*
and r_* values are not fixed but depend by the number of devices
involved in the chunk.
A new function called get_raid56_space_info() is created to compute
the values r_*_chunk, and r_*_used for data, metadata and system
chunks in case of a RAID5/6 profile.
The r_*_chunk values are computed from the chunk_info array.
In order to compute the r_*_used values, a new function
get_raid56_logical_ratio() is created. This function computes the ratio
l_*_used / l_*_chunk from the ioctl_space_args array. So we can get:
'r_*_used' = 'r_*_chunk' * 'l_*_used' / 'l_*_chunk'
Even tough this is not mathematically true every time, it is true on
"average" (for example if the RAID5 chunks use different number of disks
the real values depend by which chunk contains the data).
Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
Signed-off-by: David Sterba <dsterba@suse.com>
This would sync the code between kernel and btrfs-progs, and save at
least 1 byte for each btrfs_block_group_cache.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Update the summary of 'fi usage' where the multiple profiles will be
listed by type, like:
Multiple profiles: yes (data, metadata)
The string is returned from btrfs_test_for_multiple_profiles so the
callers don't have to assemble it together from the other profile
strings.
Signed-off-by: David Sterba <dsterba@suse.com>
The term 'mixed' is confusing as it's commonly used for mised block
group profiles created by 'mkfs.btrfs --mixed'. We're interested in
multiple profiles for each type, so use the term 'multiple'.
Signed-off-by: David Sterba <dsterba@suse.com>
Add the warning to 'device usage' and 'filesystem df'.
Signed-off-by: Goffredo Baroncelli <kreijack@inwid.it>
Signed-off-by: David Sterba <dsterba@suse.com>
A new line in the "Overall" section is added to inform that 'Multiple
profiles' are present.
Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a check in some btrfs subcommands to detect if a filesystem
has mixed profiles for data/metadata/system. In this case
a warning is showed.
Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
Signed-off-by: David Sterba <dsterba@suse.com>
Some scripts can still rely on this message, so make it available with
-vv, so -v stays sane.
Fixes: #127
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When we process a clone request, we look up the source subvolume by
UUID, even if the source is the subvolume that we're currently
receiving. Usually, this is fine. However, if for some reason we
previously received the same subvolume, then this will use paths
relative to the previously received subvolume instead of the current
one. This is incorrect, since the send stream may use temporary names
for the clone source. This can be reproduced as follows:
btrfs subvolume create subvol
dd if=/dev/urandom of=subvol/foo bs=1M count=1
cp --reflink subvol/foo subvol/bar
mkdir subvol/dir
mv subvol/foo subvol/dir/
btrfs property set subvol ro true
btrfs send -f send.data subvol
mkdir first second
btrfs receive -f send.data first
btrfs receive -f send.data second
The second receive results in this error:
ERROR: cannot open first/subvol/o259-7-0/foo: No such file or directory
Fix it by always cloning from the current subvolume if its UUID matches.
This has the nice side effect of avoiding unnecessary UUID tree lookups
in that case.
Fixes: f1c24cd80d ("Btrfs-progs: add btrfs send/receive commands")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The checks for a subvolume being modified after it was received have
been commented out since they were added back in commit f1c24cd80d
("Btrfs-progs: add btrfs send/receive commands"). Let's just get rid of
the noise.
If they were ever in place, it would have never been possible
to do an incremental send and running dedupe against the parent
snapshot.
That particular use case used to cause send, the kernel side, to fail
(initially with a BUG_ON() and later with -EIO returned to user
space), see commit b4f9a1a87a48 ("Btrfs: fix incremental send failure
after deduplication").
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
[ add Filipe's note ]
Signed-off-by: David Sterba <dsterba@suse.com>
The old code of copy_one_extent() is a mess:
- The main loop is implemented using goto
- @mirror_num is reset to 1 for each loop
- @mirror num check against @num_copies is wrong for decompression error
This patch will fix this mess by:
- Use read_extent_data()
read_extent_data() has all the good wrapping of btrfs_map_block()
and length check.
This removes a lot of complexity.
- Add extra file extent offset check
To prevent underflow for memory allocation
- Do proper mirror_num check for decompression error
Issue: #221
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Previously, no filenames/xattrs would be printed with --nofilename, but
to keep the format of dump, print a placeholder instead of all names.
This is:
* directory entries (files, directories, subvolumes)
* default subvolume
* extended attributes (name, value)
* hardlink names if stored inside another item
Note that lengths are not hidden because they can be calculated from the
item size anyway.
Signed-off-by: David Sterba <dsterba@suse.com>
In the mail list, it's pretty common that a developer is asking dump tree
output from the reporter, it's better to protect those kind reporters by
hiding the filename if the reporter wants.
This option will skip @name/@data output for the following items:
- DIR_INDEX
- DIR_ITEM
- INODE_REF
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
According to the documentation, btrfs qgroup remove takes the same
options as qgroup assign, i.e., --rescan and --no-rescan. However,
currently no options are accepted. Activate option handling also for
qgroup remove, so that automatic rescan can be disabled by the user.
Signed-off-by: Michael Lass <bevan@bi-co.net>
Signed-off-by: David Sterba <dsterba@suse.com>
LOGICAL_INO v1 ignored the reserved fields, so they could be filled
with random stack garbage and have no effect. LOGICAL_INO_V2 requires
all unused reserved bits to be set to zero, and returns EINVAL if they
are not, to guard against future kernel versions which may interpret
non-zero bit values.
Sometimes when 'btrfs ins log' runs, the stack garbage is zeros, so the
-o (ignore offsets) option for logical-resolve works. Sometimes the
stack garbage is something else, and 'btrfs ins log -o' fails with
invalid argument. This depends mostly on compiler version and build
environment details, so a binary typically either always works or never
works.
Fix by initializing logical-resolve's argument structure with a C99
compound literal zero.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Signed-off-by: David Sterba <dsterba@suse.com>
This ioctl will be responsible for deleting a subvolume using its id.
This can be used when a system has a file system mounted from a
subvolume, rather than the root file system, like below:
/
@subvol1/
@subvol2/
@subvol_default/
If only @subvol_default is mounted, we have no path to reach @subvol1
(id 256) and @subvol2 (id 257), thus no way to delete them. Current
subvolume delete ioctl takes a file handle point as argument, and if
@subvol_default is mounted, we can't reach @subvol1 and @subvol2 from
the same mount point.
$ mount -o subvol=subvol_default /mnt
$ btrfs subvolume delete -i 257 /mnt
This will delete @subvol2 although it's path is hidden.
Fixes: #152
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This commit organises block groups cache in
btrfs_fs_info::block_group_cache_tree. And any dirty block groups are
linked in transaction_handle::dirty_bgs.
To keep coherence of bisect, it does almost replace in place:
1. Replace the old btrfs group lookup functions with new functions
introduced in former commits.
2. set_extent_bits(..., BLOCK_GROUP_DIRYT) things are replaced by linking
the block group cache into trans::dirty_bgs. Checking and clearing bits
are transformed too.
3. set_extent_bits(..., bit | EXTENT_LOCKED) things are replaced by
new the btrfs_add_block_group_cache() which inserts caches into
btrfs_fs_info::block_group_cache_tree directly. Other operations are
converted to tree operations.
Signed-off-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We are going to touch dirty_bgs in transaction directly, so every call
chain should pass @trans to the leaf functions.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filesystems with nontrivial snapshots or dedupe will easily overflow
a 4K buffer. Bump the size up to the largest size supported by the
V1 ioctl.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Signed-off-by: David Sterba <dsterba@suse.com>
Increase the maximum buffer size to SZ_16M.
Add an option (-o) to set the ..._IGNORE_OFFSET flag.
If the buffer size is greater than 64K or the IGNORE_OFFSET option
is used, call ioctl V2; otherwise, use ioctl V1 to be compatible with
older kernels.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Signed-off-by: David Sterba <dsterba@suse.com>
When kernel returns ENOTCONN after the user tries to create a qgroup on
a subvolume without quota enabled, present a meaningful message to the
user. Kernels before 5.5 return EINVAL for that.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is based on idea from Stanislaw Gruszka to print the ratio,
originally suggested for the 'fi df', but we don't want to add new
things there and let people use 'fi us' instead. The new output fits
there and is printed by default without options:
Example output:
$ btrfs fi us /mnt
[...]
Data,single: Size:339.00GiB, Used:172.05GiB (50.75%)
Metadata,single: Size:7.00GiB, Used:3.41GiB (48.70%)
System,single: Size:32.00MiB, Used:64.00KiB (0.20%)
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Even "btrfs rescue zero-log" only reset btrfs_super_block::log_root and
btrfs_super_block::log_root_level, we still use trasction to write all
super blocks for all devices.
This means we can't handle things like corrupted extent tree:
checksum verify failed on 2172747776 found 000000B6 wanted 00000000
checksum verify failed on 2172747776 found 000000B6 wanted 00000000
bad tree block 2172747776, bytenr mismatch, want=2172747776, have=0
WARNING: could not setup extent tree, skipping it
Clearing log on /dev/nvme/btrfs, previous log_root 0, level 0
ERROR: Corrupted fs, no valid METADATA block group found
ERROR: attempt to start transaction over already running one
[CAUSE]
Because we have extra check in transaction code to ensure we have valid
METADATA block groups.
In fact we don't really need transaction at all.
[FIX]
Instead of commit transaction, we can just call write_all_supers()
manually, so we can still handle multi-device fs while avoid above
error.
Also, add OPEN_CTREE_NO_BLOCK_GROUPS open ctree flag to make it more
robust.
Link: https://lore.kernel.org/linux-btrfs/CAKbQEqG35D_=8raTFH75-yCYoqH2OvpPEmpj2dxgo+PTc=cfhA@mail.gmail.com/
Reported-by: Christian Pernegger <pernegger@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We access btrfs_block_group_cache::item mostly for @used and @flags.
@flags is already a dedicated member in btrfs_block_group_cache, only
@used doesn't have a dedicated member.
This patch will remove btrfs_block_group_cache::item and add
btrfs_block_group_cache::used.
It's the btrfs-progs equivalent of the following kernel patches:
btrfs: move block_group_item::used to block group
btrfs: move block_group_item::flags to block group
btrfs: remove embedded block_group_cache::item
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs balance status supports both short and long option -v|--verbose
but usage failed to show it in its --help. This patch fixes the --help.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs balance start supports both short and long option -v|--verbose
however usage failed to show the long option. This patch fixes the --help.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Even when -q option specified, the receive sub-command is not quiet as
shown below.
$ btrfs receive -q -f /tmp/t /btrfs1
At snapshot ss3
It must be quiet at least when it's been asked to be quiet.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add definition, crypto wrappers and support to mkfs for blake2 for
checksumming. There are 2 aliases either blake2 or blake2b.
Signed-off-by: David Sterba <dsterba@suse.com>
Add the definition to the checksum types and let mkfs accept it.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
With the introduction of xxhash64 to btrfs-progs we created a crypto/
directory for all the hashes used in btrfs (although no
cryptographically secure hash is there yet).
Move the crc32c implementation from kernel-lib/ to crypto/ as well so we
have all hashes consolidated.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Adding this table will make extending btrfs-progs with new checksum types
easier.
Also add accessor functions to access the table fields.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a helper to check if we have a valid csum type from the super block.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Update the checksumming API to be able to cope with more checksum types
than just CRC32C. The finalization call is merged into btrfs_csum_data.
There are some fixme's and asserts added that need to be resolved.
Co-developed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
In preparation to supporting new checksum algorithm pass the checksum type
to btrfs_csum_data/btrfs_csum_final, this allows us to encapsulate any
differences in processing into the respective functions
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Pass pointer to a generic buffer instead of fixed size that crc32c
currently uses.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Add the checksum type to csum_tree_block_size(), __csum_tree_block_size()
and verify_tree_block_csum_silent().
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Cache the super-block's checksum type field in 'struct recover_control'.
This will be needed for further refactoring the checksum handling.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
The only difference between parse_limit and parse_size is that
parse_limit accepts "none" as a valid input. That's easy enough to
handle as a special case and lets us drop the duplicate code.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We want just one header for the check API (similar to what mkfs does)
but as btrfsck.h is exported header (libbtrfs), it needs some
deprecation beriod before it's moved through there are probably no users
of that header file in particular.
Copy the header to check, all modifications and cleanups won't affect
the public header.
Signed-off-by: David Sterba <dsterba@suse.com>
The default traversal has been switched to BFS due, update the
documentation accordingly. Also fix the help text of the command that
ommitted to mention the options.
Signed-off-by: David Sterba <dsterba@suse.com>
Move the full-balance warning to before the fork, so that the user can
see and react to it.
Notes on test:
- Don't use grep -q, as it causes a SIGPIPE during the countdown, and
the balance thus doesn't start.
- The "balance cancel" is superfluous as the last command, but it
provides some idempotence and allows adding more tests below it.
Issue: #168
Signed-off-by: Vladimir Panteleev <git@vladimir.panteleev.md>
Signed-off-by: David Sterba <dsterba@suse.com>
User reports:
"When I execute btrfs restore -S to restore a symlink, it prints:
SYMLINK: 'dest/path/of/symlink' => 'symlink/target'
Failed to change owner: Bad file descriptor
And at cmds-restore.c#L937:
ret = fchownat(-1, file, btrfs_inode_uid(path.nodes[0], inode_item),
btrfs_inode_gid(path.nodes[0], inode_item),
AT_SYMLINK_NOFOLLOW);
"
The -1 is indeed a bad descriptor, and should be probably AT_FDCWD as
this is documented. The path passed as 'file' is always absolute, so the
semantics are unaffected.
Issue: #183
Signed-off-by: David Sterba <dsterba@suse.com>
ETA is calculated in a wrong way. It should be just current time in
seconds + sec_left, independently if the job was resumed or not.
Pull-request: #190
Signed-off-by: Grzegorz Kowal <grzegorz@amuncode.org>
Signed-off-by: David Sterba <dsterba@suse.com>
In process_clone(), we're not checking the return value of strdup().
But, there's no reason to strdup() in the first place: we just pass the
path into path_cat_out(). Get rid of the strdup().
Fixes: f1c24cd80d ("Btrfs-progs: add btrfs send/receive commands")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
- add first line of the long description
- list the type values in all commands (set, get, list)
- enhance
- split option description
Signed-off-by: David Sterba <dsterba@suse.com>
The comman 'btrfs inspect dump-tree <dev>' will scan all the devices
from the filesystem by defaul.
So as of now you can not inspect each mirrored device independently.
This patch adds option --noscan, which when used won't scan the system
for the partner devices, instead it just uses the devices provided in
the argument.
For example:
btrfs inspect dump-tree --noscan <dev> [<dev>..]
This helps to debug degraded raid1 and raid10.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
fgets consumes n-1 bytes from input buffer.
When a user types y\n, the newline is left in the buffer. As a result,
the next fgets uses that \n as answer without waiting for the user to
type.
This patch also fix a bug that dereference the ret without checking if
it's NULL.
* Consumes the `\n` from stdin buffer
* Avoid NULL pointer dereference: treat EOF as default value
Pull-request: #182
Author: pjw91 <mail6543210@yahoo.com.tw>
Signed-off-by: David Sterba <dsterba@suse.com>
When a scrub completes or is cancelled, statistics are updated for
reporting in a later btrfs scrub status command and for resuming the
scrub. Most statistics (such as bytes scrubbed) are additive so scrub
adds the statistics from the current run to the saved statistics.
However, the last_physical statistic is not additive. The value from the
current run should replace the saved value. The current code incorrectly
adds the last_physical from the current run to the previous saved value.
This bug causes the resume point to be incorrectly recorded, so large
areas of the disk are skipped when the scrub resumes. As an example,
assume a disk had 1000000 bytes and scrub was cancelled and resumed each
time 10% (100000 bytes) had been scrubbed.
Run | Start byte | bytes scrubbed | kernel last_physical | saved last_physical
1 | 0 | 100000 | 100000 | 100000
2 | 100000 | 100000 | 200000 | 300000
3 | 300000 | 100000 | 400000 | 700000
4 | 700000 | 100000 | 800000 | 1500000
5 | 1500000 | 0 | immediately completes| completed
In this example, only 40% of the disk is actually scrubbed.
This patch changes the saved/displayed last_physical to track the last
reported value from the kernel.
Signed-off-by: Graham R. Cobb <g.btrfs@cobb.uk.net>
Signed-off-by: David Sterba <dsterba@suse.com>
This adds a global --format option to request extended output formats
from each command.
We currently only support text mode. Command help reports what
output formats are available for each command. Global help reports
what valid formats are.
If an invalid format is requested, an error is reported and lists the
valid formats.
Each command sets a bitmask that describes which formats it is capable
of outputting. If a globally valid format is requested of a command
that doesn't support it, an error is reported and command usage dumped.
Commands don't need to specify that they support text output. All
commands are required to output text.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
[ use global config instead of passing cmd_context ]
Signed-off-by: David Sterba <dsterba@suse.com>
For options that do not have the long description, the empty string is
required to mark where the options start. Some commands were missing
that.
Signed-off-by: David Sterba <dsterba@suse.com>