[BUG]
There is one report about `btrfs rescue clear-ino-cache` failed with
tree block level mismatch:
# btrfs rescue clear-ino-cache /dev/mapper/rootext
Successfully cleaned up ino cache for root id: 5
Successfully cleaned up ino cache for root id: 257
Successfully cleaned up ino cache for root id: 258
corrupt node: root=7 block=647369064448 slot=0, invalid level for leaf, have 1 expect 0
node 647369064448 level 1 items 252 free space 241 generation 6065173 owner CSUM_TREE
node 647369064448 flags 0x1(WRITTEN) backref revision 1
fs uuid e6614f01-6f56-4776-8b0a-c260089c35e7
chunk uuid f665f535-4cfd-49e0-8be9-7f94bf59b75d
key (EXTENT_CSUM EXTENT_CSUM 3714473984) block 677126111232 gen 6065002
[...]
key (EXTENT_CSUM EXTENT_CSUM 6192357376) block 646396493824 gen 6065032
ERROR: failed to clear ino cache: Input/output error
[CAUSE]
During `btrfs rescue clear-ino-cache`, btrfs-progs will iterate through
all the subvolumes, and clear the inode cache inode from each subvolume.
The problem is in how we iterate the subvolumes.
We hold a path of tree root, and go modifiy the fs for each found
subvolume, then call btrfs_next_item().
This is not safe, because the path to tree root is not longer reliable
if we modified the fs.
So the btrfs_next_item() call will fail because the fs is modified
halfway, resulting the above problem.
[FIX]
Instead of holding a path to a subvolume root item, and modify the fs
halfway, here introduce a helper, find_next_root(), to locate the root
item whose objectid >= our target rootid, and return the found item key.
The path to root tree is only hold then released inside
find_next_root().
By this, we won't hold any unrelated path while modifying the
filesystem.
And since we're here, also adding back the missing new line when all ino
cache is cleared.
Pull-request: #890
Reported-by: Archange <archange@archlinux.org>
Link: https://lore.kernel.org/linux-btrfs/4803f696-2dc5-4987-a353-fce1272e93e7@archlinux.org/
Signed-off-by: Qu Wenruo <wqu@suse.com>
There is an internal report that, during btrfs-convert to block-group
tree, by accident some systemd events triggered the mount of the target
fs.
This leads to double mount (one by kernel and one by the btrfs-progs),
which seems to cause quite some problems.
To avoid such accident, exclusively opens all devices if btrfs-progs is
doing write operations.
Pull-request: #888
Reported-by: pandada8 <pandada8@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Remove last newline in the output of 'btrfs filesystem show', keep the
line between two filesystems so the devices are visually grouped
togehter.
Pull-request: #866
Author: Matt Langford <github@matt.boats>
Signed-off-by: David Sterba <dsterba@suse.com>
process_clone() only searches the received_uuid, but could exist in an
earlier uuid that isn't the received_uuid. Mirror what process_snapshot
does and search both the received_uuid and if that fails look up by
normal uuid.
Fixes: https://github.com/kdave/btrfs-progs/issues/606
Issue: #606
Pull-request: #643
Pull-request: #862
Signed-off-by: Arsenii Skvortsov <ettavolt@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
ASAN test fails at misc/055 with the following leak:
Qgroupid Referenced Exclusive Path
-------- ---------- --------- ----
0/5 16.00KiB 16.00KiB <toplevel>
0/256 16.00KiB 16.00KiB <stale>
====== RUN CHECK /home/runner/work/btrfs-progs/btrfs-progs/btrfs qgroup clear-stale /home/runner/work/btrfs-progs/btrfs-progs/tests/mnt
=================================================================
==102571==ERROR: LeakSanitizer: detected memory leaks
Indirect leak of 4096 byte(s) in 1 object(s) allocated from:
#0 0x7fd1c98fbb37 in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
#1 0x55aa2f8953f8 in btrfs_util_subvolume_path_fd libbtrfsutil/subvolume.c:178
#2 0x55aa2f8fa2a6 in get_or_add_qgroup cmds/qgroup.c:837
#3 0x55aa2f8fa7e9 in update_qgroup_info cmds/qgroup.c:883
#4 0x55aa2f8fd912 in __qgroups_search cmds/qgroup.c:1385
#5 0x55aa2f8fe196 in qgroups_search_all cmds/qgroup.c:1453
#6 0x55aa2f902a7c in cmd_qgroup_clear_stale cmds/qgroup.c:2281
#7 0x55aa2f73425b in cmd_execute cmds/commands.h:126
#8 0x55aa2f734bcc in handle_command_group /home/runner/work/btrfs-progs/btrfs-progs/btrfs.c:177
#9 0x55aa2f73425b in cmd_execute cmds/commands.h:126
#10 0x55aa2f735a96 in main /home/runner/work/btrfs-progs/btrfs-progs/btrfs.c:518
#11 0x7fd1c942a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 08134323d00289185684a4cd177d202f39c2a5f3)
#12 0x7fd1c942a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 08134323d00289185684a4cd177d202f39c2a5f3)
#13 0x55aa2f734144 in _start (/home/runner/work/btrfs-progs/btrfs-progs/btrfs+0x84144) (BuildId: 56f3dd838e1ae189c142c5d27fac025cd46deddb)
Indirect leak of 432 byte(s) in 2 object(s) allocated from:
#0 0x7fd1c98fb4d0 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77
#1 0x55aa2f8fa1a1 in get_or_add_qgroup cmds/qgroup.c:822
#2 0x55aa2f8fa7e9 in update_qgroup_info cmds/qgroup.c:883
#3 0x55aa2f8fd912 in __qgroups_search cmds/qgroup.c:1385
#4 0x55aa2f8fe196 in qgroups_search_all cmds/qgroup.c:1453
#5 0x55aa2f902a7c in cmd_qgroup_clear_stale cmds/qgroup.c:2281
#6 0x55aa2f73425b in cmd_execute cmds/commands.h:126
#7 0x55aa2f734bcc in handle_command_group /home/runner/work/btrfs-progs/btrfs-progs/btrfs.c:177
#8 0x55aa2f73425b in cmd_execute cmds/commands.h:126
#9 0x55aa2f735a96 in main /home/runner/work/btrfs-progs/btrfs-progs/btrfs.c:518
#10 0x7fd1c942a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 08134323d00289185684a4cd177d202f39c2a5f3)
#11 0x7fd1c942a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 08134323d00289185684a4cd177d202f39c2a5f3)
#12 0x55aa2f734144 in _start (/home/runner/work/btrfs-progs/btrfs-progs/btrfs+0x84144) (BuildId: 56f3dd838e1ae189c142c5d27fac025cd46deddb)
[CAUSE]
Above leaks are caused by two btrfs_qgroup structures and one path for
toplevel qgroup.
It's caused by the fact that we called qgroups_search_all() but didn't
do any cleanup.
[FIX]
Call __free_all_qgroups() inside cmd_qgroup_clear_stale() to properly
free the qgroups.
Fixes: 701ab151c2 ("btrfs-progs: qgroup: new command to delete stale qgroups")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Commit d7492ec59e ("btrfs-progs: use on-stack buffer in
__ino_to_path_fd") was supposed to switch path buffer from dynamic
allocation to on-stack but it was done wrong. The btrfs_data_container
is a flexible array so it needs to be explicitly allocated to the right
size.
The conversion turned it to an array. Gcc 13.x started to warn about
access to fspath->val[i] being out of bounds. Fortunately overall size
was 65536 and used only first 4096 bytes.
cmds/inspect.c: In function ‘__ino_to_path_fd’:
cmds/inspect.c:86:35: warning: array subscript i is outside array bounds of ‘__u64[]’ {aka ‘long long unsigned int[]’} [-Warray-bounds=]
86 | ptr += fspath->val[i];
| ~~~~~~~~~~~^~~
In file included from ./kernel-shared/accessors.h:11,
from cmds/inspect.c:35:
./kernel-shared/uapi/btrfs.h:724:17: note: while referencing ‘val’
724 | __u64 val[]; /* out */
Add an on-stack buffer and map it over fspath, similar to the previous
dynamic array.
Signed-off-by: David Sterba <dsterba@suse.com>
The list-chunk command is deemed to be reasonably complete so make it
visible in the default build. The output can be tweaked later.
Issue: #559
Signed-off-by: David Sterba <dsterba@suse.com>
The long options now allow to pass the unit mode in the usual way, drop
the local variable for raw byte values.
Signed-off-by: David Sterba <dsterba@suse.com>
The string escaping functionality is more generic and can be used in
other commands (e.g. in dump-tree). Move it to the string utils.
Signed-off-by: David Sterba <dsterba@suse.com>
tree-stats currently displays only some global trees and fs-tree 5. Add
support to show the stats of a specified tree.
Issue: #268
Signed-off-by: Chung-Chiang Cheng <cccheng@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This makes btrfs scrub status --si show the Rate in metric units as well.
Before:
Total to scrub: 877.65GB
Rate: 609.22MiB/s
After:
Total to scrub: 877.65GB
Rate: 638.81MB/s
Pull-request: #832
Author: Ivan Kozik <ivan@ludios.org>
Signed-off-by: David Sterba <dsterba@suse.com>
Enhance the sorting capabilities of 'inspect list-chunks' to allow
multiple keys. Drop the gaps, this works only for pstart and it's hard
to make it work with arbitrary sort keys.
Usage is printed by default, assuming this is an interesting info and
even if it slows down the output (due to extra lookups) it's more
convenient to print it rather than not.
The options related to usage and empty were removed.
Output changes:
- rename Number to PNumber, meaning physical number on the device
- print Devid, device number, can be also sort key
Examples:
btrfs inspect list-chunks /mnt
btrfs inspect list-chunks --sort length,usage
btrfs inspect list-chunks --sort lstart
Depending on the sort key order, the output can be wild, for that the
PNumber and LNumber give some hint where the chunks lie in their space.
Example output:
$ sudo ./btrfs inspect list-chunks --sort length,usage /
Devid PNumber Type/profile PStart Length PEnd LNumber LStart Usage%
----- ------- ----------------- --------- --------- --------- ------- --------- ------
1 7 Data/single 1.52GiB 16.00MiB 1.54GiB 69 191.68GiB 86.04
1 3 System/DUP 117.00MiB 32.00MiB 149.00MiB 40 140.17GiB 0.05
1 2 System/DUP 85.00MiB 32.00MiB 117.00MiB 39 140.17GiB 0.05
1 15 Data/single 8.04GiB 64.00MiB 8.10GiB 61 188.60GiB 94.46
1 1 Data/single 1.00MiB 84.00MiB 85.00MiB 68 191.60GiB 74.24
1 5 Metadata/DUP 341.00MiB 192.00MiB 533.00MiB 60 188.41GiB 82.58
1 4 Metadata/DUP 149.00MiB 192.00MiB 341.00MiB 59 188.41GiB 82.58
1 20 Metadata/DUP 9.29GiB 256.00MiB 9.54GiB 38 139.92GiB 57.76
1 19 Metadata/DUP 9.04GiB 256.00MiB 9.29GiB 37 139.92GiB 57.76
1 22 Metadata/DUP 9.79GiB 256.00MiB 10.04GiB 25 113.15GiB 57.93
1 21 Metadata/DUP 9.54GiB 256.00MiB 9.79GiB 24 113.15GiB 57.93
1 46 Metadata/DUP 29.29GiB 256.00MiB 29.54GiB 43 142.71GiB 62.38
Signed-off-by: David Sterba <dsterba@suse.com>
There's more information available in sysfs
(/sys/fs/btrfs/FSID/allocation) that we can print in 'fi df'. This is
still meant for debugging or deeper analysis of the filesystem, the
values need to be correctly interpreted with respect to the profiles,
persistence and other conditonal features.
The extended output is not printed by default and for now is behind the
verbosity options:
$ btrfs -vv fi df /mnt
Data, single: total=47.06GiB, used=25.32GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=1.44GiB, used=961.20MiB
GlobalReserve, single: total=125.62MiB, used=0.00B
Data:
bg_reclaim_threshold 0%
bytes_may_use 8.00KiB
bytes_pinned 0.00B
bytes_readonly 64.00KiB
bytes_reserved 0.00B
bytes_used 25.32GiB
bytes_zone_unusable 0.00B
chunk_size 10.00GiB
disk_total 47.06GiB
disk_used 25.32GiB
total_bytes 47.06GiB
Metadata:
bg_reclaim_threshold 0%
bytes_may_use 126.62MiB
bytes_pinned 0.00B
bytes_readonly 0.00B
bytes_reserved 0.00B
bytes_used 961.20MiB
bytes_zone_unusable 0.00B
chunk_size 256.00MiB
disk_total 2.88GiB
disk_used 1.88GiB
total_bytes 1.44GiB
System:
bg_reclaim_threshold 0%
bytes_may_use 0.00B
bytes_pinned 0.00B
bytes_readonly 0.00B
bytes_reserved 0.00B
bytes_used 16.00KiB
bytes_zone_unusable 0.00B
chunk_size 32.00MiB
disk_total 64.00MiB
disk_used 32.00KiB
total_bytes 32.00MiB
Signed-off-by: David Sterba <dsterba@suse.com>
Commit 4db925911c ("btrfs-progs: use strncpy_null everywhere") did
not properly convert the subvolume name copying to strncpy_null() and
trimmed the last character.
Issue: #829
Signed-off-by: David Sterba <dsterba@suse.com>
The separator of key=value is only one or more space character, the
'encoded_write' also uses ',' which is inconsistent with the rest.
Signed-off-by: David Sterba <dsterba@suse.com>
The xattr names are user strings but still can potentially contain
special characters (as reported). There doesn't seem to be a restriction
on the name defined.
The xattr values care length-encoded byte arrays so escaping needs be
done.
The clone source is a path and by mistake lacked the encoding.
Issue: #818
Signed-off-by: David Sterba <dsterba@suse.com>
Use the safe version of strncpy that makes sure the string is
terminated.
To be noted:
- the conversion in scrub path handling was skipped
- sizes of device paths in some ioctl related structures is
BTRFS_DEVICE_PATH_NAME_MAX + 1
Recently gcc 13.3 started to detect problems with our use of strncpy
potentially lacking the null terminator, warnings like:
cmds/inspect.c: In function ‘cmd_inspect_logical_resolve’:
cmds/inspect.c:294:33: warning: ‘__builtin_strncpy’ specified bound 4096 equals destination size [-Wstringop-truncation]
294 | strncpy(mount_path, mounted, PATH_MAX);
| ^
Signed-off-by: David Sterba <dsterba@suse.com>
Now that there's only __strncpy_null we can drop the underscore and move
it to string-utils as it's a generic string function rather than
something for paths.
Signed-off-by: David Sterba <dsterba@suse.com>
The macro strncpy_null uses sizeof the first argument for the length,
but there are no checks and this works only for buffers with static
length, i.e. not pointers. This is error prone. Use the open coded
variant that makes the sizeof visible.
Signed-off-by: David Sterba <dsterba@suse.com>
Use unaligned access helper for code that potentially or actually
accesses data that come from on-disk structures. This is for image or
chunk restore. This may pessimize some cases but is in general safer on
strict alignment architectures and has no effect on other architectures.
Related issue #770.
Signed-off-by: David Sterba <dsterba@suse.com>
Recent patches updated stale qgroup handling, using 'unlinked' and
'dropped' where we otherwise use 'deleted' and 'cleaned'.
Signed-off-by: David Sterba <dsterba@suse.com>
Currently `btrfs qgroup show` command shows any 0 level qgroup without a
root backref as `<stale>`, which is not correct.
There are several more cases:
- Under deletion
The subvolume is not yet full dropped, but unlinked.
In that case we would not have a root backref item, but the qgroup is
not stale.
- Squota space holder
This is for squota mode, that a fully dropped subvolume still have
extents accounting on the already-gone subvolume.
In this case it's not stale either, and future accounting relies on
it.
This patch would add above special cases, and add an extra `SPECIAL
PATHS` section to explain all the cases, including `<stale>`.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The current stale qgroup deletion doesn't handle the following cases at
all:
- It doesn't detect stale qgroups correctly
The current check is using the root backref, which means unlinked but
not yet fully dropped subvolumes would mark its corresponding qgroups
stale.
This is incorrect. The real stale check should be based on the root
item, not root backref.
- Squota non-empty but stale qgroups
Such qgroups can not and should not be deleted, as future accounting
still require them.
- Full accounting mode, stale qgroups but not empty
Since qgroup numbers are inconsistent already, it's common to have
such stale qgroups with non-zero numbers.
Now it's dependent on the kernel to determine whether such qgroup can
be deleted.
Address the above problems:
- Do root_item based detection
So that btrfs_qgroup::stale would properly indicate if there is a
subvolume root item for the qgroup.
- Do not attempt to delete squota stale but non-empty qgroups
- Attempt to delete stale but non-empty qgroups for full accounting mode
And deletion failure would not count as an error.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This allows the users to identify if the running qgroup mode and whether
the numbers are already inconsistent.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since qgroup numbers are only updated at transaction commit time, it's
better to do a sync before reading the quota tree, to reduce the chance
of uncommitted qgroup changes.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This reverts commit 9da773aa46.
There are several problems related to the --delete-qgroup option:
- Currently kernel doesn't allow to delete non-empty qgroups
- A qgroup can only be empty after fully dropped and a transaction is
committed
The tool doesn't take either factor into consideration
- Things like drop_subtree_threshold or other operations can mark qgroup
inconsistent and skip accounting
This can mean the target qgroup will never be empty until next rescan
On the other hand, even we do it the proper way, it would hugely delay
the command (wait until the subvolume to be cleaned).
Furthermore, even if the waiting is handled properly,
drop_subtree_threshold can still prevent us deleting the qgroup (qgroup
numbers are inconsistent, and accounting is skipped completely).
So the qgroup cleanup needs kernel to make it work properly.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Use the objectid, type, offset natural order as it's more readable and
we're used to read keys like that.
Signed-off-by: David Sterba <dsterba@suse.com>
What basename(3) does with the argument depends on _GNU_SOURCE and
inclusion of libgen.h. This is problematic on Musl (1.2.5) as reported.
We want the GNU semantics that does not modify the argument. Common way
to make it portable is to add own helper. This is now implemented in
path_basename() that does not use the libc provided basename but preserves
the semantics. The path_dirname() is just for parity, otherwise same as
dirname().
Sources:
- https://bugs.gentoo.org/926288
- https://git.musl-libc.org/cgit/musl/commit/?id=725e17ed6dff4d0cd22487bb64470881e86a92e7
Issue: #778
Signed-off-by: David Sterba <dsterba@suse.com>
Reported by 'gcc -fanalyzer':
cmds/inspect.c:1193:1: warning: leak of ‘ctx.stats’ [CWE-401] [-Wanalyzer-malloc-leak]
There are mixed returns and gotos for error handling and the returns
miss freeing of the ctx.stats. Unify all paths to the single label that
frees the buffers and rename it.
Signed-off-by: David Sterba <dsterba@suse.com>
Reported by 'gcc -fanalyzer':
cmds/scrub.c:1150:25: warning: use of possibly-NULL ‘path’ where non-null expected [CWE-690] [-Wanalyzer-possible-null-argument]
Initialization of the datafile path is done from a static string but the
strdup() call is not handled. Store the path directly to the buffer,
it's later modified by mkdir_p().
Signed-off-by: David Sterba <dsterba@suse.com>
Reported by 'gcc -fanalyzer':
cmds/subvolume.c:1078:39: warning: use of possibly-NULL ‘name’ where non-null expected [CWE-690] [-Wanalyzer-possible-null-argument]
The failure name duplication is not handled and can potentially lead to
a NULL dereference later. Handle the error properly and return template
error message.
Signed-off-by: David Sterba <dsterba@suse.com>
Reported by 'gcc -fanalyzer':
cmds/replace.c:357:17: warning: double ‘close’ of file descriptor ‘fdmnt’ [CWE-1341] [-Wanalyzer-fd-double-close]
The first close is done right before going to the label
'leave_with_error' but the variable is not reset to -1 so in the exit
block close() is called again.
Signed-off-by: David Sterba <dsterba@suse.com>
Use a local copy of the search header for proper aligned access instead
of the unaligned helpers, move the definitions to the closest scope.
Signed-off-by: David Sterba <dsterba@suse.com>
Use tree search ioctl wrappers for code that is considered internal, ie.
leaving out libbtrfs (legacy), libbtrfsutil (needs own API for that).
Conversion is mostly direct of what the API provides.
Signed-off-by: David Sterba <dsterba@suse.com>
Bit shifts should be done on unsigned type as a matter of good practice
to avoid any problems with bit overflowing to the sign bit.
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
With the latest kernel patch to reject invalid qgroupids in
btrfs_qgroup_inherit structure, "btrfs subvolume create" or "btrfs
subvolume snapshot" can lead to the following output:
# mkfs.btrfs -O quota -f $dev
# mount $dev $mnt
# btrfs subvolume create -i 2/0 $mnt/subv1
Create subvolume '/mnt/btrfs/subv1'
ERROR: cannot create subvolume: No such file or directory
The "btrfs subvolume" command output the first line, seemingly to
indicate a successful subvolume creation, then followed by an error
message.
This can be a little confusing on whether if the subvolume is created or
not.
[FIX]
Fix the output by only outputting the regular line if the ioctl
succeeded.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Remove btrfs_qgroup_inherit_add_copy() and the command line interface.
This was designed to add a pair of source/destination qgroups into
btrfs_qgroup_inherit structure, so that rfer/excl numbers would be
copied from the source qgroup into the destination one.
This behavior has been intentionally hidden since 2016, as such copy will
cause qgroup inconsistent immediately and a rescan would reset whatever
numbers copied anyway.
Now we're going to reject the copy behavior from kernel, there is no
need to keep those hidden (and already disabled for "subvolume create")
case.
Remove btrfs_qgroup_inherit_add_copy() call, and cleanup the
undocumented options.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Use a more descriptive name, the interface is generic so it should use
the generic term for file/directory.
Signed-off-by: David Sterba <dsterba@suse.com>
There are some cases that disable verbosity (of errors) and then print
own message. Enable the verbose error messages printed by
btrfs_open_fd2() as they are specific.
Signed-off-by: David Sterba <dsterba@suse.com>