Use unaligned access helper for code that potentially or actually
accesses data that come from on-disk structures. This is for image or
chunk restore. This may pessimize some cases but is in general safer on
strict alignment architectures and has no effect on other architectures.
Related issue #770.
Signed-off-by: David Sterba <dsterba@suse.com>
Recent patches updated stale qgroup handling, using 'unlinked' and
'dropped' where we otherwise use 'deleted' and 'cleaned'.
Signed-off-by: David Sterba <dsterba@suse.com>
Currently `btrfs qgroup show` command shows any 0 level qgroup without a
root backref as `<stale>`, which is not correct.
There are several more cases:
- Under deletion
The subvolume is not yet full dropped, but unlinked.
In that case we would not have a root backref item, but the qgroup is
not stale.
- Squota space holder
This is for squota mode, that a fully dropped subvolume still have
extents accounting on the already-gone subvolume.
In this case it's not stale either, and future accounting relies on
it.
This patch would add above special cases, and add an extra `SPECIAL
PATHS` section to explain all the cases, including `<stale>`.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The current stale qgroup deletion doesn't handle the following cases at
all:
- It doesn't detect stale qgroups correctly
The current check is using the root backref, which means unlinked but
not yet fully dropped subvolumes would mark its corresponding qgroups
stale.
This is incorrect. The real stale check should be based on the root
item, not root backref.
- Squota non-empty but stale qgroups
Such qgroups can not and should not be deleted, as future accounting
still require them.
- Full accounting mode, stale qgroups but not empty
Since qgroup numbers are inconsistent already, it's common to have
such stale qgroups with non-zero numbers.
Now it's dependent on the kernel to determine whether such qgroup can
be deleted.
Address the above problems:
- Do root_item based detection
So that btrfs_qgroup::stale would properly indicate if there is a
subvolume root item for the qgroup.
- Do not attempt to delete squota stale but non-empty qgroups
- Attempt to delete stale but non-empty qgroups for full accounting mode
And deletion failure would not count as an error.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This allows the users to identify if the running qgroup mode and whether
the numbers are already inconsistent.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since qgroup numbers are only updated at transaction commit time, it's
better to do a sync before reading the quota tree, to reduce the chance
of uncommitted qgroup changes.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This reverts commit 9da773aa46.
There are several problems related to the --delete-qgroup option:
- Currently kernel doesn't allow to delete non-empty qgroups
- A qgroup can only be empty after fully dropped and a transaction is
committed
The tool doesn't take either factor into consideration
- Things like drop_subtree_threshold or other operations can mark qgroup
inconsistent and skip accounting
This can mean the target qgroup will never be empty until next rescan
On the other hand, even we do it the proper way, it would hugely delay
the command (wait until the subvolume to be cleaned).
Furthermore, even if the waiting is handled properly,
drop_subtree_threshold can still prevent us deleting the qgroup (qgroup
numbers are inconsistent, and accounting is skipped completely).
So the qgroup cleanup needs kernel to make it work properly.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Use the objectid, type, offset natural order as it's more readable and
we're used to read keys like that.
Signed-off-by: David Sterba <dsterba@suse.com>
What basename(3) does with the argument depends on _GNU_SOURCE and
inclusion of libgen.h. This is problematic on Musl (1.2.5) as reported.
We want the GNU semantics that does not modify the argument. Common way
to make it portable is to add own helper. This is now implemented in
path_basename() that does not use the libc provided basename but preserves
the semantics. The path_dirname() is just for parity, otherwise same as
dirname().
Sources:
- https://bugs.gentoo.org/926288
- https://git.musl-libc.org/cgit/musl/commit/?id=725e17ed6dff4d0cd22487bb64470881e86a92e7
Issue: #778
Signed-off-by: David Sterba <dsterba@suse.com>
Reported by 'gcc -fanalyzer':
cmds/inspect.c:1193:1: warning: leak of ‘ctx.stats’ [CWE-401] [-Wanalyzer-malloc-leak]
There are mixed returns and gotos for error handling and the returns
miss freeing of the ctx.stats. Unify all paths to the single label that
frees the buffers and rename it.
Signed-off-by: David Sterba <dsterba@suse.com>
Reported by 'gcc -fanalyzer':
cmds/scrub.c:1150:25: warning: use of possibly-NULL ‘path’ where non-null expected [CWE-690] [-Wanalyzer-possible-null-argument]
Initialization of the datafile path is done from a static string but the
strdup() call is not handled. Store the path directly to the buffer,
it's later modified by mkdir_p().
Signed-off-by: David Sterba <dsterba@suse.com>
Reported by 'gcc -fanalyzer':
cmds/subvolume.c:1078:39: warning: use of possibly-NULL ‘name’ where non-null expected [CWE-690] [-Wanalyzer-possible-null-argument]
The failure name duplication is not handled and can potentially lead to
a NULL dereference later. Handle the error properly and return template
error message.
Signed-off-by: David Sterba <dsterba@suse.com>
Reported by 'gcc -fanalyzer':
cmds/replace.c:357:17: warning: double ‘close’ of file descriptor ‘fdmnt’ [CWE-1341] [-Wanalyzer-fd-double-close]
The first close is done right before going to the label
'leave_with_error' but the variable is not reset to -1 so in the exit
block close() is called again.
Signed-off-by: David Sterba <dsterba@suse.com>
Use a local copy of the search header for proper aligned access instead
of the unaligned helpers, move the definitions to the closest scope.
Signed-off-by: David Sterba <dsterba@suse.com>
Use tree search ioctl wrappers for code that is considered internal, ie.
leaving out libbtrfs (legacy), libbtrfsutil (needs own API for that).
Conversion is mostly direct of what the API provides.
Signed-off-by: David Sterba <dsterba@suse.com>
Bit shifts should be done on unsigned type as a matter of good practice
to avoid any problems with bit overflowing to the sign bit.
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
With the latest kernel patch to reject invalid qgroupids in
btrfs_qgroup_inherit structure, "btrfs subvolume create" or "btrfs
subvolume snapshot" can lead to the following output:
# mkfs.btrfs -O quota -f $dev
# mount $dev $mnt
# btrfs subvolume create -i 2/0 $mnt/subv1
Create subvolume '/mnt/btrfs/subv1'
ERROR: cannot create subvolume: No such file or directory
The "btrfs subvolume" command output the first line, seemingly to
indicate a successful subvolume creation, then followed by an error
message.
This can be a little confusing on whether if the subvolume is created or
not.
[FIX]
Fix the output by only outputting the regular line if the ioctl
succeeded.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Remove btrfs_qgroup_inherit_add_copy() and the command line interface.
This was designed to add a pair of source/destination qgroups into
btrfs_qgroup_inherit structure, so that rfer/excl numbers would be
copied from the source qgroup into the destination one.
This behavior has been intentionally hidden since 2016, as such copy will
cause qgroup inconsistent immediately and a rescan would reset whatever
numbers copied anyway.
Now we're going to reject the copy behavior from kernel, there is no
need to keep those hidden (and already disabled for "subvolume create")
case.
Remove btrfs_qgroup_inherit_add_copy() call, and cleanup the
undocumented options.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Use a more descriptive name, the interface is generic so it should use
the generic term for file/directory.
Signed-off-by: David Sterba <dsterba@suse.com>
There are some cases that disable verbosity (of errors) and then print
own message. Enable the verbose error messages printed by
btrfs_open_fd2() as they are specific.
Signed-off-by: David Sterba <dsterba@suse.com>
The code in scrub predates the global verbosity options and sets its own
quiet status. This is still used only for error messages that should be
printed even with -q. Drop that or replace with bconf.verbose status
check.
Signed-off-by: David Sterba <dsterba@suse.com>
There are many places that pass false as verbosity argument and then
print an error message, or don't print any message in error cases.
Use btrfs_open_file_or_dir_fd() that will be verbose in case of an error
with the same semantics.
Signed-off-by: David Sterba <dsterba@suse.com>
It's commonly used elsewhere in the code to return the -errno values if
possible, do that for the open helpers too.
Signed-off-by: David Sterba <dsterba@suse.com>
For historical reasons the helpers [btrfs_]open_dir... return also
the 'DIR *dirstream' value when a directory is opened.
However this is never used. So avoid calling diropen() and return
only the fd.
Replace the last reference to btrfs_open_file_or_dir3() with
btrfs_open_fd2().
Signed-off-by: Goffredo Baroncelli <kreijack@libero.it>
Signed-off-by: David Sterba <dsterba@suse.com>
For historical reasons the helpers [btrfs_]open_dir... return also
the 'DIR *dirstream' value when a directory is opened.
However this is never used. So avoid calling diropen() and return
only the fd.
Replace open_file_or_dir() with btrfs_open_fd2() removing any reference
to the unused/useless dirstream variables. btrfs_open_fd2() is required
to avoid spurious error messages.
Signed-off-by: Goffredo Baroncelli <kreijack@libero.it>
Signed-off-by: David Sterba <dsterba@suse.com>
For historical reasons the helpers [btrfs_]open_dir... return also
the 'DIR *dirstream' value when a directory is opened.
However this is never used. So avoid calling diropen() and return
only the fd.
Replace btrfs_open_file_or_dir() with btrfs_open_file_or_dir_fd()
removing any references to the unused/useless dirstream variables.
Signed-off-by: Goffredo Baroncelli <kreijack@libero.it>
Signed-off-by: David Sterba <dsterba@suse.com>
For historical reasons the helpers [btrfs_]open_dir... return also
the 'DIR *dirstream' value when a directory is opened.
However this is never used. So avoid calling diropen() and return
only the fd.
Replace open_file_or_dir3() with btrfs_open_fd2() removing any reference
to the unused/useless dirstream variables. btrfs_open_fd2() is needed
because sometime the callers need to set the RDONLY/RDWRITE mode, and to
avoid spurious messages.
Signed-off-by: Goffredo Baroncelli <kreijack@libero.it>
Signed-off-by: David Sterba <dsterba@suse.com>
For historical reasons the helpers [btrfs_]open_dir... return also
the 'DIR *dirstream' value when a directory is opened.
However this is never used. So avoid calling diropen() and return
only the fd.
Replace open_path_or_dev_mnt() with btrfs_open_mnt_fd() removing
any reference to the unused/useless dirstream variables.
Signed-off-by: Goffredo Baroncelli <kreijack@libero.it>
Signed-off-by: David Sterba <dsterba@suse.com>
For historical reasons the helpers [btrfs_]open_dir... return also
the 'DIR *dirstream' value when a directory is opened.
However this is never used. So avoid calling diropen() and return
only the fd.
Replace the last btrfs_open_dir() call with btrfs_open_dir_fd()
removing any reference to the unused/useless dirstream variables.
Also update the add_seen_fsid() function removing any reference to dir
stream (again this is never used).
Signed-off-by: Goffredo Baroncelli <kreijack@libero.it>
Signed-off-by: David Sterba <dsterba@suse.com>
For historical reasons the helpers [btrfs_]open_dir... return also
the 'DIR *dirstream' value when a directory is opened.
Replace btrfs_open_dir() with btrfs_open_dir_fd() removing
any reference to the unused/useless dirstream variables.
Calling btrfs_open_dir_fd() with only the path is equivalent to
btrfs_open_dir(_, _, 1).
Signed-off-by: Goffredo Baroncelli <kreijack@libero.it>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
In cmd_rescue_clear_ino_cache(), we opened the fs, but without
closing it using close_ctree().
[CAUSE]
This was introduced in 42404a4e44 ("btrfs-progs: move inode cache
removal to rescue group"), the original code inside btrfs check
had a "goto out_close;" to properly close the fs.
[FIX]
Manually call close_ctree() on the fs_info->tree_root.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There's a report that passing raw device mapper path and -d don't work
together:
yyy@xxx ~ $ sudo btrfs filesystem show /dev/dm-0
Label: none uuid: a7fbb8d6-ec5d-4e88-bd8b-c686553e0dc7
Total devices 1 FS bytes used 144.00KiB
devid 1 size 256.00MiB used 88.00MiB path /dev/mapper/da0972636816-LogVol00
With --all-devices
yyy@xxx ~ $ sudo btrfs filesystem show --all-devices /dev/dm-0
ERROR: not a valid btrfs filesystem: /dev/dm-0
Where dm-0 corresponds to the LogVol00 device from above.
Passing the option -d skips some steps but still uses the real path of
the device that is required for scanning and identification, while
blkid uses the canonicalized path.
The combination of raw device name and -d was not handled as the raw
path is not in cache and thus not recognized. Canonicalization fixes
that although this changes the device name in the output.
Issue: #732
Signed-off-by: David Sterba <dsterba@suse.com>
In case of a raid5/6 filesystem 'btrfs fi us' returns wrong values
without the root capabilities:
$ sudo btrfs fi us /tmp/raid5fs # as root
Overall:
Device size: 3.00GiB
Device allocated: 1.51GiB <--- OK
Device unallocated: 1.49GiB <--- OK
Device missing: 0.00B
Device slack: 0.00B
Used: 769.03MiB <--- OK
Free (estimated): 1.32GiB (min: 1.32GiB) <-OK
Free (statfs, df): 1.32GiB
Data ratio: 1.50 <--- OK
Metadata ratio: 1.50 <--- OK
Global reserve: 5.50MiB (used: 0.00B)
Multiple profiles: no
[...]
$ btrfs fi us /tmp/raid5fs # as user
WARNING: cannot read detailed chunk info, per-device usage will not be shown, run as root
Overall:
Device size: 3.00GiB
Device allocated: 0.00B <--- WRONG
Device unallocated: 3.00GiB <--- WRONG
Device missing: 0.00B
Device slack: 0.00B
Used: 0.00B <--- WRONG
Free (estimated): 0.00B (min: 8.00EiB) <- WRONG
Free (statfs, df): 1.32GiB
Data ratio: 0.00 <--- WRONG
Metadata ratio: 0.00 <--- WRONG
Global reserve: 5.50MiB (used: 0.00B)
Multiple profiles: no
[...]
The reason is that the BTRFS_IOC_SPACE_INFO ioctl doesn't return enough
information. To bypass it a scan of the chunks is required when a
raid5/6 profile is present.
To avoid providing wrong information, in case of a raid5/6 filesystem
without the root capabilities the "btrfs fi us" is not executed at all
and a warning with a suggestion to run it as root is printed.
$ ./btrfs fi us /tmp/t/
WARNING: cannot read detailed chunk info, per-device usage will not be shown, run as root
WARNING: due to the presence of a raid5/raid6 profile, we cannots compute some values;
WARNING: run as root instead.
Signed-off-by: Goffredo Baroncelli <kreijack@libero.it>
Signed-off-by: David Sterba <dsterba@suse.com>
If "btrfs dev us" is invoked by a not root user, it is impossible to
collect the chunk info data (not enough privileges). This causes
"btrfs dev us" to print as "Unallocated" value the size of the disk.
This patch handles the case where print_device_chunks() is invoked
without the chunk info data, printing "Unallocated N/A":
Before the patch:
$ btrfs dev us t/
WARNING: cannot read detailed chunk info, per-device usage will not be shown, run as root
/dev/loop0, ID: 1
Device size: 5.00GiB
Device slack: 0.00B
Unallocated: 5.00GiB <-- Wrong
$ sudo btrfs dev us t/
[sudo] password for ghigo:
/dev/loop0, ID: 1
Device size: 5.00GiB
Device slack: 0.00B
Data,single: 8.00MiB
Metadata,DUP: 512.00MiB
System,DUP: 16.00MiB
Unallocated: 4.48GiB <-- Correct
After the patch:
$ ./btrfs dev us /tmp/t/
WARNING: cannot read detailed chunk info, per-device usage will not be shown, run as root
/dev/loop0, ID: 1
Device size: 5.00GiB
Device slack: 0.00B
Unallocated: N/A
$ sudo ./btrfs dev us /tmp/t/
[sudo] password for ghigo:
/dev/loop0, ID: 1
Device size: 5.00GiB
Device slack: 0.00B
Data,single: 8.00MiB
Metadata,DUP: 512.00MiB
System,DUP: 16.00MiB
Unallocated: 4.48GiB
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Goffredo Baroncelli <kreijack@libero.it>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch introduces a new parser helper, parse_u64_with_suffix(),
which has a better error handling, following all the parse_*()
helpers to return non-zero value for errors.
This new helper is going to replace parse_size_from_string(), which
would directly call exit(1) to stop the whole program.
Furthermore most callers of parse_size_from_string() are expecting
exit(1) for error, so that they can skip the error handling.
For those call sites, introduce a wrapper, arg_strtou64_with_suffix(),
to do that. The only disadvantage is a little less detailed error
report for why the parse failed, but for most cases the generic error
string should be enough.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
On multi-device filesystems, a scrub limit may be applied to any of the
devices. Ensure that any limit found is not disregarded.
Since it's more intuitive, keep the lowest non-zero limit found, even
though at the present we don't actually use the exact value.
Pull-request: #733
Issue: #727
Fixes: 7e4a235df1 ("btrfs-progs: scrub status: print device speed limit in status if set")
Signed-off-by: Jonas Malaco <jonas@protocubo.io>
Signed-off-by: David Sterba <dsterba@suse.com>
On multi-device filesystems, a scrub limit may be applied to any of the
devices. Ensure that any limit found is not disregarded.
Since it's more intuitive, keep the lowest non-zero limit found, even
though at the present we don't actually use the exact value.
Pull-request: #733
Issue: #727
Fixes: 7e4a235df1 ("btrfs-progs: scrub status: print device speed limit in status if set")
Signed-off-by: Jonas Malaco <jonas@protocubo.io>
Signed-off-by: David Sterba <dsterba@suse.com>
On multi-device filesystems, scrub status should report "some limits
set" if at least one device has a scrub limit set.
However, with btrfs-progs 6.6.3, this was being reported regardless of
whether any limit actually being set:
# sudo btrfs scrub limit /more/butter
UUID: 989129d9-c96f-4d52-9d68-cbb6d9b2c499
Id Limit Path
-- ----- ---------
1 - /dev/sdc1
2 - /dev/sdd1
# sudo btrfs scrub status /more/butter/
UUID: 989129d9-c96f-4d52-9d68-cbb6d9b2c499
Scrub started: Mon Jan 15 02:00:30 2024
Status: running
Duration: 6:23:19
Time left: 0:49:08
ETA: Mon Jan 15 09:12:57 2024
Total to scrub: 9.83TiB
Bytes scrubbed: 8.72TiB (88.64%)
Rate: 397.47MiB/s (some device limits set)
Error summary: no errors found
Fix it by only setting `limit` to the special marker value 1 if at least
one actual limit is found.
Pull-request: #733
Issue: #727
Fixes: 7e4a235df1 ("btrfs-progs: scrub status: print device speed limit in status if set")
Signed-off-by: Jonas Malaco <jonas@protocubo.io>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When try to create a subvolume where the target path already exists, the
"btrfs" command doesn't return error code correctly.
# mkfs.btrfs -f $dev
# mount $dev $mnt
# touch $mnt/subv1
# btrfs subvolume create $mnt/subv1
ERROR: target path already exists: $mnt/subv1
# echo $?
0
[CAUSE]
The check on whether target exists is done by path_is_dir(), if it
returns 0 or 1, it means there is something in that path already.
But unfortunately commit 5aa959fb34 ("btrfs-progs: subvolume create:
accept multiple arguments") only changed the out label, which would
directly return @ret, not updating the return value correctly.
[FIX]
Make sure all error out branch has their @ret manually updated.
Fixes: 5aa959fb34 ("btrfs-progs: subvolume create: accept multiple arguments")
Issue: #730
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Scrubs which complete in under one second may carry a duration rounded
down to zero. This subsequently results in a bytes_per_sec value of
zero, which corresponds to the Rate metric output, causing intermittent
tests/btrfs/282 failures.
This change ensures that Rate reflects any sub-second bytes processed.
Time left and ETA metrics are also affected by this change, in that they
increase to account for (sub-second) bytes_per_sec.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>