There's only one caller of btrfs_list_alloc_comparer_set so move it
there. Also move the definitions of BTRFS_LIST_* to the header so they
can be used by both btrfs-list and subvolume.c.
Signed-off-by: David Sterba <dsterba@suse.com>
The actual implementation of find-new functionality is outside of
subvolume.c, copy it where it's supposed to be. No reformatting or style
changes.
Signed-off-by: David Sterba <dsterba@suse.com>
The main functionality of subvolume listing is now in btrfs-list.c but
there are no other commands using the API so this will be merged. It's a
lot of code so split it to another file.
Signed-off-by: David Sterba <dsterba@suse.com>
The btrfs_list_* functions come with some overhead and for simple path
resolution we can use btrfs_subvolid_resolve.
Signed-off-by: David Sterba <dsterba@suse.com>
We don't need to include this besides btrfs-list.c itself and
subvolume.c that does use the btrfs_list_* API.
Signed-off-by: David Sterba <dsterba@suse.com>
Add a slightly more convenient way to identify the subvolumes with bad
combination of flags and received uuid.
Signed-off-by: David Sterba <dsterba@suse.com>
Implement safety check when a read-only subvolume is getting switched
to read-write and there's received_uuid set.
This prevents accidental breakage of incremental send use case but
allows user to do the rw change anyway but resets the received_uuid in
that case.
As this is implemented entirely in userspace, it's racy and using the
raw ioctl won't prevent it nor reset the received_uuid. A change in the
ioctl implementation might do that in the future.
Signed-off-by: David Sterba <dsterba@suse.com>
Add option support to force the value change. This allows to do safety
checks by default and warn user that something might break. Using the
force will override that and changing the property should do change
itself and additionally any other changes that could break some
use cases.
Signed-off-by: David Sterba <dsterba@suse.com>
There are some send/receive related data not printed in subvol show,
while they're exported by the ioctls. Print them for convenience:
$ btrfs subvol show test
test
Name: test
UUID: dc16dd1b-825f-3245-94a8-557672d6cf85
Parent UUID: -
Received UUID: -
Creation time: 2021-05-17 16:17:14 +0200
Subvolume ID: 19112
Generation: 7730702
Gen at creation: 7730701
Parent ID: 5
Top level ID: 5
Flags: -
Send transid: 0
Send time: 2021-05-17 16:17:14 +0200
Receive transid: 0
Receive time: -
Snapshot(s):
test-snap
Signed-off-by: David Sterba <dsterba@suse.com>
I had to go back to find what BTRFS_ARG_REG is, add a comment for that.
And, search_umounted_fs_uuids() is also to find the seed device, so bring
the related comment above it.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The commands initializing a new device (mkfs, device add) do discard by
default, while this is missing from replace start. For parity add the
options with same name and semantics.
Issue: #390
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a report that, btrfstune can even work while the fs has transid
mismatch problems.
$ btrfstune -f -u /dev/sdb1
Current fsid: b2b5ae8d-4c49-45f0-b42e-46fe7dcfcb07
New fsid: b2b5ae8d-4c49-45f0-b42e-46fe7dcfcb07
Set superblock flag CHANGING_FSID
Change fsid in extents
parent transid verify failed on 792854528 wanted 20103 found 20091
parent transid verify failed on 792854528 wanted 20103 found 20091
parent transid verify failed on 792854528 wanted 20103 found 20091
Ignoring transid failure
parent transid verify failed on 792870912 wanted 20103 found 20091
parent transid verify failed on 792870912 wanted 20103 found 20091
parent transid verify failed on 792870912 wanted 20103 found 20091
Ignoring transid failure
parent transid verify failed on 792887296 wanted 20103 found 20091
parent transid verify failed on 792887296 wanted 20103 found 20091
parent transid verify failed on 792887296 wanted 20103 found 20091
Ignoring transid failure
ERROR: child eb corrupted: parent bytenr=38010880 item=69 parent level=1 child level=1
ERROR: failed to change UUID of metadata: -5
ERROR: btrfstune failed
This leaves a corrupted fs even more corrupted, and due to the extra
CHANGING_FSID flag, btrfs check will not even try to run on it:
Opening filesystem to check...
ERROR: Filesystem UUID change in progress
ERROR: cannot open file system
[CAUSE]
Unlike kernel, btrfs-progs has a less strict check on transid mismatch.
In read_tree_block() we will fall back to use the tree block even its
transid mismatch if we can't find any better copy.
However not all commands in btrfs-progs needs this feature, only
btrfs-check (which may fix the problem) and btrfs-restore (it just tries
to ignore any problems) really utilize this feature.
[FIX]
Introduce a new open ctree flag, OPEN_CTREE_ALLOW_TRANSID_MISMATCH, to
be explicit about whether we really want to ignore transid error.
Currently only btrfs-check and btrfs-restore will utilize this new flag.
Also add btrfs-image to allow opening such fs with transid error.
Link: https://www.reddit.com/r/btrfs/comments/pivpqk/failure_during_btrfstune_u/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The refactoring f3a132fa1b ("btrfs-progs: factor out compression type
name parsing to common utils") caused a bug with parsing option -c with
defrag:
# btrfs fi defrag -v -czstd file
ERROR: unknown compression type: zstd
# btrfs fi defrag -v -clzo file
ERROR: unknown compression type: lzo
# btrfs fi defrag -v -czlib file
ERROR: unknown compression type: zlib
Fix it by properly checking the value representing unknown compression
algorithm.
Issue: #403
Signed-off-by: David Sterba <dsterba@suse.com>
The function btrfs_list_get_path_rootid is exported to libbtrfs so it
needs to stay, but we can inline the implementation.
Signed-off-by: David Sterba <dsterba@suse.com>
The property definitions and handlers are for the command line
processing, so merge it with the main source file.
Signed-off-by: David Sterba <dsterba@suse.com>
The header contains the protocol definitions and is almost exactly the
same as the kernel version, move it to the proper directory.
Signed-off-by: David Sterba <dsterba@suse.com>
Move everything related to the output formatting and filtering out of
qgroup.h and leave only the structures used by the public API.
Signed-off-by: David Sterba <dsterba@suse.com>
The exported functions provided by qgroups have been changed, now remove
the prefix from the local helpers.
Signed-off-by: David Sterba <dsterba@suse.com>
After merging the files, many functions can be made static, leaving only
a few helpers that are used by subvolume.
Signed-off-by: David Sterba <dsterba@suse.com>
The contents of top level qgroups.c is only for command line output and
filtering, we already have cmds/qgroup.c for that so merge the files.
Signed-off-by: David Sterba <dsterba@suse.com>
There are declarations that are namely for the command line out put,
filters and formatting. Move it to cmds/.
Signed-off-by: David Sterba <dsterba@suse.com>
Many qgroup commands accept the level/id format and also a path to
subvolume, the qgroup id is derived from that. This does not make sense
for the create command as we can't create the 0/subvolid qgroup (thus
can't be derived from the path), only the higher levels.
Signed-off-by: David Sterba <dsterba@suse.com>
This helper can parse a qgroupid or a path, so rename it accordingly, so
a plain qgroupid parsing can be factored out as a standalone helper.
Signed-off-by: David Sterba <dsterba@suse.com>
Add the GPL v2 header to files where it was missing and is not from an
external source, update to the most recent version with the address.
Signed-off-by: David Sterba <dsterba@suse.com>
There are some duplicate parsers of the profile names, factor out the
one from balance to the common code.
Signed-off-by: David Sterba <dsterba@suse.com>
There are various parsing helpers scattered everywhere, unify them to
one file and start with helpers already in utils.c.
Signed-off-by: David Sterba <dsterba@suse.com>
We can use the raid table to match profile names, additionally make the
test case insensitive. The single profile is not represented as a bit
and must be set manually for now.
Signed-off-by: David Sterba <dsterba@suse.com>
The declarations do not correspond to any command descriptors as they
have been moved to other command groups.
Signed-off-by: David Sterba <dsterba@suse.com>
There is a recent report of ghost subvolumes where such subvolumes has
no ROOT_REF/BACKREF, and 0 root ref. But without an orphan item, thus
kernel won't queue them for cleanup.
Such ghost subvolumes are just here to take up space, and no way to
delete them except by btrfs check, which will try to fix the problem by
adding orphan item.
There is a kernel patch submitted to allow btrfs to detect such ghost
subvolumes and queue them for cleanup.
But btrfs-progs will not continue to call the ioctl if it can't find the
full subvolume path.
Thus this patch will loose the restriction by allowing btrfs-progs to
continue to call the ioctl even if it can't grab the subvolume path.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is some code that using NAME_MAX but it doesn't include header
that is defined. This patch adds a line that includes linux/limits.h
which defines NAME_MAX.
Issue: #386
Issue: #385
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The Used and Free should be together, while all the device information
is in the first section.
Example:
Overall:
Device size: 128.00GiB
Device allocated: 24.00GiB
Device unallocated: 104.00GiB
Device missing: 0.00B
Device zone unusable: 5.13MiB
Device zone size: 256.00MiB
Used: 213.33MiB
Free (estimated): 111.79GiB (min: 111.79GiB)
Free (statfs, df): 111.79GiB
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 25.58MiB (used: 16.00KiB)
Multiple profiles: no
Signed-off-by: David Sterba <dsterba@suse.com>
Read device size and print it in the overall overview in zoned mode. The
total unusable size is there so the zone size is complementing it. It's
read from the first device assuming that kernel mandates that all
devices have the same zone size.
Example:
Overall:
Device size: 128.00GiB
Device allocated: 24.00GiB
Device unallocated: 104.00GiB
Device missing: 0.00B
Used: 213.33MiB
Device zone unusable: 5.13MiB
Device zone size: 256.00MiB
Free (estimated): 111.79GiB (min: 111.79GiB)
Free (statfs, df): 111.79GiB
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 25.58MiB (used: 16.00KiB)
Multiple profiles: no
Signed-off-by: David Sterba <dsterba@suse.com>
Print number of stripes for striped profiles in device usage commands.
It helps to see profiles easily. The output is like below.
/dev/vdc, ID: 1
Device size: 1.00GiB
Device slack: 0.00B
Data,RAID0/2: 912.62MiB
Data,RAID0/3: 912.62MiB
Metadata,RAID1: 102.38MiB
System,RAID1: 8.00MiB
Unallocated: 1.00MiB
Multiple lines can appear in case a balance conversion process was
interrupted or if there's been a new device added and new data written
to the full stripe.
Issue: #372
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There's another loop protection during scan of directory items. This can
fire under invalid conditions, ie. when there's no real endless loop.
The layout of b-tree items could trigger that and has been observed in
practice. This prevents automated restoration as it requires user
attention.
The number of loops is 1024, unjustified and without explanation. Errors
during traversing the leaves are checked so most errors would be caught.
A real loop in the directory items would require some crafting and would
not happen on a normal filesystem.
Issue: #59
Issue: #164
Issue: #237
Signed-off-by: David Sterba <dsterba@suse.com>
There's some kind of looping protection during copying file extents,
mostly likely to avoid endless loops on severely damaged filesystems.
This has been bothering users and makes restoring hard to automate as
it requires user attention to press 'y' or 'a'. This has not been well
documented either.
The number of loops is 1024 which looks arbitrary and hard to justify.
This eg. means that a file with many fragments hits the interactive
question more than once.
There are other checks when iterating the leaves that would catch
corruptions or other errors, so the looping would happen in some rare
and rather artificial case when some kind of loop exists inside the
extent items. This is not easily possible if possible at all as the
items do not directly reference other.
In case there's some genuine error found that would require a looping
protection, we'll add it or extend the checks to identify the loop.
Issue: #59
Issue: #164
Issue: #237
Signed-off-by: David Sterba <dsterba@suse.com>
Add new options to dumps checksums in node headers and in the checksum
items:
$ btrfs inspect dump-tree --csum-headers image
root tree
leaf 471515136 items 19 free space 12186 generation 15 owner ROOT_TREE
leaf 471515136 flags 0x1(WRITTEN) backref revision 1 csum 0x756b2d54
fs uuid df0348df-5773-47dd-81e9-a18221461239
For nodes/leaves it's appended on the 2nd line of the header.
Checksum items are stored in leaves as EXTENT_CSUM key type, with offset
value as the logical offset starting. As the array would be hard to
parse or match, each offset value is printed with the checksum. For
crc32c it's 4 values on a line, for xxhash it's 2 and for the long
256bit checksums it's one checksum per line.
$ btrfs inspect dump-tree --csum-items image
leaf 5423104 items 1 free space 30 generation 6 owner CSUM_TREE
leaf 5423104 flags 0x1(WRITTEN) backref revision 1
fs uuid bd7c981e-16ff-4081-a734-3ef5d50cafc1
chunk uuid 13f4c76c-7845-4984-88ed-f01b52e05cf8
item 0 key (EXTENT_CSUM EXTENT_CSUM 22020096) itemoff 55 itemsize 16228
range start 22020096 end 38637568 length 16617472
[22020096] 0x8941f998 [22024192] 0x8941f998 [22028288] 0x8941f998 [22032384] 0x8941f998
[22036480] 0x8941f998 [22040576] 0x8941f998 [22044672] 0x8941f998 [22048768] 0x8941f998
...
$ btrfs inspect dump-tree --csum-items image
leaf 5718016 items 1 free space 7746 generation 6 owner CSUM_TREE
leaf 5718016 flags 0x1(WRITTEN) backref revision 1
fs uuid f453a5b4-8b4a-4fbf-90a2-2925e4fe2335
chunk uuid eb1da63b-248b-44c2-82da-71b2564bf50e
item 0 key (EXTENT_CSUM EXTENT_CSUM 52387840) itemoff 7771 itemsize 8512
range start 52387840 end 53477376 length 1089536
[52387840] 0x686ede9288c391e7e05026e56f2f91bfd879987a040ea98445dabc76f55b8e5f
[52391936] 0x686ede9288c391e7e05026e56f2f91bfd879987a040ea98445dabc76f55b8e5f
...
The options are not on by default, the header checksum is not important
for the structures. Data checksums can be quite big so that would make
the dump long and without any actual data to match against.
Signed-off-by: David Sterba <dsterba@suse.com>
Replace follow and traverse by one parameter that takes bits to affect
the behaviour. This allows to extend btrfs_print_tree output with more
modes from one place.
Signed-off-by: David Sterba <dsterba@suse.com>
Recognize special resize amount 'cancel' for resize operation. This
will request kernel to stop running any resize operation (most likely
shrinking resize). This needs support in kernel, otherwise this will
fail due to another exclusive operation running (though could be the
same one).
The command returns after kernel finishes any work that got interrupted,
but this should not take long in kernels 5.10+ that allow interruptible
relocation. The waiting inside kernel is interruptible so this command
(and the waiting stage) can be interrupted.
The resize operation could relocate block groups but the nominal
filesystem size will be restored when resize won't finish. It's
recommended to review the filesystem state.
Note: in kernels 5.10+ sending a fatal signal (TERM, KILL, Ctrl-C) to
the process running the resize will cancel it too.
Example:
$ btrfs fi resize -10G /mnt
...
$ btrfs fi resize cancel /mnt
Signed-off-by: David Sterba <dsterba@suse.com>
Recognize special name 'cancel' for device deletion, that will request
kernel to stop running device deletion. This needs support in kernel,
otherwise this will fail due to another exclusive operation running
(though could be the same one).
The command returns after kernel finishes any work that got interrupted,
but this should not take long in kernels 5.10+ that allow interruptible
relocation. The waiting inside kernel is interruptible so this command
(and the waiting stage) can be interrupted.
The device size is restored when deletion does not finish but it's
recommended to review the filesystem state.
Note: in kernels 5.10+ sending a fatal signal (TERM, KILL, Ctrl-C) to
the process running the device deletion will cancel it too.
Example:
$ btrfs device delete /dev/sdx /mnt
...
$ btrfs device delete cancel /mnt
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs inspect-internal --help shows incomplete sentence. As shown
below:
btrfs inspect-internal --help
<snip>
btrfs inspect-internal min-dev-size [options] <path>
Get the minimum size the device can be shrunk to. The
btrfs inspect-internal dump-tree [options] <device> [<device> ..]
<snip>
The short help string can be multi-line but must be in one string. This
patch fixes it.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Print the total zone_unusable size in the summary for 'fi usage' for a
filesystem in zoned mode. It's a sum of all the zone_unusable values
from 'fi df'. Per-device stats are not implemented and would need more
complicated calculations from raw data, kernel does not export that (but
it could).
As of 5.12, the zone_unusable is stored only in memory so we'd have to
map raw block device zones to the block groups and the live extents in
the associated block groups to get the exact numbers.
Example:
# btrfs fi usage /mnt
Overall:
Device size: 2.00GiB
Device allocated: 768.00MiB
Device unallocated: 1.25GiB
Device missing: 0.00B
Device zone unusable: 320.00KiB
Used: 128.00KiB
Free (estimated): 1.50GiB (min: 1.50GiB)
Free (statfs, df): 1.50GiB
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 3.25MiB (used: 32.00KiB)
Multiple profiles: no
Data,single: Size:256.00MiB, Used:0.00B (0.00%)
/dev/nullb0 256.00MiB
Metadata,single: Size:256.00MiB, Used:112.00KiB (0.04%)
/dev/nullb0 256.00MiB
System,single: Size:256.00MiB, Used:16.00KiB (0.01%)
/dev/nullb0 256.00MiB
Unallocated:
/dev/nullb0 1.25GiB
# btrfs fi df
Data, single: total=256.00MiB, used=0.00B, zone_unusable=0.00B
System, single: total=256.00MiB, used=16.00KiB, zone_unusable=160.00KiB
Metadata, single: total=256.00MiB, used=112.00KiB, zone_unusable=160.00KiB
GlobalReserve, single: total=3.25MiB, used=32.00KiB
Signed-off-by: David Sterba <dsterba@suse.com>
Getting the per bg type zone unusable space will be used in other size
reports like 'fi us', so export it to the device utils.
Signed-off-by: David Sterba <dsterba@suse.com>