The defaults for rotational devices are to enable DUP for metadata, this
does not yet work on zoned devices and fails with messages like:
Zoned: /dev/sda: host-managed device detected, setting zoned feature
ERROR: cannot use RAID/DUP profile in zoned mode
The RAID/DUP support will be implemented in the future and we don't want
to change the defaults to revert them back again. This makes it a bit
awkward for the user until this happens, so at least print a hint what
to do that single/single must be set manually.
Link: https://lore.kernel.org/linux-btrfs/20210706091922.38650-1-johannes.thumshirn@wdc.com/
Reported-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Make sure btrfs check can detect such problem. Right now we have no way
to fix it yet.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Linux VFS doesn't allow directory to have hard links, thus for btrfs
on-disk directory inode items, their nlinks should never go beyond 1.
Lowmem mode already has the check and will report it without problem.
Only original mode needs this update.
Reported-by: Pepperpoint <pepperpoint@mb.ardentcoding.com>
Link: https://lore.kernel.org/linux-btrfs/162648632340.7.1932907459648384384.10178178@mb.ardentcoding.com/
Reviewed-by: Su Yue <l@damenly.su>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Subvolume iteration has a window between when we get a root ref (with
BTRFS_IOC_TREE_SEARCH or BTRFS_IOC_GET_SUBVOL_ROOTREF) and when we look
up the path of the parent directory (with BTRFS_IOC_INO_LOOKUP{,_USER}).
If the subvolume is moved or deleted and its old parent directory is
deleted during that window, then BTRFS_IOC_INO_LOOKUP{,_USER} will fail
with ENOENT. The iteration will then fail with ENOENT as well.
We originally encountered this bug with an application that called
`btrfs subvolume show` (which iterates subvolumes to find snapshots) in
parallel with other threads creating and deleting subvolumes. It can be
reproduced almost instantly with the included test cases.
Subvolume iteration should be robust against concurrent modifications to
subvolumes. So, if a subvolume's parent directory no longer exists, just
skip the subvolume, as it must have been deleted or moved elsewhere.
Reviewed-by: Neal Gompa <ngompa13@gmail.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Run ci/ci-build-musl to verify build of current branch works in
environment with musl libc. Also works for a given branch name.
Signed-off-by: David Sterba <dsterba@suse.com>
There is some code that using NAME_MAX but it doesn't include header
that is defined. This patch adds a line that includes linux/limits.h
which defines NAME_MAX.
Issue: #386
Issue: #385
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently v1 space cache clearing will delete one cache inode just in
one transaction, and then start a new transaction to delete the next
inode.
This is far from efficient and can make the already slow v1 space cache
deleting even slower, as large fs has tons of cache inodes to delete.
This patch will speed up the process by batching up to 16 inode deletion
into one transaction.
A quick benchmark of deleting 702 v1 space cache inodes would look like
this:
Unpatched: 4.898s
Patched: 0.087s
Which is obviously a big win.
Reported-by: Joshua <joshua@mailmag.net>
Link: https://lore.kernel.org/linux-btrfs/0b4cf70fc883e28c97d893a3b2f81b11@mailmag.net/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since v4.19, btrfs-progs has full write support to free space tree, the
out-of-date warning in btrfs(5) has already confused some end user.
Update the content to avoid further confusion.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Make the libbtrfsutil library and license more visible in the overview.
Drop link to travis-ci.org CI as it's not used anymore.
Signed-off-by: David Sterba <dsterba@suse.com>
In btrfs_sb_io(), blk_zone_report is used for getting information about
zones. But it is not freed if code goes in usual path. This patch frees
the variable just after it used.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
check_running_fs_exclop() can return 1 when exclop is changed to "none"
The ret is set by the return value of the select() operation. Checking
the exclusive op changes just the exclop variable while ret is still
set to 1.
Set ret = 0 if exclop is set to BTRFS_EXCL_NONE or BTRFS_EXCL_UNKNOWN.
Remove unnecessary continue statement at the end of the block.
The command appears to have executed, but does not. This was found when
balance which typically reports chunks relocated did not print anything
on screen.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The Used and Free should be together, while all the device information
is in the first section.
Example:
Overall:
Device size: 128.00GiB
Device allocated: 24.00GiB
Device unallocated: 104.00GiB
Device missing: 0.00B
Device zone unusable: 5.13MiB
Device zone size: 256.00MiB
Used: 213.33MiB
Free (estimated): 111.79GiB (min: 111.79GiB)
Free (statfs, df): 111.79GiB
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 25.58MiB (used: 16.00KiB)
Multiple profiles: no
Signed-off-by: David Sterba <dsterba@suse.com>
Read device size and print it in the overall overview in zoned mode. The
total unusable size is there so the zone size is complementing it. It's
read from the first device assuming that kernel mandates that all
devices have the same zone size.
Example:
Overall:
Device size: 128.00GiB
Device allocated: 24.00GiB
Device unallocated: 104.00GiB
Device missing: 0.00B
Used: 213.33MiB
Device zone unusable: 5.13MiB
Device zone size: 256.00MiB
Free (estimated): 111.79GiB (min: 111.79GiB)
Free (statfs, df): 111.79GiB
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 25.58MiB (used: 16.00KiB)
Multiple profiles: no
Signed-off-by: David Sterba <dsterba@suse.com>
Sysfs hides the zone size of a block device in the queue/chunk_sectors
file, so add a helper that will read it for us when given the short
device name (that can be found in FSID/devices).
Signed-off-by: David Sterba <dsterba@suse.com>
There are several directories in /sys/fs/btrfs/FSID that contain more
than one file/directory. Add a helper to open the directory so that the
file descriptor can be used for fdopendir.
Signed-off-by: David Sterba <dsterba@suse.com>
If there's CONFIG_CRYPTO_SHA256=y in /proc/config.gz and no line with
'sha256' in /proc/modules, then the mount will use the generic
implementation.
After 'modprobe sha256' there's 'sha256_ssse3' in /proc/modules and the
sysfs checksum file would show e.g. 'sha256-avx2'.
Signed-off-by: David Sterba <dsterba@suse.com>
Print number of stripes for striped profiles in device usage commands.
It helps to see profiles easily. The output is like below.
/dev/vdc, ID: 1
Device size: 1.00GiB
Device slack: 0.00B
Data,RAID0/2: 912.62MiB
Data,RAID0/3: 912.62MiB
Metadata,RAID1: 102.38MiB
System,RAID1: 8.00MiB
Unallocated: 1.00MiB
Multiple lines can appear in case a balance conversion process was
interrupted or if there's been a new device added and new data written
to the full stripe.
Issue: #372
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The user transaction ioctls have been removed in kernel 4.17 by commit
7a5a07a81062 ("btrfs: Remove userspace transaction ioctls"), the
definitions are not relevant and can be removed.
The numbers could be reused in the future, eg. when there are no
maintained LTS kernels older than 4.19.
Signed-off-by: David Sterba <dsterba@suse.com>
There's another loop protection during scan of directory items. This can
fire under invalid conditions, ie. when there's no real endless loop.
The layout of b-tree items could trigger that and has been observed in
practice. This prevents automated restoration as it requires user
attention.
The number of loops is 1024, unjustified and without explanation. Errors
during traversing the leaves are checked so most errors would be caught.
A real loop in the directory items would require some crafting and would
not happen on a normal filesystem.
Issue: #59
Issue: #164
Issue: #237
Signed-off-by: David Sterba <dsterba@suse.com>
There's some kind of looping protection during copying file extents,
mostly likely to avoid endless loops on severely damaged filesystems.
This has been bothering users and makes restoring hard to automate as
it requires user attention to press 'y' or 'a'. This has not been well
documented either.
The number of loops is 1024 which looks arbitrary and hard to justify.
This eg. means that a file with many fragments hits the interactive
question more than once.
There are other checks when iterating the leaves that would catch
corruptions or other errors, so the looping would happen in some rare
and rather artificial case when some kind of loop exists inside the
extent items. This is not easily possible if possible at all as the
items do not directly reference other.
In case there's some genuine error found that would require a looping
protection, we'll add it or extend the checks to identify the loop.
Issue: #59
Issue: #164
Issue: #237
Signed-off-by: David Sterba <dsterba@suse.com>
The test image is manually crafted with 1MiB offset in the device item
of devid 1.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is a report from the mailing list that one user got its filesystem
with device item bytes_used mismatch.
This problem leaves the device item with some ghost bytes_used, meaning
even if we delete all device extents of that device, the bytes_used
still won't be 0.
This itself is not a big deal, but when the user used up all its
unallocated space, write time tree-checker can be triggered and make the
fs RO, as the new device::bytes_used can be larger than
device::total_bytes.
Thus we need to fix the problem in btrfs-check to avoid above write-time
tree check warning.
This patch will add the ability to reset a device's bytes_used to both
original mode and lowmem mode.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add new options to dumps checksums in node headers and in the checksum
items:
$ btrfs inspect dump-tree --csum-headers image
root tree
leaf 471515136 items 19 free space 12186 generation 15 owner ROOT_TREE
leaf 471515136 flags 0x1(WRITTEN) backref revision 1 csum 0x756b2d54
fs uuid df0348df-5773-47dd-81e9-a18221461239
For nodes/leaves it's appended on the 2nd line of the header.
Checksum items are stored in leaves as EXTENT_CSUM key type, with offset
value as the logical offset starting. As the array would be hard to
parse or match, each offset value is printed with the checksum. For
crc32c it's 4 values on a line, for xxhash it's 2 and for the long
256bit checksums it's one checksum per line.
$ btrfs inspect dump-tree --csum-items image
leaf 5423104 items 1 free space 30 generation 6 owner CSUM_TREE
leaf 5423104 flags 0x1(WRITTEN) backref revision 1
fs uuid bd7c981e-16ff-4081-a734-3ef5d50cafc1
chunk uuid 13f4c76c-7845-4984-88ed-f01b52e05cf8
item 0 key (EXTENT_CSUM EXTENT_CSUM 22020096) itemoff 55 itemsize 16228
range start 22020096 end 38637568 length 16617472
[22020096] 0x8941f998 [22024192] 0x8941f998 [22028288] 0x8941f998 [22032384] 0x8941f998
[22036480] 0x8941f998 [22040576] 0x8941f998 [22044672] 0x8941f998 [22048768] 0x8941f998
...
$ btrfs inspect dump-tree --csum-items image
leaf 5718016 items 1 free space 7746 generation 6 owner CSUM_TREE
leaf 5718016 flags 0x1(WRITTEN) backref revision 1
fs uuid f453a5b4-8b4a-4fbf-90a2-2925e4fe2335
chunk uuid eb1da63b-248b-44c2-82da-71b2564bf50e
item 0 key (EXTENT_CSUM EXTENT_CSUM 52387840) itemoff 7771 itemsize 8512
range start 52387840 end 53477376 length 1089536
[52387840] 0x686ede9288c391e7e05026e56f2f91bfd879987a040ea98445dabc76f55b8e5f
[52391936] 0x686ede9288c391e7e05026e56f2f91bfd879987a040ea98445dabc76f55b8e5f
...
The options are not on by default, the header checksum is not important
for the structures. Data checksums can be quite big so that would make
the dump long and without any actual data to match against.
Signed-off-by: David Sterba <dsterba@suse.com>
Replace follow and traverse by one parameter that takes bits to affect
the behaviour. This allows to extend btrfs_print_tree output with more
modes from one place.
Signed-off-by: David Sterba <dsterba@suse.com>
There's a report that a system with 4.19 kernel fails boot because
device scan exits with error. This is because zoned support is compiled
in btrfs-progs but not in kernel.
To make new progs and old kernels work, do a fallback when the zoned
ioctl is not available, as if it were a non-zoned device. There is no
other option, but this is safe at least for the device scan that would
not error out. Any unaligned writes to a zoned device will fail as
expected.
Issue: #376
Signed-off-by: David Sterba <dsterba@suse.com>
Internally it's blake2b but for the user facing output or other command
line interfaces let's call it just BLAKE2.
Signed-off-by: David Sterba <dsterba@suse.com>
With explicit width the default alignment is to the right, using space
is a gnu extension. Fix the following warnings:
crypto/hash-speedtest.c: In function ‘main’:
crypto/hash-speedtest.c:152:15: warning: ' ' flag used with ‘%s’ gnu_printf format [-Wformat=]
152 | printf("% 12s: ", c->name);
| ^
crypto/hash-speedtest.c:172:21: warning: ' ' flag used with ‘%u’ gnu_printf format [-Wformat=]
172 | printf("%s: % 12llu, %s/i % 8llu",
| ^
crypto/hash-speedtest.c:172:34: warning: ' ' flag used with ‘%u’ gnu_printf format [-Wformat=]
172 | printf("%s: % 12llu, %s/i % 8llu",
| ^
Signed-off-by: David Sterba <dsterba@suse.com>