We were setting the super block's used bytes to a static number.
However the number of blocks we have to write has the correct used size,
so just add up the total number of blocks we're allocating as we
determine their offsets. This value will be used later which is why I'm
calculating it this way instead of doing the math to set the bytes_super
specifically.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We use these block's in order to keep track of which blocks need to be
added to the extent tree and where their roots need to be written.
However we skip MKFS_SUPER_BLOCK for all of these helpers, and we don't
actually need to keep track of the specific block we allocated because
it is always BTRFS_SUPER_INFO_OFFSET. Remove this enum as we don't need
it.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Allow creating trees more flexible, eg. in no fixed order. To handle
this we want to rework the initial mkfs step to take an array of the
blocks we want to create and use the array to keep track of which blocks
we need to create. Use that for current format, make it ready for extent
tree v2.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Extend basic tests with the degenerate raid0 and raid10, coming in 5.15.
Mount of a freshly created filesystem works even on older kernels too.
Signed-off-by: David Sterba <dsterba@suse.com>
Kernel patch b2f78e88052bc0bee ("btrfs: allow degenerate raid0/raid10")
in
5.15 will allow mounting and converting to single device raid0 or two
device raid10. Let mkfs create such filesystem.
"The motivation is to allow to preserve the profile type as long as it
possible for some intermediate state (device removal, conversion), or
when there are disks of different size, with raid0 the otherwise
unusable space of the last device will be used too. Similarly for
raid10, though the two largest devices would need to be the same."
Signed-off-by: David Sterba <dsterba@suse.com>
Running convert-tests.sh Reported that the 019-ext4-copy-timestamps test
failed:
...
mount -o loop -t ext4 btrfs-progs/tests/test.img btrfs-progs/tests/mnt
====== RUN CHECK touch btrfs-progs/tests/mnt/file
====== RUN CHECK stat btrfs-progs/tests/mnt/file
File: 'btrfs-progs/tests/mnt/file'
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 700h/1792d Inode: 13 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Context: unconfined_u:object_r:unlabeled_t:s0
Access: 2021-08-24 22:10:21.999209679 +0800
Modify: 2021-08-24 22:10:21.999209679 +0800
Change: 2021-08-24 22:10:21.999209679 +0800
...
btrfs-progs/btrfs-convert btrfs-progs/tests/test.img
...
====== RUN CHECK mount -t btrfs -o loop btrfs-progs/tests/test.img btrfs-progs/tests/mnt
====== RUN CHECK stat btrfs-progs/tests/mnt/file
File: 'btrfs-progs/tests/mnt/file'
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 2ch/44d Inode: 267 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Context: unconfined_u:object_r:unlabeled_t:s0
Access: 2021-08-24 22:10:21.000000000 +0800
Modify: 2021-08-24 22:10:21.000000000 +0800
Change: 2021-08-24 22:10:21.000000000 +0800
...
atime on converted inode does not match
test failed for case 019-ext4-copy-timestamps
Obviously, the log says that btrfs-convert does not support nanoseconds.
I looked at the source code and found that only if ext2_fs.h defines
EXT4_EPOCH_MASK btrfs-convert to support nanoseconds. But in e2fsprogs,
EXT4_EPOCH_MASK was introduced in v1.43, but in some older versions,
such as v1.40, e2fsprogs actually supports nanoseconds. It seems that if
struct ext2_inode_large contains the i_atime_extra member, ext4 is
supports nanoseconds, so I updated the logic to determine whether the
current ext4 file system supports nanosecond precision. In addition, I
imported some definitions to encode and decode tv_nsec (copied from
e2fsprogs source code).
Author: Li Zhang <zhanglikernel@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When running "btrfs check --check-data-csum" on fs with corrupted data,
the error message almost makes no sense:
$ btrfs check --check-data-csum /dev/test/test
Opening filesystem to check...
Checking filesystem on /dev/test/test
UUID: c31afe0a-55bc-4e7d-aba0-9dfa9ddf8090
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking csums against data
mirror 1 bytenr 13631488 csum 19 expected csum 152 <<<
ERROR: errors found in csum tree
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 147456 bytes used, error(s) found
total csum bytes: 16
total tree bytes: 131072
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 124799
file data blocks allocated: 16384
referenced 16384
[CAUSE]
We're just outputting the first byte and in decimal, which is completely
different from what we did in kernel space, nor what we did for metadata
csum mismatch.
[FIX]
Use btrfs_format_csum() for btrfs-check to output csum.
Now the result looks much better:
[5/7] checking csums against data
mirror 1 bytenr 13631488 csum 0x13fec125 expected csum 0x98757625
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
- Change it void
The old one always return csum_size.
- Use snprintf()
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Function btrfs_format_csum() is a special helper only used in
btrfs-progs.
Move it to common/utils.[ch] other than leaving it in
kernel-shared/disk-io.c.
Since we're moving the code, also introduce a macro,
BTRFS_CSUM_STRING_LEN, to replace open-coded string length calculation.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is used to validate the detection and correction code in both fsck
modes for an invalid bytes_used value in the super block.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We do not detect problems with our bytes_used counter in the super
block. Thankfully the same method to fix block groups is used to re-set
the value in the super block, so simply add some extra code to validate
the bytes_used field and then piggy back on the repair code for block
groups.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
By enabling the lowmem checks properly I uncovered the case where test
fsck/007 will infinite loop at the detection stage. This is because
when checking the inode item we will just btrfs_next_item(), and because
we ignore check tree block failures at read time we don't get an -EIO
from btrfs_next_leaf. Generally what check usually does is validate the
leaves/nodes as we hit them, but in this case we're not doing that. Fix
this by checking the leaf if we move to the next one and if it fails
bail. This allows us to pass the fsck/007 test with lowmem.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The repair cycle in the main check will drop all of our cache and loop
through again to make sure everything is still good to go.
Unfortunately we record our unaligned extent records on a per-root list
so they can be retrieved when we're checking the fs roots. This isn't
straightforward to clean up, so instead simply check our current list of
unaligned extent records when we are adding a new one to make sure we're
not duplicating our efforts. This makes us able to pass fsck/001 with
my super bytes_used fix applied.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
By enabling the lowmem checks properly I uncovered the case where test
fsck/007 will infinite loop at the detection stage. This is because
when checking the inode item we will just btrfs_next_item(), and because
we ignore check tree block failures at read time we don't get an -EIO
from btrfs_next_leaf.
This occurs because we allow fsck to raw-read blocks even if they fail
basic sanity checks, because we want the opportunity to repair the
blocks. However this means corrupt blocks are sitting in cache marked
as uptodate. btrfs_search_slot() handles this by doing a check_block()
on every block we add to the path, so that anything that is doing a
search gets a proper -EIO.
btrfs_next_sibling_block() needs a similar check. With this fix we now
return -EIO on btrfs_next_leaf() properly and we no longer infinite loop
on fsck/007 with lowmem.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When I added the invalid super image I saw that the lowmem tests were
passing, despite not having the detection code yet. Turns out this is
because we weren't using a run command helper which does the proper
expansion and adds the --mode=lowmem option. Fix this to use the proper
handler, and now the lowmem test fails properly without my patch to add
this support to the lowmem mode.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We can already fix this problem with the block accounting code, we just
need to keep track of how much we should have used on the file system,
and then check it against the bytes_super. The repair just piggy backs
on the block group used repair.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Test 044 was failing with lowmem because it was not bubbling up the
error to the user. This is because we try to allow repair the
opportunity to clear the error, however if repair isn't set we simply do
not add the temporary error to the main error return variable. Fix this
by adding the tmp_err to err before moving on to the next item.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have a check that will return an error only if ret < 0, but we return
the lowmem specific errors which are all > 0. Fix this by simply
checking if (ret). This allows test 010 to pass with lowmem properly.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Long options are always preferred and in case there's a long list of
single letter options it's the best practice to keep the options sane.
Signed-off-by: David Sterba <dsterba@suse.com>
This image has a broken used field of a block group item to validate
fsck does the correct thing.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The lowmem mode validates the used field of the block group item, but
the normal mode does not. Fix this by keeping a running tally of what
we think the used value for the block group should be, and then if it
mismatches report an error and fix the problem if we have repair set.
We have to keep track of pending extents because we process leaves as we
see them, so it could be much later in the process that we find the
block group item to associate the extents with.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While doing the extent tree v2 stuff I noticed that fsck doesn't detect
an invalid ->used value on the block group item in the normal mode. To
build a test case for this I need the ability to corrupt block group
items. This allows us to corrupt the various fields of a block group.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a small device size misalignment between the super block device
size and the device extent size:
total_bytes 10737418240 <<<
bytes_used 15097856
dev_item.total_bytes 10737418240
dev_item.bytes_used 1094713344
item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
devid 1 total_bytes 1095761920 bytes_used 1094713344
^^^^^^^^^^
[CAUSE]
In fixup_device_size(), we only reset superblock device item size, which
will be overwritten in write_dev_supers() using btrfs_device::total_bytes.
And it doesn't touch btrfs_superblock::total_bytes either.
[FIX]
So fix the small mismatch by also resetting btrfs_device::total_bytes,
btrfs_device::bytes_used and btrfs_superblock::total_bytes.
Thankfully since commit 73dd4e3c87 ("btrfs-progs: image: Don't modify
the chunk and device tree if the source dump is single device") single
device dump won't have such problem, but it's still worth for
multi-device dump.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
With recent change to enlarge max_pending_size to 256M for data dump,
the decompress code requires quite a lot of memory, up to 256M * 4.
The reason is we're using wrapped uncompress() function call, which
needs the buffer to be large enough to contain the decompressed data.
This patch will re-work the decompress work to use inflate() which can
resume it decompression so that we can use a much smaller buffer size.
Use 512K as buffer size.
Now the memory consumption for restore is reduced to
cluster data size + 512K * nr_running_threads
Instead of the original one:
cluster data size + 1G * nr_running_threads
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This new experimental data dump feature will dump the whole image, not
only the existing tree blocks but also all its data extents(*).
This feature will rely on the new dump format (_DUmP_v1), as it needs
extra large extent size limit, and older btrfs-image dump can't handle
such large item/cluster size.
Since we're dumping all extents including data extents, for the restored
image there is no need to use any extra super block flags to inform
kernel.
Kernel should just treat the restored image as any ordinary btrfs.
This new feature will be hidden behind the experimental features, that's
to say, if --enable-experimental is not enabled, although we still have
the option, it will not do anything but output an error message.
*: The data extents will be dumped as is, that's to say, even for
preallocated extent, its (meaningless) data will be read out and
dumpped.
This behavior will cause extra space usage for the image, but we can
skip all the complex partially shared preallocated extent check.
Issue: #394
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The original dump format only contains a magic member to verify the
format, this means if we want to introduce new on-disk format or change
certain size limit, we can only introduce new magic as version.
Introduce the framework to allow multiple magic numbers to co-exist for
further extensions.
Introduce the following members for each dump version.
- max_pending_size
The threshold size of a cluster. It's not a hard limit but a soft
one. One cluster can go larger than max_pending_size for one item, but
next item would go to the next cluster.
- magic_cpu
The magic number in CPU byte order.
- extra_sb_flags
If the super block of this restore needs extra super block flags like
BTRFS_SUPER_FLAG_METADUMP_V2.
For incoming data dump feature, we don't need any extra super block
flags.
This change also implies that all image dumps will use the same magic
for all clusters. No mixing is allowed, as we will use the first cluster
to determine the dump version.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add --enable-experimental configure option that allows to merge unstable
features or partially implemented features. This is supposed to help
features that need time to settle, tweak output or formatting and would
require constant rebases and would have limited exposure to users that
could provide feedback.
If this is enabled, the following may change without notice:
- the whole feature may disappear in the future
- new command names could change or relocate to other subcommands
- parameter names
- output formatting
- json output
Signed-off-by: David Sterba <dsterba@suse.com>
For the incoming extra page size support for subpage (sectorsize <
PAGE_SIZE) cases, the support for metadata will be a critical point.
Currently for subpage support, we require 64K page size, so that no
matter whatever the nodesize is, it will be contained inside one page.
And we will reject any tree block which crosses page boundary.
But for other page size, especially 16K page size, we must support
nodesize differently.
For nodesize < PAGE_SIZE, we will have the same requirement (tree blocks
can't cross page boundary).
While for nodesize >= PAGE_SIZE, we will require the tree blocks to be
page aligned.
To support such feature, we will make btrfs-check to reports more
subpage related warnings for metadata.
This patch will report any tree block which is not nodesize aligned as a
warning.
Existing mkfs/convert has already make sure all new tree blocks are
nodesize aligned, this is just for older converted filesystems.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For fsck tests, we check the subpage warnings for each type 1 test, but
such type 1 tests are mostly read-only tests, and one of the test will
trigger new subpage related warnings (fsck/018).
For subpage related warnings, what we really care are write operations,
including mkfs, btrfs-convert and repair, not those read-only tests.
So skip the subpage warning check for fsck type 1 tests to prevent false
alert of later more strict subpage warnings.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are two types of test cases:
- Type 1 (without test.sh)
- Type 2 (test.sh, mostly will override check_image())
For Type 2 tests, we check subpage related warnings of btrfs-check, but
didn't check it for Type 1 test cases.
In fact, Type 1 test cases are more important, as they involve repair,
which can generate new tree blocks, and we want to make sure such new
tree blocks won't cause subpage related warnings.
This patch will add the extra check for Type 1 test cases.
And it will make sure the subpage related warnings are really from this
test case, to prevent false alerts.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The declarations do not correspond to any command descriptors as they
have been moved to other command groups.
Signed-off-by: David Sterba <dsterba@suse.com>
Add new option --uuid to convert with the following modes:
- 'copy' -- copy the UUID from the source filesystem
- 'new' -- (default) generate new UUID
- UUID -- a valid UUID that will be set on btrfs
Based on patch from Florian
https://lore.kernel.org/linux-btrfs/1357486331-4615-2-git-send-email-falbrechtskirchinger@gmail.com/
and ported to contemporary codebase.
Issue: #391
Signed-off-by: David Sterba <dsterba@suse.com>
There is a recent report of ghost subvolumes where such subvolumes has
no ROOT_REF/BACKREF, and 0 root ref. But without an orphan item, thus
kernel won't queue them for cleanup.
Such ghost subvolumes are just here to take up space, and no way to
delete them except by btrfs check, which will try to fix the problem by
adding orphan item.
There is a kernel patch submitted to allow btrfs to detect such ghost
subvolumes and queue them for cleanup.
But btrfs-progs will not continue to call the ioctl if it can't find the
full subvolume path.
Thus this patch will loose the restriction by allowing btrfs-progs to
continue to call the ioctl even if it can't grab the subvolume path.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[HICCUP]
There is a bug report that mkfs.btrfs -R free-space-tree still makes
kernel to try to cleanup the v1 space cache:
# mkfs.btrfs -R free-space-tree -f /dev/test/scratch1
# mount /dev/test/scratch1 /mnt/btrfs
# dmesg | grep cleaning
BTRFS info (device dm-6): cleaning free space cache v1
[CAUSE]
By default, mkfs.btrfs will set super cache generation to (u64)-1, which
will inform kernel that the v1 space cache is invalid, needs to
regenerate it.
But for free space cache tree, kernel will set super cache generation to
0, to indicate v1 space cache is not in use.
This means, even we enabled free space tree with all the RO compatible
bits and new tree, as long as super cache generation is not 0, kernel
still consider the fs has some invalid v1 space cache, and will try to
remove them.
[FIX]
This is not a big deal, but to make the "-R free-space-tree" to really
work as kernel, we also need to set super cache generation to 0.
Reported-by: Chris Murphy <lists@colorremedies.com>
Link: https://lore.kernel.org/linux-btrfs/CAJCQCtSvgzyOnxtrqQZZirSycEHp+g0eDH5c+Kw9mW=PgxuXmw@mail.gmail.com/
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The defaults for rotational devices are to enable DUP for metadata, this
does not yet work on zoned devices and fails with messages like:
Zoned: /dev/sda: host-managed device detected, setting zoned feature
ERROR: cannot use RAID/DUP profile in zoned mode
The RAID/DUP support will be implemented in the future and we don't want
to change the defaults to revert them back again. This makes it a bit
awkward for the user until this happens, so at least print a hint what
to do that single/single must be set manually.
Link: https://lore.kernel.org/linux-btrfs/20210706091922.38650-1-johannes.thumshirn@wdc.com/
Reported-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Make sure btrfs check can detect such problem. Right now we have no way
to fix it yet.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Linux VFS doesn't allow directory to have hard links, thus for btrfs
on-disk directory inode items, their nlinks should never go beyond 1.
Lowmem mode already has the check and will report it without problem.
Only original mode needs this update.
Reported-by: Pepperpoint <pepperpoint@mb.ardentcoding.com>
Link: https://lore.kernel.org/linux-btrfs/162648632340.7.1932907459648384384.10178178@mb.ardentcoding.com/
Reviewed-by: Su Yue <l@damenly.su>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Subvolume iteration has a window between when we get a root ref (with
BTRFS_IOC_TREE_SEARCH or BTRFS_IOC_GET_SUBVOL_ROOTREF) and when we look
up the path of the parent directory (with BTRFS_IOC_INO_LOOKUP{,_USER}).
If the subvolume is moved or deleted and its old parent directory is
deleted during that window, then BTRFS_IOC_INO_LOOKUP{,_USER} will fail
with ENOENT. The iteration will then fail with ENOENT as well.
We originally encountered this bug with an application that called
`btrfs subvolume show` (which iterates subvolumes to find snapshots) in
parallel with other threads creating and deleting subvolumes. It can be
reproduced almost instantly with the included test cases.
Subvolume iteration should be robust against concurrent modifications to
subvolumes. So, if a subvolume's parent directory no longer exists, just
skip the subvolume, as it must have been deleted or moved elsewhere.
Reviewed-by: Neal Gompa <ngompa13@gmail.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Run ci/ci-build-musl to verify build of current branch works in
environment with musl libc. Also works for a given branch name.
Signed-off-by: David Sterba <dsterba@suse.com>
There is some code that using NAME_MAX but it doesn't include header
that is defined. This patch adds a line that includes linux/limits.h
which defines NAME_MAX.
Issue: #386
Issue: #385
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently v1 space cache clearing will delete one cache inode just in
one transaction, and then start a new transaction to delete the next
inode.
This is far from efficient and can make the already slow v1 space cache
deleting even slower, as large fs has tons of cache inodes to delete.
This patch will speed up the process by batching up to 16 inode deletion
into one transaction.
A quick benchmark of deleting 702 v1 space cache inodes would look like
this:
Unpatched: 4.898s
Patched: 0.087s
Which is obviously a big win.
Reported-by: Joshua <joshua@mailmag.net>
Link: https://lore.kernel.org/linux-btrfs/0b4cf70fc883e28c97d893a3b2f81b11@mailmag.net/
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since v4.19, btrfs-progs has full write support to free space tree, the
out-of-date warning in btrfs(5) has already confused some end user.
Update the content to avoid further confusion.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>