The BUG_ON() condition in write_data_to_disk() is no longer correct.
Now write_raid56_with_parity() will return the bytes written of last
stripe.
Thus a success writeback can trigger the BUG_ON(ret).
Fix the condition to (ret < 0).
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Shinichiro reported that "mkfs.btrfs -m DUP" is doing repeated write
into the device.
For non-zoned device this is not a big deal, but for zoned device this
is critical, as zoned device doesn't support overwrite at all.
[CAUSE]
The problem is related to write_and_map_eb() call, since commit
2a93728391 ("btrfs-progs: use write_data_to_disk() to replace
write_extent_to_disk()"), we call write_data_to_disk() for metadata
write back.
But the problem is, write_data_to_disk() will call btrfs_map_block()
with rw = WRITE.
By that btrfs_map_block() will always return all stripes, while in
write_data_to_disk() we also iterate through each mirror of the range.
This results above repeated writeback.
[FIX]
Fix this problem by completely remove @mirror argument
from write_data_to_disk().
With extra comments to explicitly show that function will write to
all mirrors.
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: 2a93728391 ("btrfs-progs: use write_data_to_disk() to replace write_extent_to_disk()")
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The helper `check_min_kernel_version` is duplicated and can be removed.
Signed-off-by: Chung-Chiang Cheng <cccheng@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Some older compilers do not support overflow builtins introduced in
5ad2aacd24 ("btrfs-progs: kernel-lib: sync include/overflow.h"). Add
stubs to make it compile. This fixes CI build of Centos 7.
Signed-off-by: David Sterba <dsterba@suse.com>
Use the autoconf archive macros for gcc builtin detection and add the
overflow from recently added from kernel.
New:
__builtin_add_overflow
__builtin_sub_overflow
__builtin_mul_overflow
Signed-off-by: David Sterba <dsterba@suse.com>
The help text is out of sync with many options, lacking the long
options, required arguments or mistakenly requiring arguments when the
value is read from another one.
Signed-off-by: David Sterba <dsterba@suse.com>
This file includes linux/fs.h which includes linux/mount.h and with
glibc 2.36 linux/mount.h and glibc mount.h are not compatible [1]
therefore try to avoid including both headers
[1] https://sourceware.org/glibc/wiki/Release/2.36
Signed-off-by: Khem Raj <raj.khem@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Attempting to dump a bad btrfs superblock returns successful exit status
zero. According to the manual page non-zero should be returned on
failure. Fix this.
$ btrfs inspect-internal dump-super /dev/zero
superblock: bytenr=65536, device=/dev/zero
---------------------------------------------------------
ERROR: bad magic on superblock on /dev/zero at 65536
$ echo $?
0
Signed-off-by: Mike Fleetwood <mike.fleetwood@googlemail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This compat flag is missing, but is being checked by mount, and could
well be present legitimately.
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
To corrupt holes/prealloc/inline extents, we need to mess with
extent data items. This patch makes it possible to modify
disk_bytenr with a specific value (useful for hole corruptions)
and to modify the type field (useful for prealloc corruptions)
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs-corrupt-block already has a mix of generic and specific corruption
options, but currently lacks the capacity for totally arbitrary
corruption in item data.
There is already a flag for corruption size (bytes/-b), so add a flag
for an offset and a value to memset the item with. Exercise the new
flags with a new variant for -I (item) corruption. Look up the item as
before, but instead of corrupting a field in the item struct, corrupt an
offset/size in the item data.
The motivating example for this is that in testing fsverity with btrfs,
we need to corrupt the generated Merkle tree--metadata item data which
is an opaque blob to btrfs.
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
The command group of 'replace' belongs to device and could be seen as
confusing. At minimum we can add an alias so now there's equivalent:
# btrfs replace start
# btrfs device replace start
Both commands will exist for backward compatibility, tough we might
revisit which one is the primary one.
Issue: #484
Signed-off-by: David Sterba <dsterba@suse.com>
This is in preparation for introducing tabular output for device stats. Simply
factor out string-specific output lines in a separate function.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Similar to kernel check_leaf(), calling btrfs_item_end_nr() may get a
reasonable value even an item has invalid offset/size due to u32
overflow.
Fix it by use u64 variable to store item data end in btrfs_check_leaf()
to avoid u32 overflow.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215299
Reported-by: Wenqing Liu <wenqingliu0120@gmail.com>
Signed-off-by: Su Yue <l@damenly.su>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a test to ensure that 'btrfs fi show' on a mounted filesystem, which
has a missing device will explicitly print which device is missing.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently when a device is missing for a mounted filesystem the output
that is produced is unhelpful:
Label: none uuid: 139ef309-021f-4b98-a3a8-ce230a83b1e2
Total devices 2 FS bytes used 128.00KiB
devid 1 size 5.00GiB used 1.26GiB path /dev/loop0
*** Some devices missing
While the context which prints this is perfectly capable of showing
which device exactly is missing, like so:
Label: none uuid: 4a85a40b-9b79-4bde-8e52-c65a550a176b
Total devices 2 FS bytes used 128.00KiB
devid 1 size 5.00GiB used 1.26GiB path /dev/loop0
devid 2 size 0 used 0 path /dev/loop1 MISSING
This is a lot more usable output as it presents the user with the id
of the missing device and its path.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is the same on-disk format update synchronized from the kernel
code.
Unlike kernel, there are two callers reading this member:
- btrfs inspect dump-super
It's just printing the value, add a notice about deprecation.
- btrfs-find-root
In that case, since we always got 0, the root search for log root
should never find a perfect match.
Use btrfs_super_geneartion() + 1 to provide a better result.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
On large (blockcount > 32bit) filesystems reading directly
super_block->s_blocks_count is not sufficient as the block count is held
in 2 separate 32 bit variables. Instead always use the provided
ext2fs_blocks_count to read the value. This can result in assertion
failure, when the block count is only held in the high 32 bits, in this
case s_block_counts would be zero, which would result in
btrfs_convert_context::block_count/total_bytes to also be 0 and hit an
assertion failure:
convert/main.c:1162: do_convert: Assertion `cctx.total_bytes != 0` failed, value 0
btrfs-convert(+0xffb0)[0x557defdabfb0]
btrfs-convert(main+0x6c5)[0x557defdaa125]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7f66e1f8bd0a]
btrfs-convert(_start+0x2a)[0x557defdab52a]
Aborted
What's worse it can also result in btrfs-convert mistakenly thinking
that a filesystem is smaller than it actually is (ignoring the top 32 bits).
Link: https://lore.kernel.org/linux-btrfs/023b5ca9-0610-231b-fc4e-a72fe1377a5a@jansson.tech/
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The initial proposal for file attributes was built on simply doing
SETFLAGS but this builds on an old and non-extensible interface that has
no direct mapping for all inode flags. There's a unified interface
fileattr that covers file attributes and xflags, it should be possible
to add new bits.
On the protocol level the value is copied as-is in the original inode
but this does not provide enough information how to apply the bits on
the receiving side. Eg. IMMUTABLE flag prevents any changes to the file
and has to be handled manually.
The receiving side does not apply the bits yet, only parses it from the
stream.
Signed-off-by: David Sterba <dsterba@suse.com>
Add constant for initial value to avoid unexpected clashes with user
defined getopt values and shift the common size getopt values.
Signed-off-by: David Sterba <dsterba@suse.com>
Now that LZO and ZSTD are optional for not just restore, rename the
build variables to a more generic name and update configure summary.
Signed-off-by: David Sterba <dsterba@suse.com>
There are build-time options for LZO and ZSTD support, the stream v2+
supports compression. The help text lists what has been compiled in,
similar to what 'restore' does, with a similar limitation that a stream
with compressed data cannot be processed if any of the extents is
compressed.
Signed-off-by: David Sterba <dsterba@suse.com>
Copy contents from https://btrfs.wiki.kernel.org/index.php/Changelog#By_feature
The formatting is done by a definition and list, instead of a table.
Unfortunatelly RST does not wrap long text in table cells so the width
exceeds visible area.
Signed-off-by: David Sterba <dsterba@suse.com>
Fix the following warnings:
[SPHINX] man
../CHANGES:37: ERROR: Unexpected indentation.
../CHANGES:1183: WARNING: Block quote ends without a blank line; unexpected unindent.
./Documentation/DocConventions.rst:31: ERROR: Unexpected indentation.
./Documentation/Source-repositories.rst:2: WARNING: Duplicate explicit target name: "web access".
./Documentation/DocConventions.rst: WARNING: document isn't included in any toctree
The free format of CHANGES sometimes does not align with RST so fix it
so it's visually similar in both formats. Remove RST references to 'web
access', URLs are parsed and rendered clickable and we don't have full
reference list yet. Doc conventions have been updated to RST but not
finalized so put it to TODO section.
Signed-off-by: David Sterba <dsterba@suse.com>
Adapt the existing send/receive tests by passing '-o compress-force' to
the mount commands in a new test. After writing a few files in the
various compression formats, send/receive them with and without
--force-decompress to test both the encoded_write path and the fallback
to decode+write.
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
First, add a --proto option to allow specifying the desired send
protocol version. It defaults to one, the original version. In a couple
of releases once people are aware that protocol revisions are happening,
we can change it to default to zero, which means the latest version
supported by the kernel. This is based on Dave Sterba's patch.
Also add a --compressed-data flag to instruct the kernel to use
encoded_write commands for compressed extents. This requires an explicit
opt in separate from the protocol version because:
1. The user may not want compression on the receiving side, or may want
a different compression algorithm/level on the receiving side.
2. It has a soft requirement for kernel support on the receiving side
(btrfs-progs can fall back to decompressing and writing if the kernel
doesn't support BTRFS_IOC_ENCODED_WRITE, but the user may not be
prepared to pay that CPU cost). Going forward, since it's easier to
update progs than the kernel, I think we'll want to make new send
features that require kernel support opt-in, whereas anything that
only requires a progs update can happen automatically.
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In send stream v2, send can emit a command for setting inode flags via
the setflags ioctl. Pass the flags attribute through to the ioctl call
in receive.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Send stream v2 can emit fallocate commands, so receive must support them
as well. The implementation simply passes along the arguments to the
syscall. Note that mode is encoded as a u32 in send stream but fallocate
takes an int, so there is a unsigned->signed conversion there.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
An encoded_write can fail if the file system it is being applied to does
not support encoded writes or if it can't find enough contiguous space
to accommodate the encoded extent. In those cases, we can likely still
process an encoded_write by explicitly decoding the data and doing a
normal write.
Add the necessary fallback path for decoding data compressed with zlib,
lzo, or zstd. zlib and zstd have reusable decoding context data
structures which we cache in the receive context so that we don't have
to recreate them on every encoded_write.
Finally, add a command line flag for force-decompress which causes
receive to always use the fallback path rather than first attempting the
encoded write.
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a new btrfs_send_op and support for both dumping and proper receive
processing which does actual encoded writes.
Encoded writes are only allowed on a file descriptor opened with an
extra flag that allows encoded writes, so we also add support for this
flag when opening or reusing a file for writing.
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Update our copy of send.h from the kernel. This adds the new commands
and attributes for v2 as well as explicit enum numbering.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The new format privileges the BTRFS_SEND_A_DATA attribute by
guaranteeing it will always be the last attribute in any command that
needs it, and by implicitly encoding the data length as the difference
between the total command length in the command header and the sizes of
the rest of the attributes (and of course the tlv_type identifying the
DATA attribute). To parse the new stream, we must read the tlv_type and
if it is not DATA, we proceed normally, but if it is DATA, we don't
parse a tlv_len but simply compute the length.
In addition, we add some bounds checking when parsing each chunk of
data, as well as for the tlv_len itself.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
In send stream v2, write commands can now be an arbitrary size. For that
reason, we can no longer allocate a fixed array in sctx for read_cmd.
Instead, read_cmd dynamically allocates sctx->read_buf. To avoid
needless reallocations, we reuse read_buf between read_cmd calls by also
keeping track of the size of the allocated buffer in sctx->read_buf_sz.
We do the first allocation of the old default size at the start of
processing the stream, and we only reallocate if we encounter a command
that needs a larger buffer.
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
An encoded extent can be up to 128K in length, which exceeds the largest
value expressible by the current send stream format's 16 bit tlv_len
field. Since encoded writes cannot be split into multiple writes by
btrfs send, the send stream format must change to accommodate encoded
writes.
Supporting this changed format requires retooling how we store the
commands we have processed. We currently store pointers to the struct
btrfs_tlv_headers in the command buffer. This is not sufficient to
represent the new BTRFS_SEND_A_DATA format. Instead, parse the attribute
headers and store them in a new struct btrfs_send_attribute which has a
32bit length field. This is transparent to users of the various TLV_GET
macros.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
The header extent-cache.h is internal and provides only structures defined
in ctree.h, we don't need anything else besides the structures.
Signed-off-by: David Sterba <dsterba@suse.com>
The header extent_io.h is internal and provides only structures defined
in ctree.h, we don't need anything else besides the structures and
declarations used for inline functions in other headers.
Signed-off-by: David Sterba <dsterba@suse.com>