6f4a51886b
When attempting an encoded write, if it fails for some specific reason
like -EINVAL (when an offset is not sector size aligned) or -ENOSPC, we
then fallback into decompressing the data and writing it using regular
buffered IO. This logic however is not correct, one of the reasons is
that it assumes the encoded offset is smaller than the unencoded file
length and that they can be compared, but one is an offset and the other
is a length, not an end offset, so they can't be compared to get correct
results. This bad logic will often result in not copying all data, or even
no data at all, resulting in a silent data loss. This is easily seen in
with the following reproducer:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdj
MNT=/mnt/sdj
umount $DEV &> /dev/null
mkfs.btrfs -f $DEV > /dev/null
mount -o compress $DEV $MNT
# File foo has a size of 33K, not aligned to the sector size.
xfs_io -f -c "pwrite -S 0xab 0 33K" $MNT/foo
xfs_io -f -c "pwrite -S 0xcd 0 64K" $MNT/bar
# Now clone the first 32K of file bar into foo at offset 0.
xfs_io -c "reflink $MNT/bar 0 0 32K" $MNT/foo
# Snapshot the default subvolume and create a full send stream (v2).
btrfs subvolume snapshot -r $MNT $MNT/snap
btrfs send --compressed-data -f /tmp/test.send $MNT/snap
echo -e "\nFile bar in the original filesystem:"
od -A d -t x1 $MNT/snap/bar
umount $MNT
mkfs.btrfs -f $DEV > /dev/null
mount $DEV $MNT
echo -e "\nReceiving stream in a new filesystem..."
btrfs receive -f /tmp/test.send $MNT
echo -e "\nFile bar in the new filesystem:"
od -A d -t x1 $MNT/snap/bar
umount $MNT
Running the test without this patch:
$ ./test.sh
(...)
File bar in the original filesystem:
0000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
*
0065536
Receiving stream in a new filesystem...
At subvol snap
File bar in the new filesystem:
0000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
*
0033792
We end up with file bar having less data, and a smaller size, than in the
original filesystem.
This happens because when processing file bar, send issues the following
commands:
clone bar - source=foo source offset=0 offset=0 length=32768
write bar - offset=32768 length=1024
encoded_write bar - offset=33792, len=4096, unencoded_offset=33792, unencoded_file_len=31744, unencoded_len=65536, compression=1, encryption=0
The first 32K are cloned from file foo, as that file ranged is shared
between the files.
Then there's a regular write operation for the file range [32K, 33K),
since file foo has different data from bar for that file range.
Finally for the remainder of file bar, the send side issues an encoded
write since the extent is compressed in the source filesystem, for the
file offset 33792 (33K), remaining 31K of data. The receiver will try the
encoded write, but that fails with -EINVAL since the offset 33K is not
sector size aligned, so it will fallback to decompressing the data and
writing it using regular buffered writes. However that results in doing
no writes at decompress_and_write() because 'pos' is initialized to the
value of 33K (unencoded_offset) and unencoded_file_len is 31K, so the
while loop has no iterations.
Another case where we can fallback to decompression plus regular buffered
writes is when the destination filesystem has a sector size larger then
the sector size of the source filesystem (for example when the source
filesystem is on x86_64 with a 4K sector size and the destination
filesystem is PowerPC with a 64K sector size). In that scenario encoded
write attempts will fail with -EINVAL due to offsets not being aligned
with the sector size of the destination filesystem, and the receive will
attempt the fallback of decompressing the buffer and writing the
decompressed using regular buffered IO.
Fix this by tracking the number of written bytes instead, and increment
it, and the unencoded offset, after each write.
Fixes:
|
||
---|---|---|
Documentation | ||
check | ||
ci | ||
cmds | ||
common | ||
convert | ||
crypto | ||
image | ||
kernel-lib | ||
kernel-shared | ||
libbtrfs | ||
libbtrfsutil | ||
m4 | ||
mkfs | ||
tests | ||
.editorconfig | ||
.gitignore | ||
64-btrfs-dm.rules | ||
64-btrfs-zoned.rules | ||
CHANGES | ||
COPYING | ||
INSTALL | ||
Makefile | ||
Makefile.extrawarn | ||
Makefile.inc.in | ||
README.md | ||
VERSION | ||
autogen.sh | ||
btrfs-completion | ||
btrfs-corrupt-block.c | ||
btrfs-crc.c | ||
btrfs-debugfs | ||
btrfs-find-root.c | ||
btrfs-fragments.c | ||
btrfs-map-logical.c | ||
btrfs-sb-mod.c | ||
btrfs-select-super.c | ||
btrfs.c | ||
btrfstune.c | ||
configure.ac | ||
fsck.btrfs | ||
ioctl.h | ||
kerncompat.h | ||
libbtrfs.sym | ||
quick-test.c | ||
show-blocks | ||
version.h.in |
README.md
Btrfs-progs
Userspace utilities to manage btrfs filesystems. License: GPLv2.
Btrfs is a copy on write (COW) filesystem for Linux aimed at implementing advanced features while focusing on fault tolerance, repair and easy administration.
This repository hosts following utilities and also documentation:
- btrfs — the main administration tool (manual page)
- mkfs.btrfs — utility to create the filesystem (manual page)
- all-in-one binary in the busybox style with mkfs.btrfs, btrfs-image and other tools built-in (standalone tools)
- libbtrfsutil (LGPL v2.1) — C and python 3 bindings, see libbtrfsutil/README.md for more
- manual pages and documentation source published at btrfs.readthedocs.io
See INSTALL for build instructions and tests/README.md for testing information.
Release cycle
The major version releases are time-based and follow the cycle of the linux kernel releases. The cycle usually takes 2 months. A minor version releases may happen in the meantime if there are bug fixes or minor useful improvements queued.
The release tags are signed with a GPG key ID F2B4 1200 C54E FB30 380C 1756 C565 D5F9 D76D 583B
,
release tarballs are hosted at kernel.org.
See file CHANGES or changelogs on wiki.
Reporting bugs
There are several ways, each has its own specifics and audience that can give feedback or work on a fix. The following list is sorted in the order of preference:
- github issue tracker
- to the mailing list linux-btrfs@vger.kernel.org -- (not required to subscribe), beware that the mail might get overlooked in other traffic
- IRC (irc.libera.chat #btrfs) -- good for discussions eg. if a bug is already known, but reports could miss developers' attention
- bugzilla.kernel.org -- (requires registration), set the product to Filesystems and component Btrfs, please put 'btrfs-progs' into the subject so it's clear that it's not a kernel bug report
Development
The patch submissions, development or general discussions take place at linux-btrfs@vger.kernel.org mailinglist, subsciption is not required to post.
The GitHub pull requests will not be accepted directly, the preferred way is to send patches to the mailinglist instead. You can link to a branch in any git repository if the mails do not make it to the mailinglist or just for convenience (makes it easier to test).
The development model of btrfs-progs shares a lot with the kernel model. The github way is different in some ways. We, the upstream community, expect that the patches meet some criteria (often lacking in github contributions):
- one logical change per patch: eg. not mixing bugfixes, cleanups, features etc., sometimes it's not clear and will be usually pointed out during reviews
- proper subject line: eg. prefix with btrfs-progs: subpart, ... ,
descriptive yet not too long, see
git log --oneline
for some inspiration - proper changelog: the changelogs are often missing or lacking explanation why the change was made, or how is something broken, what are user-visible effects of the bug or the fix, how does an improvement help or the intended usecase
- the Signed-off-by line: this documents who authored the change, you can read
more about the
The Developer's Certificate of Origin (chapter 11)
- if you are not used to the signed-off style, your contributions won't be rejected just because of it's missing, the Author: tag will be added as a substitute in order to allow contributions without much bothering with formalities
Source code coding style and preferences follow the
kernel coding style.
You can find the editor settings in .editorconfig
and use the
EditorConfig plugin to let your editor use that,
or update your editor settings manually.
Testing
The testing documentation can be found in tests/ and continuous integration/container images in ci/.
Documentation updates
Documentation fixes or updates do not need much explanation so sticking to the code rules in the previous section is not necessary. GitHub pull requests are OK, patches could be sent to me directly and not required to be also in the mailinglist. Pointing out typos via IRC also works, although might get accidentally lost in the noise.
Documents are written in RST and built by sphinx.
Third-party sources
Build dependencies are listed in INSTALL. Implementation of checksum/hash functions is provided by copies of the respective sources to avoid adding dependencies that would make deployments in rescure or limited environments harder. The implementations are portable and not optimized for speed nor accelerated. Optionally it's possible to use libgcrypt, libsodium or libkcapi implementations.
- CRC32C: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
- XXHASH: https://github.com/Cyan4973/xxHash
- SHA256: https://tools.ietf.org/html/rfc4634
- BLAKE2: https://github.com/BLAKE2/BLAKE2
Some other code is borrowed from kernel, eg. the raid5 tables or data structure implementation.