As zoned btrfs uses regular writes for metadata, it needs zone write
locking in the IO scheduler. Add a udev rule that configures an IO
scheduler doing zone write locking.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add some scripts for convenience, so far there was only one for musl as
it usually breaks first, but we've had some problems on centos due to
old kernel headers and potential breakage when changing kerncpomat.h.
Signed-off-by: David Sterba <dsterba@suse.com>
This new test script will create a fs with the following situations:
- Preallocated extents (no data csum)
- Nodatasum inodes (no data csum)
- Partially written preallocated extents (no data csum for part of the
extent)
- Regular data extents (with data csum)
- Regular data extents then hole punched (with data csum)
- Preallocated data, then written, then hole punched (with data csum)
- Compressed extents (with data csum)
And make sure after --init-csum-tree (with or without
--init-extent-tree) the result fs can still pass check.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When using `--init-csum-tree` with `--init-extent-tree`, csum tree
population will be done by iterating all file extent items.
This allow us to skip preallocated extents, but it still has the
following problems:
- Inodes with NODATASUM
- Hole file extents
- Written preallocated extents
We will generate csum for the whole extent, while other part may still
be unwritten.
Make it to follow the same behavior of recently introduced
fill_csum_for_file_extent(), so we can generate correct csum.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
If a btrfs filesystem has preallocated file extents, 'btrfs check
--init-csum-tree' will create csum item for preallocated extents, and
cause error:
# mkfs.btrfs -f test.img
# mount test.img /mnt/btrfs
# fallocate -l 32K /mnt/btrfs/file
# umount /mnt/btrfs
# btrfs check --init-csum-tree --force test.img
...
[4/7] checking fs roots
root 5 inode 257 errors 800, odd csum item
ERROR: errors found in fs roots
found 376832 bytes used, error(s) found
And the csum tree is not empty, containing csum for the preallocated
extent:
$ btrfs ins dump-tree -t csum test.img
btrfs-progs v5.15.1
checksum tree key (CSUM_TREE ROOT_ITEM 0)
leaf 30408704 items 1 free space 16226 generation 9 owner CSUM_TREE
leaf 30408704 flags 0x1(WRITTEN) backref revision 1
fs uuid ecc79835-5611-4609-b985-e4ccd6f15b54
chunk uuid b1c75553-5b82-4aa6-bbbe-e7f50643b1a8
item 0 key (EXTENT_CSUM EXTENT_CSUM 13631488) itemoff 16251 itemsize 32
range start 13631488 end 13664256 length 32768
[CAUSE]
For `--init-csum-tree` alone, we will use extent tree to iterate each
data extent, and calculate csum for them.
But extent items alone can not tell us if the file extent belongs to a
NODATASUM inode, nor if it's preallocated.
Thus we create csums for those data extents, and cause the problem.
[FIX]
However the fix is not that simple, we can not just generate csum for
non-preallocated range.
As the following case we still need csum for the un-referred part:
xfs_io -f -c "pwrite 0 8K" -c "sync" -c "punch 0 4K"
So here we have to go another direction by:
- Always generate csum for the whole data extent
This is the same as the old code
- Iterate the file extents, and delete csum for preallocated range
or NODATASUM inodes
Issue: #430
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This part has no mode specific operations, just move them into
mode-common.[ch].
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When calling iterate_extent_inodes() on data extents with indirect ref
(with inline or keyed EXTENT_DATA_REF_KEY), it will fail to execute the
call back function at all.
[CAUSE]
In function find_parent_nodes(), we only add the target tree block if a
backref has @parent populated.
For indirect backref like EXTENT_DATA_REF_KEY, we rely on
__resolve_indirect_ref() to get the parent leaves.
However __resolve_indirect_ref() only grabs backrefs from
&prefstate->pending_indirect_refs.
Meaning callers should queue any indirect backref to
pending_indirect_refs.
But unfortunately in __add_prelim_ref() and __add_missing_keys(), none
of them properly queue the indirect backrefs to pending_indirect_refs,
but directly to pending.
Making all indirect backrefs never got resolved, thus no callback
function executed
[FIX]
Fix __add_prelim_ref() and __add_missing_keys() to properly queue
indirect backrefs to the correct list.
Currently there is no such direct user in btrfs-progs, but later csum
tree re-initialization code will rely this to do proper csum
re-calculate (to avoid preallocated/nodatasum extents).
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There's still problem left with compilation on musl and kernel < 5.11,
because __ALIGN_KERNEL is not defined anymore:
../bin/ld: kernel-shared/volumes.o: in function `create_chunk':
volumes.c:(.text+0x17f8): undefined reference to `__ALIGN_KERNEL'
Due to the entangled includes and unconditional definition of
__ALIGN_KERNEL, we can't use #ifdef in kerncompat.h to define it
eventually (as kerncompat.h is the first include). Instead add local
definitions of the macros and rename them to avoid name clashes.
Pull-request: #433
Signed-off-by: David Sterba <dsterba@suse.com>
cfg->fs_uuid is either 0 or set to the value of the -U parameter
passed to mkfs.btrfs. However the value of the latter is already being
validated in the main mkfs function. Just remove the duplicated checks
in make_btrfs as they effectively can never be executed.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pointer returned from get_parent needs additional handling otherwise
we could return an error and then try to free it. Reset the pointer when
the error occurs so the cleanup is always done on a valid pointer.
Issue: #423
Signed-off-by: David Sterba <dsterba@suse.com>
The max width of the page is set to 800px, that renders of commands and
output in a small box requiring a scrollbar. Increase the limit to
1200px. Also update some subvoulme related text.
Signed-off-by: David Sterba <dsterba@suse.com>
Make inclusion of sys/sysinfo.h conditional to avoid the following build
failure on musl:
In file included from .../i586-buildroot-linux-musl/sysroot/usr/include/linux/kernel.h:4,
from ./kerncompat.h:31,
from common/utils.c:42:
.../i586-buildroot-linux-musl/sysroot/usr/include/linux/sysinfo.h:7:8: error: redefinition of 'struct sysinfo'
7 | struct sysinfo {
| ^~~~~~~
In file included from common/utils.c:27:
.../i586-buildroot-linux-musl/sysroot/usr/include/sys/sysinfo.h:10:8: note: originally defined here
10 | struct sysinfo {
| ^~~~~~~
Fixes:
- http://autobuild.buildroot.org/results/16f44fb9dea72a7079e8e5517e760dd79d2724cc
The 'struct sysinfo' is defined in linux/sysinfo.h and sys/sysinfo.h,
while both must not be included at the same time. Stop including
linux/kernel.h that sometimes unconditionally includes sys/sysinfo.h and
causes the double definition for some reason. As we now include
linux/const.h directly, there's no other effective change.
Pull-request: #433
Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The function autodetect_object_types() tries to detect the type of
btrfs object passed. If it is an "inode" type (e.g. file) this function
returns the type as "inode". If it is a block device, it return it as
"block device".
However it doesn't handle the case where the object passed is a link
to a block device (which could be a valid btrfs device). For example
LVM/DM creates link to block devices. In this case it should return
the type as "block device".
This patch replace the lstat() call with a stat().
Reported-by: Boris Burkov <boris@bur.io>
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs-tools compile fails with mips, musl and 5.12+ headers.
The definition of __ALIGN_KERNEL has moved in 5.12+ kernels, so we
add an explicit include of const.h to pickup the macro:
| make: *** [Makefile:595: mkfs.btrfs] Error 1
| make: *** Waiting for unfinished jobs....
| libbtrfs.a(volumes.o): in function `dev_extent_search_start':
| /usr/src/debug/btrfs-tools/5.12.1-r0/git/kernel-shared/volumes.c:464: undefined reference to `__ALIGN_KERNEL'
| collect2: error: ld returned 1 exit status
This is safe for older kernel's as well, since the header still
exists, and is valid to include.
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
[remove invalid OE Upstream-status]
Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be>
Signed-off-by: David Sterba <dsterba@suse.com>
When some error happens when trying to search for parent subvolume
then parent_subvol will contain errno so don't try to free that
Crash backtrace would look like:
0 process_snapshot at cmds/receive.c:358
358 free(parent_subvol->path);
1 0x00005646898aaa67 in read_and_process_cmd at common/send-stream.c:348
2 btrfs_read_and_process_send_stream at common/send-stream.c:525
3 0x00005646898c9b8b in do_receive at cmds/receive.c:1113
4 cmd_receive at cmds/receive.c:1316
5 0x00005646898750b1 in cmd_execute at cmds/commands.h:125
6 main at btrfs.c:405
(gdb) p parent_subvol
$1 = (struct subvol_info *) 0xfffffffffffffffe
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Dāvis Mosāns <davispuh@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The feature pages share the contents with the manual page section 5 so
put the contents to separate files. Progress: 2/3.
Signed-off-by: David Sterba <dsterba@suse.com>
The feature pages share the contents with the manual page section 5 so
put the contents to separate files.
Signed-off-by: David Sterba <dsterba@suse.com>
This patch fixes potential bugs in fixup_extent_refs(). If
btrfs_start_transaction() fails in some way and returns error ptr, It
goes to out logic. But old code checkes whether it is null and it calls
commit. This patch solves the problem with make that it calls only if
ret is no error.
Issue: #409
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We are going to start creating multiple sets of global trees, which at
the moment are the free space tree, csum tree, and extent tree.
Generally we will assign these at block group creation time, but Dave
would like to be able to have them per-subvolume at some point, so
reserve a slot for that as well.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add the on disk definitions for the block group tree. This will be part
of the super block so we need to add the appropriate helpers to the
super block, as well as adding it to the backup roots.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We want to enable developers to test the extent tree v2 features as they
are added, add the ability to mkfs an extent tree v2 fs if we have
experimental enabled.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We could have multiple extent roots, so add a helper to mark all the
used space in the FS based on any extent roots we find, and then use
this extent io tree to fixup the block group accounting.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We may have multiple extent roots, so cycle through all of the extent
roots and populate the csum tree based on the content of every extent
root we have in the file system.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Use the global roots tree to find all of the csum roots in the system
and check all of them as appropriate.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Instead of checking the csum and extent tree individually, walk through
the global roots and validate them all. This will work properly if we
have extent tree v1 or extent tree v2.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Instead of looking for the first extent root or csum root in memory,
scan through the tree root and re-init any root items that match the
given objectid. This will allow reinit to work with both extent tree v1
and extent tree v2 global roots when using the --reinit option.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We are going to have multiples of these trees with extent tree v2, so
add a rb tree to track them based on their root key value. This works
for both v1 and v2, so we can remove the direct pointers to these roots
in our fs_info.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We're going to have multiple free space roots in the future, so access
it via a helper in most cases. We will address the remaining direct
accesses in future patches.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When we switch to multiple global trees we'll need to access the
appropriate extent root depending on the block group or possibly root.
To handle this, use a helper in most places and then the actual root in
places where it is required. We will whittle down the direct accessors
with future patches, but this does the bulk of the preparatory work.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filesystem du command fails and exits when it access file that has
permission denied. But it can continue the command except the files.
This patch prints error message just like /bin/du does and it continues
if it can.
Issue: #421
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
At this point it's not clear what exactly needs the indices and they
appear in the left column with list of index pages which is a bit
confusing.
Signed-off-by: David Sterba <dsterba@suse.com>
There was a regression in version 5.15 due to changes in how the ratio
for the various raid profiles got calculated and this in turn had a
cascading effect on unallocated/allocated space reported.
Add a test to ensure this regression doesn't occur again.
Issue: #422
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_mark_used_tree_blocks skips the reloc roots for some reason, which
causes problems because these blocks are in use, and we use this helper
to determine if the block accounting is correct with extent tree v2.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is going to be used for the extent tree v2 stuff more commonly, so
move it out so that it is accessible from everywhere that we need it.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We will walk all referenced tree blocks during check in order to avoid
writing over any referenced blocks during fsck. However in the future
we're going to need to do this for things like fixing block group
accounting with extent tree v2. This is because extent tree v2 will not
refer to all of the allocated blocks in the extent tree. Refactor the
code some to allow us to send down an arbitrary extent_io_tree so we can
use this helper for any case where we need to figure out where all the
used space is on an extent tree v2 file system.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have this helper sitting in extent-tree.c, but it's a repair
function. I'm going to need to make changes to this for extent-tree-v2
and would rather this live outside of the code we need to share with the
kernel.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Extent tree v2 no longer tracks all allocated blocks on the file system,
so we'll have to default to walking trees to generate metadata images.
There's an annoying drawback with walking trees with btrfs-image where
we'll happily copy multiple blocks over and over again if there are
snapshots. Fix this by keeping track of blocks we've seen and simply
skipping blocks that we've already queued up for copying.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
With extent tree v2 we will have per-block group checksums, so add a
helper to access the csum root and rename the fs_info csum_root to
_csum_root to catch all the places that are accessing it directly.
Convert everybody to use the helper except for internal things.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>