The scrub_read_file function is always on a branch,
which has (fd >= 0), so there is not need to judgment
the pasted in arg.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
o Return 0 to indicate success,
when detected errors were corrected during scrubbing.
P.s. This is also to facilitate scripting when return value
is to be checked.
o Warn the users if there are uncorrectable errors detected.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
mkfs can try to write outside of small devices. The zeroing code
doesn't test the device size and runs before mkfs tests for small
devices and exits.
Testers experienced this as small regular files being extended as mkfs
failed:
$ truncate -s 1m /tmp/some-file
$ strace -epwrite ./mkfs.btrfs /tmp/some-file
SMALL VOLUME: forcing mixed metadata/data groups
WARNING! - Btrfs v3.14.2 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using
pwrite(3, ..., 2097152, 0) = 2097152
pwrite(3, ..., 4096, 65536) = 4096
pwrite(3 ..., 2097152, 18446744073708503040) = -1 EINVAL (Invalid argument)
ERROR: failed to zero device '/tmp/some-file' - Input/output error
$ ls -lh /tmp/some-file
-rw-rw-r--. 1 zab zab 2.0M Jul 16 13:49 /tmp/some-file
This simple fix adds a helper that clamps a region to be zeroed to the
size of the device. It doesn't address the larger questions of whether
to modify the device before the size test or whether or zero regions
that have been trimmed.
Finally, the error handling mess after the zeroing calls is cleaned up.
zero_blocks() and its callers only return -errno.
Signed-off-by: Zach Brown <zab@zabbo.net>
Signed-off-by: David Sterba <dsterba@suse.cz>
mkfs cut of size '1024 * 1024 * 1024' to mark dev as small volume so to
force mixed group. Use a define for that.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
qgroup items are not deleted by btrfs when the underlying subvolume goes
away. As a result, btrfsck will print those as inconsistent. This can
clutter up the printout so we ignore them by default. They are still printed
if a full report (via --qgroup-report) is requested.
This patch and the ones it depends on (to do qgroup verification) can be
found at:
https://github.com/markfasheh/btrfs-progs-patches/tree/qgroup-verify
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
Recently we merge a memory leak fix, which fails xfstests/btrfs/012,
the cause is that it only frees @fs_devices but leaves it on the global
fs_uuid list, which cause a 'Segmentation fault' over running command
btrfs-convert. This fixes the problem.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Btrfs-progs superblock checksum check is somewhat too restricted for
super-recover, since current btrfs-progs will only read the 1st
superblock and if you need super-recover the 1st superblock is
possibly already damaged.
The fix is introducing super_recover parameter for
btrfs_read_dev_super() and callers to allow scan backup superblocks if
needed.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The qgroup verification code can trivially be extended to provide
extended information on the extents which a subvolume root
references. Along with qgroup-verify, I have found this tool to be
invaluable when tracking down extent references.
The patch adds a switch to the check subcommand '--subvol-extents'
which takes as args a single subvolume id. When run with the switch,
we'll print out each extent that the subvolume references. The extent
printout gives standard extent info you would expect along with
information on which other roots reference it.
Sample output follows - this is a few lines from a run on a subvolume
I've been testing qgroup changes on:
Print extent state for subvolume 281 on /dev/vdb2
UUID: 8203ca66-9858-4e3f-b447-5bbaacf79c02
Offset Len Root Refs Roots
12582912 20480 12 257 279 280 281 282 283 284 285 286 287 288 289
12603392 8192 12 257 279 280 281 282 283 284 285 286 287 288 289
12611584 12288 12 257 279 280 281 282 283 284 285 286 287 288 289
<snip a bunch of extents to show some variety>
124583936 16384 4 281 282 283 280
125075456 16384 4 280 281 282 283
126255104 16384 11 257 280 281 282 283 284 285 286 287 288 289
4763508736 4096 3 279 280 281
In case it wasn't clear, this applies on top of my qgroup verify patch:
"btrfs-progs: add quota group verify code"
A branch with all this can be found on github:
https://github.com/markfasheh/btrfs-progs-patches/tree/qgroup-verify
Please apply,
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
The btrfs-image requires at least 2 args to run,
one for the source dev/file, the other for the target dev/file.
This patch depends on patch:
btrfs-progs: move the check_argc_* functions into utils.c
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
To let the independent tools(e.g. btrfs-image, btrfs-convert, etc.)
share the convenience of check_argc_* functions, just move it into
utils.c.
Also add a new function "set_argv0" to set the correct tool name:
*btrfs-image*: too few arguments
The original btrfs* tools work as before.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
[moved argv0 and check_argc to utils.*]
Signed-off-by: David Sterba <dsterba@suse.cz>
Add more control to the balance behaviour.
Usage filter may not be finegrained enough and can lead to moving too
many chunks at once. Another example use is in connection with
drange+devid or vrange filters that allow to work with a specific chunk
or even with a chunk on a given device.
The limit filter applies last, the value of 0 means no limiting.
CC: Ilya Dryomov <idryomov@gmail.com>
CC: Hugo Mills <hugo@carfax.org.uk>
Signed-off-by: David Sterba <dsterba@suse.cz>
After the discussion in
http://thread.gmane.org/gmane.comp.file-systems.btrfs/36334
the 'X' will be mentioned in the manpage because new e2fsprogs/lsattr
will display it and represents the NOCOMPRESS bit of an inode.
Signed-off-by: David Sterba <dsterba@suse.cz>
Don't bother free the buffer if the malloc failed.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Steps to reproduce:
# mkfs.btrfs -f /dev/sda9 -b 2g
# mount /dev/sda9 /mnt
# dd if=/dev/zero of=/mnt/data bs=4k oflag=direct
# btrfs file df /mnt
Data, single: total=1.66GiB, used=1.66GiB
System, single: total=4.00MiB, used=16.00KiB
Metadata, single: total=200.00MiB, used=67.88MiB
For a filesystem without snapshots, 70M metadata, extent
checking eats max memory about 110M, this is a nightmare
for some system with low memory.
It is very likely that extent record can be freed quickly
for a filesystem without snapshots, improve this by trying
if it can free memory after adding data/tree backrefs.
This patch reduces max memory cost from 110M to 40M for
extents checking for the above case.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Steps to reproduce:
# mkfs.btrfs -f <dev1>
# btrfs-image <dev1> <image_file>
# btrfs-image -r -o <image_file> <dev2>
# btrfs check <dev2>
btrfs check output:
: read block failed check_tree_block
: Couldn't read tree root
: Couldn't open file system
The btrfs-image should not mess with the chunk tree under the old_restore way.
The new restore way was introduced by:
commit d6f7e3da0d
Btrfs-progs: make btrfs-image restore with a valid chunk tree V2
...
And the following commit enhanced the new restore on the valid chunk tree
building stuff:
commit ef2a8889ef
Btrfs-progs: make image restore with the original device offsets
...
But the second commit should not effect the old_restore way since the
old_restore way doesn't try to build a valid chunk tree at all.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
I found the following patch is insufficient.
===============================================================================
commit 6e6b32ddf58db54f714d0f263c2589f4859e8b5e
Author: Adam Buchbinder <abuchbinder@google.com>
Date: Fri Jun 13 16:43:56 2014 -0700
btrfs-progs: Fix a use-after-free in the volumes code.
===============================================================================
"btrfs filesystem show <dev>" with this patch causes segmentation fault
if "<dev>" is a not-mounted Btrfs filesystem.
===============================================================================
Label: none uuid: <cut here>
Total devices 1 FS bytes used 112.00KiB
devid 1 size 59.12GiB used 2.04GiB path /dev/sdd1
Segmentation fault (core dumped)
===============================================================================
It's due to double-free of fs_devices->list as follows.
===============================================================================
cmd_show
-> list_del(&fs_devices->list) # 1st one.
-> btrfs_close_devices(fs_devices)
-> list_del(&fs_devices->list) # <- 2nd one introduced at 6e6b32dd.
Double-free happens here.
===============================================================================
First list_del() can safely be removed because fs_devices->list will be
deleted by second one, soon.
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Cc: Adam Buchbinder <abuchbinder@google.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
If the malloc above fails, the btrfs-image will exit directly
without any error messages.
Now just return the ENOMEM errno and let the caller prompt the
error message.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Handle the malloc failure for dump_worker in the same way as
the restore worker.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The chunk-recover.c/BTRFS_NUM_MIRRORS in the userspace means
the same thing as ctree.h/BTRFS_MAX_MIRRORS in the kernelspace,
so to stay consistent with the kernelspace, just make this movement
in the userspace:
chunk-recover.c/BTRFS_NUM_MIRRORS
===>
ctree.h/BTRFS_MAX_MIRRORS
This provides convenience for future use.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The chattr(1) manpage suffers from the same problems mount(1)
had: many options listed, not kept up to date for various
filesystems.
I've submitted a manpage update for chattr(1) which says to
refer to filesystem-specific manpages for supported attributes;
this patch updates btrfs(5) to list the attributes supported
by btrfs.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
[added some asciidoc markups, adjusted formatting]
Signed-off-by: David Sterba <dsterba@suse.cz>
Btrfs has global block reservation, so even mkfs.btrfs can execute
without problem, there is still a possibility that the filesystem can't
be mounted.
For example when mkfs.btrfs on a 8M file on x86_64 platform, kernel will
refuse to mount due to ENOSPC, since system block group takes 4M and
mixed block group takes 4M, and global block reservation will takes all
the 4M from mixed block group, which makes btrfs unable to create uuid
tree.
This patch will add minimum device size check before actually mkfs.
The minimum size calculation uses a simplified one:
minimum_size_for_each_dev = 2 * (system block group + global block rsv)
and global block rsv = leafsize << 10
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
There's no reason to assume that the bad key order is in a leaf block,
so accessing level 0 of the path is going to be an error if it's actually
a node block that's bad.
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Hugo Mills <hugo@carfax.org.uk>
Signed-off-by: David Sterba <dsterba@suse.cz>
We need test to verify extent tree rebuilding work, this test
create a strange filesystem with some snapshots, destroy
extent root node, and run fsck with "--init-extent-tree".
Since this tests need btrfs internal tool(btrfs-corrupt-block),so
i add this test into btrfs-progs.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When btrfs-image failed to create an image, the invalid output file
had better be deleted to prevent being used mistakenly in the future.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
For btrfs-image,
dump may not come with option '-o'
-r may not come with option '-c', '-s', '-w', dev_cnt != 1
-m may not come with dev_cnt < 2
All of the above should be regarded as invalid combinations,
and the usage will show up.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The btrfs-image support multiple devices with -m specified.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The value of variable leaf in while loop don't have to be set
for every round. Just move it outside.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Add some missing options, also improve some confusing
expressions.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
These two options are used for same purpose, but they are exclusive with
each other. Make it clear to common users.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Previously if restore could not read users specified fs root, it would
output following message:
Error reading root
With this patch, it will output message like:
Fail to read root 1000: No such file or directory
Signed-off-byr Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Steps to reproduce:
# mkfs.btrfs -f /dev/sda9
# mount /dev/sda9 /mnt
# dd if=/dev/zero of=/mnt/data bs=1M count=1
# btrfs restore -r /dev/sda9 -r 2 -o /tmp
If users don't input a valid fs/file root objectid, btrfs restore still
continue and don't restore anything, this is unfriendly, we could
check it firstly.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When things go wrong for lzo-compressed btrfs, feeding lzo1x_decompress_safe()
with corrupt data during restore can lead to crashes. Reduce the risk by adding
a check on the input length.
Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net>
Signed-off-by: David Sterba <dsterba@suse.cz>
It's 32 bits as defined in ctree.h, but the struct had it as 64 bits.
Found using MemorySanitizer.
Signed-off-by: Adam Buchbinder <abuchbinder@google.com>
Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
On my system, this brings the FS conversion test suite's runtime from over
ten seconds down to under two.
Thanks to Julien Muchembled for the suggestion.
Signed-off-by: Adam Buchbinder <abuchbinder@google.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When we write a btrfs to full and then we have no space left for
free space cache.
The btrfs check will output msg as follows which is noise indeed:
# free space inode generation (0) did not match
free space cache generation (XXX)
When the free space cache is not written out normally,
the free inode generation will be 0.
In this condition, no noise should be outputed.
Also, check 0-sized inode eariler together with 0-generationed inode.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When using parse_size(), even non-numeric value is passed, it will only
give error message "ERROR: size value is empty", which is quite
confusing for end users.
This patch will introduce more meaningful error message for the
following new cases
1) Invalid size string (non-numeric string)
2) Minus size value (like "-1K")
Also this patch will take full use of endptr returned by strtoll() to
reduce unneeded loop.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This depends on the kernel patch:
[PATCH] btrfs:replace EINVAL with EOPNOTSUPP for dev_replace
This catches the EOPNOTSUPP and output msg that says dev_replace raid56
is not currently supported. Note that the msg will only be shown when
run dev_replace not in background.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When run chunk-recover on a health btrfs(data profile raid0, with
plenty of data), the program has a chance to abort on the number
of mirrors of an extent.
According to the kernel code, the max mirror number of an extent
is 3 not 2:
ctree.h: BTRFS_MAX_MIRRORS 3
chunk-recover.c : BTRFS_NUM_MIRRORS 2
just change BTRFS_NUM_MIRRORS to 3, and everything goes well.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When deal with the p & q stripes for data profile raid6, chunk-recover
forgets to insert them into the chunk record. Just insert them back
freely.
Also wrap the insert procedure into a new function, fill_chunk_up.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
These use the system's mke2fs, and don't require loop devices
or root privileges.
They don't pick up anything with the default flags right now,
but they do pick up some sanitizer issues when the tools are
compiled with any of -fsanitize={address,memory,thread}.
Signed-off-by: Adam Buchbinder <abuchbinder@google.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Current btrfs-debug-tree output chunk/block group type as numbers,
which makes it hard to understand and need to check the source to
understand the meaning.
This patch will convert numeric type output to human readable strings.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Current btrfs-debug-tree outputs extent flags as numbers,
which makes it hard to understand and need to check the source to
understand the meaning.
This patch will convert numberic flags output to human readable strings.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
btrfs resize now support size unit parse of k/m/g/t/p/e in kernel space,
adopt the changes in userspace manpage.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When a struct btrfs_fs_devices was being torn down by
btrfs_close_devices(), there was an invalidated pointer in the global
list fs_uuids which still pointed to it; if a device was closed and
then reopened (which btrfs-convert does), freed memory would be
accessed.
This was found using ThreadSanitizer (pretty much doing what
AddressSanitizer would, but not exiting after the first failure).
To reproduce, build with -fsanitize=thread and run 'make test'.
Representative output is below.
This change makes the current tests TSan-clean.
WARNING: ThreadSanitizer: heap-use-after-free (pid=29161)
Read of size 8 at 0x7d180000eee0 by main thread:
#0 memcmp ??:0
#1 find_fsid .../volumes.c:81
#2 device_list_add .../volumes.c:95
#3 btrfs_scan_one_device .../volumes.c:259
#4 btrfs_scan_fs_devices .../disk-io.c:1002
#5 __open_ctree_fd .../disk-io.c:1090
#6 open_ctree_fd .../disk-io.c:1191
#7 do_convert .../btrfs-convert.c:2317
#8 main .../btrfs-convert.c:2745
Previous write of size 8 at 0x7d180000eee0 by main thread:
#0 free ??:0
#1 btrfs_close_devices .../volumes.c:191
#2 close_ctree .../disk-io.c:1401
#3 do_convert .../btrfs-convert.c:2300
#4 main .../btrfs-convert.c:2745
Location is heap block of size 96 at 0x7d180000eee0 allocated by main thread:
#0 calloc ??:0 (exe+0x00000002acc6)
#1 device_list_add .../volumes.c:97
#2 btrfs_scan_one_device .../volumes.c:259
#3 btrfs_scan_fs_devices .../disk-io.c:1002
#4 __open_ctree_fd .../disk-io.c:1090
#5 open_ctree_fd .../disk-io.c:1191
#6 do_convert .../btrfs-convert.c:2256
#7 main .../btrfs-convert.c:2745
Signed-off-by: Adam Buchbinder <abuchbinder@google.com>
Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When running with UndefinedBehaviorSanitizer, the tests produce the following
error:
radix-tree.c:836:30: runtime error: shift exponent 18446744073709551613
is too large for 64-bit type 'unsigned long'
(That's a negative shift exponent represented as an unsigned long.)
Even though the value is discarded in those cases, it's still undefined
behavior; see the C99 standard, section 6.5.7, paragraph three: "If the
value of the right operand is negative [...] the behavior is undefined."
Signed-off-by: Adam Buchbinder <abuchbinder@google.com>
Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>