Previously if restore could not read users specified fs root, it would
output following message:
Error reading root
With this patch, it will output message like:
Fail to read root 1000: No such file or directory
Signed-off-byr Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Steps to reproduce:
# mkfs.btrfs -f /dev/sda9
# mount /dev/sda9 /mnt
# dd if=/dev/zero of=/mnt/data bs=1M count=1
# btrfs restore -r /dev/sda9 -r 2 -o /tmp
If users don't input a valid fs/file root objectid, btrfs restore still
continue and don't restore anything, this is unfriendly, we could
check it firstly.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When things go wrong for lzo-compressed btrfs, feeding lzo1x_decompress_safe()
with corrupt data during restore can lead to crashes. Reduce the risk by adding
a check on the input length.
Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net>
Signed-off-by: David Sterba <dsterba@suse.cz>
It's 32 bits as defined in ctree.h, but the struct had it as 64 bits.
Found using MemorySanitizer.
Signed-off-by: Adam Buchbinder <abuchbinder@google.com>
Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
On my system, this brings the FS conversion test suite's runtime from over
ten seconds down to under two.
Thanks to Julien Muchembled for the suggestion.
Signed-off-by: Adam Buchbinder <abuchbinder@google.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When we write a btrfs to full and then we have no space left for
free space cache.
The btrfs check will output msg as follows which is noise indeed:
# free space inode generation (0) did not match
free space cache generation (XXX)
When the free space cache is not written out normally,
the free inode generation will be 0.
In this condition, no noise should be outputed.
Also, check 0-sized inode eariler together with 0-generationed inode.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When using parse_size(), even non-numeric value is passed, it will only
give error message "ERROR: size value is empty", which is quite
confusing for end users.
This patch will introduce more meaningful error message for the
following new cases
1) Invalid size string (non-numeric string)
2) Minus size value (like "-1K")
Also this patch will take full use of endptr returned by strtoll() to
reduce unneeded loop.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This depends on the kernel patch:
[PATCH] btrfs:replace EINVAL with EOPNOTSUPP for dev_replace
This catches the EOPNOTSUPP and output msg that says dev_replace raid56
is not currently supported. Note that the msg will only be shown when
run dev_replace not in background.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When run chunk-recover on a health btrfs(data profile raid0, with
plenty of data), the program has a chance to abort on the number
of mirrors of an extent.
According to the kernel code, the max mirror number of an extent
is 3 not 2:
ctree.h: BTRFS_MAX_MIRRORS 3
chunk-recover.c : BTRFS_NUM_MIRRORS 2
just change BTRFS_NUM_MIRRORS to 3, and everything goes well.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When deal with the p & q stripes for data profile raid6, chunk-recover
forgets to insert them into the chunk record. Just insert them back
freely.
Also wrap the insert procedure into a new function, fill_chunk_up.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
These use the system's mke2fs, and don't require loop devices
or root privileges.
They don't pick up anything with the default flags right now,
but they do pick up some sanitizer issues when the tools are
compiled with any of -fsanitize={address,memory,thread}.
Signed-off-by: Adam Buchbinder <abuchbinder@google.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Current btrfs-debug-tree output chunk/block group type as numbers,
which makes it hard to understand and need to check the source to
understand the meaning.
This patch will convert numeric type output to human readable strings.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Current btrfs-debug-tree outputs extent flags as numbers,
which makes it hard to understand and need to check the source to
understand the meaning.
This patch will convert numberic flags output to human readable strings.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
btrfs resize now support size unit parse of k/m/g/t/p/e in kernel space,
adopt the changes in userspace manpage.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When a struct btrfs_fs_devices was being torn down by
btrfs_close_devices(), there was an invalidated pointer in the global
list fs_uuids which still pointed to it; if a device was closed and
then reopened (which btrfs-convert does), freed memory would be
accessed.
This was found using ThreadSanitizer (pretty much doing what
AddressSanitizer would, but not exiting after the first failure).
To reproduce, build with -fsanitize=thread and run 'make test'.
Representative output is below.
This change makes the current tests TSan-clean.
WARNING: ThreadSanitizer: heap-use-after-free (pid=29161)
Read of size 8 at 0x7d180000eee0 by main thread:
#0 memcmp ??:0
#1 find_fsid .../volumes.c:81
#2 device_list_add .../volumes.c:95
#3 btrfs_scan_one_device .../volumes.c:259
#4 btrfs_scan_fs_devices .../disk-io.c:1002
#5 __open_ctree_fd .../disk-io.c:1090
#6 open_ctree_fd .../disk-io.c:1191
#7 do_convert .../btrfs-convert.c:2317
#8 main .../btrfs-convert.c:2745
Previous write of size 8 at 0x7d180000eee0 by main thread:
#0 free ??:0
#1 btrfs_close_devices .../volumes.c:191
#2 close_ctree .../disk-io.c:1401
#3 do_convert .../btrfs-convert.c:2300
#4 main .../btrfs-convert.c:2745
Location is heap block of size 96 at 0x7d180000eee0 allocated by main thread:
#0 calloc ??:0 (exe+0x00000002acc6)
#1 device_list_add .../volumes.c:97
#2 btrfs_scan_one_device .../volumes.c:259
#3 btrfs_scan_fs_devices .../disk-io.c:1002
#4 __open_ctree_fd .../disk-io.c:1090
#5 open_ctree_fd .../disk-io.c:1191
#6 do_convert .../btrfs-convert.c:2256
#7 main .../btrfs-convert.c:2745
Signed-off-by: Adam Buchbinder <abuchbinder@google.com>
Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When running with UndefinedBehaviorSanitizer, the tests produce the following
error:
radix-tree.c:836:30: runtime error: shift exponent 18446744073709551613
is too large for 64-bit type 'unsigned long'
(That's a negative shift exponent represented as an unsigned long.)
Even though the value is discarded in those cases, it's still undefined
behavior; see the C99 standard, section 6.5.7, paragraph three: "If the
value of the right operand is negative [...] the behavior is undefined."
Signed-off-by: Adam Buchbinder <abuchbinder@google.com>
Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This is a straight cut and paste from the util-linux
mount manpage into btrfs-mount.5
It's pretty much impossible for util-linux to keep up
with every filesystem out there, and Karel has more than
once expressed a wish that mount options move into fs-specific
manpages.
So, here we go.
The way btrfs asciidoc is generated, there's not a trivial
way to have both btrfs(5) and btrfs(8) so I named it btrfs-mount(5)
for now. A bit ick and I'm open to suggestions.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The 'num_unordered' will be recounted after 'goto out',
just remove it.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
mount(8) will canonicalize pathnames before passing them to the kernel.
Links to e.g. /dev/sda will be resolved to /dev/sda. Links to /dev/dm-#
will be resolved using the name of the device mapper table to
/dev/mapper/<name>.
Btrfs will use whatever name the user passes to it, regardless of whether
it is canonical or not. That means that if a 'btrfs device ready' is
issued on any device node pointing to the original device, it will adopt
the new name instead of the name that was used during mount.
Mounting using /dev/sdb2 will result in df:
/dev/sdb2 209715200 39328 207577088 1% /mnt
lrwxrwxrwx 1 root root 4 Jun 4 13:36 /dev/whatever-i-like -> sdb2
/dev/whatever-i-like 209715200 39328 207577088 1% /mnt
Likewise, mounting with /dev/mapper/whatever and using /dev/dm-0 with a
btrfs device command results in df showing /dev/dm-0. This can happen with
multipath devices with friendly names enabled and doing something like
'partprobe' which (at least with our version) ends up issuing a 'change'
uevent on the sysfs node. That *always* uses the dm-# name, and we get
confused users.
This patch does the same canonicalization of the paths that mount does
so that we don't end up having inconsistent names reported by ->show_devices
later.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
[use PATH_MAX in canonicalize_dm_name]
Signed-off-by: David Sterba <dsterba@suse.cz>
This is the most important patch ever. ;)
I found this to be less aesthetically pleasing than it was before:
[CC] btrfstune.o
Making all in Documentation
ASCIIDOC btrfs-convert.xml
[LD] btrfstune
XMLTO btrfs-convert.8
[CC] btrfs-show-super.o
GZIP btrfs-convert.8.gz
[LD] btrfs-show-super
ASCIIDOC btrfs-debug-tree.xml
XMLTO btrfs-debug-tree.8
so I shortened the pretty-text to match what we had before.
Also, make clean "quiet" like it is in the top dir.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
[Changed to ASCII and XMLTO]
Signed-off-by: David Sterba <dsterba@suse.cz>
Man page for 'btrfs-balance' mentioned <filters> but does not explain
them, which make end users hard to use '-d', '-m' or '-s options.
This patch will use the explanations from
https://btrfs.wiki.kernel.org/index.php/Balance_Filters to enrich the
man page.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Add '-f' option for btrfs-show-super manpage,
This option implies that sys chunk array and backup roots info
will show up.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This patch adds an option '--check-data-csum' to verify data checksums.
fsck won't check data csums unless users specify this option explictly.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When encountering a corrupted fs root node, fsck hit following message:
Check tree block failed, want=29360128, have=0
Check tree block failed, want=29360128, have=0
Check tree block failed, want=29360128, have=0
Check tree block failed, want=29360128, have=0
Check tree block failed, want=29360128, have=0
read block failed check_tree_block
Checking filesystem on /dev/sda9
UUID: 0d295d80-bae2-45f2-a106-120dbfd0e173
checking extents
Segmentation fault (core dumped)
This is because in btrfs_setup_all_roots(), we check
btrfs_read_fs_root() return value by verifing whether it is
NULL pointer, this is wrong since btrfs_read_fs_root() return
PTR_ERR(ret), fix it.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
If the csum of one stripe is not able to judge the order of two
device extents, the stripe may happen to belong to the device extent
that is already kicked out as ordered.
Take this condition into consideration, don't report failure and
give more tries with the stripes following.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When count the number of unordered device extents in chunk-recover,
the counter should be reinitialized to be used.
Also, introduce a new function for the counting job.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This patch adds functionality (in qgroup-verify.c) to compute bytecounts in
subvolume quota groups. The original groups are read in and stored in memory
so that after we compute our own bytecounts, we can compare them with those
on disk. A print function is provided to do this comparison and show the
results on the console.
A 'qgroup check' pass is added to btrfsck. If any subvolume quota groups
differ from what we compute, the differences for them are printed. We also
provide an option '--qgroup-report' which will run only the quota check code
and print a report on all quota groups. Other than making it possible to
verify that our qgroup changes work correctly, this mode can also be used in
xfstests for automated checking after qgroup tests.
This patch does not address the following:
- compressed counts are identical to non compressed, because kernel doesn't
make the distinction yet. Adding the code to verify compressed counts
shouldn't be hard at all though once kernel can do this.
- It is only concerned with subvolume quota groups (like most of
btrfs-progs).
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
Due to either bugs in send (kernel) that generate a command against
a wrong path for example, or transient errors on the receiving side,
we stopped processing the send stream immediately and exited with
an error.
It's often desirable to continue processing the send stream even if an
error happens while processing a single command from the send stream.
This change just adds a --max-errors <N> parameter, whose default value
is 1 (preserving current behaviour), that allows to tolerate N errors
before stopping. A value of 0 means to never stop no matter how many
errors we get into while processing the send stream. Regardless of its
value, errors are always printed to stderr when they happen, just like
before this change.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
A mdrestore_struct was being written to without its mutex being held.
This race was found with ThreadSanitizer; the relevant part of the report
looks like this:
WARNING: ThreadSanitizer: data race (pid=18828)
Write of size 8 at 0x7fffffc3d088 by main thread:
#0 build_chunk_tree .../btrfs-progs/btrfs-image.c:2233
#1 __restore_metadump .../btrfs-progs/btrfs-image.c:2294
#2 restore_metadump .../btrfs-progs/btrfs-image.c:2345
#3 main .../btrfs-progs/btrfs-image.c:2545
Previous read of size 8 at 0x7fffffc3d088 by thread T1 (mutexes: write M0):
#0 restore_worker .../btrfs-progs/btrfs-image.c:1636
Location is stack of main thread.
Mutex M0 created at:
#0 pthread_mutex_init ??:0
#1 mdrestore_init .../btrfs-progs/btrfs-image.c:1766
#2 __restore_metadump .../btrfs-progs/btrfs-image.c:2286
#3 restore_metadump .../btrfs-progs/btrfs-image.c:2345
#4 main .../btrfs-progs/btrfs-image.c:2545
Thread T1 (tid=18830, running) created by main thread at:
#0 pthread_create ??:0
#1 mdrestore_init .../btrfs-progs/btrfs-image.c:1784
#2 __restore_metadump .../btrfs-progs/btrfs-image.c:2286
#3 restore_metadump .../btrfs-progs/btrfs-image.c:2345
#4 main .../btrfs-progs/btrfs-image.c:2545
Signed-off-by: Adam Buchbinder <abuchbinder@google.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Allow the specification of the filesystem UUID at mkfs time.
Non-unique unique IDs are rejected. This includes attempting
to re-mkfs with the same UUID; if you really want to do that,
you can mkfs with a new UUID, then re-mkfs with the one you
wanted.
(Implemented only for mkfs.btrfs, not btrfs-convert).
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
[converted help to asciidoc]
Signed-off-by: David Sterba <dsterba@suse.cz>
gcc 4.9.0 gives a warning: format ‘%d’ expects argument of type ‘int’,
but argument 2 has type ‘u64’
Using %llu and casting to unsigned long long (same as bytenr) fixes this.
Signed-off-by: Christian Hesse <mail@eworm.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
When I made all the btrfs-foo.c targets generic, I somehow
managed to break the libs definition for btrfs-fragments by
dropping the "s" off the end.
Fix that, although apparently nobody is building this tool. :)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The original print_dev_item() only prints device id,total bytes and
bytes used.
When it comes to debug duplicated device id, dev uuid
is needed to distinguish different devices since device id is not
reliable.
(Although the current dev replace implement will reuse the dev uuid,
so not really helpful)
This patch adds dev uuid output for print_dev_item().
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The btrfstune -S option accepts a positive value to enable seeding,
and a zero to disable seeding, negtive is not allowed.
Add "positive, zero, negative" sentences to btrfstune manpage.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Fedora had a bug where a poor user thought that --alloc-start
meant that the filesystem would be created at that offset into
the device, rather than just starting allocations at that offset.
A subtle difference, but worth clarifying, because the manpage
is misleading on this point.
The original commit log for this option says:
Add mkfs.btrfs -A offset to control allocation start on devices
This is a utility option for the resizer, it makes sure to allocate
at offset bytes in the disk or higher. It ensures the resizer will have
something to move when testing it.
so allude to that intended use in the manpage.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
[converted to asciidoc]
Signed-off-by: David Sterba <dsterba@suse.cz>
The manpage of btrfsck.8 is supposed to link to btrfs-check.8 .
Reported-by: WorMzy Tykashi <wormzy.tykashi@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
man pages for btrfs-progs are compressed by gzip by default. In Makefile
the variable GZIP is use, this evaluates to 'gzip gzip' on my system.
>From man gzip:
> The environment variable GZIP can hold a set of default options for
> gzip. These options are interpreted first and can be overwritten by
> explicit command line parameters.
So using any other variable name fixes this.
Signed-off-by: Christian Hesse <mail@eworm.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
Very often while debugging filesystems with many subvolumes and/or
snapshots, specially when they are large, I want to see only the
content of one of the trees. So this change just adds an option
to btrfs-debug-tree to allow to specify the id of the tree we're
interesting in dumping to stdout.
Example: btrfs-debug-tree -t 257 /dev/sdc
Will only dump the tree of the first snapshot or subvolume that was
created.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
If we point btrfs-show-super at a not-btrfs-device and
try to print all superblocks, bad things are apt to happen:
superblock: bytenr=274877906944, device=/dev/sdc2
---------------------------------------------------------
btrfs-show-super: ctree.h:1984: btrfs_super_csum_size: \
Assertion `!(t >= (sizeof(btrfs_csum_sizes) / sizeof((btrfs_csum_sizes)[0])))' failed.
csum 0xAborted
Don't try to print superblocks that don't look like superblocks,
and add an "-f" (force) option to try anyway, if the user
really wants to give it a shot.
Fix some spelling & capitalization while we're at it.
The manpage says that if any problem happens, 1 will
be returned, but that's already not true today LOL, so
I didn't bother to make it true when we detect bad
sb magic, either...
I figure it's worth continuing and trying all superblocks
in case just one has a corrupt magic.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
[renamed -f to -F due to clash with existing option, converted
relevant docs to asciidoc]
Signed-off-by: David Sterba <dsterba@suse.cz>
The btrfs-rescue accepts exactly one arg for both
chunk-recover & super-recover, use check_argc_exact clearly.
Signed-off-by: David Sterba <dsterba@suse.cz>
Add sys chunk array and backup roots info if the new option '-f'
if specified.
This may be useful for debugging sys_chunk related issues.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
find_good_parent() uses assert to deal with the problem that clone
source's parent can't be found.
But in fact the assert is somewhat overkilled since subvol_uuid_search()
has enough error messages for debug and caller of find_good_parent() can
handle the problems in find_good_parent(), so the assert can be removed
without any problem.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When 'btrfs replace status' encounters an unknown dev replace status, it
will cause an assert, which is somewhat overkilled and can be replaced
with a normal error message.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
open_path_or_dev_mnt() is used to on *mounted* btrfs device or mount
point, when a unmounted btrfs device is passed, errno is set to EINVAL to
info the caller.
If ignore the errno and just print "ERROR: can't access '%s'", end users
will get confused.
This patch will add check for open_path_or_dev_mnt() caller and print
more meaningful error message when a unmounted btrfs device path is
given.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>