Steps to reproduce:
# mkfs.btrfs -f /dev/sda9 -b 2g
# mount /dev/sda9 /mnt
# dd if=/dev/zero of=/mnt/data bs=4k oflag=direct
# btrfs file df /mnt
Data, single: total=1.66GiB, used=1.66GiB
System, single: total=4.00MiB, used=16.00KiB
Metadata, single: total=200.00MiB, used=67.88MiB
For a filesystem without snapshots, 70M metadata, extent
checking eats max memory about 110M, this is a nightmare
for some system with low memory.
It is very likely that extent record can be freed quickly
for a filesystem without snapshots, improve this by trying
if it can free memory after adding data/tree backrefs.
This patch reduces max memory cost from 110M to 40M for
extents checking for the above case.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Steps to reproduce:
# mkfs.btrfs -f <dev1>
# btrfs-image <dev1> <image_file>
# btrfs-image -r -o <image_file> <dev2>
# btrfs check <dev2>
btrfs check output:
: read block failed check_tree_block
: Couldn't read tree root
: Couldn't open file system
The btrfs-image should not mess with the chunk tree under the old_restore way.
The new restore way was introduced by:
commit d6f7e3da0d
Btrfs-progs: make btrfs-image restore with a valid chunk tree V2
...
And the following commit enhanced the new restore on the valid chunk tree
building stuff:
commit ef2a8889ef
Btrfs-progs: make image restore with the original device offsets
...
But the second commit should not effect the old_restore way since the
old_restore way doesn't try to build a valid chunk tree at all.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
I found the following patch is insufficient.
===============================================================================
commit 6e6b32ddf58db54f714d0f263c2589f4859e8b5e
Author: Adam Buchbinder <abuchbinder@google.com>
Date: Fri Jun 13 16:43:56 2014 -0700
btrfs-progs: Fix a use-after-free in the volumes code.
===============================================================================
"btrfs filesystem show <dev>" with this patch causes segmentation fault
if "<dev>" is a not-mounted Btrfs filesystem.
===============================================================================
Label: none uuid: <cut here>
Total devices 1 FS bytes used 112.00KiB
devid 1 size 59.12GiB used 2.04GiB path /dev/sdd1
Segmentation fault (core dumped)
===============================================================================
It's due to double-free of fs_devices->list as follows.
===============================================================================
cmd_show
-> list_del(&fs_devices->list) # 1st one.
-> btrfs_close_devices(fs_devices)
-> list_del(&fs_devices->list) # <- 2nd one introduced at 6e6b32dd.
Double-free happens here.
===============================================================================
First list_del() can safely be removed because fs_devices->list will be
deleted by second one, soon.
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Cc: Adam Buchbinder <abuchbinder@google.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
If the malloc above fails, the btrfs-image will exit directly
without any error messages.
Now just return the ENOMEM errno and let the caller prompt the
error message.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Handle the malloc failure for dump_worker in the same way as
the restore worker.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The chunk-recover.c/BTRFS_NUM_MIRRORS in the userspace means
the same thing as ctree.h/BTRFS_MAX_MIRRORS in the kernelspace,
so to stay consistent with the kernelspace, just make this movement
in the userspace:
chunk-recover.c/BTRFS_NUM_MIRRORS
===>
ctree.h/BTRFS_MAX_MIRRORS
This provides convenience for future use.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The chattr(1) manpage suffers from the same problems mount(1)
had: many options listed, not kept up to date for various
filesystems.
I've submitted a manpage update for chattr(1) which says to
refer to filesystem-specific manpages for supported attributes;
this patch updates btrfs(5) to list the attributes supported
by btrfs.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
[added some asciidoc markups, adjusted formatting]
Signed-off-by: David Sterba <dsterba@suse.cz>
Btrfs has global block reservation, so even mkfs.btrfs can execute
without problem, there is still a possibility that the filesystem can't
be mounted.
For example when mkfs.btrfs on a 8M file on x86_64 platform, kernel will
refuse to mount due to ENOSPC, since system block group takes 4M and
mixed block group takes 4M, and global block reservation will takes all
the 4M from mixed block group, which makes btrfs unable to create uuid
tree.
This patch will add minimum device size check before actually mkfs.
The minimum size calculation uses a simplified one:
minimum_size_for_each_dev = 2 * (system block group + global block rsv)
and global block rsv = leafsize << 10
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
There's no reason to assume that the bad key order is in a leaf block,
so accessing level 0 of the path is going to be an error if it's actually
a node block that's bad.
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Hugo Mills <hugo@carfax.org.uk>
Signed-off-by: David Sterba <dsterba@suse.cz>
We need test to verify extent tree rebuilding work, this test
create a strange filesystem with some snapshots, destroy
extent root node, and run fsck with "--init-extent-tree".
Since this tests need btrfs internal tool(btrfs-corrupt-block),so
i add this test into btrfs-progs.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When btrfs-image failed to create an image, the invalid output file
had better be deleted to prevent being used mistakenly in the future.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
For btrfs-image,
dump may not come with option '-o'
-r may not come with option '-c', '-s', '-w', dev_cnt != 1
-m may not come with dev_cnt < 2
All of the above should be regarded as invalid combinations,
and the usage will show up.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The btrfs-image support multiple devices with -m specified.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The value of variable leaf in while loop don't have to be set
for every round. Just move it outside.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Add some missing options, also improve some confusing
expressions.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
These two options are used for same purpose, but they are exclusive with
each other. Make it clear to common users.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Previously if restore could not read users specified fs root, it would
output following message:
Error reading root
With this patch, it will output message like:
Fail to read root 1000: No such file or directory
Signed-off-byr Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Steps to reproduce:
# mkfs.btrfs -f /dev/sda9
# mount /dev/sda9 /mnt
# dd if=/dev/zero of=/mnt/data bs=1M count=1
# btrfs restore -r /dev/sda9 -r 2 -o /tmp
If users don't input a valid fs/file root objectid, btrfs restore still
continue and don't restore anything, this is unfriendly, we could
check it firstly.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When things go wrong for lzo-compressed btrfs, feeding lzo1x_decompress_safe()
with corrupt data during restore can lead to crashes. Reduce the risk by adding
a check on the input length.
Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net>
Signed-off-by: David Sterba <dsterba@suse.cz>
It's 32 bits as defined in ctree.h, but the struct had it as 64 bits.
Found using MemorySanitizer.
Signed-off-by: Adam Buchbinder <abuchbinder@google.com>
Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
On my system, this brings the FS conversion test suite's runtime from over
ten seconds down to under two.
Thanks to Julien Muchembled for the suggestion.
Signed-off-by: Adam Buchbinder <abuchbinder@google.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When we write a btrfs to full and then we have no space left for
free space cache.
The btrfs check will output msg as follows which is noise indeed:
# free space inode generation (0) did not match
free space cache generation (XXX)
When the free space cache is not written out normally,
the free inode generation will be 0.
In this condition, no noise should be outputed.
Also, check 0-sized inode eariler together with 0-generationed inode.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When using parse_size(), even non-numeric value is passed, it will only
give error message "ERROR: size value is empty", which is quite
confusing for end users.
This patch will introduce more meaningful error message for the
following new cases
1) Invalid size string (non-numeric string)
2) Minus size value (like "-1K")
Also this patch will take full use of endptr returned by strtoll() to
reduce unneeded loop.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This depends on the kernel patch:
[PATCH] btrfs:replace EINVAL with EOPNOTSUPP for dev_replace
This catches the EOPNOTSUPP and output msg that says dev_replace raid56
is not currently supported. Note that the msg will only be shown when
run dev_replace not in background.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When run chunk-recover on a health btrfs(data profile raid0, with
plenty of data), the program has a chance to abort on the number
of mirrors of an extent.
According to the kernel code, the max mirror number of an extent
is 3 not 2:
ctree.h: BTRFS_MAX_MIRRORS 3
chunk-recover.c : BTRFS_NUM_MIRRORS 2
just change BTRFS_NUM_MIRRORS to 3, and everything goes well.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When deal with the p & q stripes for data profile raid6, chunk-recover
forgets to insert them into the chunk record. Just insert them back
freely.
Also wrap the insert procedure into a new function, fill_chunk_up.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
These use the system's mke2fs, and don't require loop devices
or root privileges.
They don't pick up anything with the default flags right now,
but they do pick up some sanitizer issues when the tools are
compiled with any of -fsanitize={address,memory,thread}.
Signed-off-by: Adam Buchbinder <abuchbinder@google.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Current btrfs-debug-tree output chunk/block group type as numbers,
which makes it hard to understand and need to check the source to
understand the meaning.
This patch will convert numeric type output to human readable strings.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Current btrfs-debug-tree outputs extent flags as numbers,
which makes it hard to understand and need to check the source to
understand the meaning.
This patch will convert numberic flags output to human readable strings.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
btrfs resize now support size unit parse of k/m/g/t/p/e in kernel space,
adopt the changes in userspace manpage.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When a struct btrfs_fs_devices was being torn down by
btrfs_close_devices(), there was an invalidated pointer in the global
list fs_uuids which still pointed to it; if a device was closed and
then reopened (which btrfs-convert does), freed memory would be
accessed.
This was found using ThreadSanitizer (pretty much doing what
AddressSanitizer would, but not exiting after the first failure).
To reproduce, build with -fsanitize=thread and run 'make test'.
Representative output is below.
This change makes the current tests TSan-clean.
WARNING: ThreadSanitizer: heap-use-after-free (pid=29161)
Read of size 8 at 0x7d180000eee0 by main thread:
#0 memcmp ??:0
#1 find_fsid .../volumes.c:81
#2 device_list_add .../volumes.c:95
#3 btrfs_scan_one_device .../volumes.c:259
#4 btrfs_scan_fs_devices .../disk-io.c:1002
#5 __open_ctree_fd .../disk-io.c:1090
#6 open_ctree_fd .../disk-io.c:1191
#7 do_convert .../btrfs-convert.c:2317
#8 main .../btrfs-convert.c:2745
Previous write of size 8 at 0x7d180000eee0 by main thread:
#0 free ??:0
#1 btrfs_close_devices .../volumes.c:191
#2 close_ctree .../disk-io.c:1401
#3 do_convert .../btrfs-convert.c:2300
#4 main .../btrfs-convert.c:2745
Location is heap block of size 96 at 0x7d180000eee0 allocated by main thread:
#0 calloc ??:0 (exe+0x00000002acc6)
#1 device_list_add .../volumes.c:97
#2 btrfs_scan_one_device .../volumes.c:259
#3 btrfs_scan_fs_devices .../disk-io.c:1002
#4 __open_ctree_fd .../disk-io.c:1090
#5 open_ctree_fd .../disk-io.c:1191
#6 do_convert .../btrfs-convert.c:2256
#7 main .../btrfs-convert.c:2745
Signed-off-by: Adam Buchbinder <abuchbinder@google.com>
Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When running with UndefinedBehaviorSanitizer, the tests produce the following
error:
radix-tree.c:836:30: runtime error: shift exponent 18446744073709551613
is too large for 64-bit type 'unsigned long'
(That's a negative shift exponent represented as an unsigned long.)
Even though the value is discarded in those cases, it's still undefined
behavior; see the C99 standard, section 6.5.7, paragraph three: "If the
value of the right operand is negative [...] the behavior is undefined."
Signed-off-by: Adam Buchbinder <abuchbinder@google.com>
Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This is a straight cut and paste from the util-linux
mount manpage into btrfs-mount.5
It's pretty much impossible for util-linux to keep up
with every filesystem out there, and Karel has more than
once expressed a wish that mount options move into fs-specific
manpages.
So, here we go.
The way btrfs asciidoc is generated, there's not a trivial
way to have both btrfs(5) and btrfs(8) so I named it btrfs-mount(5)
for now. A bit ick and I'm open to suggestions.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The 'num_unordered' will be recounted after 'goto out',
just remove it.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
mount(8) will canonicalize pathnames before passing them to the kernel.
Links to e.g. /dev/sda will be resolved to /dev/sda. Links to /dev/dm-#
will be resolved using the name of the device mapper table to
/dev/mapper/<name>.
Btrfs will use whatever name the user passes to it, regardless of whether
it is canonical or not. That means that if a 'btrfs device ready' is
issued on any device node pointing to the original device, it will adopt
the new name instead of the name that was used during mount.
Mounting using /dev/sdb2 will result in df:
/dev/sdb2 209715200 39328 207577088 1% /mnt
lrwxrwxrwx 1 root root 4 Jun 4 13:36 /dev/whatever-i-like -> sdb2
/dev/whatever-i-like 209715200 39328 207577088 1% /mnt
Likewise, mounting with /dev/mapper/whatever and using /dev/dm-0 with a
btrfs device command results in df showing /dev/dm-0. This can happen with
multipath devices with friendly names enabled and doing something like
'partprobe' which (at least with our version) ends up issuing a 'change'
uevent on the sysfs node. That *always* uses the dm-# name, and we get
confused users.
This patch does the same canonicalization of the paths that mount does
so that we don't end up having inconsistent names reported by ->show_devices
later.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
[use PATH_MAX in canonicalize_dm_name]
Signed-off-by: David Sterba <dsterba@suse.cz>
This is the most important patch ever. ;)
I found this to be less aesthetically pleasing than it was before:
[CC] btrfstune.o
Making all in Documentation
ASCIIDOC btrfs-convert.xml
[LD] btrfstune
XMLTO btrfs-convert.8
[CC] btrfs-show-super.o
GZIP btrfs-convert.8.gz
[LD] btrfs-show-super
ASCIIDOC btrfs-debug-tree.xml
XMLTO btrfs-debug-tree.8
so I shortened the pretty-text to match what we had before.
Also, make clean "quiet" like it is in the top dir.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
[Changed to ASCII and XMLTO]
Signed-off-by: David Sterba <dsterba@suse.cz>
Man page for 'btrfs-balance' mentioned <filters> but does not explain
them, which make end users hard to use '-d', '-m' or '-s options.
This patch will use the explanations from
https://btrfs.wiki.kernel.org/index.php/Balance_Filters to enrich the
man page.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Add '-f' option for btrfs-show-super manpage,
This option implies that sys chunk array and backup roots info
will show up.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This patch adds an option '--check-data-csum' to verify data checksums.
fsck won't check data csums unless users specify this option explictly.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When encountering a corrupted fs root node, fsck hit following message:
Check tree block failed, want=29360128, have=0
Check tree block failed, want=29360128, have=0
Check tree block failed, want=29360128, have=0
Check tree block failed, want=29360128, have=0
Check tree block failed, want=29360128, have=0
read block failed check_tree_block
Checking filesystem on /dev/sda9
UUID: 0d295d80-bae2-45f2-a106-120dbfd0e173
checking extents
Segmentation fault (core dumped)
This is because in btrfs_setup_all_roots(), we check
btrfs_read_fs_root() return value by verifing whether it is
NULL pointer, this is wrong since btrfs_read_fs_root() return
PTR_ERR(ret), fix it.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
If the csum of one stripe is not able to judge the order of two
device extents, the stripe may happen to belong to the device extent
that is already kicked out as ordered.
Take this condition into consideration, don't report failure and
give more tries with the stripes following.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When count the number of unordered device extents in chunk-recover,
the counter should be reinitialized to be used.
Also, introduce a new function for the counting job.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This patch adds functionality (in qgroup-verify.c) to compute bytecounts in
subvolume quota groups. The original groups are read in and stored in memory
so that after we compute our own bytecounts, we can compare them with those
on disk. A print function is provided to do this comparison and show the
results on the console.
A 'qgroup check' pass is added to btrfsck. If any subvolume quota groups
differ from what we compute, the differences for them are printed. We also
provide an option '--qgroup-report' which will run only the quota check code
and print a report on all quota groups. Other than making it possible to
verify that our qgroup changes work correctly, this mode can also be used in
xfstests for automated checking after qgroup tests.
This patch does not address the following:
- compressed counts are identical to non compressed, because kernel doesn't
make the distinction yet. Adding the code to verify compressed counts
shouldn't be hard at all though once kernel can do this.
- It is only concerned with subvolume quota groups (like most of
btrfs-progs).
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
Due to either bugs in send (kernel) that generate a command against
a wrong path for example, or transient errors on the receiving side,
we stopped processing the send stream immediately and exited with
an error.
It's often desirable to continue processing the send stream even if an
error happens while processing a single command from the send stream.
This change just adds a --max-errors <N> parameter, whose default value
is 1 (preserving current behaviour), that allows to tolerate N errors
before stopping. A value of 0 means to never stop no matter how many
errors we get into while processing the send stream. Regardless of its
value, errors are always printed to stderr when they happen, just like
before this change.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>