The table has been updated, copy the changes so that we can utilize it
for cleanups.
Note, ncopies for raid5 and rai6 was wrong and is now correct.
Signed-off-by: David Sterba <dsterba@suse.com>
Mkfs tends to create pretty large metadata chunk compared to kernel:
Node size: 16384
Sector size: 4096
Filesystem size: 10.00GiB
Block group profiles:
Data: single 8.00MiB
Metadata: DUP 1.00GiB
System: DUP 8.00MiB
While kernel only tends to create 256MiB metadata chunk:
/* for larger filesystems, use larger metadata chunks */
if (fs_devices->total_rw_bytes > 50ULL * SZ_1G)
max_stripe_size = SZ_1G;
else
max_stripe_size = SZ_256M;
This won't cause problems in real world, but it's still better to make
the behavior unified.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Unlike kernel, btrfs-progs doesn't (yet) support devices grow/shrink,
the port only needs to handle open_ctree() time initialization (at
read_one_dev()), and btrfs_add_device() used for mkfs.
This provide the basis for incoming unification of chunk allocator
behavior.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add support for a new metadata_uuid field. This is just a preparatory
commit which switches all users of the fsid field for metdata comparison
purposes to utilize the new field. This more or less mirrors the
kernel patch, additionally:
* Update 'btrfs inspect-internal dump-super' to account for the new
field. This involes introducing the 'metadata_uuid' line to the
output and updating the logic for comparing the fs uuid to the
dev_item uuid.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have btrfs_alloc_dev_extent() accepting @convert flag to toggle
special handling for convert.
However the @convert flag only determines whether we call
find_free_dev_extent(), and we may later need to insert dev extents
without searching dev tree.
So refactor btrfs_alloc_dev_extent() into 2 functions,
- btrfs_alloc_dev_extent(), which will try to find free dev extent, and
- btrfs_insert_dev_extent(), which will just inserts a dev extent
For implementation, btrfs_alloc_dev_extent() will call
btrfs_insert_dev_extent() anyway, so there's no duplicated code.
This removes the need of @convert parameter, and make
btrfs_insert_dev_extent() public for later usage.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Similar to the changes where strerror(errno) was converted, continue
with the remaining cases where the argument was stored in another
variable.
The savings in object size are about 4500 bytes:
$ size btrfs.old btrfs.new
text data bss dec hex filename
805055 24248 19748 849051 cf49b btrfs.old
804527 24248 19748 848523 cf28b btrfs.new
Signed-off-by: David Sterba <dsterba@suse.com>
Another BUG_ON() during fuzz/003:
====== RUN MAYFAIL btrfs check --repair tests/fuzz-tests/images/bko-199833-reloc-recovery-crash.raw.restored
[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
ctree.c:1650: leaf_space_used: Warning: assertion `data_len < 0` failed, value 1
bad key ordering 18 19
bad block 29409280
ERROR: errors found in extent allocation tree or chunk allocation
WARNING: minor unaligned/mismatch device size detected
WARNING: recommended to use 'btrfs rescue fix-device-size' to fix it
[3/7] checking free space cache
[4/7] checking fs roots
ctree.c:1650: leaf_space_used: Warning: assertion `data_len < 0` failed, value 1
bad key ordering 18 19
root 18446744073709551608 missing its root dir, recreating
Unable to find block group for 0
Unable to find block group for 0
Unable to find block group for 0
volumes.c:564: btrfs_alloc_dev_extent: BUG_ON `ret` triggered, value -28
failed (ignored, ret=134): btrfs check --repair tests/fuzz-tests/images/bko-199833-reloc-recovery-crash.raw.restored
mayfail: returned code 134 (SIGABRT), not ignored
test failed for case 003-multi-check-unmounted
However the culprit function btrfs_alloc_dev_extent() has proper error
handling label err:, just using that label would solve the problem easily.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Prevent unnecessary error from failing fsync(), if opened read only.
Performed 'grep "writeable = " *.h *.c' to make sure there were no odd
situations where fsync() might still be desired here. They're all straight-
forward. The only situation where writeable will be 0 is if btrfs_open_devices
is given flags without O_RDWR. There is no situation where a writeable volume
temporarily becomes unwriteable, or anything like that. Given that it's being
opened O_RDWR, there's no reason to attempt fsync().
utils.c
int btrfs_add_to_fsid() {
...
device->writeable = 1;
volumes.c
int btrfs_close_devices() {
...
while (!list_empty(&fs_devices->devices)) {
...
// just after the fsync() being patched
267: device->writeable = 0;
...
int btrfs_open_devices() {
...
list_for_each_entry(device, &fs_devices->devices, dev_list) {
...
if (flags & O_RDWR)
332: device->writeable = 1
kernel btrfs_close_devices() does not have a corresponding fsync() that I see.
Signed-off-by: James Harvey <jamespharvey20@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
BUG_ON() can be triggered if some image contains overlappin chunks.
volumes.c:1930: read_one_chunk: BUG_ON `ret` triggered, value -17
btrfs(+0x2cf12)[0x5601efa17f12]
btrfs(+0x2fd8b)[0x5601efa1ad8b]
btrfs(btrfs_read_chunk_tree+0x2bf)[0x5601efa1b30f]
btrfs(btrfs_setup_chunk_tree_and_device_map+0xe8)[0x5601efa07718]
btrfs(+0x1c944)[0x5601efa07944]
btrfs(open_ctree_fs_info+0x90)[0x5601efa07b90]
btrfs(cmd_check+0x4d7)[0x5601efa4f8c7]
btrfs(main+0x88)[0x5601ef9fd768]
/usr/lib/libc.so.6(__libc_start_main+0xeb)[0x7f3c7787306b]
btrfs(_start+0x2a)[0x5601ef9fd88a]
Extent cache code can already detect it without problems, we only need
to remove the BUG_ON() and exit gracefully.
Reported-by: Xu Wen <wen.xu@gatech.edu>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200409
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Don't panic for btrfs_read_chunk_tree() if one device or chunk is
corrupted.
Caller can already handle it pretty well.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199839
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Presently btrfs-progs haven't pulled the enum defining the symbolic
names of read ahead constants. This commit adds the enum and
simultaneously converts all usages to respective symbolic name.
No functional change, just making the code human readable.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This parameter was introduced with the original implementation of the
function but has never really been used, so just remove it.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Make find_device to be consistent with kernel according
35c70103a528 ("btrfs: refactor find_device helper")
And, modify the compare condition from both devid and uuid to
devid or devid and uuid according
8f18cf13396c ("Btrfs: Make the resizer work based on shrinking and growing devices")
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Just as kernel find_free_dev_extent(), allow it to return maximum hole
size for us to build device list for later chunk allocator rework.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As part of the effort to unify code and behavior between btrfs-progs and
kernel, copy the btrfs_raid_array from kernel to btrfs-progs.
So later we can use the btrfs_raid_array[] to get needed raid info other
than manually do if-else branches.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Do a cleanup. Also make it consistent with kernel. Use fs_info instead
of root for BTRFS_LEAF_DATA_SIZE, since maybe in some situation we do
not know root, but just know fs_info.
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As btrfs is specific to Linux, %m can be used instead of strerror(errno)
in format strings. This has some size reduction benefits for embedded
systems.
glibc, musl, and uclibc-ng all support %m as a modifier to printf.
A quick glance at the BIONIC libc source indicates that it has
support for %m as well. BSDs and Windows do not but I do believe
them to be beyond the scope of btrfs-progs.
Compiled sizes on Ubuntu 16.04:
Before:
3916512 btrfs
233688 libbtrfs.so.0.1
4899 bcp
2367672 btrfs-convert
2208488 btrfs-corrupt-block
13302 btrfs-debugfs
2152160 btrfs-debug-tree
2136024 btrfs-find-root
2287592 btrfs-image
2144600 btrfs-map-logical
2130760 btrfs-select-super
2152608 btrfstune
2131760 btrfs-zero-log
2277752 mkfs.btrfs
9166 show-blocks
After:
3908744 btrfs
233256 libbtrfs.so.0.1
4899 bcp
2366560 btrfs-convert
2207432 btrfs-corrupt-block
13302 btrfs-debugfs
2151104 btrfs-debug-tree
2134968 btrfs-find-root
2281864 btrfs-image
2143536 btrfs-map-logical
2129704 btrfs-select-super
2151552 btrfstune
2130696 btrfs-zero-log
2276272 mkfs.btrfs
9166 show-blocks
Total savings: 23928 (24 kilo)bytes
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
@chunk_tree and @chunk_objectid of device extent is fixed to
BTRFS_CHUNK_TREE_OBJECTID and BTRFS_FIRST_CHUNK_TREE_OBJECTID
respectively.
There is no need to pass them as parameter explicitly.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Remove @trans parameter for find_free_dev_extent_start() and its
callers.
The function itself is doing read-only tree search, no use of
transaction.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The function is not used by anyone else outside of volumes.c, make it
static.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are a couple of places where instead of the more succinct
list_for_each_entry the code uses list_for_each. This results in
slightly more code with no additional benefit as well as no
coherent pattern. This patch makes the code uniform. No functional
changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
[ remove unused variable in uuid_search ]
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce new subcommand 'fix-device-size' to the rescue group, to fix
device size alignment-related problems.
Especially for people unable to mount their fs with super::total_bytes
mismatch, this tool will fix the problems and let the mount continue.
Reported-by: Asif Youssuff <yoasif@gmail.com>
Reported-by: Rich Rauenzahn <rrauenza@gmail.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Recent kernel (starting from v4.6) will refuse to mount if super block
total bytes is smaller than all devices' size.
This makes end user unable to do anything to their otherwise quite
healthy fs.
To fix such problem, introduce repair function to fix it on an unmounted
filesystem.
Reported-by: Asif Youssuff <yoasif@gmail.com>
Reported-by: Rich Rauenzahn <rrauenza@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Recent kernel introduced alignment check for dev item, however older
kernel doesn't align device size when adding new device or shrinking
existing device.
This makes noisy kernel warning every time when any DEV_ITEM gets updated.
Introduce function to fix device size on an unmounted filesystem.
Reported-by: Asif Youssuff <yoasif@gmail.com>
Reported-by: Rich Rauenzahn <rrauenza@gmail.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When passing directory larger than block device using --rootdir
parameter, we get the following backtrace:
------
extent-tree.c:2693: btrfs_reserve_extent: BUG_ON `ret` triggered, value -28
./mkfs.btrfs(+0x1a05d)[0x557939e6b05d]
./mkfs.btrfs(btrfs_reserve_extent+0xb5a)[0x557939e710c8]
./mkfs.btrfs(+0xb0b6)[0x557939e5c0b6]
./mkfs.btrfs(main+0x15d5)[0x557939e5de04]
/usr/lib/libc.so.6(__libc_start_main+0xea)[0x7f83b101af6a]
./mkfs.btrfs(_start+0x2a)[0x557939e5af5a]
------
Nothing special, just BUG_ON() abusing from ancient code.
Fix them by using correct return.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If one of btrfs' devices was pulled out and we've replaced it with a
new one, then they have the same uuid.
If that device gets reconnected, 'btrfs filesystem show' will show the
stale one instead of the new one, but on the kernel side btrfs has a fix
not to include the stale one, this could confuse users as people may
monitor btrfs by running that command.
This does the similar thing to what kernel side has done.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
[ format string adjustments ]
Signed-off-by: David Sterba <dsterba@suse.com>
Function find_next_chunk() is used to find next chunk start position,
which should only do search on chunk tree and objectid is set to
BTRFS_FIRST_CHUNK_TREE_OBJECTID.
So refactor the parameter list to get rid of @root, which should be
obtained from fs_info->chunk_root, and @objectid, which is set to
BTRFS_FIRST_CHUNK_TREE_OBJECTID.
Signed-off-by: Qu Wenruo <quwenruo.btrfs@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Metadata blocks are always nodesize. When reading the
superblock::sys_array, the actual size of data is fixed to 4k and
smaller than nodesize, but otherwise everything works as before.
Signed-off-by: David Sterba <dsterba@suse.com>
I've run into a couple filesystems where btrfs-find-root would spin
indefinitely.
If the first cache extent start location is 0, we end up in an infinite
loop in btrfs_next_bg(). Fix it by checking for that situation, and
jumping to the next bg if necessary.
Fixes: e2e0dae9 (btrfs-progs: volume: Fix a bug causing btrfs-find-root to skip first chunk)
Signed-off-by: Justin Maggard <jmaggard@netgear.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 functions are involved in this refactor: btrfs_make_block_group()
btrfs_make_block_groups(), btrfs_alloc_chunk, btrfs_alloc_data_chunk().
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
BTW, there is a duplicated definition of btrfs_add_device() in
volumes.h, also remove it.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Just to keep the 1st paramter the same as kernel.
We can also save a few lines since the parameter is shorter now.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new function, btrfs_get_chunk_stripe_len() to get correct
stripe length.
This is very handy for lowmem mode, which checks the mapping between
device extent and chunk item.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_check_chunk_valid() doesn't check if
1) chunk flag has conflicting flags
For example chunk type DATA|METADATA|RAID1|RAID10 is completely
invalid, while current check_chunk_valid() can't detect it.
2) num_stripes is invalid for RAID10
Num_stripes 5 is not valid for RAID10.
This patch will enhance btrfs_check_chunk_valid() to handle above cases.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new header, kernel-lib/raid56.h, for later raid56 works.
It contains 2 functions, from original btrfs-progs code:
void raid6_gen_syndrome(int disks, size_t bytes, void **ptrs);
int raid5_gen_result(int nr_devs, size_t stripe_len, int dest, void **data);
Will be expanded later and some part of it(RAID6 recover part) may keep
sync with kernel later.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ unify gpl header, rename header macro ]
Signed-off-by: David Sterba <dsterba@suse.com>
In btrfs_check_chunk_valid() we calculate chunk item using open code,
use an existing helper btrfs_chunk_item_size() instead.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If the final fsync() on the Btrfs device fails, we just swallow the
error and don't alert the user in any way. This was uncovered by xfstest
generic/405, which checks that mkfs fails when it encounters EIO.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>