btrfs-convert sometimes show 'Assertion failed' in converting a nearly blank
file system, as:
create btrfs filesystem:
blocksize: 4096
nodesize: 16384
features: extref, skinny-metadata (default)
creating btrfs metadata.
creating ext2fs image file.
trans 7 running 5
ctree.c:363: btrfs_cow_block: Assertion `1` failed.
btrfs-convert(btrfs_cow_block+0x92)[0x40acaf]
btrfs-convert(btrfs_search_slot+0x1cb)[0x40c50f]
btrfs-convert(btrfs_csum_file_block+0x20f)[0x41d83a]
btrfs-convert[0x43422d]
btrfs-convert[0x4342cd]
btrfs-convert[0x4345ca]
btrfs-convert[0x434767]
btrfs-convert[0x435770]
btrfs-convert[0x439748]
btrfs-convert(main+0x13f8)[0x43b09d]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd]
btrfs-convert[0x407649]
Reason is complex:
1: main thread allocated a block of memory,
shared with sub thread
2: main thread killed sub thread, and free above memory
3: main thread malloc a new one(in same address),
and use it
4: sub thread(which is not really quit), write into
this address, and caused this bug.
By adding some debug lines into code, we can see following output:
create btrfs filesystem:
blocksize: 4096
nodesize: 16384
features: extref, skinny-metadata (default)
creating btrfs metadata.
1: ctx(0x7ffe1abde230)->info=0xc65b80
2: task_period_start: will create periodic.timer_fd
3: task_stop: info->periodic.timer_fd = NULL
4: task_stop: begin pthread_cancel info->id=-1746053376
5: task_stop: done pthread_cancel ret=0
6: task_stop: begin info->postfn
7: task_period_stop: periodic.timer_fd NULL
8: task_stop: done info->postfn
9: task_stop: done all
10: creating ext2fs image file.
trans 7 running 5
11: task_period_start: create periodic.timer_fd done info->periodic.timer_fd(0xc65b80)=7
12: btrfs_cow_block: root->fs_info->generation(0xc63568) = 5 trans->transid(0xc65b80)=7
13: ctree.c:368: btrfs_cow_block: Assertion `1` failed.
./btrfs-convert(btrfs_cow_block+0xda)[0x40ad37]
./btrfs-convert(btrfs_search_slot+0x1cb)[0x40c5b4]
./btrfs-convert(btrfs_insert_empty_items+0xac)[0x40d9f6]
./btrfs-convert(btrfs_record_file_extent+0xc0)[0x4183fe]
./btrfs-convert[0x435796]
./btrfs-convert[0x439b0c]
./btrfs-convert(main+0x13f8)[0x43b45d]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd]
./btrfs-convert[0x407689]
Conclusion:
a: subthread should exit before step 5, but it is still running
in step 11
b: task_stop() hadn't close periodic.timer_fd in step3,
because periodic.timer_fd is not initialized yet.
c. address of 0xc65b80 is overwrited by subthread in step 11,
but this address is freed and re-malloc by main thread
before step 10, and used for trans->transid.
d: trans->transid which is overwrite by subthread caused error
in step 13.
Fix:
pthread_cancel() only send a cancellation request to the thread,
thread will quit in next cancellation point by default.
To make sub thread quit in time, this patch add pthread_join()
after pthread_cancel() call.
And to make pthread_join() works, pthread_detach() is removed.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
To avoid following mount error in test:
mount: /root/btrfs/progs/tests/fsck-tests/012-leaf-corruption/test.img
is not a block device (maybe try `-o loop'?)
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As convert implement its own alloc extent, avoid such metadata problem
too.
Reported-by: Chris Murphy <lists@colorremedies.com>
Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now find_free_extent() function won't return a metadata extent that
crosses stripe boundary.
Reported-by: Chris Murphy <lists@colorremedies.com>
Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Kernel btrfs_map_block() function has a limitation that it can only
map BTRFS_STRIPE_LEN size.
That will cause scrub fails to scrub tree block which crosses strip
boundary, causing BUG_ON().
Normally, it's OK as metadata is always in metadata chunk and
BTRFS_STRIPE_LEN can always be divided by node/leaf size.
So without mixed block group, tree block won't cross stripe boundary.
But for mixed block group, especially for btrfs converted from ext4,
it's almost sure one or more tree blocks are not aligned with node size
and cross stripe boundary.
Causing bug with kernel scrub.
This patch will report the problem, although we don't have a good idea
how to fix it in user space until we add the ability to relocate tree
block in user space.
Also, kernel code should also be checked for such tree block alloc
problems.
Reported-by: Chris Murphy <lists@colorremedies.com>
Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Although it is fixed to BTRFS_STRIPE_LEN(64K) now, it's still used in a
lot of code, just output it for user who wants to trace the source of
stripe_len in btrfs_map_bio() code.
Reported-by: Chris Murphy <lists@colorremedies.com>
Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs progs output following error message when doing resize on
no-enouth-free-space case:
# btrfs filesystem resize +10g /mnt/btrfs_5gb
Resize '/mnt/btrfs_5gb' of '+10g'
ERROR: unable to resize '/mnt/btrfs_5gb' - File too large
#
It is not a good description for users, and this patch changed it to:
# ./btrfs filesystem resize +10G /mnt/tmp1
Resize '/mnt/tmp1' of '+10G'
ERROR: unable to resize '/mnt/tmp1' - no enouth free space
#
Reported-by: Taeha Kim <kthguru@gmail.com>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
A leftover from when recursive defrag was added.
Signed-off-by: Patrik Lundquist <patrik.lundquist@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit dedb1ebeee broke commit
96cfbbf0ea.
Casting thresh value greater than (u32)-1 simply truncates bits while
desired value is (u32)-1 for max defrag threshold.
I.e. "btrfs fi defrag -t 4g" is trimmed/truncated to 0
and "-t 5g" to 1073741824.
Also added a missing newline.
Signed-off-by: Patrik Lundquist <patrik.lundquist@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
To run a given test set the variable TEST like
$ make test TEST=002-bad-transid
$ make test TEST=002-*
and only tests matching the value will be run. The pattern is glob and
pased to 'find -name'.
The convert tests do not follow the fsck and misc layout and are skipped
if TEST is set.
Signed-off-by: David Sterba <dsterba@suse.com>
Previously in 'filesystem resize get_min_size', now
'inspect-internal min-dev-size'. We'd like to avoid cluttering the
'resize' syntax further.
The test has been updated to exercise the new option.
Signed-off-by: David Sterba <dsterba@suse.com>
Currently there is not way for a user to know what is the minimum size a
device of a btrfs filesystem can be resized to. Sometimes the value of
total allocated space (sum of all allocated chunks/device extents), which
can be parsed from 'btrfs filesystem show' and 'btrfs filesystem usage',
works as the minimum size, but sometimes it does not, namely when device
extents have to relocated to holes (unallocated space) within the new
size of the device (the total allocated space sum).
This change adds the ability to reliably compute such minimum value and
extents 'btrfs filesystem resize' with the following syntax to get such
value:
btrfs filesystem resize [devid:]get_min_size
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
# mkfs.btrfs /dev/sdb /dev/sdd -m raid0 -d raid0
# mount /dev/sdb /mnt/btrfs
# btrfs balance start /mnt/btrfs
# btrfs fi df /mnt/btrfs
Data, single: total=1.00GiB, used=320.00KiB
System, single: total=32.00MiB, used=16.00KiB
Metadata, RAID0: total=256.00MiB, used=112.00KiB
GlobalReserve, single: total=16.00MiB, used=0.00B
Only metadata stay RAID0. Data and system goes from RAID0 to single.
[REASON]
The problem is caused by the temporary single chunk.
In mkfs, it will always create single data/metadata/sys chunk and them
add device into the temporary btrfs.
When doing all chunk balance, for data and syschunk, they are almost
empty, so balance will move them into the single chunk and remove the
old RAID0 chunk.
For metadata, it has more data and will kick the metadata chunk pre
alloc, so new RAID0 chunk is allocated and the old metadata is move
there. Old RAID0 and single chunks are removed.
[FIX]
Now we add a new function to cleanup the temporary chunks at the end of
mkfs routine.
It will cleanup the chunks which is empty and its profile differs from
the mkfs profile.
So in balance, btrfs will always alloc a new chunk to keep the profile,
other than moving data into the single chunk.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Man manual need to be updated since RAID5/6 has been supported
by btrfs-replace.
Signed-off-by: Wang Yanfeng <wangyf-fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This reverts commit 5f8232e5c8.
This commit causes a regression:
$ mkfs.btrfs -f /dev/sda6
$ btrfsck /dev/sda6
Checking filesystem on /dev/sda6
UUID: 2ebb483c-1986-4610-802a-c6f3e6ab4b76
checking extents
Chunk[256, 228, 0]: length(4194304), offset(0), type(2) mismatch with
block group[0, 192, 4194304]: offset(4194304), objectid(0), flags(34)
Chunk[256, 228, 4194304]: length(8388608), offset(4194304), type(4)
mismatch with block group[4194304, 192, 8388608]: offset(8388608),
objectid(4194304), flags(36)
Block group[0, 4194304] (flags = 34) didn't find the relative chunk.
Block group[4194304, 8388608] (flags = 36) didn't find the relative
chunk.
......
The commit has the following bug causing the problem.
1) Typo forgets to add meta/data_profile for alloc_chunk.
Only meta/data_profile is added to allocate a block group, but not
chunk.
2) Type for the first system chunk is impossible to modify yet.
The type for the first chunk and its stripe is hard coded into
make_btrfs() function.
So even we try to modify the type of the block group, we are unable to
change the type of the first chunk.
Causing the chunk type mismatch problem.
The 1st bug can be fixed quite easily but the second is not.
The good news is, the last patch "btrfs-progs: mkfs: Cleanup temporary
chunk to avoid strange balance behavior." from my patchset can handle it
quite well alone.
So just revert the patch.
New bug fix for btrfsck(err is 0 even chunk/extent tree is corrupted) and
new test cases for mkfs will follow soon.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function will be used to free a empty chunk.
This provides the basis for later temp chunk cleanup.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce two functions, free_space_info and free_block_group_cache.
The former will free the space of a empty block group.
The latter will free the in memory block group cache along with its
space in space_info and device space.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce two functions, free_chunk_item and free_system_chunk_item.
First one will free chunk item in chunk tree.
The latter one will free a system chunk in super block.
They are used for later chunk/block group free function.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce two functions, free_dev_extent_item and
free_chunk_dev_extent_items, to free dev extent items in a chunk.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function is used to free a block group item. It must be called
with all the space in the block group pinned. Or there is a possibility
that tree blocks be allocated into the range.
The function is used for later block group/chunk free function.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As chunk tree is only stored in super block, chunk tree commit doesn't
need to go through tree root update.
Or a BUG_ON will be triggered.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a new test case for I_ERR_FILE_WRONG_NBYTES.
The new btrfs-image dump image contains a file in 12K size.
But nbytes in its inode item is a random number.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Some unknown kernel bug makes inode nbytes modification out of sync with
file extent update.
But it's quite easy to fix in btrfs-progs anyway.
So just fix it by adding a new function repair_inode_nbytes by using the
found_size in inode_record.
Reported-by: Christian <cdysthe@gmail.com>
Reported-by: Chris Murphy <lists@colorremedies.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The filesystem creation has to solve some chicken-egg problems and
creates some temporary objects. In our case it's an extra single/single
pair of block groups that's not used unless the user asks that
explicitly.
Example:
Data, single: total=8.00MiB, used=64.00KiB
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=153.56MiB, used=112.00KiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B
Even with a single device filesystem and defaults, there's single
block group for metadata and system. The single device case is easy to
fix, we'll simply create the right type from the beginning.
Example:
Data, single: total=8.00MiB, used=64.00KiB
System, DUP: total=4.00MiB, used=16.00KiB
Metadata, DUP: total=136.00MiB, used=112.00KiB
GlobalReserve, single: total=16.00MiB, used=0.00B
Filesystem on top of multiple devices still leaves the single/single
groups behind.
Signed-off-by: David Sterba <dsterba@suse.com>
Enhance leaf check to verify item ends that looks otherwise fine but
would exceed leaf. Same check is done in kernel.
Reported-by: Robert Marklund <robbelibobban@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The optional argument to attribute 'deprecated' has been introduced in
gcc 4.5, and does not build on 4.4 which is still in use. The
recommended replacements are mentioned in the comment, not absolutely
necessary to repeat it via the attribute.
Reported-by: Amr El-Sharnoby <amr.elsharnoby@horizontechs.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The original implementation doesn't output the nbytes for an inode.
Add the output and since the output is too long, reformat it to multi
lines.
This is very handy to debug related bugs.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
There's an awkward asymmetry between btrfs device add and btrfs device
delete. Resolve this by aliasing delete to remove.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
We're also going to want to support aliases, so rather than adding
another member, replace "hidden" with a "flags" member.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>