This member can be fetched from eb::fs_info, and no caller really
depends on that member to determine if an eb is dummy. We have eb::flags
to determine that.
Kernel doesn't have such member either.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
Valgrind reports the following error for fsck/012:
adding new tree backref on start 4206592 len 4096 parent 0 root 5
==100735== Syscall param pwrite64(buf) points to uninitialised byte(s)
==100735== at 0x49F303A: pwrite (in /usr/lib/libpthread-2.31.so)
==100735== by 0x1A5C85: write_extent_to_disk (extent_io.c:815)
==100735== by 0x1B2507: write_and_map_eb (disk-io.c:512)
==100735== by 0x1B26A7: write_tree_block (disk-io.c:545)
==100735== by 0x1D4822: __commit_transaction (transaction.c:148)
==100735== by 0x1D4AA2: btrfs_commit_transaction (transaction.c:213)
==100735== by 0x16360D: fixup_extent_refs (main.c:7662)
==100735== by 0x16449F: check_extent_refs (main.c:8033)
==100735== by 0x166199: check_chunks_and_extents (main.c:8786)
==100735== by 0x166441: do_check_chunks_and_extents (main.c:8842)
==100735== by 0x169D13: cmd_check (main.c:10324)
==100735== by 0x11CDC6: cmd_execute (commands.h:125)
==100735== Address 0x4e8aeb0 is 128 bytes inside a block of size 4,224 alloc'd
==100735== at 0x483BB65: calloc (vg_replace_malloc.c:762)
==100735== by 0x1A54C5: __alloc_extent_buffer (extent_io.c:609)
==100735== by 0x1A5AD1: alloc_extent_buffer (extent_io.c:752)
==100735== by 0x1B1A0A: btrfs_find_create_tree_block (disk-io.c:222)
==100735== by 0x1BD4A2: btrfs_alloc_free_block (extent-tree.c:2538)
==100735== by 0x1A8CE3: __btrfs_cow_block (ctree.c:322)
==100735== by 0x1A91C6: btrfs_cow_block (ctree.c:415)
==100735== by 0x1AB16C: btrfs_search_slot (ctree.c:1185)
==100735== by 0x160BBC: delete_extent_records (main.c:6652)
==100735== by 0x16343F: fixup_extent_refs (main.c:7629)
==100735== by 0x16449F: check_extent_refs (main.c:8033)
==100735== by 0x166199: check_chunks_and_extents (main.c:8786)
==100735==
[CAUSE]
For new extent buffer allocated, we don't initialize its content.
This is not a major concern, at all.
For the above report, the reported range is inside the unused part of
the extent buffer, thus won't cause anything.
Regular btrfs_cow_block() will cover all the used ranges of one extent
buffer.
[FIX]
But still, since kernel initialize the extent buffer with 0, it won't
hurt to do extra initialized to make valgrind happy.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
For a fuzzed image, `btrfs check` will segfault at open_ctree() stage:
$ btrfs check --mode=lowmem issue_207.raw
Opening filesystem to check...
extent_io.c:665: free_extent_buffer_internal: BUG_ON `eb->refs < 0` triggered, value 1
btrfs(+0x6bf67)[0x56431d278f67]
btrfs(+0x6c16e)[0x56431d27916e]
btrfs(alloc_extent_buffer+0x45)[0x56431d279db5]
btrfs(read_tree_block+0x59)[0x56431d2848f9]
btrfs(btrfs_setup_all_roots+0x29c)[0x56431d28535c]
btrfs(+0x78903)[0x56431d285903]
btrfs(open_ctree_fs_info+0x90)[0x56431d285b60]
btrfs(+0x45a01)[0x56431d252a01]
btrfs(main+0x94)[0x56431d2220c4]
/usr/lib/libc.so.6(__libc_start_main+0xf3)[0x7f6e28519153]
btrfs(_start+0x2e)[0x56431d22235e]
[CAUSE]
The fuzzed image has a strange log root bytenr:
log_root 61440
log_root_transid 0
In fact, the log_root seems to be fuzzed, as its transid is 0, which is
invalid.
Note that range [61440, 77824) covers the physical offset of the primary
super block.
The bug is caused by the following sequence:
1. cache for tree block [64K, 68K) is created by open_ctree()
__open_ctree_fd()
|- btrfs_setup_chunk_tree_and_device_map()
|- btrfs_read_sys_array()
|- sb = btrfs_find_create_tree_block()
|- free_extent_buffer(sb)
This created an extent buffer [64K, 68K) in fs_info->extent_cache, then
reduce the refcount of that eb back to 0, but not freed yet.
2. Try to read that corrupted log root
__open_ctree_fd()
|- btrfs_setup_chunk_tree_and_device_map()
|- btrfs_setup_all_roots()
|- find_and_setup_log_root()
|- read_tree_block()
|- btrfs_find_create_tree_block()
|- alloc_extent_buffer()
The final alloc_extent_buffer() will try to free that cached eb
[64K, 68K), since it doesn't match with current search.
And since that cached eb is already released (refcount == 0), the
extra free_extent_buffer() will cause above BUG_ON().
[FIX]
Here we fix it through a more comprehensive method, instead of simply
verifying log_root_transid, here we just don't pollute eb cache when
reading sys chunk array.
So that we won't have an eb cache [64K, 68K), and will error out at
logical mapping phase.
Issue: #207
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Create directory for all sources that can be used by anything that's not
rellated to a relevant kernel part, all common functions, helpers,
utilities that do not fit any other specific category.
The traditional location would be probably lib/ with all things that are
statically linked to the main binaries, but we have libbtrfs and
libbtrfsutil so this would be confusing.
Signed-off-by: David Sterba <dsterba@suse.com>
Add const prefix for the following parameters:
- @eb of memcmp_extent_buffer()
- @eb of read_extent_buffer()
This backports kernel commit 1cbb1f454e53 ("btrfs: struct-funcs,
constify readers") to btrfs-progs.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Those functions are in preparation for adding the freespace tree repair
code since it needs to be able to deal with bitmap based FSTs. This
patch adds extent_buffer_bitmap_set and extent_buffer_bitmap_clear
functions. Since in userspace we don't have to deal with page mappings
their implementation is vastly simplified by simply setting each bit in
the passed range.
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Another BUG_ON() during fuzz/003:
====== RUN MAYFAIL btrfs check --init-csum-tree tests/fuzz-tests/images/bko-161821.raw.restored
[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
parent transid verify failed on 4198400 wanted 14 found 1114126
parent transid verify failed on 4198400 wanted 14 found 1114126
Ignoring transid failure
owner ref check failed [4198400 4096]
repair deleting extent record: key [4198400,169,0]
adding new tree backref on start 4198400 len 4096 parent 0 root 5
Repaired extent references for 4198400
ref mismatch on [4222976 4096] extent item 1, found 0
backref 4222976 root 7 not referenced back 0x5617f8ecf780
incorrect global backref count on 4222976 found 1 wanted 0
backpointer mismatch on [4222976 4096]
owner ref check failed [4222976 4096]
repair deleting extent record: key [4222976,169,0]
Repaired extent references for 4222976
[3/7] checking free space cache
[4/7] checking fs roots
parent transid verify failed on 4198400 wanted 14 found 1114126
Ignoring transid failure
Wrong generation of child node/leaf, wanted: 1114126, have: 14
root 5 missing its root dir, recreating
parent transid verify failed on 4198400 wanted 14 found 1114126
Ignoring transid failure
ERROR: child eb corrupted: parent bytenr=4222976 item=0 parent level=1 child level=2
ERROR: errors found in fs roots
extent buffer leak: start 4222976 len 4096
extent_io.c:611: free_extent_buffer_internal: BUG_ON `eb->flags & EXTENT_DIRTY` triggered, value 1
failed (ignored, ret=134): btrfs check --init-csum-tree tests/fuzz-tests/images/bko-161821.raw.restored
mayfail: returned code 134 (SIGABRT), not ignored
test failed for case 003-multi-check-unmounted
Since we're shifting to use btrfs_abort_transaction() in btrfs-progs,
it will be more and more common to see dirty leaked eb. Instead of
BUG_ON(), we only need to report it as a warning.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Instead of using the internal struct extent_io_tree, use struct fs_info.
This does not only unify the interface between kernel and btrfs-progs,
but also makes later btrfs_print_tree() use fewer parameters.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
eb->lru is not initialized in __alloc_extent_buffer(), so in the
following call chain, it could call NULL pointer dereference:
btrfs_clone_extent_buffer()
|- __alloc_extent_buffer()
|- Now eb->lru is NULL (not initialized)
free_extent_buffer_final()
|- list_del_init(&eb->lru)
Thankfully, current btrfs-progs won't trigger such bug as the only
btrfs_clone_extent_buffer() user is paths_from_inode(), which is not
used by anyone.
(But due to the usefulness of that function in future offline scrub, I'd
like to keep this dead code.)
Anyway, initialize eb->lru in __alloc_extent_bufer() bring no harm.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In free_extent_buffer_final() we access eb->tree->cache_size in
BUG_ON(). However eb->tree can be NULL if it's a cloned extent buffer.
Currently the cloned extent buffer is only used in backref.c,
paths_from_inode() function. Thankfully that function is not used yet
(but could be pretty useful to convert inode number to path, so I'd like
to keep such function).
Anyway, check eb->tree before accessing its member.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have the infrastructure to cache extent buffers but we don't actually
do the caching. As soon as the last reference is dropped, the buffer
is dropped. This patch keeps the extent buffers around until the max
cache size is reached (defaults to 25% of memory) and then it drops
the last 10% of the LRU to free up cache space for reallocation. The
cache size is configurable (for use by e.g. lowmem) when the cache is
initialized.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
[ update codingstyle, switch total_memory to bytes ]
Signed-off-by: David Sterba <dsterba@suse.com>
Just to keep the 1st paramter the same as kernel.
We can also save a few lines since the parameter is shorter now.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Copy le_test_bit() from the kernel and use that for the free space tree
bitmaps.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nodesize is used in kernel, the values are always equal. We have to keep
leafsize in headers, similarly the tree setting functions still take and
set leafsize, but it's effectively a no-op.
Signed-off-by: David Sterba <dsterba@suse.com>
kerncompat.h header file is part of libbtrfs API. min/max macros cause
conflict while building projects dependant on libbtrfs. Moving those
macros to btrfs-progs internal header file fixes the conflict.
Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This reuses the existing code for checking the free space cache, we just
need to load the free space tree. While we do that, we check a couple of
invariants on the free space tree itself. This requires pulling in some
code from the kernel to exclude the super stripes.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch is generated from a coccinelle semantic patch:
identifier t;
expression e;
statement s;
@@
-t = malloc(e);
+t = calloc(1, e);
(
if (!t) s
|
if (t == NULL) s
|
)
-memset(t, 0, e);
Signed-off-by: Silvio Fricke <silvio.fricke@gmail.com>
[squashed patches into one]
Signed-off-by: David Sterba <dsterba@suse.com>
Offline btrfs tools, like btrfs-image, will infinitely loop when there
is missing device.
The reason is, for missing device, it's fd will be set to -1, but before
we reading, we only check the fd validation by checking if it's 0.
So in that case, -1 will pass the validation check, and cause pread to
return 0, and loop to read.
Just change the validation check from "== 0" to "<= 0" to avoid such
problem.
Reported-by: Timothy Normand Miller <theosib@gmail.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Unlike kernel, these functions in userland just test/set/clear a member.
So move them to header to avoid extra function call cost.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Allow read_tree_block() and read_node_slot() to return error pointer.
This should help caller to get more specified error number.
For existing callers, change (!eb) judgmentt to
(!extent_buffer_uptodate(eb)) to keep the compatibility, and for caller
missing the check, use PTR_ERR(eb) if possible.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
glibc 2.10+ (5+ years old) enables all the desired features:
_XOPEN_SOURCE 700, __XOPEN2K8, POSIX_C_SOURCE, DEFAULT_SOURCE; with a
single _GNU_SOURCE define in the makefile alone. For portability to
other libc implementations (e.g. dietlibc) _XOPEN_SOURCE=700 is also
defined.
This also resolves Debian bug report filed by Michael Tautschnig -
"Inconsistent use of _XOPEN_SOURCE results in conflicting
declarations". Whilst I was not able to reproduce the results, the
reported fact is that _XOPEN_SOURCE set to 500 in one set of files
(e.g. cmds-filesystem.c) generates/defines different struct stat from
other files (cmds-replace.c).
This patch thus cleans up all feature defines, and sets them at a
consistent level.
Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=747969
Signed-off-by: Dimitri John Ledkov <dimitri.j.ledkov@intel.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
coverity barked out a warning that btrfs-map-logical was storing but
ignoring errors from read_extent_from_disk(). So don't ignore 'em. I
made extent reading errors fatal to match the fatal errors from mapping
mirrors above.
And while we're at it have read_extent_from_disk() return -errno pread
errors instead of -EIO or -1 (-EPERM). The only other caller who tests
errors clobbers them with -EIO.
Signed-off-by: Zach Brown <zab@zabbo.net>
Signed-off-by: David Sterba <dsterba@suse.cz>
This patch pulls back backref.c, adds a couple of helpers everywhere that it
needs, and cleans up backref.c to fit in btrfs-progs. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
[removed free_some_buffers after "do not reclaim extent buffer"]
Signed-off-by: David Sterba <dsterba@suse.cz>
We should kill free_some_buffers() to stop reclaiming extent buffers or
we will hit a problem described below.
As of commit 53ee1bccf9, we are not
counting a reference for tree->lru anymore. However free_some_buffers()
is still left and is reclaiming extent buffers whose @refs == 1. This
cause extent buffers to be reclaimed unintentionally. Thus the following
steps could happen:
1. A buffer at address A is reclaimed by free_some_buffers()
(address A is also free()ed)
2. Some code call alloc_extent_buffer()
3. Address A is assigned to newly allocated buffer
4. You see a buffer pointed by A suddenly changed its content
This problem is also pointed out here and it has a reproducer:
https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg36703.html
This commit drop free_some_buffers() and related variables, and also it
modify extent_io_tree_cleanup() to catch non-free'ed buffers properly.
Signed-off-by: Naohiro Aota <naota@elisp.net>
Signed-off-by: David Sterba <dsterba@suse.cz>
free_some_buffer() should not free dirty extent buffers. They are left
to be committed.
Signed-off-by: Naohiro Aota <naota@elisp.net>
Signed-off-by: David Sterba <dsterba@suse.cz>
Now we set @refs to 2 on creating a new extent buffer, meanwhile we
allocate the needed free space, but we don't give enough free_extent_buffer()
to reduce the eb's references to zero so that the eb can finally be freed,
so the problem is we has decrease the referene count of backrefs to zero, which
ends up releasing the space occupied by the eb, and this space can be allocated
again for something else(another eb or disk), usually a crash(core dump) will
occur, I've hit a crash in rb_insert() because another eb re-use the space while
the original one is floating around.
We should do the same thing as the kernel code does, it's necessary to initialize
@refs to 1 instead of 2, this helps us get rid of the above problem.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
A user was reporting an issue with bad transid errors on his blocks. The thing
is that btrfs-progs will ignore transid failures for things like restore and
fsck so we can do a best effort to fix a users file system. So fsck can put
together a coherent view of the file system with stale blocks. So if everything
else is ok in the mind of fsck then we can recow these blocks to fix the
generation and the user can get their file system back. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
The 'prealloc' extent_state structure is leaked for the case when the 'desired
range' encapsulates/covers the 'extent range'.
Signed-off-by: chandan <chandan@linux.vnet.ibm.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
In files copied from the kernel, mark many functions as static,
and remove any resulting dead code.
Some functions are left unmarked if they aren't static in the
kernel tree.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
It should be 'clear', not 'set'.
Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This adds a 'btrfs-image -m' option, which let us restore an image that
is built from a btrfs of multiple disks onto several disks altogether.
This aims to address the following case,
$ mkfs.btrfs -m raid0 sda sdb
$ btrfs-image sda image.file
$ btrfs-image -r image.file sdc
---------
so we can only restore metadata onto sdc, and another thing is we can
only mount sdc with degraded mode as we don't provide informations of
another disk. And, it's built as RAID0 and we have only one disk,
so after mount sdc we'll get into readonly mode.
This is just annoying for people(like me) who're trying to restore image
but turn to find they cannot make it work.
So this'll make your life easier, just tap
$ btrfs-image -m image.file sdc sdd
---------
then you get everything about metadata done, the same offset with that of
the originals(of course, you need offer enough disk size, at least the disk
size of the original disks).
Besides, this also works with raid5 and raid6 metadata image.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
As we know, btrfs can manage several devices in the same fs, so [offset, size]
is not sufficient for unique identification of an device extent, we need the
device id to identify the device extents which have the same offset and size,
but are not in the same device. So, we added a member variant named objectid
into the extent cache, and introduced some functions to make the extent cache
be suitable to manage the device extent.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
In fact, the code of many rb-tree insert/search/delete functions is similar,
so we can abstract them, and implement common functions for rb-tree, and then
simplify them.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
In trying to track down a weird tree log problem I wanted to make sure that the
free space cache was actually valid, which we currently have no way of doing.
So this patch adds a bunch of support for the free space cache code and then a
checker to fsck. Basically we go through and if we can actually load the free
space cache then we will walk the extent tree and verify that the free space
cache exactly matches what is in the extent tree. Hopefully this will always be
correct, the only time it wouldn't is if the extent tree is corrupt or we have
some sort of awful bug in the free space cache. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Please find attached a patch to make the new libbtrfs usable from
C++ (at least for the parts snapper will likely need).
Signed-off-by: Arvin Schnell <aschnell@suse.de>
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
It looks possible to hit the search_again label without using the
prealloc. A new prealloc is allocated, leaking the current one.
Every use of prealloc sets it to null so let's just allocate a new
prealloc when we don't already have one.
Signed-off-by: Zach Brown <zab@redhat.com>
David Woodhouse originally contributed this code, and Chris Mason
changed it around to reflect the current design goals for raid56.
The original code expected all metadata and data writes to be full
stripes. This meant metadata block size == stripe size, and had a few
other restrictions.
This version allows metadata blocks smaller than the stripe size. It
implements both raid5 and raid6, although it does not have code to
rebuild from parity if one of the drives is missing or incorrect.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This changes free_some_buffers (called each time we allocate an extent
buffer) to allow a higher hard limit on the number of extent buffers
in use.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
fsck needs to be able to open a damaged FS, which means open_ctree needs
to be able to return a damaged FS.
This adds a new open_ctree_fs_info which can be used to open any and all
roots that are valid. btrfs-debug-tree is changed to use it.
Signed-off-by: Chris Mason <chris.mason@oracle.com>