The libblkid scan method which was introduced later, will also
scan devices under /proc/partitions. So we don't have to do
the explicit scan of the same.
Remove the scan method BTRFS_SCAN_PROC.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
If we didn't find what we are looking for in /proc/partitions,
we're not going to find it by scanning every node under /dev, either.
But that's just what btrfs_scan_for_fsid() does.
Remove that fallback; at that point btrfs_scan_for_fsid() just calls
scan_for_btrfs(), so remove the wrapper & call it directly.
Side note: so, these paths always use /proc/partitions, not libblkid.
Userspace-intiated scans default to libblkid. I presume this is
part of the design, and intentional? Anyway, not changing it now!
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
We use the read extent buffer infrastructure to read the super block when we are
creating a btrfs-image. This works out fine most of the time except when the fs
has been balanced, then it fails to map the super block. So we could fix
btrfs-image to read in the super in a special way, but thats more code. So
instead just check in the eb reading code if we are reading the super and then
don't bother mapping the block, just read the actual offset. This fixed some
poor guy who was trying to btrfs-image his fs that had been balanced. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
David sent a quick patch that removed a BUG_ON(). I took a peek and
found that the function was already leaking an eb ref and only returned
0. So this fixes the leak and makes the function void and fixes up the
callers.
Accidentally-motivated-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Zach Brown <zab@zabbo.net>
Signed-off-by: David Sterba <dsterba@suse.cz>
Btrfs-progs superblock checksum check is somewhat too restricted for
super-recover, since current btrfs-progs will only read the 1st
superblock and if you need super-recover the 1st superblock is
possibly already damaged.
The fix is introducing super_recover parameter for
btrfs_read_dev_super() and callers to allow scan backup superblocks if
needed.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When encountering a corrupted fs root node, fsck hit following message:
Check tree block failed, want=29360128, have=0
Check tree block failed, want=29360128, have=0
Check tree block failed, want=29360128, have=0
Check tree block failed, want=29360128, have=0
Check tree block failed, want=29360128, have=0
read block failed check_tree_block
Checking filesystem on /dev/sda9
UUID: 0d295d80-bae2-45f2-a106-120dbfd0e173
checking extents
Segmentation fault (core dumped)
This is because in btrfs_setup_all_roots(), we check
btrfs_read_fs_root() return value by verifing whether it is
NULL pointer, this is wrong since btrfs_read_fs_root() return
PTR_ERR(ret), fix it.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This patch adds functionality (in qgroup-verify.c) to compute bytecounts in
subvolume quota groups. The original groups are read in and stored in memory
so that after we compute our own bytecounts, we can compare them with those
on disk. A print function is provided to do this comparison and show the
results on the console.
A 'qgroup check' pass is added to btrfsck. If any subvolume quota groups
differ from what we compute, the differences for them are printed. We also
provide an option '--qgroup-report' which will run only the quota check code
and print a report on all quota groups. Other than making it possible to
verify that our qgroup changes work correctly, this mode can also be used in
xfstests for automated checking after qgroup tests.
This patch does not address the following:
- compressed counts are identical to non compressed, because kernel doesn't
make the distinction yet. Adding the code to verify compressed counts
shouldn't be hard at all though once kernel can do this.
- It is only concerned with subvolume quota groups (like most of
btrfs-progs).
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
Fix double free of memory if btrfs_open_devices fails:
*** Error in `btrfs': double free or corruption (fasttop): 0x000000000066e020 ***
Crash happened because when open failed on device inside
btrfs_open_devices it freed all memory by calling btrfs_close_devices but
inside disk-io.c we call btrfs_close_again it again.
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When a disk containing btrfs is overwritten with other FS, ext4
for example it doesn't overwrite 2nd and 3rd copy of the btrfs SB.
And btrfs_read_dev_super() would look for backup SB when primary
SB isn't found. This causes the problem as in the reproducer below.
In kernel we avoid this by _not_ reading backup SB implicitly,
this patch would port the same to btrfs-progs.
reproducer:
mkfs.btrfs /dev/sde
mkfs.ext4 /dev/sde
mount /dev/sde /ext4
btrfs-convert /dev/sde (is successful (bug))
with this patch
::
btrfs-convert /dev/sde
/dev/sde is mounted
Signed-off-by: Anand Jain <Anand.Jain@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
If we are cycling through all of the mirrors trying to find the best one we need
to make sure we set best_mirror to an actual mirror number and not 0. Otherwise
we could end up reading a mirror that wasn't the best and make everybody sad.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Currently, as of 8cae1840af when running
btrfs-convert I get a bus error.
The problem is that struct btrfs_key has __attribute__ ((__packed__))
so it is not aligned. Then, a pointer to it's objectid field is taken,
cast to a void*, then eventually cast back to a u64* and
dereferenced. The problem is that the dereferenced u64* is not
necessarily aligned (ie, not necessarily a valid u64*), resulting in
undefined behavior.
This patch adds a local u64 variable which would of course be properly
aligned and then uses a pointer to that.
I did not modify the call from btrfs_fs_roots_compare_roots as that
uses struct btrfs_root which is a regular struct and would thus have
it's members correctly aligned to begin with.
After patching this I realized Liu Bo had already written a similar
patch, but I think mine is cleaner, so I'm sending it anyway.
Signed-off-by: Ivan Jager <aij+@mrph.org>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
this patch will make btrfsck operations to open disk in exclusive mode,
so that mount will fail when btrfsck is running
Signed-off-by: Anand Jain <Anand.Jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
The following steps could trigger btrfs segfault:
mkfs -t btrfs -m raid5 -d raid5 /dev/loop{0..3}
losetup -d /dev/loop2
btrfs check /dev/loop0
The reason is that read_tree_block() returns NULL and
add_root_to_pending() dereferences it without checking it first.
Also replace a BUG_ON with proper error checking.
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Internally, btrfs_header_chunk_tree_uuid() calculates an unsigned
long, but casts it to a pointer, while all callers cast it to unsigned
long again.
From btrfs commit b308bc2f05a86e728bd035e21a4974acd05f4d1e
Signed-off-by: Ross Kirk <ross.kirk@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Unfortunately you can't run --init-extent-tree if you can't actually read the
extent root. Fix this by allowing partial starts with no extent root and then
have fsck only check to see if the extent root is uptodate _after_ the check to
see if we are init'ing the extent tree. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
So I needed to add a flag to not try to read block groups when doing
--init-extent-tree since we could hang there, but that meant adding a whole
other 0/1 type flag to open_ctree_fs_info. So instead I've converted it all
over to using a flags setting and added the flag that I needed. This has been
tested with xfstests and make test. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
In some cases the tree root is so hosed we can't get anything useful out of it.
So add the -b option to btrfsck to make us look for the most recent backup tree
root to use for repair. Then we can hopefully get ourselves into a working
state. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
mkfs -r wasn't creating chunks properly, making it very difficult to
allocate space for anything except tiny filesystems.
This changes it around to use more of the generic infrastructure, and
to do actual logical->physical block number translation.
It also allocates space to the files in smaller extents (max 1MB), which
keeps the allocator from trying to allocate an extent bigger than a
single chunk.
It doesn't quite support multi-device mkfs -r yet, but is much closer.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
A user was reporting an issue with bad transid errors on his blocks. The thing
is that btrfs-progs will ignore transid failures for things like restore and
fsck so we can do a best effort to fix a users file system. So fsck can put
together a coherent view of the file system with stale blocks. So if everything
else is ok in the mind of fsck then we can recow these blocks to fix the
generation and the user can get their file system back. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Internally, btrfs_header_fsid() calculates an unsigned long, but casts
it to a pointer, while all callers cast it to unsigned long again.
Committed to btrfs as fba6aa75654394fccf2530041e9451414c28084f
Fix line length issues and match changes to kernelspace
Signed-off-by: Ross Kirk <ross.kirk@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Remove unused parameter, 'eb'. Unused since introduction in
7777e63b42
Signed-off-by: Ross Kirk <ross.kirk@gmail.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Until now if one of device's first superblock is corrupt,btrfs will
fail to mount. Luckily, btrfs have at least two superblocks for
every disk.
In theory, if silent corrupting happens when we are writting superblocks
into disk, we must hold at least one good superblock.
One side effect is that user must gurantee that the disk must be
a btrfs disk. Otherwise, this tool may destroy other fs.(This is also
reason why btrfs only use first superblock in every disk to mount)
This little program will try to correct bad superblocks from
good superblocks with max generation.
There will be five kinds of return values:
0: all supers are valid, no need to recover
1: usage or syntax error
2: recover all bad superblocks successfully
3: fail to recover bad superblocks
4: abort to recover bad superblocks
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
If some fatal superblocks are damaged, running ioctl will return failure,
in this case, we should avoid run ioctl.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
As a result of a successful call to btrfs_read_sys_array(), the 'ret'
variable is already set to 0. Hence the function would return 0 even
if the call to read_tree_block() fails.
Signed-off-by: chandan <chandan@linux.vnet.ibm.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
In files copied from the kernel, mark many functions as static,
and remove any resulting dead code.
Some functions are left unmarked if they aren't static in the
kernel tree.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Port of commit b3b4aa7 to userspace.
parameter tree root it's not used since commit
5f39d397dfbe140a14edecd4e73c34ce23c4f9ee ("Btrfs: Create extent_buffer
interface for large blocksizes")
This gets userspace a tad closer to kernelspace by removing
this unused parameter that was all over the codebase...
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
For most time, In open_ctree_*(), we use the first superblock
(BTRFS_SUPER_INFO_OFFSET). However, for btrfs-convert, we don't,
we should pass the correct sb_bytenr to btrfs_scan_fs_devices() rather
than always use BTRFS_SUPER_INFO_OFFSET.This patch fix the following
regression:
mkfs.ext2 <dev>
btrfs-convert <dev>
warning, device 1 is missing
Check tree block failed, want=2670592, have=0
read block failed check_tree_block
Couldn't read chunk root
Segmentation fault (core dumped)
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
btrfs_scan_for_fsid uses only one argument run_ioctl out of 3
so remove the rest two of them
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Some codes still use the cpu_to_lexx instead of the
BTRFS_SETGET_STACK_FUNCS declared in ctree.h.
Also added some BTRFS_SETGET_STACK_FUNCS for btrfs_header and
btrfs_super.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This adds a 'btrfs-image -m' option, which let us restore an image that
is built from a btrfs of multiple disks onto several disks altogether.
This aims to address the following case,
$ mkfs.btrfs -m raid0 sda sdb
$ btrfs-image sda image.file
$ btrfs-image -r image.file sdc
---------
so we can only restore metadata onto sdc, and another thing is we can
only mount sdc with degraded mode as we don't provide informations of
another disk. And, it's built as RAID0 and we have only one disk,
so after mount sdc we'll get into readonly mode.
This is just annoying for people(like me) who're trying to restore image
but turn to find they cannot make it work.
So this'll make your life easier, just tap
$ btrfs-image -m image.file sdc sdd
---------
then you get everything about metadata done, the same offset with that of
the originals(of course, you need offer enough disk size, at least the disk
size of the original disks).
Besides, this also works with raid5 and raid6 metadata image.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Otherwise we will access illegal addresses while searching on fs_uuid list.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Add chunk-recover program to check or rebuild chunk tree when the system
chunk array or chunk tree is broken.
Due to the importance of the system chunk array and chunk tree, if one of
them is broken, the whole btrfs will be broken even other data are OK.
But we have some hint(fsid, checksum...) to salvage the old metadata.
So this function will first scan the whole file system and collect the
needed data(chunk/block group/dev extent), and check for the references
between them. If the references are OK, the chunk tree can be rebuilt and
luckily the file system will be mountable.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Because the fs/file roots are not extents, so it is better to use rb-tree
to manage them. Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
In fact, the code of many rb-tree insert/search/delete functions is similar,
so we can abstract them, and implement common functions for rb-tree, and then
simplify them.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Some commands(such as btrfs-convert) access the devices again after we close
the ctree, so it is better that we don't free the devices objects when the ctree
is closed, or we need re-allocate the memory for the devices. We needn't worry
the memory leak problem, because all the memory will be freed after the taskes
die.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
As we know, the file descriptor 0 is a special number, so we shouldn't
use it to initialize the file descriptor of the devices, or we might
close this special file descriptor by mistake when we close the devices.
"-1" is a better choice.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
The tree log bug I introduced could create inconsistent file extent entries in
the file system tree and in some worst cases even create multiple extent entries
for the same entry. To fix this we need to do a few things
1) Keep track of extent items that overlap and then pick the one that covers the
largest area and delete the rest of the items.
2) Keep track of file extent items that land in extent items but don't match
disk_bytenr/disk_num_bytes exactly. Once we find these we need to figure out
who is the right ref and then fix all of the other refs to agree.
Each of these cases require a complete rescan of all of the extents, so
unfortunately if you hit this particular problem the fsck is going to take quite
a while since it will likely rescan all the trees 2 or 3 times. With this patch
the broken file system a user sent me is fixed and a broken file system that was
created by my reproducer is also fixed. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Only the first byte of the wanted csum is printed:
checksum verify failed on 65536 found DA97CF61 wanted 6B
checksum verify failed on 65536 found DA97CF61 wanted 6BC3870D
Also add leading zeros to the format.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
All we need for restore to work is the chunk root, the tree root and the fs root
we want to restore from. So to do this we need to make a few adjustments
1) Make open_ctree_fs_info fail completely if it can't read the chunk tree.
There is no sense in continuing if we can't read the chunk tree since we won't
be able to translate logical to physical blocks.
2) Use open_ctree_fs_info in restore, and if we didn't load a tree root or
fs root go ahead and try to set those up manually ourselves.
This is related to work I did last year on restore, but it uses the
open_ctree_fs_info instead of my open coded open_ctree. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
I've been working on btrfs-image and I kept seeing these leaks pop up on
valgrind so I'm just fixing them. We don't properly cleanup the device cache,
the chunk tree mapping cache, or the space infos on close. With this patch
valgrind doesn't complain about any memory leaks running btrfs-image. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
We just free the log root after we set it up when we open a ctree in the tools.
This isn't nice, it makes double free's and leaks eb's, makes segfaults with
btrfs-image. So fix this to be correct, and fix the cleanup if the buffer is
not uptodate. With this fix I no longer segfault trying to do btrfs-image on a
file system with a log tree. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Allocate fs_info::super_copy dynamically of full BTRFS_SUPER_INFO_SIZE
and use it directly for saving superblock to disk.
This fixes incorrect superblock checksum after mkfs.
Signed-off-by: David Sterba <dsterba@suse.cz>
It seems highly unlikely that posix_fadvise could fail,
and even if it does, it was only advisory. Still, if
it does, we could issue a notice to the user.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Free the memory allocated to "multi" before the error
exit in read_whole_eb(). Set it to NULL after we free
it in the loop to avoid any potential double-free.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Instead of doing a BUG_ON() if we fail to find the last fs root just return
an error so the callers can deal with it how they like. Also we need to
actually return an error if we can't find the latest root so that the error
handling works. With this btrfsck was able to deal with a file system that
was missing a root item but still had extents that referred back to the
root. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>