extent-tree.c: In function 'btrfs_free_block_groups':
extent-tree.c:3190:12: warning: cast to pointer from integer of
different size [-Wint-to-pointer-cast]
Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Extend mkfs options to specify optional or potentially backwards
incompatible features.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Check whether any involved device is already busy running a
scrub. This would cause damaged status messages and the state
"aborted" without the explanation that a scrub was already
running. Therefore check it first, prevent it and give some
feedback to the user if scrub is already running.
Note that if scrub is started with a block device as the
parameter, only that particular block device is checked. It
is a normal mode of operation to start scrub on multiple
single devices, there is no reason to prevent this.
Here is an example:
/mnt2 is the mountpoint of a filesystem.
/dev/sdk and /dev/sdl are the block devices for that filesystem.
case 1:
btrfs scrub start /mnt2
btrfs scrub start /mnt2
-> complain
case 1:
btrfs scrub start /dev/sdk
btrfs scrub start /dev/sdk
-> complain
case 3:
btrfs scrub start /dev/sdk
btrfs scrub start /dev/sdl
-> don't complain
case 4:
btrfs scrub start /dev/sdk
btrfs scrub start /mnt2
-> complain
case 5:
btrfs scrub start /mnt2
btrfs scrub start /dev/sdk
-> complain if the scrub on /dev/sdk is still running.
-> don't complain if the scrub on /dev/sdk is finished, the
status messages will be fine.
Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Current way of specifying the path to match is not very comfortable, but
the feature itself is very useful. Let's save the short option -m for a
more user friendly syntax and keep a long option --path-regex with the
current syntax.
CC: Peter Stuge <peter@stuge.se>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
We were unconditionally executing our regular expression, even though we may not
have one, so check to make sure mreg is not null before calling regexec.
Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
The option -m is used to specify the regex string. -c is used to
specify case insensitive matching. -i was already taken.
In order to restore only a single folder somewhere in the btrfs
tree, it is unfortunately neccessary to construct a slightly
nontrivial regex, e.g.:
restore -m '^/(|home(|/username(|/Desktop(|/.*))))$' /dev/sdb2 /output
This is needed in order to match each directory along the way to the
Desktop directory, as well as all contents below the Desktop directory.
Signed-off-by: Peter Stuge <peter@stuge.se>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Pass up return value of walk_down_tree, so the caller can handle it.
This also fixes a segfault when read_tree_block fails with NULL returned.
Signed-off-by: Lin Ming <mlin@kernel.org>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
filesystem show was missing in SYNOPSIS section.
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This commit adds a command line option to enable sending streams
which make use of the new end-cmd semantic if multiple snapshots are
sent back-to-back. The goal is to use the <end cmd> as an indication
to stop reading the input stream. So far, the receiver could only
use EOF to recognize the end.
If the new command line option '-e' is set, this commit requires a
kernel which is able to support the new flags in the send ioctl. New
bits in the flags of the send ioctl will be set which cause EINVAL
on old kernels. However, if the option '-e' is not set, it works
with old and new kernels without any errors or any changed behavior.
This used to be the encoding (with 2 snapshots in this example):
<stream header> + <sequence of commands> + <end cmd> +
<stream header> + <sequence of commands> + <end cmd> + EOF
The new format (if the two new flags are used) is this one:
<stream header> + <sequence of commands> +
<sequence of commands> + <end cmd>
Note that the currently existing receivers treat <end cmd> only as
an indication that a new <stream header> is following. This means,
you can just skip the sequence <end cmd> <stream header> without
loosing compatibility. As long as an EOF is following, the currently
existing receivers handle the new format (if the two new flags are
used) exactly as the old one.
Also note that the kernel interface was changed in a way that is
backward compatible to old btrfs-progs tools. You set one or two bits
in the flags field of the ioctl to enable the new behavior. Old tools
set these flags to zero, thus getting exactly the same as they got
with older kernels. And this is exactly what happens if the new '-e'
option is not set, the new bits in the flags are not set and thus
old kernels and new kernels are both supported.
So what is the benefit of this change? The goal is to be able to use
a single stream (one TCP connection) to multiplex a request/response
handshake plus Btrfs send streams, all in the same stream. In this
case you cannot evaluate an EOF condition as an end of the Btrfs send
stream. You need something else, and the <end cmd> is just perfect
for this purpose.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
btrfsck gets hardlinked to btrfs during the build, but the
install phase simply copies them both to the destination without
preserving the link.
Just force-link btrfsck in the destination again during install
so that the installed btrfsck is a link as well.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
btrfs_init_path was initially used when the path objects were on the
stack. Now all the work is done by btrfs_alloc_path and btrfs_init_path
isn't required.
This patch removes it, and just uses kmem_cache_zalloc to zero out the object.
[Eric Sandeen: port kernel commit e00f730 to userspace]
(Note, the rest of userspace has an on-stack path, so the actual
function remains for now).
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
The code path should not reach there. Remove it.
[Eric Sandeen: port kernel commit 3fed40c to userspace]
Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
extent_ref_type() contains inconsequential differences between
kernelspace and userspace, and has since the initial commits
to each. Just make userspace look like kernelspace.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
div_factor has been implemented for two times, cleanup it.
And I move them into a independent file named math.h because they are
common math functions.
[Eric Sandeen: port kernel commit 3fed40c to userspace]
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Remove some commented-out & #if 0'd code:
* close_blocks()
* btrfs_drop_snapshot()
* btrfs_realloc_node()
* btrfs_find_dead_roots()
There are still some #if 0'd functions in there, but I'm hedging
on those for now, they have been copied to cmds-check.c and I want
to see if they can be brough back into ctree.c eventually.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
cmds-check.c contains the only caller of btrfs_fsck_reinit_root;
moving it to the caller's source file gets ctree.c a little
closer to kernelspace, although it does require exporting
add_root_to_dirty_list(), which is not done in kernelspace.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Otherwise we can execced the array bound of path->slots[].
[Eric Sandeen: port kernel commit a05a9bb to userspace]
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Files with only #include directives are boring. :)
This is just a leftover after the move to the btrfs tool.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
check_extent_refs is pinning down all the corrupt tree blocks it finds,
but it is incorrectly casting these to an extent_record first.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
If a device could not be opened in volumes.c:read_one_dev(), a
btrfs_device instance was allocated and added to the list of
devices of the fs - however this device instance had its fd,
name and label fields not initialized. This is problematic in
disk-io.c:close_all_devices() as it tried to sync, fadvise and
close the (invalid) fd of the device, and kfree() its name and
label, which pointed to random memory locations.
Thread 1 (Thread 0x7f0a3d2d1740 (LWP 23585)):
#0 __GI___libc_free (mem=0xa5a5a5a5a5a5a5a5) at malloc.c:2970
#1 0x000000000042054b in close_all_devices (fs_info=0x1e92bf0) at disk-io.c:1276
#2 0x0000000000421dcd in close_ctree (root=<optimized out>) at disk-io.c:1336
#3 0x0000000000418cfa in cmd_check (argc=<optimized out>, argv=<optimized out>) at cmds-check.c:4171
#4 0x0000000000403ed4 in main (argc=2, argv=0x7fff9a583d28) at btrfs.c:295
v2: Added Liu Bo's review mention.
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This adds a 'btrfs-image -m' option, which let us restore an image that
is built from a btrfs of multiple disks onto several disks altogether.
This aims to address the following case,
$ mkfs.btrfs -m raid0 sda sdb
$ btrfs-image sda image.file
$ btrfs-image -r image.file sdc
---------
so we can only restore metadata onto sdc, and another thing is we can
only mount sdc with degraded mode as we don't provide informations of
another disk. And, it's built as RAID0 and we have only one disk,
so after mount sdc we'll get into readonly mode.
This is just annoying for people(like me) who're trying to restore image
but turn to find they cannot make it work.
So this'll make your life easier, just tap
$ btrfs-image -m image.file sdc sdd
---------
then you get everything about metadata done, the same offset with that of
the originals(of course, you need offer enough disk size, at least the disk
size of the original disks).
Besides, this also works with raid5 and raid6 metadata image.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Otherwise we will access illegal addresses while searching on fs_uuid list.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
A device can be added to the device list without getting a name, so we may
access to illegal addresses while opening devices with their name.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
As for skinny metadata, key.offset stores levels rather than extent length.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
According to the bytenr of the extent buffer record, we can calculate the index
of the stripes, and we also know which device and where we read out the extent
buffer record, that means we can know the relationship between the device extent
and the stripes in the chunk, by this relationship, we can recover the raid0/radi10/
raid5/raid6 metadata chunk.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Add chunk rebuild for RAID1/SINGLE/DUP to chunk-recover command.
Before this patch chunk-recover can only scan and reuse the old chunk
data to recover. With this patch, chunk-recover can use the reference
between chunk/block group/dev extent to rebuild the whole chunk tree
even when old chunks are not available.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Add chunk-recover program to check or rebuild chunk tree when the system
chunk array or chunk tree is broken.
Due to the importance of the system chunk array and chunk tree, if one of
them is broken, the whole btrfs will be broken even other data are OK.
But we have some hint(fsid, checksum...) to salvage the old metadata.
So this function will first scan the whole file system and collect the
needed data(chunk/block group/dev extent), and check for the references
between them. If the references are OK, the chunk tree can be rebuilt and
luckily the file system will be mountable.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This patch adds the function to check correspondence between block group,
chunk and device extent.
Original-signed-off-by: Cheng Yang <chenyang.fnst@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
As we know, btrfs can manage several devices in the same fs, so [offset, size]
is not sufficient for unique identification of an device extent, we need the
device id to identify the device extents which have the same offset and size,
but are not in the same device. So, we added a member variant named objectid
into the extent cache, and introduced some functions to make the extent cache
be suitable to manage the device extent.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Because the fs/file roots are not extents, so it is better to use rb-tree
to manage them. Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
In fact, the code of many rb-tree insert/search/delete functions is similar,
so we can abstract them, and implement common functions for rb-tree, and then
simplify them.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Some commands(such as btrfs-convert) access the devices again after we close
the ctree, so it is better that we don't free the devices objects when the ctree
is closed, or we need re-allocate the memory for the devices. We needn't worry
the memory leak problem, because all the memory will be freed after the taskes
die.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
As we know, the file descriptor 0 is a special number, so we shouldn't
use it to initialize the file descriptor of the devices, or we might
close this special file descriptor by mistake when we close the devices.
"-1" is a better choice.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
When making btrfs filesystem. we firstly write root leaf to
specified filed, and then we recow the root. If we don't recow,
some trees are not in the correct block group.
Steps to reproduce:
dd if=/dev/zero of=test.img bs=1M count=100
mkfs.btrfs -f test.img
btrfs-debug-tree test.img
extent tree key (EXTENT_TREE ROOT_ITEM 0)
leaf 4210688 items 10 free space 3349 generation 4 owner 2
fs uuid 2e08fd93-f24d-4f44-a226-e2116fcd544f
chunk uuid dc482988-6246-46ce-9329-68bcf6d3683c
item 0 key (0 BLOCK_GROUP_ITEM 4194304) itemoff 3971 itemsize 24
block group used 12288 chunk_objectid 256 flags 2
[..snip..]
item 3 key (1138688 EXTENT_ITEM 4096) itemoff 3827 itemsize 42
extent refs 1 gen 1 flags 2
tree block key (0 UNKNOWN.0 0) level 0
item 4 key (1138688 TREE_BLOCK_REF 7) itemoff 3827 itemsize 0
tree block backref
[..snip..]
checksum tree key (CSUM_TREE ROOT_ITEM 0)
leaf 1138688 items 0 free space 3995 generation 1 owner 7
fs uuid 2e08fd93-f24d-4f44-a226-e2116fcd544f
chunk uuid dc482988-6246-46ce-9329-68bcf6d3683c
For the above example, csum root leaf comes into system block group which
is wrong,csum root leaf should be in metadata block group.
Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
I noticed that I was getting these errors on a bigger file system with more
snapshots that had been removed. This check is bogus since we won't inc
rec->found_ref if we don't find a REF_KEY _and_ a DIR_ITEM, so we only have to
worry about there being no references to a root if it actually has a root item.
If it doesn't then it's just referenced by things that will go no where anyway.
With this patch fsck no longer incorrectly complains about this file system
image I have. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
A user reported that fsck was complaining about unresolved refs for some
snapshots. You can reproduce this by doing
mkfs.btrfs /dev/sdb
mount /dev/sdb /mnt
btrfs subvol snap /mnt/ /mnt/a
btrfs subvol snap /mnt/ /mnt/b
btrfs subvol del /mnt/a
umount /mnt
btrfsck /dev/sdb
and you'd get this
unresolved ref root 258 dir 256 index 2 namelen 1 name a error 600
because snapshot b has a dir item that points to a. Except we encode in our
root ref the dirid of the ref holder, and if it doesn't match we just give it
back a empty directory since we can't hardlink directories. This makes the
check in btrfsck bogus, when we delete a we remove the ref key for it so any
lookups into /mnt/b/a will just give a blank directory as it's supposed to. Fix
this by only saying the backref is reachable if there is both a DIR_ITEM and a
REF_KEY for the given root. With this patch I no longer see errors when running
this reproducer. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
There is a problem where if we find a backref extent record first that doesn't
match a extent item we will delete some of the duplicates but not others. In
order to deal with this we need to make sure we only pay attention to duplicates
that actually have duplicate extent items. If a extent_rec has a duplicate but
the record itself doesn't have an associated extent item we promote the
duplicate to the extent record and just discard the original extent_rec since it
was just added by the backref. We copy the backref onto the promoted extent
record and then continue processing. This allowed me to fix a file system that
previously was not able to be fixed by fsck. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
The allocator looks for these hints when moving on to another block group which
will make it reset which block group it looks at, when we've already searched
that block group and didn't find any space to allocate, we need to fix this by
just letting the allocator make the determination if the block group is good
enough. This also fixes a problem where if we couldn't find space in the block
group we were given we'd just error out instead of moving on to the next block
group. Previously I couldn't fix some file systems that were relatively full,
but with this patch I can now run fsck on them with no allocation errors.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This fixes two bugs with the free space cache checker. First is we apparently
always use root->sectorsize for our unit in the kernel so we have to do that in
progs otherwise bitmaps turn out to not look right if we have leafsize !=
sectorsize. The second is a small issue if we had skinny metadata extents set,
we wouldn't advance last properly because we unconditionally use key.offset
instead of root->leafsize. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
In some cases the extent tree can just be so gone there is no point in trying to
figure out how to put it back together. So add a --init-extent-tree mode which
will zero out the extent tree and then re-add extents for all of the blocks we
find. This will also undo any balance that was going on at the time of the
crash, this is needed because the reloc tree seems to confuse fsck at the
moment. With this patch I can put back together a users file system that was
completely gone. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Left out a newline in the generation check printf.
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
I noticed a slight problem with btrfs-image, since it was building a chunk tree
by setting the physical offset of the stripes to the same as the logical offset
it created this problem where the super block was now mapped into the file
system differently than it was before. This isn't a huge deal except that we
also carry along the free space cache with us, which is setup with the idea that
super at physical X is at logical Y. So this would make the free space checker
in fsck freak out because it would see that the cache says that the super block
is free space, and that the area where it thought the super block was located is
in fact used. In the mount case we'd end up overwriting real metadata with
backup super blocks. So we need to maintain the physical offsets in our
stripes. This is a huge pain because we store the logical bytenrs of all of our
metadata. This patch scans the entire image looking for chunk tree blocks and
builds an in memory chunk tree so we can write logical blocks to their physical
offsets. With this patch we no longer have the problems I described above.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>