A user had a fs where the objectid of an orphan item was not the actual orphan
item objectid. This screwed up fsck because the block has keys in the wrong
order, also the fs scanning stuff will freak out because we have an inode with
nlink 0 and no orphan item. So this patch is pretty big but is all related.
1) Deal with bad key ordering. We can easily fix this up, so fix the checking
stuff to tell us exactly what it found when it said there was a problem. Then
if it's bad key ordering we can reorder the keys and restart the scan.
2) Deal with bad keys. If we find an orphan item with the wrong objectid it's
likely to screw with stuff, so keep track of these sort of things with a
bad_item list and just run through and delete any objects that don't make sense.
So far we just do this for orphan items but we could extend this as new stuff
pops up.
3) Deal with missing orphan items. This is easy, if we have a file with i_nlink
set to 0 and no orphan item we can just add an orphan item.
4) Add the infrastructure to corrupt actual key values. Needed this to create a
test image to verify I was fixing things properly.
This patch fixes the corrupt image I'm adding and passes the other make test
tests. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
When we re-init the extent root we make it completely empty, so when we reset a
pending balance we will fail to find refs for any blocks we may cow, which will
result in errors and we will exit out. We need to reset the balance first so
the normal cow stuff doesn't freak out and then we can re-init the extent tree.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Previously, open_file_or_dir() will open block device successfully, however,
we should enhance such checks to make sure we are really opening a file or dir.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
The error msg:
"ERROR: defrag range ioctl not supported in this kernel,
please try without any options."
should only show up when failing to do a range defraging,
not upon non-range defraging.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
btrfstune operates on umounted devices <device>,
not mount points <mnt>. fix it.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Steps to reproduce:
# mkfs.btrfs -f /dev/sda
# mount /dev/sda /mnt
# btrfs subvolume create /mnt/foo
# umount /mnt
# mount -o subvol=foo /dev/sda /mnt
# btrfs sub snapshot -r /mnt /mnt/snap
# btrfs send /mnt/snap > /dev/null
We will fail to send '/mnt/snap',this is because btrfs send try to
open '/mnt/snap' by btrfs internal subvolume path 'foo/snap' rather
than relative path based on mounted point, this will return us 'no
such file or directory',this is not right, fix it.
Reported-by: Thomas Scheiblauer <tom@sharkbay.at>
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
I sometimes get segfault in cmd_scrub_status(), this is because
free_history() forgot to check whether pointer address is valid,fix it.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
I hit a problem that i can not start scrub when i am trying to track
superblock generation mismatch problems.
The fact is that we are trying to check whether we have started a scrub operation
in userspace, this will make us can't start scrub if that record file is damaged
itself. By adding a option to skip that check, everything will be fine.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
btrfsck reports backref error after running init-csum-tree
btrfsck --init-csum-tree /dev/sdc
btrfsck /dev/sdc
::
ref mismatch on [29474816 16384] extent item 1, found 0
Backref 29474816 root 7 not referenced back 0x1101d30
Incorrect global backref count on 29474816 found 1 wanted 0
backpointer mismatch on [29474816 16384]
owner ref check failed [29474816 16384]
Errors found in extent allocation tree or chunk allocation
::
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Originally, multi devices are scanned one by one;
Now, one thread is used per device to scan.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Decide the raid0/5/6 data stripes' order using checksums.
For one chunk, fetch each 64k logical stripe
1. search its checksum in the csum tree
2. read the physical stripe data on each device
3. calc the data checksums
4. if one checksum matches the value from the csum tree,
then the logical stripe resides in that device,
the stripe order index can be calculated.
5. if more than one checksums match,
then use the successive csum in the tree to compare again.
6. if equal stripes are encountered, just fetch next stripe.
7. if some devices' order are still not decided, then they
can not be recovered.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
If no chunks need to be recovered, skip the recover works,
meanwhile the user won't be annoyed by the "ask_user".
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
When reading block groups we will searching it's corresponding chunk, however, at this
time, some chunks has not been built(data chunks raid0/raid10/raid56), don't bug_on here,
we will try to rebuild these chunks later.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
getmntent should be used in context of *mntent functions, though
fopen/fclose works.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
btrfs_scan_kernel() does a getmntent() but never releases the
filedescriptor it gets back from that.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64711
Reported-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Now we set @refs to 2 on creating a new extent buffer, meanwhile we
allocate the needed free space, but we don't give enough free_extent_buffer()
to reduce the eb's references to zero so that the eb can finally be freed,
so the problem is we has decrease the referene count of backrefs to zero, which
ends up releasing the space occupied by the eb, and this space can be allocated
again for something else(another eb or disk), usually a crash(core dump) will
occur, I've hit a crash in rb_insert() because another eb re-use the space while
the original one is floating around.
We should do the same thing as the kernel code does, it's necessary to initialize
@refs to 1 instead of 2, this helps us get rid of the above problem.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
When allocating chunk root node, we should use nodesize rather than sectorsize,
this will casue regression when making other nodesize choice.(for example 16k size now)
Reported-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Internally, btrfs_header_chunk_tree_uuid() calculates an unsigned
long, but casts it to a pointer, while all callers cast it to unsigned
long again.
From btrfs commit b308bc2f05a86e728bd035e21a4974acd05f4d1e
Signed-off-by: Ross Kirk <ross.kirk@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
we use 37 as the allocation size to hold the uuid_unparse, here
it defines BTRFS_UUID_UNPARSE_SIZE for the same.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
get_label prints the label at the moment. Change this so that
the label is returned and printing is done by the caller.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Steps to reproduce:
# mkfs.btrfs -f <dev>
# mount <dev> <mnt>
# mkdir <mnt>/backup
# btrfs sub create <mnt>/subv
# btrfs sub snapshot -r <mnt>/subv <mnt>/snap1
# btrfs sub snapshot -r <mnt>/subv <mnt>/snap2
# btrfs send <mnt>/snap2 -p <mnt>/snap1 -f sent_file
# btrfs receive -f sent_file <mnt>/backup
Above steps will make btrfs receive fails with "ERROR: can not find
parent subvolume", this is because we try to find parent subvolume by
RECEIVED_SUBVOL_KEY,and it will return ENOENT if parent snapshot has not
been sent or it has been deleted. Actually, we can try harder to find
whether parent subvolume exists by searching uuid key.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
When creating a fs on a loop device, mkfs checks whether the same file
is not already mounted, but a backing file of another loop dev does not
exist, mkfs fails. This fixes a bug during openSUSE installation.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
As we know, a new fs doesn't have space cache, so we set the cache generation
of the super block to be -1ULL, it is not equal to the fs generation. But the
check program didn't consider this case, and output the following message
cache and super generation don't match, space cache will be invalidated
directly, it would be baffling the users. So we should avoid outputing such
message. This patch fixes this problem.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Unfortunately you can't run --init-extent-tree if you can't actually read the
extent root. Fix this by allowing partial starts with no extent root and then
have fsck only check to see if the extent root is uptodate _after_ the check to
see if we are init'ing the extent tree. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
with design revamp around filesystem show the fsid filter
by label wasn't planned. but apparently that seemed to be
necessary. this patch will fix it.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Commit "btrfs-progs: separate command and implementation of
chunk-recover code" moved contents of this file to chunk-recover.c but
failed to remove the file cmds-chunk.c
Reported-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Exec btrfsck on btrfs with snapshots that are under a dropping
progress will cause prompt on "ref mismatch".
However we do not want this kind of prompt, since an remount
operation will continue the dropping progress.
Here the prompt is nonsense.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
If a given filesystem is mounted more than once, btrfs fi show will
print dups. This adds a quick and dirty hash table of fsids it
has already printed and makes sure we don't print any fsid more than
once.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This fixes the regression introduced with the patch
btrfs-progs: avoid write to the disk before sure to create fs
what happened with this patch is it missed the check to see if the
user has the option set before pushing the defaults.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
The feature has been introduced in kernel 3.7 and enabling it by
default is desired.
All features enabled by default are marked as such in
'mkfs.btrfs -O list-all' output.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
A way of disabling features that are on by default in case it's not
wanted, eg. due to lack of support in the used kernel.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This fixes static compile target of btrfs-progs.
Signed-off-by: Emil Karlson <jekarlson@gmail.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
16KB is faster and leads to less metadata fragmentation in almost all
workloads. It does slightly increase lock contention on the root nodes
in some workloads, but that is best dealt with by adding more subvolumes
(for now).
This uses 16KB or the page size, whichever is bigger. If you're doing a
mixed block group mkfs, it uses the sectorsize instead.
Since the kernel refuses to mount a mixed block group FS where the
metadata leaf size doesn't match the data sectorsize, this also adds a
similar check during mkfs.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
We intentionally fall through these case statements;
just annotate it to be clear.
Resolves-Coverity-CID: 1054884
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
We intentionally fall through these case statements;
just annotate it to be clear.
Resolves-Coverity-CID: 1054887
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Even if it's "definitely" btrfs at this point,
btrfs_scan_one_device could fail for other reasons.
Check the return value, warn if it fails, and skip
the device register.
Resolves-Coverity-CID: 1125925
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
e0a04278 removed a bunch of dead code but left one little
bit; reinit is always 0, so btrfs_read_block_groups is
never called from here.
Resolves-Coverity-CID: 1125926
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
get_df returns a negative error number, but then
we pass it to strerror, which wants a positive value...
Resolves-Coverity-CID: 1125929
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
open can fail, of course.
Resolves-Coverity-CID: 1125925
Resolves-Coverity-CID: 1125930
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>