The search header is usually accessed in an unaligned way, we could
trigger errors (SIGBUS) on architectures that do not support that.
Signed-off-by: David Sterba <dsterba@suse.com>
Nodesize is used in kernel, the values are always equal. We have to keep
leafsize in headers, similarly the tree setting functions still take and
set leafsize, but it's effectively a no-op.
Signed-off-by: David Sterba <dsterba@suse.com>
Current open_ctree_fs_info() won't return anything if chunk tree root is
corrupted.
This makes some function, like btrfs-find-root, unable to find any older
chunk tree root, even it is possible to use system_chunk_array in super
block.
And at least two users in mail list has reported such heavily chunk
corruption.
Although we have 'btrfs rescue chunk-recovery' but it's too time
consuming and sometimes not able to cope with a specific filesystem
corruption.
This patch adds a new open ctree flag,
OPEN_CTREE_IGNORE_CHUNK_TREE_ERROR, allowing fs_info to be returned from
open_ctree_fs_info() even there is no valid tree root in it.
Also adds a new close_ctree() variant, close_ctree_fs_info() to handle
possible fs_info without any root.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ adjusted error messages ]
Signed-off-by: David Sterba <dsterba@suse.com>
The BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE bit is supposed to be in the
COMPAT_RO_SUPP bitmask.
Reported-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This reuses the existing code for checking the free space cache, we just
need to load the free space tree. While we do that, we check a couple of
invariants on the free space tree itself. This requires pulling in some
code from the kernel to exclude the super stripes.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
To start, let's tell btrfs-progs to read the free space root and how to
print the on-disk format of the free space tree. However, we're not
adding the FREE_SPACE_TREE read-only compat bit to the set of supported
bits because progs doesn't know how to keep the free space tree
consistent.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The sequence, transid and reserved fields of inode were writen to disk
with uninitizlized value, this patch fixes it.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
s/generation/sequence/
for BTRFS_SETGET_STACK_FUNCS(stack_inode_sequence, ...)
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As convert implement its own alloc extent, avoid such metadata problem
too.
Reported-by: Chris Murphy <lists@colorremedies.com>
Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently there is not way for a user to know what is the minimum size a
device of a btrfs filesystem can be resized to. Sometimes the value of
total allocated space (sum of all allocated chunks/device extents), which
can be parsed from 'btrfs filesystem show' and 'btrfs filesystem usage',
works as the minimum size, but sometimes it does not, namely when device
extents have to relocated to holes (unallocated space) within the new
size of the device (the total allocated space sum).
This change adds the ability to reliably compute such minimum value and
extents 'btrfs filesystem resize' with the following syntax to get such
value:
btrfs filesystem resize [devid:]get_min_size
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function will be used to free a empty chunk.
This provides the basis for later temp chunk cleanup.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Current btrfs only support CRC32 as checksum algorithm.
But in btrfs_csum_sizes array, we have an extra 0 at tail, causing
csum_type 1 can still be considered as supported csum type.
Fix it by removing the tailing 0.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This function is used to change fsid and chunk_tree_uuid of a node/leaf.
The function does it without transaction protection.
This is the basis of offline uuid change.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Now open_ctree will exit if it found the superblock is marked
CHANGING_FSID, except given IGNORE_FSID open ctree flags.
Kernel will do the same thing later.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[removed the chunk tree flag, reworded the error message]
Signed-off-by: David Sterba <dsterba@suse.cz>
Add the super flag to inform kernel not to mount a filesystem wich fsid
change is in progress.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[removed the chunk tree flag]
Signed-off-by: David Sterba <dsterba@suse.cz>
We have this check in the kernel but not in userspace, which makes fsck
fail when we wouldn't have a problem in the kernel. This was meant to
catch this case because it really isn't good, unfortunately it will
require a design change to fix in the kernel so in the meantime add this
check so we can be sure our tests only catch real problems. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This provides the basis for later qgroup related changes.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Ctree.h of btrfs-progs contains wrong flags for btrfs_qgroup_status.
Update it with the one in kernel.
Also, introduce the inline function btrfs_qgroup_(level/subvid) to get
the level/subvolid of qgroup, to replace the old open-coded bit
operations.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Add new open ctree flag OPEN_CTREE_SUPPRESS_CHECK_BLOCK_ERRORS to
suppress tree block csum error output.
Provides the basis for new btrfs-find-root and other enhancement on
btrfs offline tools output.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[renamed vars and funcs, added comments]
Signed-off-by: David Sterba <dsterba@suse.cz>
We hold a transaction open for the entirety of fixing extent refs. This works
out ok most of the time but we can be tight on space and run out of space when
fixing things. To get around this just push down the transaction starting dance
into the functions that actually fix things. This keeps us from ending up with
ENOSPC because we pinned everything and allows the code to be a bit simpler.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
The METADUMP super flag makes us skip doing the chunk tree reading which isn't
helpful for the new restore since we have a valid chunk tree. But we still want
to have a way for the kernel to know that this is a metadump restore so it
doesn't do things like verify data checksums. We also want to skip some of the
device extent checks in fsck since those will obviously not match. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Add btrfs_get_extent() and btrfs_punch_hole() for btrfs-progs.
Btrfs_get_extent() will act much like kernel one, return the first
extent that covers the given range.
The difference will be that progs btrfs_get_extent() can't handle
no-holes feature, which means caller should handle it carefully.
Btrfs_punch_hole() will punch a hole in given range of given inode,
however it differs from kernel one since it won't zero any page or drop
any extents if there is any extent in the hole range.
These functions are mainly used for later I_ERR_FILE_EXTENT_DISCOUNT
repair function.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Before this patch, when a extent's data ref points to a invalid key in
fs tree, this happens if a leaf/node of fs tree is corrupted, btrfsck
can't do any repair and just exit.
In fact, such problem can be handled in fs tree repair routines, rebuild
the inode item(if missing) and add back the extent data (with some
assumption).
So this patch records such data extent refs for later fs tree recovery
routine.
TODO:
Restore orphan data extent refs into btrfs_root is not the best
method. It's best to directly restore it into inode_record, however
current extent tree and fs tree can't cooperate together, so use
btrfs_root as a temporary storage until inode_cache is built.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Add a basic inode item rebuild function for I_ERR_NO_INODE_ITEM.
The main use case is to repair btrfs which fs root has corrupted leaf,
but it is already working for case if the corrupteed fs root leaf/node
contains no inode extent_data.
The repair needs 3 elements for inode rebuild:
1. inode number
This is quite easy, existing inode_record codes will detect it quite
well.
2. inode type
This is the trick part. The only reliable method is to recovery it from
parent's dir_index/item.
The remaining method will search for regular file extent for FILE
type or child's backref for DIR(todo).
Fallback will be FILE.
Inode name(inode_ref) will be recoverd by nlink repair function.
This is just a fundamental implement, some advanced recovery can be
improved later with btrfs-progs infrastructure change.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
With the previous btrfs inode operations patches, now we can use
btrfs_mkdir() to create the 'lost+found' dir to do some data salvage in
btrfsck.
This patch along with previous ones will make data salvage easier.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Add btrfs_unlink() and btrfs_add_link() functions in inode.c,
for the incoming btrfs_mkdir() and later inode operations functions.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Import lookup/del_inode_ref() function in inode-item.c, as base functions
for the incoming btrfs_add_link() and btrfs_unlink() functions.
Also modify btrfs_insert_inode_ref() and split_leaf() making them able
to deal with EXTENT_IREF incompat flag.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Import btrfs_insert/del/lookup_extref() functions form kernel for the
incoming btrfs_add_link() and btrfs_unlink() functions.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Enhance the command "btrfs filesystem df" to show space usage information
for a mount point(s). It shows also an estimation of the space available,
on the basis of the current one used.
Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
[code moved under #if 0 instead of deletion]
Signed-off-by: David Sterba <dsterba@suse.cz>
We may run across dir indexes that are corrupt in such a way that it makes them
useless, such as having a bad location key or a bad name. In this case we can
just delete dir indexes that don't show up properly and then re-create what we
need. When we delete dir indexes however we need to restart scanning the fs
tree as we could have greated bogus inode recs if the location key was bad, so
set it up so that if we had to delete an dir index we go ahead and free up our
inode recs and return -EAGAIN to check_fs_roots so it knows to restart the loop.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This patch pulls back backref.c, adds a couple of helpers everywhere that it
needs, and cleans up backref.c to fit in btrfs-progs. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
[removed free_some_buffers after "do not reclaim extent buffer"]
Signed-off-by: David Sterba <dsterba@suse.cz>
Kernels >= 3.15 export the global block reserve as a space info presented
by 'btrfs fi df' but would display 'unknown' instead of some meaningful
string.
Signed-off-by: David Sterba <dsterba@suse.cz>
The three flags of @btrfs_path:
btrfs_path {
unsigned int keep_locks:1;
unsigned int skip_locking:1;
unsigned int leave_spinning:1;
}
have little meaning, because the userspace @btrfs_search_slot()
is free of locking and no other routines will decide their behavior
on these. So just remove them.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The chunk-recover.c/BTRFS_NUM_MIRRORS in the userspace means
the same thing as ctree.h/BTRFS_MAX_MIRRORS in the kernelspace,
so to stay consistent with the kernelspace, just make this movement
in the userspace:
chunk-recover.c/BTRFS_NUM_MIRRORS
===>
ctree.h/BTRFS_MAX_MIRRORS
This provides convenience for future use.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This patch adds functionality (in qgroup-verify.c) to compute bytecounts in
subvolume quota groups. The original groups are read in and stored in memory
so that after we compute our own bytecounts, we can compare them with those
on disk. A print function is provided to do this comparison and show the
results on the console.
A 'qgroup check' pass is added to btrfsck. If any subvolume quota groups
differ from what we compute, the differences for them are printed. We also
provide an option '--qgroup-report' which will run only the quota check code
and print a report on all quota groups. Other than making it possible to
verify that our qgroup changes work correctly, this mode can also be used in
xfstests for automated checking after qgroup tests.
This patch does not address the following:
- compressed counts are identical to non compressed, because kernel doesn't
make the distinction yet. Adding the code to verify compressed counts
shouldn't be hard at all though once kernel can do this.
- It is only concerned with subvolume quota groups (like most of
btrfs-progs).
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
The following kernel commit changed the definition of the inline function
btrfs_file_extent_inline_len():
commit 514ac8ad8793a097c0c9d89202c642479d6dfa34
Author: Chris Mason <clm@fb.com>
Date: Fri Jan 3 21:07:00 2014 -0800
Btrfs: don't use ram_bytes for uncompressed inline items
If we truncate an uncompressed inline item, ram_bytes isn't updated to reflect
the new size. The fixe uses the size directly from the item header when
reading uncompressed inlines, and also fixes truncate to update the
size as it goes.
Not having this new definition implies that the restore tool might misbehave when
restoring files with an inline extent that got truncated on a kernel older than
release 3.14.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This adds the flag to ctree.h, adds the feature option to mkfs to turn it on and
fixes fsck so it doesn't complain about missing hole extents in files when this
flag is set.
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
A user had a fs where the objectid of an orphan item was not the actual orphan
item objectid. This screwed up fsck because the block has keys in the wrong
order, also the fs scanning stuff will freak out because we have an inode with
nlink 0 and no orphan item. So this patch is pretty big but is all related.
1) Deal with bad key ordering. We can easily fix this up, so fix the checking
stuff to tell us exactly what it found when it said there was a problem. Then
if it's bad key ordering we can reorder the keys and restart the scan.
2) Deal with bad keys. If we find an orphan item with the wrong objectid it's
likely to screw with stuff, so keep track of these sort of things with a
bad_item list and just run through and delete any objects that don't make sense.
So far we just do this for orphan items but we could extend this as new stuff
pops up.
3) Deal with missing orphan items. This is easy, if we have a file with i_nlink
set to 0 and no orphan item we can just add an orphan item.
4) Add the infrastructure to corrupt actual key values. Needed this to create a
test image to verify I was fixing things properly.
This patch fixes the corrupt image I'm adding and passes the other make test
tests. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
When reading block groups we will searching it's corresponding chunk, however, at this
time, some chunks has not been built(data chunks raid0/raid10/raid56), don't bug_on here,
we will try to rebuild these chunks later.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Internally, btrfs_header_chunk_tree_uuid() calculates an unsigned
long, but casts it to a pointer, while all callers cast it to unsigned
long again.
From btrfs commit b308bc2f05a86e728bd035e21a4974acd05f4d1e
Signed-off-by: Ross Kirk <ross.kirk@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
mkfs -r wasn't creating chunks properly, making it very difficult to
allocate space for anything except tiny filesystems.
This changes it around to use more of the generic infrastructure, and
to do actual logical->physical block number translation.
It also allocates space to the files in smaller extents (max 1MB), which
keeps the allocator from trying to allocate an extent bigger than a
single chunk.
It doesn't quite support multi-device mkfs -r yet, but is much closer.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
The current show_qgroups() just shows a little information, and it is hard to
add some functions which the users need in the future, so i restructure it, make
it easy to add new functions.
In order to improve the scalability of show_qgroups(), i add some important
structures:
struct qgroup_lookup {
struct rb_root root;
}
/*
*store qgroup's information
*/
struct btrfs_qgroup {
struct rb_node rb_node;
u64 qgroupid;
u64 generation;
u64 rfer;
u64 rfer_cmpr;
u64 excl_cmpr;
u64 flags;
u64 max_rfer;
u64 max_excl;
u64 rsv_rfer;
u64 rsv_excl;
struct list_head qgroups;
struct list_head members;
}
/*
*glue structure to represent the relations
*between qgroups
*/
struct btrfs_qgroup_list {
struct list_head next_qgroups;
struct list_head next_member;
struct btrfs_qgroup *qgroup;
struct btrfs_qgroup *member;
}
The above 3 structures are used to manage all the information
of qgroups.
struct {
char *name;
char *column_name;
int need_print;
} btrfs_qgroup_columns[]
We define a arrary to manage all the columns that can be
outputed, and use a member variant(->need_print) to control
the output of the relative column. Some columns are outputed
by default. But we can change it according to the requirement
of the users.
For example:
if outputing max referenced size of qgroup is needed,the function
'btrfs_qgroup_setup_column()' will be called, and the parameter 'BTRFS_QGROUP_MAX_RFER'
(extend in the future) will be passsed to the function. After the function is done,
when showing qgroups, max referenced size of qgroup will be output.
Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
A user was reporting an issue with bad transid errors on his blocks. The thing
is that btrfs-progs will ignore transid failures for things like restore and
fsck so we can do a best effort to fix a users file system. So fsck can put
together a coherent view of the file system with stale blocks. So if everything
else is ok in the mind of fsck then we can recow these blocks to fix the
generation and the user can get their file system back. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This is a prepatory work for the btrfs fi show command
fixes. So that we have a function get_df to get the fs sizes
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Internally, btrfs_header_fsid() calculates an unsigned long, but casts
it to a pointer, while all callers cast it to unsigned long again.
Committed to btrfs as fba6aa75654394fccf2530041e9451414c28084f
Fix line length issues and match changes to kernelspace
Signed-off-by: Ross Kirk <ross.kirk@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>