Currently there is not way for a user to know what is the minimum size a
device of a btrfs filesystem can be resized to. Sometimes the value of
total allocated space (sum of all allocated chunks/device extents), which
can be parsed from 'btrfs filesystem show' and 'btrfs filesystem usage',
works as the minimum size, but sometimes it does not, namely when device
extents have to relocated to holes (unallocated space) within the new
size of the device (the total allocated space sum).
This change adds the ability to reliably compute such minimum value and
extents 'btrfs filesystem resize' with the following syntax to get such
value:
btrfs filesystem resize [devid:]get_min_size
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function will be used to free a empty chunk.
This provides the basis for later temp chunk cleanup.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Current btrfs only support CRC32 as checksum algorithm.
But in btrfs_csum_sizes array, we have an extra 0 at tail, causing
csum_type 1 can still be considered as supported csum type.
Fix it by removing the tailing 0.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This function is used to change fsid and chunk_tree_uuid of a node/leaf.
The function does it without transaction protection.
This is the basis of offline uuid change.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Now open_ctree will exit if it found the superblock is marked
CHANGING_FSID, except given IGNORE_FSID open ctree flags.
Kernel will do the same thing later.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[removed the chunk tree flag, reworded the error message]
Signed-off-by: David Sterba <dsterba@suse.cz>
Add the super flag to inform kernel not to mount a filesystem wich fsid
change is in progress.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[removed the chunk tree flag]
Signed-off-by: David Sterba <dsterba@suse.cz>
We have this check in the kernel but not in userspace, which makes fsck
fail when we wouldn't have a problem in the kernel. This was meant to
catch this case because it really isn't good, unfortunately it will
require a design change to fix in the kernel so in the meantime add this
check so we can be sure our tests only catch real problems. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This provides the basis for later qgroup related changes.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Ctree.h of btrfs-progs contains wrong flags for btrfs_qgroup_status.
Update it with the one in kernel.
Also, introduce the inline function btrfs_qgroup_(level/subvid) to get
the level/subvolid of qgroup, to replace the old open-coded bit
operations.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Add new open ctree flag OPEN_CTREE_SUPPRESS_CHECK_BLOCK_ERRORS to
suppress tree block csum error output.
Provides the basis for new btrfs-find-root and other enhancement on
btrfs offline tools output.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[renamed vars and funcs, added comments]
Signed-off-by: David Sterba <dsterba@suse.cz>
We hold a transaction open for the entirety of fixing extent refs. This works
out ok most of the time but we can be tight on space and run out of space when
fixing things. To get around this just push down the transaction starting dance
into the functions that actually fix things. This keeps us from ending up with
ENOSPC because we pinned everything and allows the code to be a bit simpler.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
The METADUMP super flag makes us skip doing the chunk tree reading which isn't
helpful for the new restore since we have a valid chunk tree. But we still want
to have a way for the kernel to know that this is a metadump restore so it
doesn't do things like verify data checksums. We also want to skip some of the
device extent checks in fsck since those will obviously not match. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Add btrfs_get_extent() and btrfs_punch_hole() for btrfs-progs.
Btrfs_get_extent() will act much like kernel one, return the first
extent that covers the given range.
The difference will be that progs btrfs_get_extent() can't handle
no-holes feature, which means caller should handle it carefully.
Btrfs_punch_hole() will punch a hole in given range of given inode,
however it differs from kernel one since it won't zero any page or drop
any extents if there is any extent in the hole range.
These functions are mainly used for later I_ERR_FILE_EXTENT_DISCOUNT
repair function.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Before this patch, when a extent's data ref points to a invalid key in
fs tree, this happens if a leaf/node of fs tree is corrupted, btrfsck
can't do any repair and just exit.
In fact, such problem can be handled in fs tree repair routines, rebuild
the inode item(if missing) and add back the extent data (with some
assumption).
So this patch records such data extent refs for later fs tree recovery
routine.
TODO:
Restore orphan data extent refs into btrfs_root is not the best
method. It's best to directly restore it into inode_record, however
current extent tree and fs tree can't cooperate together, so use
btrfs_root as a temporary storage until inode_cache is built.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Add a basic inode item rebuild function for I_ERR_NO_INODE_ITEM.
The main use case is to repair btrfs which fs root has corrupted leaf,
but it is already working for case if the corrupteed fs root leaf/node
contains no inode extent_data.
The repair needs 3 elements for inode rebuild:
1. inode number
This is quite easy, existing inode_record codes will detect it quite
well.
2. inode type
This is the trick part. The only reliable method is to recovery it from
parent's dir_index/item.
The remaining method will search for regular file extent for FILE
type or child's backref for DIR(todo).
Fallback will be FILE.
Inode name(inode_ref) will be recoverd by nlink repair function.
This is just a fundamental implement, some advanced recovery can be
improved later with btrfs-progs infrastructure change.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
With the previous btrfs inode operations patches, now we can use
btrfs_mkdir() to create the 'lost+found' dir to do some data salvage in
btrfsck.
This patch along with previous ones will make data salvage easier.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Add btrfs_unlink() and btrfs_add_link() functions in inode.c,
for the incoming btrfs_mkdir() and later inode operations functions.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Import lookup/del_inode_ref() function in inode-item.c, as base functions
for the incoming btrfs_add_link() and btrfs_unlink() functions.
Also modify btrfs_insert_inode_ref() and split_leaf() making them able
to deal with EXTENT_IREF incompat flag.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Import btrfs_insert/del/lookup_extref() functions form kernel for the
incoming btrfs_add_link() and btrfs_unlink() functions.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Enhance the command "btrfs filesystem df" to show space usage information
for a mount point(s). It shows also an estimation of the space available,
on the basis of the current one used.
Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
[code moved under #if 0 instead of deletion]
Signed-off-by: David Sterba <dsterba@suse.cz>
We may run across dir indexes that are corrupt in such a way that it makes them
useless, such as having a bad location key or a bad name. In this case we can
just delete dir indexes that don't show up properly and then re-create what we
need. When we delete dir indexes however we need to restart scanning the fs
tree as we could have greated bogus inode recs if the location key was bad, so
set it up so that if we had to delete an dir index we go ahead and free up our
inode recs and return -EAGAIN to check_fs_roots so it knows to restart the loop.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This patch pulls back backref.c, adds a couple of helpers everywhere that it
needs, and cleans up backref.c to fit in btrfs-progs. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
[removed free_some_buffers after "do not reclaim extent buffer"]
Signed-off-by: David Sterba <dsterba@suse.cz>
Kernels >= 3.15 export the global block reserve as a space info presented
by 'btrfs fi df' but would display 'unknown' instead of some meaningful
string.
Signed-off-by: David Sterba <dsterba@suse.cz>
The three flags of @btrfs_path:
btrfs_path {
unsigned int keep_locks:1;
unsigned int skip_locking:1;
unsigned int leave_spinning:1;
}
have little meaning, because the userspace @btrfs_search_slot()
is free of locking and no other routines will decide their behavior
on these. So just remove them.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
The chunk-recover.c/BTRFS_NUM_MIRRORS in the userspace means
the same thing as ctree.h/BTRFS_MAX_MIRRORS in the kernelspace,
so to stay consistent with the kernelspace, just make this movement
in the userspace:
chunk-recover.c/BTRFS_NUM_MIRRORS
===>
ctree.h/BTRFS_MAX_MIRRORS
This provides convenience for future use.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This patch adds functionality (in qgroup-verify.c) to compute bytecounts in
subvolume quota groups. The original groups are read in and stored in memory
so that after we compute our own bytecounts, we can compare them with those
on disk. A print function is provided to do this comparison and show the
results on the console.
A 'qgroup check' pass is added to btrfsck. If any subvolume quota groups
differ from what we compute, the differences for them are printed. We also
provide an option '--qgroup-report' which will run only the quota check code
and print a report on all quota groups. Other than making it possible to
verify that our qgroup changes work correctly, this mode can also be used in
xfstests for automated checking after qgroup tests.
This patch does not address the following:
- compressed counts are identical to non compressed, because kernel doesn't
make the distinction yet. Adding the code to verify compressed counts
shouldn't be hard at all though once kernel can do this.
- It is only concerned with subvolume quota groups (like most of
btrfs-progs).
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
The following kernel commit changed the definition of the inline function
btrfs_file_extent_inline_len():
commit 514ac8ad8793a097c0c9d89202c642479d6dfa34
Author: Chris Mason <clm@fb.com>
Date: Fri Jan 3 21:07:00 2014 -0800
Btrfs: don't use ram_bytes for uncompressed inline items
If we truncate an uncompressed inline item, ram_bytes isn't updated to reflect
the new size. The fixe uses the size directly from the item header when
reading uncompressed inlines, and also fixes truncate to update the
size as it goes.
Not having this new definition implies that the restore tool might misbehave when
restoring files with an inline extent that got truncated on a kernel older than
release 3.14.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This adds the flag to ctree.h, adds the feature option to mkfs to turn it on and
fixes fsck so it doesn't complain about missing hole extents in files when this
flag is set.
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
A user had a fs where the objectid of an orphan item was not the actual orphan
item objectid. This screwed up fsck because the block has keys in the wrong
order, also the fs scanning stuff will freak out because we have an inode with
nlink 0 and no orphan item. So this patch is pretty big but is all related.
1) Deal with bad key ordering. We can easily fix this up, so fix the checking
stuff to tell us exactly what it found when it said there was a problem. Then
if it's bad key ordering we can reorder the keys and restart the scan.
2) Deal with bad keys. If we find an orphan item with the wrong objectid it's
likely to screw with stuff, so keep track of these sort of things with a
bad_item list and just run through and delete any objects that don't make sense.
So far we just do this for orphan items but we could extend this as new stuff
pops up.
3) Deal with missing orphan items. This is easy, if we have a file with i_nlink
set to 0 and no orphan item we can just add an orphan item.
4) Add the infrastructure to corrupt actual key values. Needed this to create a
test image to verify I was fixing things properly.
This patch fixes the corrupt image I'm adding and passes the other make test
tests. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
When reading block groups we will searching it's corresponding chunk, however, at this
time, some chunks has not been built(data chunks raid0/raid10/raid56), don't bug_on here,
we will try to rebuild these chunks later.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Internally, btrfs_header_chunk_tree_uuid() calculates an unsigned
long, but casts it to a pointer, while all callers cast it to unsigned
long again.
From btrfs commit b308bc2f05a86e728bd035e21a4974acd05f4d1e
Signed-off-by: Ross Kirk <ross.kirk@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
mkfs -r wasn't creating chunks properly, making it very difficult to
allocate space for anything except tiny filesystems.
This changes it around to use more of the generic infrastructure, and
to do actual logical->physical block number translation.
It also allocates space to the files in smaller extents (max 1MB), which
keeps the allocator from trying to allocate an extent bigger than a
single chunk.
It doesn't quite support multi-device mkfs -r yet, but is much closer.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
The current show_qgroups() just shows a little information, and it is hard to
add some functions which the users need in the future, so i restructure it, make
it easy to add new functions.
In order to improve the scalability of show_qgroups(), i add some important
structures:
struct qgroup_lookup {
struct rb_root root;
}
/*
*store qgroup's information
*/
struct btrfs_qgroup {
struct rb_node rb_node;
u64 qgroupid;
u64 generation;
u64 rfer;
u64 rfer_cmpr;
u64 excl_cmpr;
u64 flags;
u64 max_rfer;
u64 max_excl;
u64 rsv_rfer;
u64 rsv_excl;
struct list_head qgroups;
struct list_head members;
}
/*
*glue structure to represent the relations
*between qgroups
*/
struct btrfs_qgroup_list {
struct list_head next_qgroups;
struct list_head next_member;
struct btrfs_qgroup *qgroup;
struct btrfs_qgroup *member;
}
The above 3 structures are used to manage all the information
of qgroups.
struct {
char *name;
char *column_name;
int need_print;
} btrfs_qgroup_columns[]
We define a arrary to manage all the columns that can be
outputed, and use a member variant(->need_print) to control
the output of the relative column. Some columns are outputed
by default. But we can change it according to the requirement
of the users.
For example:
if outputing max referenced size of qgroup is needed,the function
'btrfs_qgroup_setup_column()' will be called, and the parameter 'BTRFS_QGROUP_MAX_RFER'
(extend in the future) will be passsed to the function. After the function is done,
when showing qgroups, max referenced size of qgroup will be output.
Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
A user was reporting an issue with bad transid errors on his blocks. The thing
is that btrfs-progs will ignore transid failures for things like restore and
fsck so we can do a best effort to fix a users file system. So fsck can put
together a coherent view of the file system with stale blocks. So if everything
else is ok in the mind of fsck then we can recow these blocks to fix the
generation and the user can get their file system back. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This is a prepatory work for the btrfs fi show command
fixes. So that we have a function get_df to get the fs sizes
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Internally, btrfs_header_fsid() calculates an unsigned long, but casts
it to a pointer, while all callers cast it to unsigned long again.
Committed to btrfs as fba6aa75654394fccf2530041e9451414c28084f
Fix line length issues and match changes to kernelspace
Signed-off-by: Ross Kirk <ross.kirk@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Remove unused parameter, 'eb'. Unused since introduction in
7777e63b42
Signed-off-by: Ross Kirk <ross.kirk@gmail.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Remove unused eb parameter from btrfs_item_nr, unused since introduced
in 7777e63b42
Signed-off-by: Ross Kirk <ross.kirk@gmail.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
I found that mkfs.btrfs aborts when assigned multi volumes contain
a small volume:
# parted /dev/sdf p
Model: LSI MegaRAID SAS RMB (scsi)
Disk /dev/sdf: 72.8GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 32.3kB 72.4GB 72.4GB primary
2 72.4GB 72.8GB 461MB primary
# ./mkfs.btrfs -f /dev/sdf1 /dev/sdf2
:
SMALL VOLUME: forcing mixed metadata/data groups
adding device /dev/sdf2 id 2
mkfs.btrfs: volumes.c:852: btrfs_alloc_chunk: Assertion `!(ret)' failed.
Aborted (core dumped)
This failure of btrfs_alloc_chunk was caused by following steps:
1) since there is only small space in the small device, mkfs was
going to allocate a chunk from free space as much as available.
So mkfs called btrfs_alloc_chunk with
size = device->total_bytes - device->used_bytes.
2) (According to the comment in source code, to avoid overwriting
superblock,) btrfs_alloc_chunk starts taking chunks at an offset
of 1MB. It means that the layout of a disk will be like:
[[1MB at beginning for sb][allocated chunks]* ... free space ... ]
and you can see that the available free space for allocation is:
avail = device->total_bytes - device->used_bytes - 1MB.
3) Therefore there is only free space 1MB less than requested. damn.
>From further investigations I also found that this issue is easily
reproduced by using -A, --alloc-start option:
# truncate --size=1G testfile
# ./mkfs.btrfs -A900M -f testfile
:
mkfs.btrfs: volumes.c:852: btrfs_alloc_chunk: Assertion `!(ret)' failed.
Aborted (core dumped)
In this case there is only 100MB for allocation but btrfs_alloc_chunk
was going to allocate more than the 100MB.
The root cause of both of above troubles is a same simple bug:
btrfs_chunk_alloc does not calculate available bytes properly even
though it researches how many devices have enough room to have a
chunk to be allocated.
So this patch introduces new function btrfs_device_avail_bytes()
which returns available bytes for allocation in specified device.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This silences a sparse warning:
warning: constant 0x4D5F53665248425F is so big it is long
from
commit 52162700bb
Author: Zach Brown <zab@redhat.com>
Date: Thu Jan 17 11:54:47 2013 -0800
btrfs-progs: treat super.magic as an le64
High fives, past me!
Signed-off-by: Zach Brown <zab@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
In files copied from the kernel, mark many functions as static,
and remove any resulting dead code.
Some functions are left unmarked if they aren't static in the
kernel tree.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Port of commit 12534832 to userspace:
commit 12534832cb7b0abc7369298246e8b7af03b863ca
Author: Josef Bacik <josef@redhat.com>
Date: Thu Dec 17 21:32:27 2009 +0000
Btrfs: make set/get functions for the super compat_ro flags use compat_ro
Our set/get functions for compat_ro_flags actually look at compat_flags. This
will mess any attempt to use compat flags up. The fix is obvious. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Port kernel commit 1bec1aed to userspace.
use __le64 instead of u64 in on-disk structure definition.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Port of commit b3b4aa7 to userspace.
parameter tree root it's not used since commit
5f39d397dfbe140a14edecd4e73c34ce23c4f9ee ("Btrfs: Create extent_buffer
interface for large blocksizes")
This gets userspace a tad closer to kernelspace by removing
this unused parameter that was all over the codebase...
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Some codes still use the cpu_to_lexx instead of the
BTRFS_SETGET_STACK_FUNCS declared in ctree.h.
Also added some BTRFS_SETGET_STACK_FUNCS for btrfs_header and
btrfs_super.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This commit adds UUID tree lookup methods that make use of the search
ioctl. The code is based on the kernel code.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Support printing these things.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Remove some commented-out & #if 0'd code:
* close_blocks()
* btrfs_drop_snapshot()
* btrfs_realloc_node()
* btrfs_find_dead_roots()
There are still some #if 0'd functions in there, but I'm hedging
on those for now, they have been copied to cmds-check.c and I want
to see if they can be brough back into ctree.c eventually.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
cmds-check.c contains the only caller of btrfs_fsck_reinit_root;
moving it to the caller's source file gets ctree.c a little
closer to kernelspace, although it does require exporting
add_root_to_dirty_list(), which is not done in kernelspace.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This adds a 'btrfs-image -m' option, which let us restore an image that
is built from a btrfs of multiple disks onto several disks altogether.
This aims to address the following case,
$ mkfs.btrfs -m raid0 sda sdb
$ btrfs-image sda image.file
$ btrfs-image -r image.file sdc
---------
so we can only restore metadata onto sdc, and another thing is we can
only mount sdc with degraded mode as we don't provide informations of
another disk. And, it's built as RAID0 and we have only one disk,
so after mount sdc we'll get into readonly mode.
This is just annoying for people(like me) who're trying to restore image
but turn to find they cannot make it work.
So this'll make your life easier, just tap
$ btrfs-image -m image.file sdc sdd
---------
then you get everything about metadata done, the same offset with that of
the originals(of course, you need offer enough disk size, at least the disk
size of the original disks).
Besides, this also works with raid5 and raid6 metadata image.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Because the fs/file roots are not extents, so it is better to use rb-tree
to manage them. Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
In some cases the extent tree can just be so gone there is no point in trying to
figure out how to put it back together. So add a --init-extent-tree mode which
will zero out the extent tree and then re-add extents for all of the blocks we
find. This will also undo any balance that was going on at the time of the
crash, this is needed because the reloc tree seems to confuse fsck at the
moment. With this patch I can put back together a users file system that was
completely gone. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
The tree log bug I introduced could create inconsistent file extent entries in
the file system tree and in some worst cases even create multiple extent entries
for the same entry. To fix this we need to do a few things
1) Keep track of extent items that overlap and then pick the one that covers the
largest area and delete the rest of the items.
2) Keep track of file extent items that land in extent items but don't match
disk_bytenr/disk_num_bytes exactly. Once we find these we need to figure out
who is the right ref and then fix all of the other refs to agree.
Each of these cases require a complete rescan of all of the extents, so
unfortunately if you hit this particular problem the fsck is going to take quite
a while since it will likely rescan all the trees 2 or 3 times. With this patch
the broken file system a user sent me is fixed and a broken file system that was
created by my reproducer is also fixed. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
In trying to track down a weird tree log problem I wanted to make sure that the
free space cache was actually valid, which we currently have no way of doing.
So this patch adds a bunch of support for the free space cache code and then a
checker to fsck. Basically we go through and if we can actually load the free
space cache then we will walk the extent tree and verify that the free space
cache exactly matches what is in the extent tree. Hopefully this will always be
correct, the only time it wouldn't is if the extent tree is corrupt or we have
some sort of awful bug in the free space cache. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
This fixes up the progs to properly deal with skinny metadata. This adds the -x
option to mkfs and btrfstune for enabling the skinny metadata option. This also
makes changes to fsck so it can properly deal with the skinny metadata entries.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
This tool draws per-chunk pngs representing the allocation map. A black
or colored dot means the block is allocated.
The output is written to a subdirectory, together with an index.html to be
viewed in a browser.
There are options to control whether color should be used and which block
group types should be printed.
To build, you need to have libpng and libgd installed. It is not part of
the 'all' target, so please build it explicitely with make btrfs-fragments.
A (rather untypical) example can be seen at
http://sensille.com/fragments
Please regard this as a first scratch version and feel free to improve it :)
Signed-off-by: Arne Jansen <sensille@gmx.net>
Allocate fs_info::super_copy dynamically of full BTRFS_SUPER_INFO_SIZE
and use it directly for saving superblock to disk.
This fixes incorrect superblock checksum after mkfs.
Signed-off-by: David Sterba <dsterba@suse.cz>
External software wanting to use the functionality provided by the btrfs
send ioctl has a hard time doing so without replicating tons of work. Of
particular interest are functions like btrfs_read_and_process_send_stream()
and subvol_uuid_search(). As that functionality requires a bit more than
just send-stream.c and send-utils.c we have to pull in some other parts of
the progs package.
This patch adds code to the Makefile and headers to create a library,
libbtrfs which the btrfs command now links to.
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
The super block magic is a le64 whose value looks like an unterminated
string in memory. The lack of null termination leads to clumsy use of
string functions and causes static analysis tools to warn that the
string will be unterminated.
So let's just treat it as the le64 that it is. Endian wrappers are used
on the constant so that they're compiled into run-time constants.
Signed-off-by: Zach Brown <zab@redhat.com>
David Woodhouse originally contributed this code, and Chris Mason
changed it around to reflect the current design goals for raid56.
The original code expected all metadata and data writes to be full
stripes. This meant metadata block size == stripe size, and had a few
other restrictions.
This version allows metadata blocks smaller than the stripe size. It
implements both raid5 and raid6, although it does not have code to
rebuild from parity if one of the drives is missing or incorrect.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This patch syncs the extended inode ref definitions from kernels ctree.h and
adds support in btrfs-debug-tree for visualizing the state of extended refs
on disk.
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
This is the user mode part of the device replace patch series.
The command group "btrfs replace" is added with three commands:
- btrfs replace start srcdev|srcdevid targetdev [-Bfr] mount_point
- btrfs replace status mount_point [-1]
- btrfs replace cancel mount_point
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
"btrfs device stats" is used to retrieve and print the device stats.
"btrfs device stats -z" is used to atomically retrieve, reset and
print the stats.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
gcc optimizes out the memcpy calls at -O2 and -Os.
Replacing memcpy with memmove does't work - gcc treats memmove
the same way it treats memcpy.
This patch brings in {get|put}_unaligned_le{16|32|64} (using the
packed struct method), and uses them in the failing get/set calls.
On architectures where unaligned accesses are cheap, these unaligned
macros should be optimized out by the compiler.
Signed-off-by: Ben Peddell <klightspeed@killerwolves.net>
There are some unaligned accesses in progs that cause malfunction or
crashes on ARM.
This patch fixes the ones we stumbled upon.
Signed-off-by: Arne Jansen <sensille@gmx.net>
Using mkfs.btrfs like:
mkfs.btrfs -l 131072 /dev/sda
will return no error, but after mount it, the dmesg will report:
BTRFS: couldn't mount because metadata blocksize (131072) was too large
The leafsize and nodesize are equal at present, so we just use one function
"check_leaf_or_node_size" to limit leaf and node size below BTRFS_MAX_METADATA_BLOCKSIZE.
Signed-off-by: Robin Dong <sanbai@taobao.com>
Reviewed-by: David Sterba <dave@jikos.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
We want 'btrfs subvolume list' only to list readonly subvolumes, this patch set
introduces a new option 'r' to implement it.
You can use the command like that:
btrfs subvolume list -r <path>
Original-Signed-off-by: Zhou Bo <zhoub-fnst@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Now we check if the root item contains otime and uuid or not by comparing
->generation_v2 and ->generation of the btrfs_root_item structure, it is
wrong because it is possbile that ->generation may equal to the first
variant of the next item. We fix this problem by check the size of btrfs_root_item,
if it is larger than the original one, the new btrfs_root_item contains otime
and uuid. we needn't worry the case that the new filesystem is mounted on the
old kernel. because the otime and uuid are not changed on the old kernel, we can
get the correct result even on the kernel.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Commit bab2c565 accidentally broke 'subvol get-default' command by
removing almost all of the underlying code. Bring it back with some
fixes and improvements.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
When we discover bad blocks in the extent allocation tree, repair can
now discard them and recreate the references from the rest of the trees.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
During btrfsck --repair, we make an index of extents that have incorrect
reference counts. Once we've collect the whole index, we go through
and modify the extent allocation tree to reflect the correct results.
Changing the extent allocation tree may free blocks, and so it may
end up removing a block that had a missing reference structure. The
fsck code may then circle back around and add the reference back.
The result is an extent that isn't actually used, but is recorded in the
extent allocation tree.
This commit adds a hook called as extents are freed. The hook searches
the index of incorrect references and updates it to reflect the freeing
of the extent.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This will effectively delete all of your crcs, but at least you'll
be able to mount the FS with nodatasum.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This also includes a new --repair btrfsck option. For now it can
only fix errors in the extent allocation tree.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
- scrub structs added
- ioctls for scrub
- BTRFS_FSID_SIZE moved
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Hugo Mills <hugo@carfax.org.uk>
[Btrfs-Progs][V2] Update for lzo support
- Add incompat flag, otherwise btrfs-progs will report error
when operating on btrfs filesystems mounted with lzo option.
- Update man page.
- Allow to turn on lzo compression for defrag operation:
# btrfs filesystem defragment -c[zlib, lzo] <file>
Note: "-c zlib" will fail, because that's how getopt() works
for optional arguments.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
So alot of crazy people (I'm looking at you Meego) want to use btrfs on phones
and such with small devices. Unfortunately the way we split out metadata/data
chunks it makes space usage inefficient for volumes that are smaller than
1gigabyte. So add a -M option for mixing metadata+data, and default to this
mixed mode if the filesystem is less than or equal to 1 gigabyte. I've tested
this with xfstests on a 100mb filesystem and everything is a-ok.
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This patch updates the super field to add the cache_generation member. It also
makes us set it to -1 on mkfs so any new filesystem will get the space cache
stuff turned on. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
I forgot to add BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL to the incompat flags in
btrfs-progs. This adds it so that our tools don't freak out when touching a fs
with the default subvolume changed. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
btrfs-subvol find-new <subvol> <id> will search through a given subvol
and print out all the files with extents newer than a given id.
Care must be taken to make sure any pending delalloc is on disk before
running this because that won't show up in the output.
This commit introduces a new kind of back reference for btrfs metadata.
Once a filesystem has been mounted with this commit, IT WILL NO LONGER
BE MOUNTABLE BY OLDER KERNELS.
The new back ref provides information about pointer's key, level and in which
tree the pointer lives. This information allow us to find the pointer by
searching the tree. The shortcoming of the new back ref is that it only works
for pointers in tree blocks referenced by their owner trees.
This is mostly a problem for snapshots, where resolving one of these fuzzy back
references would be O(number_of_snapshots) and quite slow. The solution used
here is to use the fuzzy back references in the common case where a given tree
block is only referenced by one root, and use the full back references when
multiple roots have a reference
This patch makes btrfsck check more things, including
directory items, file extents, checksumming, inode link
counts etc.
The code for these checks is similar to the code verifies
extent back references. The main difference is that
shared tree blocks are treated specially. The partial
checking results(unresolved references and/or errors)
of shared sub-trees are cached. This avoids scanning
the shared blocks several times. Thank you,
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>