This adds the flag to ctree.h, adds the feature option to mkfs to turn it on and
fixes fsck so it doesn't complain about missing hole extents in files when this
flag is set.
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
A user had a fs where the objectid of an orphan item was not the actual orphan
item objectid. This screwed up fsck because the block has keys in the wrong
order, also the fs scanning stuff will freak out because we have an inode with
nlink 0 and no orphan item. So this patch is pretty big but is all related.
1) Deal with bad key ordering. We can easily fix this up, so fix the checking
stuff to tell us exactly what it found when it said there was a problem. Then
if it's bad key ordering we can reorder the keys and restart the scan.
2) Deal with bad keys. If we find an orphan item with the wrong objectid it's
likely to screw with stuff, so keep track of these sort of things with a
bad_item list and just run through and delete any objects that don't make sense.
So far we just do this for orphan items but we could extend this as new stuff
pops up.
3) Deal with missing orphan items. This is easy, if we have a file with i_nlink
set to 0 and no orphan item we can just add an orphan item.
4) Add the infrastructure to corrupt actual key values. Needed this to create a
test image to verify I was fixing things properly.
This patch fixes the corrupt image I'm adding and passes the other make test
tests. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
When reading block groups we will searching it's corresponding chunk, however, at this
time, some chunks has not been built(data chunks raid0/raid10/raid56), don't bug_on here,
we will try to rebuild these chunks later.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Internally, btrfs_header_chunk_tree_uuid() calculates an unsigned
long, but casts it to a pointer, while all callers cast it to unsigned
long again.
From btrfs commit b308bc2f05a86e728bd035e21a4974acd05f4d1e
Signed-off-by: Ross Kirk <ross.kirk@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
mkfs -r wasn't creating chunks properly, making it very difficult to
allocate space for anything except tiny filesystems.
This changes it around to use more of the generic infrastructure, and
to do actual logical->physical block number translation.
It also allocates space to the files in smaller extents (max 1MB), which
keeps the allocator from trying to allocate an extent bigger than a
single chunk.
It doesn't quite support multi-device mkfs -r yet, but is much closer.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
The current show_qgroups() just shows a little information, and it is hard to
add some functions which the users need in the future, so i restructure it, make
it easy to add new functions.
In order to improve the scalability of show_qgroups(), i add some important
structures:
struct qgroup_lookup {
struct rb_root root;
}
/*
*store qgroup's information
*/
struct btrfs_qgroup {
struct rb_node rb_node;
u64 qgroupid;
u64 generation;
u64 rfer;
u64 rfer_cmpr;
u64 excl_cmpr;
u64 flags;
u64 max_rfer;
u64 max_excl;
u64 rsv_rfer;
u64 rsv_excl;
struct list_head qgroups;
struct list_head members;
}
/*
*glue structure to represent the relations
*between qgroups
*/
struct btrfs_qgroup_list {
struct list_head next_qgroups;
struct list_head next_member;
struct btrfs_qgroup *qgroup;
struct btrfs_qgroup *member;
}
The above 3 structures are used to manage all the information
of qgroups.
struct {
char *name;
char *column_name;
int need_print;
} btrfs_qgroup_columns[]
We define a arrary to manage all the columns that can be
outputed, and use a member variant(->need_print) to control
the output of the relative column. Some columns are outputed
by default. But we can change it according to the requirement
of the users.
For example:
if outputing max referenced size of qgroup is needed,the function
'btrfs_qgroup_setup_column()' will be called, and the parameter 'BTRFS_QGROUP_MAX_RFER'
(extend in the future) will be passsed to the function. After the function is done,
when showing qgroups, max referenced size of qgroup will be output.
Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
A user was reporting an issue with bad transid errors on his blocks. The thing
is that btrfs-progs will ignore transid failures for things like restore and
fsck so we can do a best effort to fix a users file system. So fsck can put
together a coherent view of the file system with stale blocks. So if everything
else is ok in the mind of fsck then we can recow these blocks to fix the
generation and the user can get their file system back. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This is a prepatory work for the btrfs fi show command
fixes. So that we have a function get_df to get the fs sizes
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Internally, btrfs_header_fsid() calculates an unsigned long, but casts
it to a pointer, while all callers cast it to unsigned long again.
Committed to btrfs as fba6aa75654394fccf2530041e9451414c28084f
Fix line length issues and match changes to kernelspace
Signed-off-by: Ross Kirk <ross.kirk@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Remove unused parameter, 'eb'. Unused since introduction in
7777e63b42
Signed-off-by: Ross Kirk <ross.kirk@gmail.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Remove unused eb parameter from btrfs_item_nr, unused since introduced
in 7777e63b42
Signed-off-by: Ross Kirk <ross.kirk@gmail.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
I found that mkfs.btrfs aborts when assigned multi volumes contain
a small volume:
# parted /dev/sdf p
Model: LSI MegaRAID SAS RMB (scsi)
Disk /dev/sdf: 72.8GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 32.3kB 72.4GB 72.4GB primary
2 72.4GB 72.8GB 461MB primary
# ./mkfs.btrfs -f /dev/sdf1 /dev/sdf2
:
SMALL VOLUME: forcing mixed metadata/data groups
adding device /dev/sdf2 id 2
mkfs.btrfs: volumes.c:852: btrfs_alloc_chunk: Assertion `!(ret)' failed.
Aborted (core dumped)
This failure of btrfs_alloc_chunk was caused by following steps:
1) since there is only small space in the small device, mkfs was
going to allocate a chunk from free space as much as available.
So mkfs called btrfs_alloc_chunk with
size = device->total_bytes - device->used_bytes.
2) (According to the comment in source code, to avoid overwriting
superblock,) btrfs_alloc_chunk starts taking chunks at an offset
of 1MB. It means that the layout of a disk will be like:
[[1MB at beginning for sb][allocated chunks]* ... free space ... ]
and you can see that the available free space for allocation is:
avail = device->total_bytes - device->used_bytes - 1MB.
3) Therefore there is only free space 1MB less than requested. damn.
>From further investigations I also found that this issue is easily
reproduced by using -A, --alloc-start option:
# truncate --size=1G testfile
# ./mkfs.btrfs -A900M -f testfile
:
mkfs.btrfs: volumes.c:852: btrfs_alloc_chunk: Assertion `!(ret)' failed.
Aborted (core dumped)
In this case there is only 100MB for allocation but btrfs_alloc_chunk
was going to allocate more than the 100MB.
The root cause of both of above troubles is a same simple bug:
btrfs_chunk_alloc does not calculate available bytes properly even
though it researches how many devices have enough room to have a
chunk to be allocated.
So this patch introduces new function btrfs_device_avail_bytes()
which returns available bytes for allocation in specified device.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This silences a sparse warning:
warning: constant 0x4D5F53665248425F is so big it is long
from
commit 52162700bb
Author: Zach Brown <zab@redhat.com>
Date: Thu Jan 17 11:54:47 2013 -0800
btrfs-progs: treat super.magic as an le64
High fives, past me!
Signed-off-by: Zach Brown <zab@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
In files copied from the kernel, mark many functions as static,
and remove any resulting dead code.
Some functions are left unmarked if they aren't static in the
kernel tree.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Port of commit 12534832 to userspace:
commit 12534832cb7b0abc7369298246e8b7af03b863ca
Author: Josef Bacik <josef@redhat.com>
Date: Thu Dec 17 21:32:27 2009 +0000
Btrfs: make set/get functions for the super compat_ro flags use compat_ro
Our set/get functions for compat_ro_flags actually look at compat_flags. This
will mess any attempt to use compat flags up. The fix is obvious. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Port kernel commit 1bec1aed to userspace.
use __le64 instead of u64 in on-disk structure definition.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Port of commit b3b4aa7 to userspace.
parameter tree root it's not used since commit
5f39d397dfbe140a14edecd4e73c34ce23c4f9ee ("Btrfs: Create extent_buffer
interface for large blocksizes")
This gets userspace a tad closer to kernelspace by removing
this unused parameter that was all over the codebase...
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Some codes still use the cpu_to_lexx instead of the
BTRFS_SETGET_STACK_FUNCS declared in ctree.h.
Also added some BTRFS_SETGET_STACK_FUNCS for btrfs_header and
btrfs_super.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This commit adds UUID tree lookup methods that make use of the search
ioctl. The code is based on the kernel code.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Support printing these things.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Remove some commented-out & #if 0'd code:
* close_blocks()
* btrfs_drop_snapshot()
* btrfs_realloc_node()
* btrfs_find_dead_roots()
There are still some #if 0'd functions in there, but I'm hedging
on those for now, they have been copied to cmds-check.c and I want
to see if they can be brough back into ctree.c eventually.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
cmds-check.c contains the only caller of btrfs_fsck_reinit_root;
moving it to the caller's source file gets ctree.c a little
closer to kernelspace, although it does require exporting
add_root_to_dirty_list(), which is not done in kernelspace.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This adds a 'btrfs-image -m' option, which let us restore an image that
is built from a btrfs of multiple disks onto several disks altogether.
This aims to address the following case,
$ mkfs.btrfs -m raid0 sda sdb
$ btrfs-image sda image.file
$ btrfs-image -r image.file sdc
---------
so we can only restore metadata onto sdc, and another thing is we can
only mount sdc with degraded mode as we don't provide informations of
another disk. And, it's built as RAID0 and we have only one disk,
so after mount sdc we'll get into readonly mode.
This is just annoying for people(like me) who're trying to restore image
but turn to find they cannot make it work.
So this'll make your life easier, just tap
$ btrfs-image -m image.file sdc sdd
---------
then you get everything about metadata done, the same offset with that of
the originals(of course, you need offer enough disk size, at least the disk
size of the original disks).
Besides, this also works with raid5 and raid6 metadata image.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Because the fs/file roots are not extents, so it is better to use rb-tree
to manage them. Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
In some cases the extent tree can just be so gone there is no point in trying to
figure out how to put it back together. So add a --init-extent-tree mode which
will zero out the extent tree and then re-add extents for all of the blocks we
find. This will also undo any balance that was going on at the time of the
crash, this is needed because the reloc tree seems to confuse fsck at the
moment. With this patch I can put back together a users file system that was
completely gone. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
The tree log bug I introduced could create inconsistent file extent entries in
the file system tree and in some worst cases even create multiple extent entries
for the same entry. To fix this we need to do a few things
1) Keep track of extent items that overlap and then pick the one that covers the
largest area and delete the rest of the items.
2) Keep track of file extent items that land in extent items but don't match
disk_bytenr/disk_num_bytes exactly. Once we find these we need to figure out
who is the right ref and then fix all of the other refs to agree.
Each of these cases require a complete rescan of all of the extents, so
unfortunately if you hit this particular problem the fsck is going to take quite
a while since it will likely rescan all the trees 2 or 3 times. With this patch
the broken file system a user sent me is fixed and a broken file system that was
created by my reproducer is also fixed. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
In trying to track down a weird tree log problem I wanted to make sure that the
free space cache was actually valid, which we currently have no way of doing.
So this patch adds a bunch of support for the free space cache code and then a
checker to fsck. Basically we go through and if we can actually load the free
space cache then we will walk the extent tree and verify that the free space
cache exactly matches what is in the extent tree. Hopefully this will always be
correct, the only time it wouldn't is if the extent tree is corrupt or we have
some sort of awful bug in the free space cache. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
This fixes up the progs to properly deal with skinny metadata. This adds the -x
option to mkfs and btrfstune for enabling the skinny metadata option. This also
makes changes to fsck so it can properly deal with the skinny metadata entries.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
This tool draws per-chunk pngs representing the allocation map. A black
or colored dot means the block is allocated.
The output is written to a subdirectory, together with an index.html to be
viewed in a browser.
There are options to control whether color should be used and which block
group types should be printed.
To build, you need to have libpng and libgd installed. It is not part of
the 'all' target, so please build it explicitely with make btrfs-fragments.
A (rather untypical) example can be seen at
http://sensille.com/fragments
Please regard this as a first scratch version and feel free to improve it :)
Signed-off-by: Arne Jansen <sensille@gmx.net>
Allocate fs_info::super_copy dynamically of full BTRFS_SUPER_INFO_SIZE
and use it directly for saving superblock to disk.
This fixes incorrect superblock checksum after mkfs.
Signed-off-by: David Sterba <dsterba@suse.cz>
External software wanting to use the functionality provided by the btrfs
send ioctl has a hard time doing so without replicating tons of work. Of
particular interest are functions like btrfs_read_and_process_send_stream()
and subvol_uuid_search(). As that functionality requires a bit more than
just send-stream.c and send-utils.c we have to pull in some other parts of
the progs package.
This patch adds code to the Makefile and headers to create a library,
libbtrfs which the btrfs command now links to.
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
The super block magic is a le64 whose value looks like an unterminated
string in memory. The lack of null termination leads to clumsy use of
string functions and causes static analysis tools to warn that the
string will be unterminated.
So let's just treat it as the le64 that it is. Endian wrappers are used
on the constant so that they're compiled into run-time constants.
Signed-off-by: Zach Brown <zab@redhat.com>
David Woodhouse originally contributed this code, and Chris Mason
changed it around to reflect the current design goals for raid56.
The original code expected all metadata and data writes to be full
stripes. This meant metadata block size == stripe size, and had a few
other restrictions.
This version allows metadata blocks smaller than the stripe size. It
implements both raid5 and raid6, although it does not have code to
rebuild from parity if one of the drives is missing or incorrect.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This patch syncs the extended inode ref definitions from kernels ctree.h and
adds support in btrfs-debug-tree for visualizing the state of extended refs
on disk.
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
This is the user mode part of the device replace patch series.
The command group "btrfs replace" is added with three commands:
- btrfs replace start srcdev|srcdevid targetdev [-Bfr] mount_point
- btrfs replace status mount_point [-1]
- btrfs replace cancel mount_point
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
"btrfs device stats" is used to retrieve and print the device stats.
"btrfs device stats -z" is used to atomically retrieve, reset and
print the stats.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
gcc optimizes out the memcpy calls at -O2 and -Os.
Replacing memcpy with memmove does't work - gcc treats memmove
the same way it treats memcpy.
This patch brings in {get|put}_unaligned_le{16|32|64} (using the
packed struct method), and uses them in the failing get/set calls.
On architectures where unaligned accesses are cheap, these unaligned
macros should be optimized out by the compiler.
Signed-off-by: Ben Peddell <klightspeed@killerwolves.net>
There are some unaligned accesses in progs that cause malfunction or
crashes on ARM.
This patch fixes the ones we stumbled upon.
Signed-off-by: Arne Jansen <sensille@gmx.net>
Using mkfs.btrfs like:
mkfs.btrfs -l 131072 /dev/sda
will return no error, but after mount it, the dmesg will report:
BTRFS: couldn't mount because metadata blocksize (131072) was too large
The leafsize and nodesize are equal at present, so we just use one function
"check_leaf_or_node_size" to limit leaf and node size below BTRFS_MAX_METADATA_BLOCKSIZE.
Signed-off-by: Robin Dong <sanbai@taobao.com>
Reviewed-by: David Sterba <dave@jikos.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
We want 'btrfs subvolume list' only to list readonly subvolumes, this patch set
introduces a new option 'r' to implement it.
You can use the command like that:
btrfs subvolume list -r <path>
Original-Signed-off-by: Zhou Bo <zhoub-fnst@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Now we check if the root item contains otime and uuid or not by comparing
->generation_v2 and ->generation of the btrfs_root_item structure, it is
wrong because it is possbile that ->generation may equal to the first
variant of the next item. We fix this problem by check the size of btrfs_root_item,
if it is larger than the original one, the new btrfs_root_item contains otime
and uuid. we needn't worry the case that the new filesystem is mounted on the
old kernel. because the otime and uuid are not changed on the old kernel, we can
get the correct result even on the kernel.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Commit bab2c565 accidentally broke 'subvol get-default' command by
removing almost all of the underlying code. Bring it back with some
fixes and improvements.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
When we discover bad blocks in the extent allocation tree, repair can
now discard them and recreate the references from the rest of the trees.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
During btrfsck --repair, we make an index of extents that have incorrect
reference counts. Once we've collect the whole index, we go through
and modify the extent allocation tree to reflect the correct results.
Changing the extent allocation tree may free blocks, and so it may
end up removing a block that had a missing reference structure. The
fsck code may then circle back around and add the reference back.
The result is an extent that isn't actually used, but is recorded in the
extent allocation tree.
This commit adds a hook called as extents are freed. The hook searches
the index of incorrect references and updates it to reflect the freeing
of the extent.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This will effectively delete all of your crcs, but at least you'll
be able to mount the FS with nodatasum.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This also includes a new --repair btrfsck option. For now it can
only fix errors in the extent allocation tree.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
- scrub structs added
- ioctls for scrub
- BTRFS_FSID_SIZE moved
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Hugo Mills <hugo@carfax.org.uk>
[Btrfs-Progs][V2] Update for lzo support
- Add incompat flag, otherwise btrfs-progs will report error
when operating on btrfs filesystems mounted with lzo option.
- Update man page.
- Allow to turn on lzo compression for defrag operation:
# btrfs filesystem defragment -c[zlib, lzo] <file>
Note: "-c zlib" will fail, because that's how getopt() works
for optional arguments.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
So alot of crazy people (I'm looking at you Meego) want to use btrfs on phones
and such with small devices. Unfortunately the way we split out metadata/data
chunks it makes space usage inefficient for volumes that are smaller than
1gigabyte. So add a -M option for mixing metadata+data, and default to this
mixed mode if the filesystem is less than or equal to 1 gigabyte. I've tested
this with xfstests on a 100mb filesystem and everything is a-ok.
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This patch updates the super field to add the cache_generation member. It also
makes us set it to -1 on mkfs so any new filesystem will get the space cache
stuff turned on. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
I forgot to add BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL to the incompat flags in
btrfs-progs. This adds it so that our tools don't freak out when touching a fs
with the default subvolume changed. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
btrfs-subvol find-new <subvol> <id> will search through a given subvol
and print out all the files with extents newer than a given id.
Care must be taken to make sure any pending delalloc is on disk before
running this because that won't show up in the output.
This commit introduces a new kind of back reference for btrfs metadata.
Once a filesystem has been mounted with this commit, IT WILL NO LONGER
BE MOUNTABLE BY OLDER KERNELS.
The new back ref provides information about pointer's key, level and in which
tree the pointer lives. This information allow us to find the pointer by
searching the tree. The shortcoming of the new back ref is that it only works
for pointers in tree blocks referenced by their owner trees.
This is mostly a problem for snapshots, where resolving one of these fuzzy back
references would be O(number_of_snapshots) and quite slow. The solution used
here is to use the fuzzy back references in the common case where a given tree
block is only referenced by one root, and use the full back references when
multiple roots have a reference
This patch makes btrfsck check more things, including
directory items, file extents, checksumming, inode link
counts etc.
The code for these checks is similar to the code verifies
extent back references. The main difference is that
shared tree blocks are treated specially. The partial
checking results(unresolved references and/or errors)
of shared sub-trees are cached. This avoids scanning
the shared blocks several times. Thank you,
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
This patch updates the ext3 to btrfs converter for the new
disk format. This mainly involves changing the convert's
data relocation and free space management code. This patch
also ports some functions from kernel module to btrfs-progs.
Thank you,
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
This adds a sequence number to the btrfs inode that is increased on
every update. NFS will be able to use that to detect when an inode has
changed, without relying on inaccurate time fields.
While we're here, this also:
Puts reserved space into the super block and inode
Adds a log root transid to the super so we can pick the newest super
based on the fsync log as well as the main transaction ID. For now
the log root transid is always zero, but that'll get fixed.
Adds a starting offset to the dev_item. This will let us do better
alignment calculations if we know the start of a partition on the disk.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Btrfs stores checksums for each data block. Until now, they have
been stored in the subvolume trees, indexed by the inode that is
referencing the data block. This means that when we read the inode,
we've probably read in at least some checksums as well.
But, this has a few problems:
* The checksums are indexed by logical offset in the file. When
compression is on, this means we have to do the expensive checksumming
on the uncompressed data. It would be faster if we could checksum
the compressed data instead.
* If we implement encryption, we'll be checksumming the plain text and
storing that on disk. This is significantly less secure.
* For either compression or encryption, we have to get the plain text
back before we can verify the checksum as correct. This makes the raid
layer balancing and extent moving much more expensive.
* It makes the front end caching code more complex, as we have touch
the subvolume and inodes as we cache extents.
* There is potentitally one copy of the checksum in each subvolume
referencing an extent.
The solution used here is to store the extent checksums in a dedicated
tree. This allows us to index the checksums by phyiscal extent
start and length. It means:
* The checksum is against the data stored on disk, after any compression
or encryption is done.
* The checksum is stored in a central location, and can be verified without
following back references, or reading inodes.
This makes compression significantly faster by reducing the amount of
data that needs to be checksummed. It will also allow much faster
raid management code in general.
The checksums are indexed by a key with a fixed objectid (a magic value
in ctree.h) and offset set to the starting byte of the extent. This
allows us to copy the checksum items into the fsync log tree directly (or
any other tree), without having to invent a second format for them.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This is the btrfs-progs version of the patch to add the ability to have
different csum algorithims. Note I didn't change the image maker since it
seemed a bit more complicated than just changing some stuff around so I will let
Yan take care of that.
Everything else was converted and for now a mkfs just
sets the type to be BTRFS_CSUM_TYPE_CRC32.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
This patch adds btrfs image tool. The image tool is
a debugging tool that creates/restores btrfs metadump
image.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
This patch does the following:
1) Update device management code to match the kernel code.
2) Allocator fixes.
3) Add a program called btrfstune to set/clear the SEEDING
super block flags.
This patch adds transaction IDs to root tree pointers.
Transaction IDs in tree pointers are compared with the
generation numbers in block headers when reading root
blocks of trees. This can detect some types of IO errors.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
The offset field in struct btrfs_extent_ref records the position
inside file that file extent is referenced by. In the new back
reference system, tree leaves holding reference to file extent
are recorded explicitly. We can quickly scan these tree leaves, so the
offset field is not required.
This patch also makes the back reference system check the objectid
when extents are being deleted
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
This patch makes the back reference system to explicit record the
location of parent node for all types of extents. The location of
parent node is placed into the offset field of backref key. Every
time a tree block is balanced, the back references for the affected
lower level extents are updated.
This is a utility option for the resizer, it makes sure to allocate
at offset bytes in the disk or higher. It ensures the resizer will have
something to move when testing it.
This patch improves converter's allocator and fixes a bug in data relocation
function. The new allocator caches free blocks as Btrfs's default allocator.
In testing here, the user CPU time reduced to half of the original when
checksum and small file packing was disabled. This patch also enlarges the
size of block groups created by the converter.
The main changes in this patch are adding chunk handing and data relocation
ability. In the last step of conversion, the converter relocates data in system
chunk and move chunk tree into system chunk. In the rollback process, the
converter remove chunk tree from system chunk and copy data back.
Regards
YZ
---
Block headers now store the chunk tree uuid
Chunk items records the device uuid for each stripes
Device extent items record better back refs to the chunk tree
Block groups record better back refs to the chunk tree
The chunk tree format has also changed. The objectid of BTRFS_CHUNK_ITEM_KEY
used to be the logical offset of the chunk. Now it is a chunk tree id,
with the logical offset being stored in the offset field of the key.
This allows a single chunk tree to record multiple logical address spaces,
upping the number of bytes indexed by a chunk tree from 2^64 to
2^128.
The mkfs code bootstraps the filesystem on a single device. Once
the raid block groups are setup, it needs to recow all of the blocks so
that each tree is properly allocated.
The first problem is that these SETGET macros lose typing information,
and therefore can't see the 'packed' attribute and therefore take
unaligned access SIGBUS signals on sparc64 when trying to derefernce
the member.
The next problem is a similar issue in btrfs_name_hash(). This gets
passed things like &key.offset which is a member of a packed
structure, losing this packed'ness information btrfs_name_hash()
performs a potentially unaligned memory access, again resulting in a
SIGBUS.
This adds support for keeping track of the number of blocks used by
root_item's. This makes it so that mkfs lays down the "default" subvol with
the correct block accounting in place. Thank you,