I found that mkfs.btrfs aborts when assigned multi volumes contain
a small volume:
# parted /dev/sdf p
Model: LSI MegaRAID SAS RMB (scsi)
Disk /dev/sdf: 72.8GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 32.3kB 72.4GB 72.4GB primary
2 72.4GB 72.8GB 461MB primary
# ./mkfs.btrfs -f /dev/sdf1 /dev/sdf2
:
SMALL VOLUME: forcing mixed metadata/data groups
adding device /dev/sdf2 id 2
mkfs.btrfs: volumes.c:852: btrfs_alloc_chunk: Assertion `!(ret)' failed.
Aborted (core dumped)
This failure of btrfs_alloc_chunk was caused by following steps:
1) since there is only small space in the small device, mkfs was
going to allocate a chunk from free space as much as available.
So mkfs called btrfs_alloc_chunk with
size = device->total_bytes - device->used_bytes.
2) (According to the comment in source code, to avoid overwriting
superblock,) btrfs_alloc_chunk starts taking chunks at an offset
of 1MB. It means that the layout of a disk will be like:
[[1MB at beginning for sb][allocated chunks]* ... free space ... ]
and you can see that the available free space for allocation is:
avail = device->total_bytes - device->used_bytes - 1MB.
3) Therefore there is only free space 1MB less than requested. damn.
>From further investigations I also found that this issue is easily
reproduced by using -A, --alloc-start option:
# truncate --size=1G testfile
# ./mkfs.btrfs -A900M -f testfile
:
mkfs.btrfs: volumes.c:852: btrfs_alloc_chunk: Assertion `!(ret)' failed.
Aborted (core dumped)
In this case there is only 100MB for allocation but btrfs_alloc_chunk
was going to allocate more than the 100MB.
The root cause of both of above troubles is a same simple bug:
btrfs_chunk_alloc does not calculate available bytes properly even
though it researches how many devices have enough room to have a
chunk to be allocated.
So this patch introduces new function btrfs_device_avail_bytes()
which returns available bytes for allocation in specified device.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This silences a sparse warning:
warning: constant 0x4D5F53665248425F is so big it is long
from
commit 52162700bb
Author: Zach Brown <zab@redhat.com>
Date: Thu Jan 17 11:54:47 2013 -0800
btrfs-progs: treat super.magic as an le64
High fives, past me!
Signed-off-by: Zach Brown <zab@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
In files copied from the kernel, mark many functions as static,
and remove any resulting dead code.
Some functions are left unmarked if they aren't static in the
kernel tree.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Port of commit 12534832 to userspace:
commit 12534832cb7b0abc7369298246e8b7af03b863ca
Author: Josef Bacik <josef@redhat.com>
Date: Thu Dec 17 21:32:27 2009 +0000
Btrfs: make set/get functions for the super compat_ro flags use compat_ro
Our set/get functions for compat_ro_flags actually look at compat_flags. This
will mess any attempt to use compat flags up. The fix is obvious. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Port kernel commit 1bec1aed to userspace.
use __le64 instead of u64 in on-disk structure definition.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Port of commit b3b4aa7 to userspace.
parameter tree root it's not used since commit
5f39d397dfbe140a14edecd4e73c34ce23c4f9ee ("Btrfs: Create extent_buffer
interface for large blocksizes")
This gets userspace a tad closer to kernelspace by removing
this unused parameter that was all over the codebase...
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Some codes still use the cpu_to_lexx instead of the
BTRFS_SETGET_STACK_FUNCS declared in ctree.h.
Also added some BTRFS_SETGET_STACK_FUNCS for btrfs_header and
btrfs_super.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This commit adds UUID tree lookup methods that make use of the search
ioctl. The code is based on the kernel code.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Support printing these things.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Remove some commented-out & #if 0'd code:
* close_blocks()
* btrfs_drop_snapshot()
* btrfs_realloc_node()
* btrfs_find_dead_roots()
There are still some #if 0'd functions in there, but I'm hedging
on those for now, they have been copied to cmds-check.c and I want
to see if they can be brough back into ctree.c eventually.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
cmds-check.c contains the only caller of btrfs_fsck_reinit_root;
moving it to the caller's source file gets ctree.c a little
closer to kernelspace, although it does require exporting
add_root_to_dirty_list(), which is not done in kernelspace.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This adds a 'btrfs-image -m' option, which let us restore an image that
is built from a btrfs of multiple disks onto several disks altogether.
This aims to address the following case,
$ mkfs.btrfs -m raid0 sda sdb
$ btrfs-image sda image.file
$ btrfs-image -r image.file sdc
---------
so we can only restore metadata onto sdc, and another thing is we can
only mount sdc with degraded mode as we don't provide informations of
another disk. And, it's built as RAID0 and we have only one disk,
so after mount sdc we'll get into readonly mode.
This is just annoying for people(like me) who're trying to restore image
but turn to find they cannot make it work.
So this'll make your life easier, just tap
$ btrfs-image -m image.file sdc sdd
---------
then you get everything about metadata done, the same offset with that of
the originals(of course, you need offer enough disk size, at least the disk
size of the original disks).
Besides, this also works with raid5 and raid6 metadata image.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Because the fs/file roots are not extents, so it is better to use rb-tree
to manage them. Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
In some cases the extent tree can just be so gone there is no point in trying to
figure out how to put it back together. So add a --init-extent-tree mode which
will zero out the extent tree and then re-add extents for all of the blocks we
find. This will also undo any balance that was going on at the time of the
crash, this is needed because the reloc tree seems to confuse fsck at the
moment. With this patch I can put back together a users file system that was
completely gone. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
The tree log bug I introduced could create inconsistent file extent entries in
the file system tree and in some worst cases even create multiple extent entries
for the same entry. To fix this we need to do a few things
1) Keep track of extent items that overlap and then pick the one that covers the
largest area and delete the rest of the items.
2) Keep track of file extent items that land in extent items but don't match
disk_bytenr/disk_num_bytes exactly. Once we find these we need to figure out
who is the right ref and then fix all of the other refs to agree.
Each of these cases require a complete rescan of all of the extents, so
unfortunately if you hit this particular problem the fsck is going to take quite
a while since it will likely rescan all the trees 2 or 3 times. With this patch
the broken file system a user sent me is fixed and a broken file system that was
created by my reproducer is also fixed. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
In trying to track down a weird tree log problem I wanted to make sure that the
free space cache was actually valid, which we currently have no way of doing.
So this patch adds a bunch of support for the free space cache code and then a
checker to fsck. Basically we go through and if we can actually load the free
space cache then we will walk the extent tree and verify that the free space
cache exactly matches what is in the extent tree. Hopefully this will always be
correct, the only time it wouldn't is if the extent tree is corrupt or we have
some sort of awful bug in the free space cache. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
This fixes up the progs to properly deal with skinny metadata. This adds the -x
option to mkfs and btrfstune for enabling the skinny metadata option. This also
makes changes to fsck so it can properly deal with the skinny metadata entries.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
This tool draws per-chunk pngs representing the allocation map. A black
or colored dot means the block is allocated.
The output is written to a subdirectory, together with an index.html to be
viewed in a browser.
There are options to control whether color should be used and which block
group types should be printed.
To build, you need to have libpng and libgd installed. It is not part of
the 'all' target, so please build it explicitely with make btrfs-fragments.
A (rather untypical) example can be seen at
http://sensille.com/fragments
Please regard this as a first scratch version and feel free to improve it :)
Signed-off-by: Arne Jansen <sensille@gmx.net>
Allocate fs_info::super_copy dynamically of full BTRFS_SUPER_INFO_SIZE
and use it directly for saving superblock to disk.
This fixes incorrect superblock checksum after mkfs.
Signed-off-by: David Sterba <dsterba@suse.cz>
External software wanting to use the functionality provided by the btrfs
send ioctl has a hard time doing so without replicating tons of work. Of
particular interest are functions like btrfs_read_and_process_send_stream()
and subvol_uuid_search(). As that functionality requires a bit more than
just send-stream.c and send-utils.c we have to pull in some other parts of
the progs package.
This patch adds code to the Makefile and headers to create a library,
libbtrfs which the btrfs command now links to.
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: David Sterba <dsterba@suse.cz>
The super block magic is a le64 whose value looks like an unterminated
string in memory. The lack of null termination leads to clumsy use of
string functions and causes static analysis tools to warn that the
string will be unterminated.
So let's just treat it as the le64 that it is. Endian wrappers are used
on the constant so that they're compiled into run-time constants.
Signed-off-by: Zach Brown <zab@redhat.com>
David Woodhouse originally contributed this code, and Chris Mason
changed it around to reflect the current design goals for raid56.
The original code expected all metadata and data writes to be full
stripes. This meant metadata block size == stripe size, and had a few
other restrictions.
This version allows metadata blocks smaller than the stripe size. It
implements both raid5 and raid6, although it does not have code to
rebuild from parity if one of the drives is missing or incorrect.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This patch syncs the extended inode ref definitions from kernels ctree.h and
adds support in btrfs-debug-tree for visualizing the state of extended refs
on disk.
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
This is the user mode part of the device replace patch series.
The command group "btrfs replace" is added with three commands:
- btrfs replace start srcdev|srcdevid targetdev [-Bfr] mount_point
- btrfs replace status mount_point [-1]
- btrfs replace cancel mount_point
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
"btrfs device stats" is used to retrieve and print the device stats.
"btrfs device stats -z" is used to atomically retrieve, reset and
print the stats.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
gcc optimizes out the memcpy calls at -O2 and -Os.
Replacing memcpy with memmove does't work - gcc treats memmove
the same way it treats memcpy.
This patch brings in {get|put}_unaligned_le{16|32|64} (using the
packed struct method), and uses them in the failing get/set calls.
On architectures where unaligned accesses are cheap, these unaligned
macros should be optimized out by the compiler.
Signed-off-by: Ben Peddell <klightspeed@killerwolves.net>
There are some unaligned accesses in progs that cause malfunction or
crashes on ARM.
This patch fixes the ones we stumbled upon.
Signed-off-by: Arne Jansen <sensille@gmx.net>
Using mkfs.btrfs like:
mkfs.btrfs -l 131072 /dev/sda
will return no error, but after mount it, the dmesg will report:
BTRFS: couldn't mount because metadata blocksize (131072) was too large
The leafsize and nodesize are equal at present, so we just use one function
"check_leaf_or_node_size" to limit leaf and node size below BTRFS_MAX_METADATA_BLOCKSIZE.
Signed-off-by: Robin Dong <sanbai@taobao.com>
Reviewed-by: David Sterba <dave@jikos.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
We want 'btrfs subvolume list' only to list readonly subvolumes, this patch set
introduces a new option 'r' to implement it.
You can use the command like that:
btrfs subvolume list -r <path>
Original-Signed-off-by: Zhou Bo <zhoub-fnst@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Now we check if the root item contains otime and uuid or not by comparing
->generation_v2 and ->generation of the btrfs_root_item structure, it is
wrong because it is possbile that ->generation may equal to the first
variant of the next item. We fix this problem by check the size of btrfs_root_item,
if it is larger than the original one, the new btrfs_root_item contains otime
and uuid. we needn't worry the case that the new filesystem is mounted on the
old kernel. because the otime and uuid are not changed on the old kernel, we can
get the correct result even on the kernel.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Commit bab2c565 accidentally broke 'subvol get-default' command by
removing almost all of the underlying code. Bring it back with some
fixes and improvements.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
When we discover bad blocks in the extent allocation tree, repair can
now discard them and recreate the references from the rest of the trees.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
During btrfsck --repair, we make an index of extents that have incorrect
reference counts. Once we've collect the whole index, we go through
and modify the extent allocation tree to reflect the correct results.
Changing the extent allocation tree may free blocks, and so it may
end up removing a block that had a missing reference structure. The
fsck code may then circle back around and add the reference back.
The result is an extent that isn't actually used, but is recorded in the
extent allocation tree.
This commit adds a hook called as extents are freed. The hook searches
the index of incorrect references and updates it to reflect the freeing
of the extent.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This will effectively delete all of your crcs, but at least you'll
be able to mount the FS with nodatasum.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This also includes a new --repair btrfsck option. For now it can
only fix errors in the extent allocation tree.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
- scrub structs added
- ioctls for scrub
- BTRFS_FSID_SIZE moved
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Hugo Mills <hugo@carfax.org.uk>