Valgrind reports memleak in btrfs_scan_one_device() about allocating
btrfs_device but on btrfs_close_devices() they are not reclaimed.
Although not a bug since after btrfs_close_devices() btrfs will exit so
memory will be reclaimed by system anyway, it's better to fix it anyway.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
this patch will make btrfsck operations to open disk in exclusive mode,
so that mount will fail when btrfsck is running
Signed-off-by: Anand Jain <Anand.Jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
free(3) already checks the pointer for NULL, no need to do it
on your own. This patch make the change globally.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
When reading block groups we will searching it's corresponding chunk, however, at this
time, some chunks has not been built(data chunks raid0/raid10/raid56), don't bug_on here,
we will try to rebuild these chunks later.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
I found that mkfs.btrfs aborts when assigned multi volumes contain
a small volume:
# parted /dev/sdf p
Model: LSI MegaRAID SAS RMB (scsi)
Disk /dev/sdf: 72.8GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 32.3kB 72.4GB 72.4GB primary
2 72.4GB 72.8GB 461MB primary
# ./mkfs.btrfs -f /dev/sdf1 /dev/sdf2
:
SMALL VOLUME: forcing mixed metadata/data groups
adding device /dev/sdf2 id 2
mkfs.btrfs: volumes.c:852: btrfs_alloc_chunk: Assertion `!(ret)' failed.
Aborted (core dumped)
This failure of btrfs_alloc_chunk was caused by following steps:
1) since there is only small space in the small device, mkfs was
going to allocate a chunk from free space as much as available.
So mkfs called btrfs_alloc_chunk with
size = device->total_bytes - device->used_bytes.
2) (According to the comment in source code, to avoid overwriting
superblock,) btrfs_alloc_chunk starts taking chunks at an offset
of 1MB. It means that the layout of a disk will be like:
[[1MB at beginning for sb][allocated chunks]* ... free space ... ]
and you can see that the available free space for allocation is:
avail = device->total_bytes - device->used_bytes - 1MB.
3) Therefore there is only free space 1MB less than requested. damn.
>From further investigations I also found that this issue is easily
reproduced by using -A, --alloc-start option:
# truncate --size=1G testfile
# ./mkfs.btrfs -A900M -f testfile
:
mkfs.btrfs: volumes.c:852: btrfs_alloc_chunk: Assertion `!(ret)' failed.
Aborted (core dumped)
In this case there is only 100MB for allocation but btrfs_alloc_chunk
was going to allocate more than the 100MB.
The root cause of both of above troubles is a same simple bug:
btrfs_chunk_alloc does not calculate available bytes properly even
though it researches how many devices have enough room to have a
chunk to be allocated.
So this patch introduces new function btrfs_device_avail_bytes()
which returns available bytes for allocation in specified device.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
sparse hates variable length array definitions on the stack:
btrfs-show-super.c:155:21: warning: Variable length array is used.
And it's right to. They're a fragile construct that doesn't handle bad
input well at all.
Signed-off-by: Zach Brown <zab@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This fixes all the instances of warnings that symbols declared in blocks
shadow symbols with the same name in surrounding scopes:
cmds-device.c:341:22: warning: symbol 'path' shadows an earlier one
cmds-device.c:285:14: originally declared here
I just renamed or removed the risky shadow symbols instead of pulling
their blocks out into functions.
Signed-off-by: Zach Brown <zab@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
In files copied from the kernel, mark many functions as static,
and remove any resulting dead code.
Some functions are left unmarked if they aren't static in the
kernel tree.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Port of commit b3b4aa7 to userspace.
parameter tree root it's not used since commit
5f39d397dfbe140a14edecd4e73c34ce23c4f9ee ("Btrfs: Create extent_buffer
interface for large blocksizes")
This gets userspace a tad closer to kernelspace by removing
this unused parameter that was all over the codebase...
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
If the chunk tree search failed in volumes.c:btrfs_read_chunk_tree()
return immediately, rather than looping and use the invalid contents
of the path structure, causing weird errors/crash at run time.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
After reading all device items from the chunk tree, don't
exit the loop and then navigate down the tree again to find
the chunk items. Instead just read all device items and
chunk items with a single tree search. This is possible
because all device items are found before any chunk item in
the chunks tree.
This is a port of the corresponding kernel patch to keep both
kernel and btrfs-progs identical:
https://patchwork.kernel.org/patch/2835529/
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
When allocating a btrfs_device structure, device_list_add()
in volumes.c was not checking if the call to duplicate the
label string succeeded or not.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
The btrfs_read_sys_array function uses 3 variants to read data from
super block.
But the three variants are related to each other, so the patch removes
unneeded extra variants and make code a little simpler.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Some codes still use the cpu_to_lexx instead of the
BTRFS_SETGET_STACK_FUNCS declared in ctree.h.
Also added some BTRFS_SETGET_STACK_FUNCS for btrfs_header and
btrfs_super.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
div_factor has been implemented for two times, cleanup it.
And I move them into a independent file named math.h because they are
common math functions.
[Eric Sandeen: port kernel commit 3fed40c to userspace]
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
If a device could not be opened in volumes.c:read_one_dev(), a
btrfs_device instance was allocated and added to the list of
devices of the fs - however this device instance had its fd,
name and label fields not initialized. This is problematic in
disk-io.c:close_all_devices() as it tried to sync, fadvise and
close the (invalid) fd of the device, and kfree() its name and
label, which pointed to random memory locations.
Thread 1 (Thread 0x7f0a3d2d1740 (LWP 23585)):
#0 __GI___libc_free (mem=0xa5a5a5a5a5a5a5a5) at malloc.c:2970
#1 0x000000000042054b in close_all_devices (fs_info=0x1e92bf0) at disk-io.c:1276
#2 0x0000000000421dcd in close_ctree (root=<optimized out>) at disk-io.c:1336
#3 0x0000000000418cfa in cmd_check (argc=<optimized out>, argv=<optimized out>) at cmds-check.c:4171
#4 0x0000000000403ed4 in main (argc=2, argv=0x7fff9a583d28) at btrfs.c:295
v2: Added Liu Bo's review mention.
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This adds a 'btrfs-image -m' option, which let us restore an image that
is built from a btrfs of multiple disks onto several disks altogether.
This aims to address the following case,
$ mkfs.btrfs -m raid0 sda sdb
$ btrfs-image sda image.file
$ btrfs-image -r image.file sdc
---------
so we can only restore metadata onto sdc, and another thing is we can
only mount sdc with degraded mode as we don't provide informations of
another disk. And, it's built as RAID0 and we have only one disk,
so after mount sdc we'll get into readonly mode.
This is just annoying for people(like me) who're trying to restore image
but turn to find they cannot make it work.
So this'll make your life easier, just tap
$ btrfs-image -m image.file sdc sdd
---------
then you get everything about metadata done, the same offset with that of
the originals(of course, you need offer enough disk size, at least the disk
size of the original disks).
Besides, this also works with raid5 and raid6 metadata image.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
A device can be added to the device list without getting a name, so we may
access to illegal addresses while opening devices with their name.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Add chunk rebuild for RAID1/SINGLE/DUP to chunk-recover command.
Before this patch chunk-recover can only scan and reuse the old chunk
data to recover. With this patch, chunk-recover can use the reference
between chunk/block group/dev extent to rebuild the whole chunk tree
even when old chunks are not available.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Add chunk-recover program to check or rebuild chunk tree when the system
chunk array or chunk tree is broken.
Due to the importance of the system chunk array and chunk tree, if one of
them is broken, the whole btrfs will be broken even other data are OK.
But we have some hint(fsid, checksum...) to salvage the old metadata.
So this function will first scan the whole file system and collect the
needed data(chunk/block group/dev extent), and check for the references
between them. If the references are OK, the chunk tree can be rebuilt and
luckily the file system will be mountable.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
As we know, btrfs can manage several devices in the same fs, so [offset, size]
is not sufficient for unique identification of an device extent, we need the
device id to identify the device extents which have the same offset and size,
but are not in the same device. So, we added a member variant named objectid
into the extent cache, and introduced some functions to make the extent cache
be suitable to manage the device extent.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
In fact, the code of many rb-tree insert/search/delete functions is similar,
so we can abstract them, and implement common functions for rb-tree, and then
simplify them.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Some commands(such as btrfs-convert) access the devices again after we close
the ctree, so it is better that we don't free the devices objects when the ctree
is closed, or we need re-allocate the memory for the devices. We needn't worry
the memory leak problem, because all the memory will be freed after the taskes
die.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
As we know, the file descriptor 0 is a special number, so we shouldn't
use it to initialize the file descriptor of the devices, or we might
close this special file descriptor by mistake when we close the devices.
"-1" is a better choice.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Allocate fs_info::super_copy dynamically of full BTRFS_SUPER_INFO_SIZE
and use it directly for saving superblock to disk.
This fixes incorrect superblock checksum after mkfs.
Signed-off-by: David Sterba <dsterba@suse.cz>
It seems highly unlikely that posix_fadvise could fail,
and even if it does, it was only advisory. Still, if
it does, we could issue a notice to the user.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
__btrfs_map_block() can possibly do the goto again: loop after
having allocated & freed the "multi" pointer. There are then
a couple error conditions where it will attempt to again kfree
the now non-NULL multi pointer. So before retrying, reset
multi to NULL after we free it.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Path allocation failure already has its own return, remember to free the
path when the error label is taken.
Signed-off-by: Zach Brown <zab@redhat.com>
struct btrfs_super is about 3.5k but a few writing paths were writing it
out as the full 4k BTRFS_SUPER_INFO_SIZE, leaking a few hundred bytes
after the super_block onto disk. In practice this meant the memory
after super_copy in struct btrfs_fs_info and whatever came after it in
the heap.
Signed-off-by: Zach Brown <zab@redhat.com>
David Woodhouse originally contributed this code, and Chris Mason
changed it around to reflect the current design goals for raid56.
The original code expected all metadata and data writes to be full
stripes. This meant metadata block size == stripe size, and had a few
other restrictions.
This version allows metadata blocks smaller than the stripe size. It
implements both raid5 and raid6, although it does not have code to
rebuild from parity if one of the drives is missing or incorrect.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
This is the user mode part of the device replace patch series.
The command group "btrfs replace" is added with three commands:
- btrfs replace start srcdev|srcdevid targetdev [-Bfr] mount_point
- btrfs replace status mount_point [-1]
- btrfs replace cancel mount_point
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
There are some unaligned accesses in progs that cause malfunction or
crashes on ARM.
This patch fixes the ones we stumbled upon.
Signed-off-by: Arne Jansen <sensille@gmx.net>
This is mostly disabled, but it is step one in handling
corrupted block groups in the extent allocation tree.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
When we're using multipath or raid0, it is possible
that btrfs dev scan will find one of the component devices
instead of the proper virtual device the kernel creates.
We want to make sure the kernel scans the virtual devices last,
since it always remembers the last device it finds with a given fsid.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The btrfs-progs raid10 code has been silently reading the wrong
raid10 block forever. We didn't notice because it was always fixed
up by the retry code.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
gcc-4.6 (as shipped in Fedora) turns on -Wunused-but-set-variable by
default, which breaks the build when combined with -Wall, e.g.:
debug-tree.c: In function ‘print_extent_leaf’:
debug-tree.c:45:13: error: variable ‘last_len’ set but not used [-Werror=unused-but-set-variable]
debug-tree.c:44:13: error: variable ‘last’ set but not used [-Werror=unused-but-set-variable]
debug-tree.c:41:21: error: variable ‘item’ set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors
This patch fixes the errors by removing the unused variables.
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Hugo Mills <hugo@carfax.org.uk>
Changes from V1 to V2:
- support extended attributes
- move btrfs_alloc_data_chunk function to volumes.c
- fix an execution error when additional useless parameters are specified
- fix traverse_directory function so that the insertion functions for the common items are invoked in a single point
The extended attributes is implemented through llistxattr and getxattr function calls.
Thanks
Signed-off-by: Donggeun Kim <dg77.kim@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
When a device is missing, the btrfs tools need to be able to read alternate
copies from the remaining devices. This creates placeholder devices
that always return -EIO so the tools can limp along.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This patch updates the ext3 to btrfs converter for the new
disk format. This mainly involves changing the convert's
data relocation and free space management code. This patch
also ports some functions from kernel module to btrfs-progs.
Thank you,
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
This adds a sequence number to the btrfs inode that is increased on
every update. NFS will be able to use that to detect when an inode has
changed, without relying on inaccurate time fields.
While we're here, this also:
Puts reserved space into the super block and inode
Adds a log root transid to the super so we can pick the newest super
based on the fsync log as well as the main transaction ID. For now
the log root transid is always zero, but that'll get fixed.
Adds a starting offset to the dev_item. This will let us do better
alignment calculations if we know the start of a partition on the disk.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This patch updates btrfs-progs for superblock duplication.
Note: I didn't make this patch as complete as the one for
kernel since updating the converter requires changing the
code again. Thank you,
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>