Tree blocks are always nodesize. As readahead is only an optimization,
exact size is not required and is only advisory.
Signed-off-by: David Sterba <dsterba@suse.com>
Metadata blocks are always nodesize. When reading the
superblock::sys_array, the actual size of data is fixed to 4k and
smaller than nodesize, but otherwise everything works as before.
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new header, kernel-lib/raid56.h, for later raid56 works.
It contains 2 functions, from original btrfs-progs code:
void raid6_gen_syndrome(int disks, size_t bytes, void **ptrs);
int raid5_gen_result(int nr_devs, size_t stripe_len, int dest, void **data);
Will be expanded later and some part of it(RAID6 recover part) may keep
sync with kernel later.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ unify gpl header, rename header macro ]
Signed-off-by: David Sterba <dsterba@suse.com>
The only reasom read_tree_block() needs a btrfs_root parameter is to get
its node/sector size.
And long ago, I have already introduced a compactible interface,
read_tree_block_fs_info() to pass btrfs_fs_info instead of btrfs_root.
Since we have cleaned up all root->sector/node/stripesize users, we
should be OK to refactor read_tree_block() function.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Since we have cached block sizes in fs_info, there is no need to specify
these sizes in btrfs_setup_root() function.
And refactor all root->sector/node/stripesize users in disk-io.c.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Leafsize is deprecated for a long time, and kernel has already updated
ctree.h to rename sb->leafsize to sb->__unused_leafsize.
This patch will remove normal users of leafsize:
1) Remove leafsize member from btrfs_root structure
Now only root->nodesize and root->sectorisze.
No longer root->leafsize.
2) Remove @leafsize parameter from btrfs_setup_root() function
Since no root->leafsize, no need for @leafsize parameter.
The remaining user of leafsize will be:
1) btrfs inspect-internal dump-super
Reformat the "leafsize" output to "leafsize (deprecated)" and
use le32_to_cpu() to do the cast manually.
2) mkfs
We still need to set sb->__unused_leafsize to nodesize.
Do the manual cast too.
3) convert
Same as mkfs, these two superblock setup should be merged later
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Introduce a new macro, BTRFS_SB_OFFSET() to calculate backup superblock
offset, this is handy if one wants to initialize static array at
declaration time.
Suggested-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Large numbers like (1024 * 1024 * 1024) may cost reader/reviewer to
waste one second to convert to 1G.
Introduce kernel include/linux/sizes.h to replace any intermediate
number larger than 4096 (not including 4096) to SZ_*.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce new function raid5_gen_result() to calculate parity or data
stripe.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If this flag is passed to open_ctree(), we'll clear the
FREE_SPACE_TREE_VALID compat_ro bit. The kernel will then reconstruct
the free space tree the next time the filesystem is mounted.
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This allows us to put raid5 codes into that file other than creating a
new raid5.c.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The filesystem existence on a device is manifested by the signature,
during the mkfs process we write it first and then create other
structures. Such filesystem is not valid and should not be registered
during device scan nor listed among devices from blkid.
This patch will introduce two staged creation. In the first phase, the
signature is wrong, but recognized as a partially created filesystem (by
open or scan helpers). Once we successfully create and write everything,
we fixup the signature. At this point automated scanning should find
a valid filesystem on all devices.
We can also rely on the partially created filesystem to do better error
handling during creation. We can just bail out and do not need to clean
up.
The partial signature is '!BHRfS_M', can be shown by
btrfs inspect-internal dump-super -F image
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As we're passing a set of flags, the enum type is not appropriate.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add new btrfsck option, '--chunk-root', to specify chunk root bytenr.
And allow open_ctree_fs_info() function accept chunk_root_bytenr to
override the bytenr in superblock. This will be mainly used when chunk
tree corruption.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since open_ctree_fs_info() now may return a fs_info even without any
roots, modify functions like read_tree_block() to operate with such
fs_info.
This provides the basis for btrfs-find-root to operate on chunk tree
with corrupted fs.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ coding style adjustments, unified declarations ]
Signed-off-by: David Sterba <dsterba@suse.com>
Current open_ctree_fs_info() won't return anything if chunk tree root is
corrupted.
This makes some function, like btrfs-find-root, unable to find any older
chunk tree root, even it is possible to use system_chunk_array in super
block.
And at least two users in mail list has reported such heavily chunk
corruption.
Although we have 'btrfs rescue chunk-recovery' but it's too time
consuming and sometimes not able to cope with a specific filesystem
corruption.
This patch adds a new open ctree flag,
OPEN_CTREE_IGNORE_CHUNK_TREE_ERROR, allowing fs_info to be returned from
open_ctree_fs_info() even there is no valid tree root in it.
Also adds a new close_ctree() variant, close_ctree_fs_info() to handle
possible fs_info without any root.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ adjusted error messages ]
Signed-off-by: David Sterba <dsterba@suse.com>
Add includes that let the header files compile or add explicit include
of kerncompat if the uXX types are used.
Signed-off-by: David Sterba <dsterba@suse.cz>
Export write_tree_block() function and allow it write extent without
transaction.
This provides the basis for later uuid change function.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Now open_ctree will exit if it found the superblock is marked
CHANGING_FSID, except given IGNORE_FSID open ctree flags.
Kernel will do the same thing later.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[removed the chunk tree flag, reworded the error message]
Signed-off-by: David Sterba <dsterba@suse.cz>
Add new flag CHUNK_ONLY and internal used only flag __RETURN_CHUNK.
CHUNK_ONLY will imply __RETURN_CHUNK, SUPPRESS_ERROR and PARTIAL, which
will allow the fs to be opened with only chunk tree OK.
This will improve the usability for btrfs-find-root.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Add new open ctree flag OPEN_CTREE_SUPPRESS_CHECK_BLOCK_ERRORS to
suppress tree block csum error output.
Provides the basis for new btrfs-find-root and other enhancement on
btrfs offline tools output.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[renamed vars and funcs, added comments]
Signed-off-by: David Sterba <dsterba@suse.cz>
When we go to fixup the dev items after a restore we scan all existing devices.
If you happen to be a btrfs developer you could possibly open up some random
device that you didn't just restore onto, which gives you weird errors and makes
you super cranky and waste a day trying to figure out what is failing. This
will make it so that we use the fd we've already opened for opening our ctree.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Change the immediate number in btrfs_open_ctree_flags to bit shift.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
As of now commands mentioned below (with in [..]) are calling call register-device
ioctl BTRFS_IOC_SCAN_DEV for all the devices in the system.
Some issues with it:
BTRFS_IOC_SCAN_DEV: ioctl is a write operation, we don't want command like
btrfs-debug-tree threads to do that..
eg:
----
$ cat /proc/fs/btrfs/devlist | egrep fsid | wc -l
0
$ btrfs-debug-tree /dev/sde (num_device > 1)
$ cat /proc/fs/btrfs/devlist | egrep fsid | wc -l
5
----
btrfs_scan_fs_devices() ends up calling this ioctl only when num_device > 1.
That's inconsistency with in feature/bug.
We don't have to register _all_ the btrfs devices (again) in the system
without user consent.
Why its inconsistent:
function btrfs_scan_fs_devices() calls btrfs_scan_lblkid only when
num_devices is > 1, which in turn calls BTRFS_IOC_SCAN_DEV ioctl, if
conditions are met.
But main issue is we have too many consumers of btrfs_scan_fs_devices()
the names below with in [] is the cli leading to this function.
open_ctree_broken() [btrfs-find-root]
recover_prepare() [btrfs rescue super-recover]
__open_ctree_fd
(updates always except when flag OPEN_CTREE_RECOVER_SUPER is set and
flag OPEN_CTREE_RECOVER_SUPER is set only by 'btrfs rescue super-
recover' but still this thread sneaks through the open_ctree function
to call register-device-ioctl as show below).
open_ctree_fs_info
[btrfs-debug-tree]
[btrfs-image -r]
[btrfs check]
open_fs
[btrfs restore]
open_ctree
[calc-size]
[btrfs-corrupt-block]
[btrfs-image] (create)
[btrfs-map-logical]
[btrfs-select-super]
[btrfstune]
[btrfs-zero-log]
[tester]
[mkfs]
[quick-test.c]
[btrfs label set unmounted]
[btrfs get label unmounted]
[btrfs rescue super-recover]
open_ctree_fd
[btrfs-convert]
Fix:
In an effort to make register-device consistent, all calls to
btrfs_scan_fs_devices() will have 5th parameter set to 0. that means
we don't need 5th parameter at all. And with this function not calling
the register ioctl at all, finally we will have following two cli to call
the ioctl BTRFS_IOC_SCAN_DEV.
btrfs dev scan and
mkfs.btrfs
Threads needing to update kernel about a device would have to use
btrfs_register_one_device() separately.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
David sent a quick patch that removed a BUG_ON(). I took a peek and
found that the function was already leaking an eb ref and only returned
0. So this fixes the leak and makes the function void and fixes up the
callers.
Accidentally-motivated-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Zach Brown <zab@zabbo.net>
Signed-off-by: David Sterba <dsterba@suse.cz>
Btrfs-progs superblock checksum check is somewhat too restricted for
super-recover, since current btrfs-progs will only read the 1st
superblock and if you need super-recover the 1st superblock is
possibly already damaged.
The fix is introducing super_recover parameter for
btrfs_read_dev_super() and callers to allow scan backup superblocks if
needed.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
this patch will make btrfsck operations to open disk in exclusive mode,
so that mount will fail when btrfsck is running
Signed-off-by: Anand Jain <Anand.Jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
So I needed to add a flag to not try to read block groups when doing
--init-extent-tree since we could hang there, but that meant adding a whole
other 0/1 type flag to open_ctree_fs_info. So instead I've converted it all
over to using a flags setting and added the flag that I needed. This has been
tested with xfstests and make test. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
In some cases the tree root is so hosed we can't get anything useful out of it.
So add the -b option to btrfsck to make us look for the most recent backup tree
root to use for repair. Then we can hopefully get ourselves into a working
state. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
mkfs -r wasn't creating chunks properly, making it very difficult to
allocate space for anything except tiny filesystems.
This changes it around to use more of the generic infrastructure, and
to do actual logical->physical block number translation.
It also allocates space to the files in smaller extents (max 1MB), which
keeps the allocator from trying to allocate an extent bigger than a
single chunk.
It doesn't quite support multi-device mkfs -r yet, but is much closer.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>