root->highest_inode is not accurate at the time of creating a lost+found
and it fails because the highest_inode+1 is already present. This could be
because of fixes after highest_inode is set. Instead, search
for the highest inode in the tree and use it for lost+found.
This makes root->highest_inode unnecessary and hence deleted.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Size of btrfs_path can be reduced by 32 bytes as we don't use the locks
array, down to 112 from 144 bytes.
Signed-off-by: David Sterba <dsterba@suse.com>
The number of distinct key types is not that big that we could waste one
for something new we want to store in the tree.
Similar to the temporary items, we'll introduce a new name for an
existing key value and use the objectid for further extension. The
victim is the BTRFS_DEV_STATS_KEY (248).
The device stats are an example of a permanent item.
[ kernel patch 50c2d5abe64c1726b48d292a2ab04f60e8238933 ]
Signed-off-by: David Sterba <dsterba@suse.com>
The number of distinct key types is not that big that we could waste one
for something new we want to store in the tree. We'll introduce a new
name for an existing key value and use the objectid for further
extension. The victim is the BTRFS_BALANCE_ITEM_KEY (248).
The nature of the balance status item is a good example of the temporary
item. It exists from beginning of the balance, keeps the status until it
finishes.
[ kernel patch 0bbbccb17fea86818e1a058faf5903aefd20b31a ]
Signed-off-by: David Sterba <dsterba@suse.com>
Btrfs-progs header lacks quite a lot inode flags.
Copy them from kernel for later use.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs-progs can't mount space_cache=v2 filesystems read-write, which is
why the compat bit wasn't added to the supported mask in the first
place. Remove it.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The filesystem existence on a device is manifested by the signature,
during the mkfs process we write it first and then create other
structures. Such filesystem is not valid and should not be registered
during device scan nor listed among devices from blkid.
This patch will introduce two staged creation. In the first phase, the
signature is wrong, but recognized as a partially created filesystem (by
open or scan helpers). Once we successfully create and write everything,
we fixup the signature. At this point automated scanning should find
a valid filesystem on all devices.
We can also rely on the partially created filesystem to do better error
handling during creation. We can just bail out and do not need to clean
up.
The partial signature is '!BHRfS_M', can be shown by
btrfs inspect-internal dump-super -F image
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently the superblock is created first, with a valid signaure, but
the rest of the filesystem is missing. When the creation process is
interrupted, the filesystem still might be considered as valid.
To prevent that, create the filesytem with an invalid signature that
would be still recognized during the mkfs process, and finalize at the
end.
Signed-off-by: David Sterba <dsterba@suse.com>
The new convert treats the convert image as a normal file, without any
special flags and permissions.
This is different from original code:
1) Permission changed from 0400 to 0600
2) Inode lacks READONLY flag
This makes we can read-write mount the ext2 image and cause rollback
failure.
Follow old code behavior, use 0400 permission and add back READONLY
flag to fix it.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new function check_data_extent_item() to check if the
corresponding data backref exists in extent tree.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that we can verify all qgroups, we can write the corrected qgroups out
to disk when '--repair' is specified. The qgroup status item is also updated
to clear any out-of-date state. The repair_ functions were modeled after the
inode repair code in cmds-check.c.
I also renamed the 'scan' member of qgroup_status_item to 'rescan' in order
to keep consistency with the kernel.
Testing this was easy, I just reproduced qgroup inconsistencies via the
usual routes and had btrfsck fix them.
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Cleanup all the old btrfs-convert facilities, including:
1) btrfs_convert_operations->alloc/free/test_extents*
No need to do non-standard extent allocation.
After init_btrfs() everything can be done by normal routine.
Now only 4 functions are needed in btrfs_convert_operations.
1) open_fs
2) read_used_space
3) copy_inodes
4) close_fs
2) fs_info->extent_ops
Same as above.
3) Old init_btrfs(), create_image(), create_file_image_range()
Replaced with newer and cleaner one.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Before this patch, btrfs-convert only rely on large enough initial
system/metadata chunk size to ensure no newer system/meta chunk will be
created.
But that's not safe enough. So add two new members in fs_info,
avoid_sys/meta_chunk_alloc flags to prevent any newer system or meta
chunks to be created before init_btrfs_v2().
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Btrfs_record_file_extent() has some small problems like:
1) Can't handle overlapping extents
2) May create extent larger than BTRFS_MAX_EXTENT_SIZE
So enhance it using previously added facilites.
This is used for later btrfs-convert, as for new convert, we create
saved image first, then copy inode.
Which will also cause extent overlapping.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new function, btrfs_search_overlap_extent() to find the first
overlapping extent.
It's useful for later btrfs-convert rework.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The search header is usually accessed in an unaligned way, we could
trigger errors (SIGBUS) on architectures that do not support that.
Signed-off-by: David Sterba <dsterba@suse.com>
Nodesize is used in kernel, the values are always equal. We have to keep
leafsize in headers, similarly the tree setting functions still take and
set leafsize, but it's effectively a no-op.
Signed-off-by: David Sterba <dsterba@suse.com>
Current open_ctree_fs_info() won't return anything if chunk tree root is
corrupted.
This makes some function, like btrfs-find-root, unable to find any older
chunk tree root, even it is possible to use system_chunk_array in super
block.
And at least two users in mail list has reported such heavily chunk
corruption.
Although we have 'btrfs rescue chunk-recovery' but it's too time
consuming and sometimes not able to cope with a specific filesystem
corruption.
This patch adds a new open ctree flag,
OPEN_CTREE_IGNORE_CHUNK_TREE_ERROR, allowing fs_info to be returned from
open_ctree_fs_info() even there is no valid tree root in it.
Also adds a new close_ctree() variant, close_ctree_fs_info() to handle
possible fs_info without any root.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ adjusted error messages ]
Signed-off-by: David Sterba <dsterba@suse.com>
The BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE bit is supposed to be in the
COMPAT_RO_SUPP bitmask.
Reported-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This reuses the existing code for checking the free space cache, we just
need to load the free space tree. While we do that, we check a couple of
invariants on the free space tree itself. This requires pulling in some
code from the kernel to exclude the super stripes.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
To start, let's tell btrfs-progs to read the free space root and how to
print the on-disk format of the free space tree. However, we're not
adding the FREE_SPACE_TREE read-only compat bit to the set of supported
bits because progs doesn't know how to keep the free space tree
consistent.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The sequence, transid and reserved fields of inode were writen to disk
with uninitizlized value, this patch fixes it.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
s/generation/sequence/
for BTRFS_SETGET_STACK_FUNCS(stack_inode_sequence, ...)
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As convert implement its own alloc extent, avoid such metadata problem
too.
Reported-by: Chris Murphy <lists@colorremedies.com>
Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently there is not way for a user to know what is the minimum size a
device of a btrfs filesystem can be resized to. Sometimes the value of
total allocated space (sum of all allocated chunks/device extents), which
can be parsed from 'btrfs filesystem show' and 'btrfs filesystem usage',
works as the minimum size, but sometimes it does not, namely when device
extents have to relocated to holes (unallocated space) within the new
size of the device (the total allocated space sum).
This change adds the ability to reliably compute such minimum value and
extents 'btrfs filesystem resize' with the following syntax to get such
value:
btrfs filesystem resize [devid:]get_min_size
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function will be used to free a empty chunk.
This provides the basis for later temp chunk cleanup.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Current btrfs only support CRC32 as checksum algorithm.
But in btrfs_csum_sizes array, we have an extra 0 at tail, causing
csum_type 1 can still be considered as supported csum type.
Fix it by removing the tailing 0.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This function is used to change fsid and chunk_tree_uuid of a node/leaf.
The function does it without transaction protection.
This is the basis of offline uuid change.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Now open_ctree will exit if it found the superblock is marked
CHANGING_FSID, except given IGNORE_FSID open ctree flags.
Kernel will do the same thing later.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[removed the chunk tree flag, reworded the error message]
Signed-off-by: David Sterba <dsterba@suse.cz>
Add the super flag to inform kernel not to mount a filesystem wich fsid
change is in progress.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[removed the chunk tree flag]
Signed-off-by: David Sterba <dsterba@suse.cz>
We have this check in the kernel but not in userspace, which makes fsck
fail when we wouldn't have a problem in the kernel. This was meant to
catch this case because it really isn't good, unfortunately it will
require a design change to fix in the kernel so in the meantime add this
check so we can be sure our tests only catch real problems. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
This provides the basis for later qgroup related changes.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Ctree.h of btrfs-progs contains wrong flags for btrfs_qgroup_status.
Update it with the one in kernel.
Also, introduce the inline function btrfs_qgroup_(level/subvid) to get
the level/subvolid of qgroup, to replace the old open-coded bit
operations.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Add new open ctree flag OPEN_CTREE_SUPPRESS_CHECK_BLOCK_ERRORS to
suppress tree block csum error output.
Provides the basis for new btrfs-find-root and other enhancement on
btrfs offline tools output.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[renamed vars and funcs, added comments]
Signed-off-by: David Sterba <dsterba@suse.cz>
We hold a transaction open for the entirety of fixing extent refs. This works
out ok most of the time but we can be tight on space and run out of space when
fixing things. To get around this just push down the transaction starting dance
into the functions that actually fix things. This keeps us from ending up with
ENOSPC because we pinned everything and allows the code to be a bit simpler.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
The METADUMP super flag makes us skip doing the chunk tree reading which isn't
helpful for the new restore since we have a valid chunk tree. But we still want
to have a way for the kernel to know that this is a metadump restore so it
doesn't do things like verify data checksums. We also want to skip some of the
device extent checks in fsck since those will obviously not match. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>