root->highest_inode is not accurate at the time of creating a lost+found
and it fails because the highest_inode+1 is already present. This could be
because of fixes after highest_inode is set. Instead, search
for the highest inode in the tree and use it for lost+found.
This makes root->highest_inode unnecessary and hence deleted.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If this flag is passed to open_ctree(), we'll clear the
FREE_SPACE_TREE_VALID compat_ro bit. The kernel will then reconstruct
the free space tree the next time the filesystem is mounted.
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It seems like bad idea to use a library name (lblkid) within generic
function name. The currently used scanning library is implementation
detail and this detail should be hidden for rest of the code.
Signed-off-by: Karel Zak <kzak@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
If blocksize is 0, it passes the IS_ALIGNED check but fails later as the
length of ebs will be zero.
Reported-by: Lukas Lueg <lukas.lueg@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=169311
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_read_dev_super() only returns 0 or -1, which doesn't really help,
caller won't know if it's caused by bad superblock or superblock out of
range.
Return -errno if pread64() return -1, and return -EOF if none or part of
the super is read out, and return what check_super() returned.
So caller can get -EIO to catch real corrupted super blocks.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Although we have enhanced read_tree_block() from a lot of different
aspects, it lacks the early bytenr/blocksize alignment check.
And the lack of such check can lead to strange use-after-free bugs, due
to the fact that alloc_extent_buffer() will free overlapping extent
buffers, and allocate new eb for the usage.
So we should not allow invalid bytenr/blocksize even passed to
btrfs_find_create_tree_block().
This patch will add such check so we won't trigger use-after-free bug
then.
Reported-by: Lukas Lueg <lukas.lueg@gmail.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The filesystem existence on a device is manifested by the signature,
during the mkfs process we write it first and then create other
structures. Such filesystem is not valid and should not be registered
during device scan nor listed among devices from blkid.
This patch will introduce two staged creation. In the first phase, the
signature is wrong, but recognized as a partially created filesystem (by
open or scan helpers). Once we successfully create and write everything,
we fixup the signature. At this point automated scanning should find
a valid filesystem on all devices.
We can also rely on the partially created filesystem to do better error
handling during creation. We can just bail out and do not need to clean
up.
The partial signature is '!BHRfS_M', can be shown by
btrfs inspect-internal dump-super -F image
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As we're passing a set of flags, the enum type is not appropriate.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This will not cause unaligned access as the checksum is at the beginning
of btrfs_header and thus aligned to a page, but for clarity use the
helper.
Signed-off-by: David Sterba <dsterba@suse.com>
stripesize should ideally be set to the value of sectorsize. However
previous versions of btrfs-progs/mkfs.btrfs had set stripesize to a
value of 4096. On machines with PAGE_SIZE other than 4096, This could
lead to the following scenario,
- /dev/loop0, /dev/loop1 and /dev/loop2 are mounted as a single
filesystem. The filesystem was created by an older version of mkfs.btrfs
which set stripesize to 4k.
- losetup -a
/dev/loop0: [0030]:19477 (/root/disk-imgs/file-0.img)
/dev/loop1: [0030]:16577 (/root/disk-imgs/file-1.img)
/dev/loop2: [64770]:3423229 (/root/disk-imgs/file-2.img)
- /etc/mtab lists only /dev/loop0
- losetup /dev/loop4 /root/disk-imgs/file-1.img
The new mkfs.btrfs invoked as 'mkfs.btrfs -f /dev/loop4' succeeds even
though /dev/loop1 has already been mounted and has
/root/disk-imgs/file-1.img as its backing file.
The above behaviour occurs because check_super() function returns an
error code (due to stripesize not being set to 4096) and hence
check_mounted_where() function treats /dev/loop1 as a disk containing a
filesystem other than Btrfs.
Hence as a workaround this commit allows 4096 as a valid stripesize.
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We stat the filesystem path before trying to open it so there's no point
to pass O_CREAT ("btrfs-progs: add stat check in open_ctree_fs_info").
Signed-off-by: David Sterba <dsterba@suse.com>
For btrfs, it's possible to have empty leaf, but empty node is not
possible.
Add check for empty node for tree blocks.
Suggested-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This adds validation checks for super_total_bytes, super_bytes_used and
super_stripesize.
Since these checks are made after superblock finishes checksum
checking, this also adds a notice of "superblock checksum matches but..".
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
[ adjusted message wording ]
Signed-off-by: David Sterba <dsterba@suse.com>
The flag OPEN_CTREE_RECOVER_SUPER is set when it's going to recover
any bad superblock copy, the current code doesn't match that.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nodesize is used in kernel, the values are always equal. We have to keep
leafsize in headers, similarly the tree setting functions still take and
set leafsize, but it's effectively a no-op.
Signed-off-by: David Sterba <dsterba@suse.com>
Currently, open_ctree_fs_info will open whatever path you pass it and try
to interpret it as a BTRFS filesystem. While this is not nessecarily
dangerous (except possibly if done on a character device), it does
result in some rather cryptic and non-sensical error messages when
trying to run certain commands in ways they weren't intended to be run.
Add a check using stat(2) to verify that the path we've been passed is
in fact a regular file or a block device, or a symlink pointing to a
regular file or block device.
This causes the following commands to provide a helpful error message
when run on a FIFO, directory, character device, or socket:
* btrfs check
* btrfs restore
* btrfs-image
* btrfs-find-root
* btrfs inspect-internal dump-tree
stat(2) is used instead of lstat(2), as stat(2) follows symlinks just
like open(2) does, which means we check the same inode that open(2)
opens, and thus don't need special handling for symlinks.
Signed-off-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add new btrfsck option, '--chunk-root', to specify chunk root bytenr.
And allow open_ctree_fs_info() function accept chunk_root_bytenr to
override the bytenr in superblock. This will be mainly used when chunk
tree corruption.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since open_ctree_fs_info() now may return a fs_info even without any
roots, modify functions like read_tree_block() to operate with such
fs_info.
This provides the basis for btrfs-find-root to operate on chunk tree
with corrupted fs.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ coding style adjustments, unified declarations ]
Signed-off-by: David Sterba <dsterba@suse.com>
Current open_ctree_fs_info() won't return anything if chunk tree root is
corrupted.
This makes some function, like btrfs-find-root, unable to find any older
chunk tree root, even it is possible to use system_chunk_array in super
block.
And at least two users in mail list has reported such heavily chunk
corruption.
Although we have 'btrfs rescue chunk-recovery' but it's too time
consuming and sometimes not able to cope with a specific filesystem
corruption.
This patch adds a new open ctree flag,
OPEN_CTREE_IGNORE_CHUNK_TREE_ERROR, allowing fs_info to be returned from
open_ctree_fs_info() even there is no valid tree root in it.
Also adds a new close_ctree() variant, close_ctree_fs_info() to handle
possible fs_info without any root.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ adjusted error messages ]
Signed-off-by: David Sterba <dsterba@suse.com>
To start, let's tell btrfs-progs to read the free space root and how to
print the on-disk format of the free space tree. However, we're not
adding the FREE_SPACE_TREE read-only compat bit to the set of supported
bits because progs doesn't know how to keep the free space tree
consistent.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Enhance chunk validation:
1) Num_stripes
We already have such check but it's only in super block sys chunk
array.
Now check all on-disk chunks.
2) Chunk logical
It should be aligned to sector size.
This behavior should be *DOUBLE CHECKED* for 64K sector size like
PPC64 or AArch64.
Maybe we can found some hidden bugs.
3) Chunk length
Same as chunk logical, should be aligned to sector size.
4) Stripe length
It should be power of 2.
5) Chunk type
Any bit out of TYPE_MAS | PROFILE_MASK is invalid.
With all these much restrict rules, several fuzzed image reported in
mail list should no longer cause btrfsck error.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch is generated from a coccinelle semantic patch:
identifier t;
expression e;
statement s;
@@
-t = malloc(e);
+t = calloc(1, e);
(
if (!t) s
|
if (t == NULL) s
|
)
-memset(t, 0, e);
Signed-off-by: Silvio Fricke <silvio.fricke@gmail.com>
[squashed patches into one]
Signed-off-by: David Sterba <dsterba@suse.com>
Now btrfs-progs will have much more strict superblock checks based on
kernel superblock checks.
This should prevent crashes or invalid memory access on crafted or
fuzzed images.
Based on kernel commit c926093ec516f5d316ecdf8c1be11f577ac71b85 .
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[added reference to kernel and comments]
Signed-off-by: David Sterba <dsterba@suse.com>
Before the patch, btrfs-progs will only read sizeof(struct
btrfs_super_block) and restore it into super_copy.
This makes checksum check for superblock impossible. Change it to read
the whole superblock.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
fsck-tests.sh failed and show following message in my node:
# ./fsck-tests.sh
[TEST] 001-bad-file-extent-bytenr
disk-io.c:1444: write_dev_supers: Assertion `ret != BTRFS_SUPER_INFO_SIZE` failed.
/root/btrfsprogs/btrfs-image(write_all_supers+0x2d2)[0x41031c]
/root/btrfsprogs/btrfs-image(write_ctree_super+0xc5)[0x41042e]
/root/btrfsprogs/btrfs-image(btrfs_commit_transaction+0x208)[0x410976]
/root/btrfsprogs/btrfs-image[0x438780]
/root/btrfsprogs/btrfs-image(main+0x3d5)[0x438c5c]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd]
/root/btrfsprogs/btrfs-image[0x4074e9]
failed to restore image /root/btrfsprogs/tests/fsck-tests/001-bad-file-extent-bytenr/default_case.img
#
# cat fsck-tests-results.txt
=== Entering /root/btrfsprogs/tests/fsck-tests/001-bad-file-extent-bytenr
restoring image default_case.img
failed to restore image /root/btrfsprogs/tests/fsck-tests/001-bad-file-extent-bytenr/default_case.img
#
Reason:
I run above test in a NFS mountpoint, it don't have enouth space to write
all superblock to image file, and don't support sparse file.
So write_dev_supers() failed in writing sb and output above message.
It takes me quite of time to know what happened, we can save these time
by output exact information in write-sb-fail case.
After patch:
# ./fsck-tests.sh
[TEST] 001-bad-file-extent-bytenr
WARNING: Write sb failed: File too large
disk-io.c:1492: write_all_supers: Assertion `ret` failed.
...
#
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Offline btrfs tools, like btrfs-image, will infinitely loop when there
is missing device.
The reason is, for missing device, it's fd will be set to -1, but before
we reading, we only check the fd validation by checking if it's 0.
So in that case, -1 will pass the validation check, and cause pread to
return 0, and loop to read.
Just change the validation check from "== 0" to "<= 0" to avoid such
problem.
Reported-by: Timothy Normand Miller <theosib@gmail.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We forgot free raid_map for raid56's map_bio.
This patch add it.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As chunk tree is only stored in super block, chunk tree commit doesn't
need to go through tree root update.
Or a BUG_ON will be triggered.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add the following tree block check to avoid memory corruption on hostile
image:
1) Check level.
Level >= BTRFS_MAX_LEVEL won't be read out.
2) Nritems.
For nr_items > max_nritems, the tree_block won't be read out.
Max nritems is calculated in a easy method.
For node, it's straightforward, just (nodesize - header size) /
(btrfs_key_ptr)
For leaf, (nodesize - header size) / (btrfs_item), as btrfs support zero
item size
This fixes 3 kernel bugs: BZ#97171, BZ#97191, BZ#97271.
Reported-by: Lukas Lueg <lukas.lueg@gmail.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Export write_tree_block() function and allow it write extent without
transaction.
This provides the basis for later uuid change function.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Now open_ctree will exit if it found the superblock is marked
CHANGING_FSID, except given IGNORE_FSID open ctree flags.
Kernel will do the same thing later.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[removed the chunk tree flag, reworded the error message]
Signed-off-by: David Sterba <dsterba@suse.cz>
Add new flag CHUNK_ONLY and internal used only flag __RETURN_CHUNK.
CHUNK_ONLY will imply __RETURN_CHUNK, SUPPRESS_ERROR and PARTIAL, which
will allow the fs to be opened with only chunk tree OK.
This will improve the usability for btrfs-find-root.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>