btrfs-progs

mirror of https://github.com/kdave/btrfs-progs synced 2025-02-20 03:36:50 +00:00

Author	SHA1	Message	Date
Qu Wenruo	48248693cd	btrfs-progs: check/lowmem: Reset path in repair mode to avoid incorrect item from being passed to lowmem check. In lowmem mode, we check fs roots and free space cache by iterating each root item and inode item, using btrfs_next_item() and a path pointing to the root tree. However in repair mode, check_fs_root() can modify the fs root, thus CoWs the tree root, and the old path in check_fs It could lead to strange behavior, e.g. after repairing a fs tree, the path can point to a fs tree. Since no ROOT_ITEM exists in fs tree, all remaining trees are skipped in repair mode. This bug exists from the early time of lowmem mode repair, and is only exposed by recent free space inode check code. (Fs tree inodes are passed to free space inode check, causing false alerts and repair failure). Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-06-05 18:02:05 +02:00
Qu Wenruo	085445e793	btrfs-progs: Cleanup BTRFS_COMPAT_EXTENT_TREE_V0 BTRFS_COMPAT_EXTENT_TREE_V0 is introduced for a short time in kernel, and it's over 10 years ago. Nowadays there should be no user for that feature, and kernel has remove this support in Jun, 2018. There is no need for btrfs-progs to support it. This patch will remove EXTENT_TREE_V0 related code and replace those BUG_ON() to a more graceful error message. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-06-05 18:00:07 +02:00
Su Yue	c6f903fa04	btrfs-progs: fix invalid memory write in get_fs_info() As the link reported, btrfs fi sh may crash while a device is removing. valgrind reported: ====================================================================== ... ==883== Invalid write of size 8 ==883== at 0x13C99A: get_device_info (in /usr/bin/btrfs) ==883== by 0x13D715: get_fs_info (in /usr/bin/btrfs) ==883== by 0x153B5F: ??? (in /usr/bin/btrfs) ==883== by 0x11B0C1: main (in /usr/bin/btrfs) ==883== Address 0x4d8c7a0 is 0 bytes after a block of size 12,288 alloc'd ==883== at 0x483877F: malloc (vg_replace_malloc.c:299) ==883== by 0x13D861: get_fs_info (in /usr/bin/btrfs) ==883== by 0x153B5F: ??? (in /usr/bin/btrfs) ==883== by 0x11B0C1: main (in /usr/bin/btrfs) ==883== ==883== Invalid write of size 8 ==883== at 0x13C99D: get_device_info (in /usr/bin/btrfs) ==883== by 0x13D715: get_fs_info (in /usr/bin/btrfs) ==883== by 0x153B5F: ??? (in /usr/bin/btrfs) ==883== by 0x11B0C1: main (in /usr/bin/btrfs) ==883== Address 0x4d8c7a8 is 8 bytes after a block of size 12,288 alloc'd ==883== at 0x483877F: malloc (vg_replace_malloc.c:299) ==883== by 0x13D861: get_fs_info (in /usr/bin/btrfs) ==883== by 0x153B5F: ??? (in /usr/bin/btrfs) ==883== by 0x11B0C1: main (in /usr/bin/btrfs) ==883== ==883== Syscall param ioctl(generic) points to unaddressable byte(s) ==883== at 0x4CA9CBB: ioctl (in /usr/lib/libc-2.29.so) ==883== by 0x13C9AB: get_device_info (in /usr/bin/btrfs) ==883== by 0x13D715: get_fs_info (in /usr/bin/btrfs) ==883== by 0x153B5F: ??? (in /usr/bin/btrfs) ==883== by 0x11B0C1: main (in /usr/bin/btrfs) ==883== Address 0x4d8c7a0 is 0 bytes after a block of size 12,288 alloc'd ==883== at 0x483877F: malloc (vg_replace_malloc.c:299) ==883== by 0x13D861: get_fs_info (in /usr/bin/btrfs) ==883== by 0x153B5F: ??? (in /usr/bin/btrfs) ==883== by 0x11B0C1: main (in /usr/bin/btrfs) ==883== --883-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting --883-- si_code=1; Faulting address: 0x284D8C7B8; sp: 0x1002eb5e50 valgrind: the 'impossible' happened: Killed by fatal signal host stacktrace: ==883== at 0x5805261C: get_bszB_as_is (m_mallocfree.c:303) ==883== by 0x5805261C: get_bszB (m_mallocfree.c:315) ==883== by 0x5805261C: vgPlain_arena_malloc (m_mallocfree.c:1799) ==883== by 0x58005AD2: vgMemCheck_new_block (mc_malloc_wrappers.c:372) ==883== by 0x58005AD2: vgMemCheck_malloc (mc_malloc_wrappers.c:407) ==883== by 0x580A7373: do_client_request (scheduler.c:1925) ==883== by 0x580A7373: vgPlain_scheduler (scheduler.c:1488) ==883== by 0x580F57A0: thread_wrapper (syswrap-linux.c:103) ==883== by 0x580F57A0: run_a_thread_NORETURN (syswrap-linux.c:156) sched status: running_tid=1 Thread 1: status = VgTs_Runnable (lwpid 883) ==883== at 0x483877F: malloc (vg_replace_malloc.c:299) ==883== by 0x1534AA: ??? (in /usr/bin/btrfs) ==883== by 0x153C49: ??? (in /usr/bin/btrfs) ==883== by 0x11B0C1: main (in /usr/bin/btrfs) client stack range: [0x1FFEFFA000 0x1FFF000FFF] client SP: 0x1FFEFFDCE0 valgrind stack range: [0x1002DB6000 0x1002EB5FFF] top usage: 7520 of 1048576 ====================================================================== The above log says that invalid write to allocated @di_args happened in get_device_info() called in get_fs_info(). The size of @di_args is allocated according by fi_args->num_devices. And fi_args->num_devices is the number of dev_items in chunk_tree. However, in the loop to get devices info, btrfs-progs calls ioctl BTRFS_IOC_DEV_INFO which just finds device in fs_info->fs_devices->devices. Let's look at kernel side. In btrfs_rm_device(), btrfs_rm_dev_item() causes removal of related dev_items in chunk_tree. Do something. Then delete the device from device->fs_devices. So the case is: Userspace kernel get_fs_info() btrfs_rm_device() ... btrfs_rm_dev_item() determine fi_args->num_devices and fi_args->max_id by seraching chunk_tree. malloc() ... Loop(Crashed): call get_device_info() by devid from 1 to fi_args->max_id. mutex_lock(&fs_devices->device_list_mutex); list_del_rcu(&device->dev_list); ... In the loop of get_device_info(), get_device_info() still can get info of the removing device since it's still in fs_info->fs_devices->devices. Then the iterator value @ndev increaments causes invalid access out of bounds. Solved it by adding the check of @ndev while looping. Reported-by: Peter Hjalmarsson <kanelxake@gmail.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=1711787 Signed-off-by: Su Yue <Damenly_Su@gmx.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-06-05 17:53:05 +02:00
Qu Wenruo	d490933d14	btrfs-progs: Enable crc32c optimization probe for convert and mkfs Although moderm hardware is fast enough and crc32c calculation is not a hotspot, doing such optimization won't hurt anyway. Issue: #175 Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-27 16:39:51 +02:00
Qu Wenruo	c98155f07a	btrfs-progs: Output extent tree leaf if we failed to find a backref There is a bug report of BUG_ON() which is caused by __free_extent() failed to lookup a backref extent: Failed to find [1429288337408, 168, 16384] btrfs unable to find ref byte nr 1429288583168 parent 0 root 2 owner 0 offset 0 convert/source-ext2.c:834: ext2_copy_inodes: BUG_ON ret triggered, value -5 ./btrfs-convert[0x410941] ./btrfs-convert(main+0x1fdc)[0x40d3b8] /lib64/libc.so.6(__libc_start_main+0xf3)[0x7f93bb7d2f33] ./btrfs-convert(_start+0x2e)[0x40a96e] It's still unclear how this bug can be triggered, but adding such debug output will provide more info for us to debug. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-27 16:11:43 +02:00
David Sterba	485da9d52d	btrfs-progs: tests: unmount testing mount point recursively The test misc-tests/035-receive-common-mount-point-prefix does another mount inside TEST_MNT but current 'make test-clean' will not properly undo the nested mount and this will break subsequent tests. The recursive unmount can handle that. Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-27 16:08:23 +02:00
Qu Wenruo	50e3858869	btrfs-progs: convert: Workaround delayed ref bug by limiting the size of a transaction In convert we use trans->block_reserved >= 4096 as a threshold to commit transaction, where block_reserved is the number of new tree blocks allocated inside a transaction. The problem is, we still have a hidden bug in delayed ref implementation in btrfs-progs, when we have a large enough transaction, delayed ref may failed to find certain tree blocks in extent tree and cause transaction abort. This fix will workaround it by committing transaction at a much lower threshold. The old 4096 means 4096 new tree blocks, when using default (16K) nodesize, it's 64M, which can contain over 12k inlined data extent or csum for around 60G, or over 800K file extents. The new threshold will limit the size of new tree blocks to 2M, aligning with the chunk preallocator threshold, and reducing the possibility to hit that delayed ref bug. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-27 16:04:02 +02:00
Sergei Trofimovich	8cd7e198ad	btrfs-progs: build: apply LDFLAGS to libbtrfsutil.so libbtrfs.so already has user's LDFLAGS applied. The change also applies those to libbtrfsutil.so. A separate variable is used for that though it currently only copies LDFLAGS. This is to make it obvious that libbtrfsutils is a standalone library. Reported-by: Michał Górny Bug: https://bugs.gentoo.org/686284 Pull-request: #172 Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-27 15:57:35 +02:00
Joshua Watt	66cb960705	btrfs-progs: build: Pass CFLAGS and LDFLAGS to Python Adds Make variables EXTRA_PYTHON_CFLAGS and EXTRA_PYTHON_LDFLAGS which can be used to pass CFLAGS and LDFLAGS respectively when building the Python library. This is required to support reproducible builds, as there are often compiler and linker flags that must be passed in order to generate reproducible output (e.g. -fdebug-prefix-map) Pull-request: #176 Signed-off-by: Joshua Watt <JPEWhacker@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-27 15:49:03 +02:00
David Sterba	43013422db	Btrfs progs v5.1 Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-17 19:58:58 +02:00
David Sterba	79313d3152	btrfs-progs: update CHANGES for v5.1 Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-17 19:56:36 +02:00
David Sterba	a8779ec9c8	btrfs-progs: CI: enable fuzz tests With recent fixes the fuzz tests pass, enable them for the continuos integration. Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-17 13:18:01 +02:00
David Sterba	936eaf9a36	btrfs-progs: tests: disable misc-tests/035-receive-common-mount-point-prefix The fix was reverted, skip the test so the testsuite can proceed. Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-17 13:02:22 +02:00
Qu Wenruo	f4be6432c9	btrfs-progs: tests: detecting compressed extent without csum Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-17 12:32:38 +02:00
Qu Wenruo	d73427b76d	btrfs-progs: check/original: Add checks for compressed extent without csum There is one report of compressed extent happens in btrfs, but has no csum and then leads to possible decompress error screwing up kernel memory. Although it's a kernel bug, and won't cause problem until compressed data get corrupted, let's catch such problem in advance. This patch will catch any unexpected compressed extent with: 1) 0 or less than expected csum 2) nodatasum flag set in the inode item This is for original mode. Reported-by: James Harvey <jamespharvey20@gmail.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-17 12:32:38 +02:00
Qu Wenruo	cc7a35f642	btrfs-progs: check/lowmem: Add checks for compressed extent without csum There is one report of compressed extent happens in btrfs, but has no csum and then leads to possible decompress error screwing up kernel memory. Although it's a kernel bug, and won't cause problem until compressed data get corrupted, let's catch such problem in advance. This patch will catch any unexpected compressed extent with: 1) missing csum 2) nodatasum flag set in the inode item This is for lowmem mode. Reported-by: James Harvey <jamespharvey20@gmail.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-17 12:32:38 +02:00
David Sterba	50b9312f3a	btrfs-progs: tests: update test number of bad-free-space-cache-inode-mode There was a duplication, increase the number for patch as they were merged. Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-17 12:32:38 +02:00
David Sterba	639d949f9f	btrfs-progs: tests: stream dump and max_error counts The --dump option of receive must also respect the --max-errors parameter. Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-17 12:32:38 +02:00
Alexander Kovtunenko	53e8014369	btrfs-progs: receive: set up max_errors count The command $ printf 'btrfs-stream\0\0\0\0\0' \| btrfs receive --dump can loop as the stream is not valid, but the maximum error limit is not set properly for --dump. The command line parameter -E applies here too, so it's still possible to dump partially damanged stream. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200085 Author: Alexander Kovtunenko <akovtunenko@slice.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-17 12:32:38 +02:00
Anand Jain	f2d5990d31	btrfs-progs: scan: pass blkid_get_cache error code blkid_get_cache() returns error code which is -errno. So we can use them directly. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-17 12:32:38 +02:00
Filipe Manana	f54d891ac8	Btrfs-progs: receive, add debug information to write and clone commands Currently, when operating in a more verbose mode (-vv), the receive command does not mention any write or clone commands, unlike other commands. This change adds debug messages for the write and clone operations, that do not include data but only offsets and lengths, as this is actually very useful to debug a send stream and I use it frequently. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-17 12:32:38 +02:00
Jeff Mahoney	50faf73f22	btrfs-progs: check: fixup_extent_flags needs to deal with non-skinny metadata When repairing a file system created by a very old kernel, I ran into issues fixing up the extent flags since fixup_extent_flags assumed that a METADATA_ITEM would be present if the record was for metadata. Since METADATA_ITEMs don't exist without skinny metadata, we need to fall back to EXTENT_ITEMs. This also falls back to EXTENT_ITEMs even with skinny metadata enabled as other parts of the tools do. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-17 12:32:38 +02:00
Qu Wenruo	5672a69639	btrfs-progs: Handle error properly in btrfs_commit_transaction() [BUG] When running fuzz-tests/003 and fuzz-tests/009, btrfs-progs will crash due to BUG_ON(). [CAUSE] We abused BUG_ON() in btrfs_commit_transaction(), which is one of the most error prone function for fuzzed images. Currently to cleanup the aborted transaction, we only need to clean up the only per-transaction data: delayed refs. This patch will introduce a new function, btrfs_destroy_delayed_refs() to cleanup delayed refs when we failed to commit transaction. With that function, we will gently destroy per-trans delayed ref, and remove the BUG_ON()s in btrfs_commit_transaction(). Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-13 15:54:47 +02:00
Qu Wenruo	45e58a1acf	btrfs-progs: Refactor btrfs_finish_extent_commit() This patch will refactor btrfs_finish_extent_commit(): - Make it return void There is no failure pattern for btrfs_finish_extent_commit(), thus it always return 0. And the caller doesn't care about the return value. So no need to return int. - Remove @root and @unpin parameters @root is only used to extract fs_info, which can be extracted from transaction handler already. @unpin is always fs_info->pinned_extents. All these parameters can be extracted from @trans, no need to pass them. The function signature now matches the kernel counterpart. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-13 15:52:46 +02:00
Qu Wenruo	27a5b9ddc3	btrfs-progs: Remove the dead branch in btrfs_run_delayed_refs() cleanup_ref_head() will only return 0 or 1, no way to return a negative value. So remove the dead branch. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-13 15:52:24 +02:00
David Sterba	3fe16a10da	btrfs-progs: tests: fix misc/026 to run on NFS The temporary files are not accessible if the testsuite is hosted on NFS, pre-create them and allow writes. Signed-off-by: David Sterba <dsterba@suse.com>	2019-05-02 19:14:13 +02:00
Omar Sandoval	cba6bae15d	libbtrfsutil: don't close fd on error in btrfs_util_subvolume_id_fd() The caller owns the fd passed to btrfs_util_subvolume_id_fd(), so we shouldn't close it on error. Fix it, add a regression test, and bump the library patch version. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>	2019-04-26 18:23:27 +02:00
David Sterba	5ca28fc25f	Merge branch 'pull/qu/v2' into devel from Qu Wenruo (https://github.com/adam900710/btrfs-progs/tree/for_devel ) The branch passes all selftests, except: - misc/035 Known bug as the fix is reverted. - fuzz/003 - fuzz/009 Not a regression, as stable tags also triggers them. BUG_ON() in commit_transaction get triggered due to ENOSPC. These two bugs will be addressed soon. but not in this pull. This pull request include the following features: Core change: - check --repair * Flush/FUA support to avoid breaking metadata CoW Now btrfs-progs crashing or transaction aborted won't cause new transid error. Fixes and Enhancement: - generic * Try best copy when reading tree blocks. * Skip unnecessary retry when one tree block copy fails. * Avoid back tree block to populate tree block cache. * Don't BUG_ON() when failed to flush/write super blocks - check * File extents repair no longer relies data in extent tree. * New ability to check and repair free space cache invalid inode mode. * Update backup roots when commit transaction. - Misc * fs_info <-> root parameters cleanup for btrfs_check_leaf/node()	2019-04-26 18:10:32 +02:00
Qu Wenruo	4ab95eb8b0	Revert "btrfs-progs: Do metadata preallocation as long as we're not modifying extent tree" Commit `7a12d8470e` ("btrfs-progs: Do metadata preallocation as long as we're not modifying extent tree") tries to fix #123, however due to the fact that chunk tree also has root->ref_cows set, we will call do_chunk_alloc() until call stack explodes. So revert that offending patch until we have a much better comment on root->ref_cows and find a better solution to this problem. Signed-off-by: Qu Wenruo <wqu@suse.com>	2019-04-16 09:04:43 +08:00
Qu Wenruo	6ab19825b0	btrfs-progs: Don't BUG_ON() when write_dev_supers() fails [BUG] Since commit "btrfs-progs: disk-io: Flush to ensure super block write is FUA" mkfs-tests/017 will fail like: ====== RUN MUSTFAIL /home/adam/btrfs-progs/mkfs.btrfs -K -f /dev/mapper/btrfs-progs-thin-vol ERROR: failed to write super block for devid 1: flush error: Input/output error disk-io.c:1810: write_all_supers: BUG_ON `ret` triggered, value -5 /home/adam/btrfs-progs/mkfs.btrfs(+0x1e5c1)[0x557a2c83e5c1] /home/adam/btrfs-progs/mkfs.btrfs(+0x1e65f)[0x557a2c83e65f] /home/adam/btrfs-progs/mkfs.btrfs(write_all_supers+0x1ce)[0x557a2c843a8a] /home/adam/btrfs-progs/mkfs.btrfs(write_ctree_super+0x12d)[0x557a2c843be2] /home/adam/btrfs-progs/mkfs.btrfs(btrfs_commit_transaction+0x250)[0x557a2c887c56] /home/adam/btrfs-progs/mkfs.btrfs(+0xc0b1)[0x557a2c82c0b1] /home/adam/btrfs-progs/mkfs.btrfs(main+0x1049)[0x557a2c82e929] /usr/lib/libc.so.6(__libc_start_main+0xf3)[0x7f6689e99223] /home/adam/btrfs-progs/mkfs.btrfs(_start+0x2e)[0x557a2c82b86e] failed (expected): /home/adam/btrfs-progs/mkfs.btrfs -K -f /dev/mapper/btrfs-progs-thin-vol [CAUSE] Just one BUG_ON() in write_all_supers(). [FIX] Just remove the BUG_ON(). Callers of write_all_supers() are already checking the return value. Also since write_all_supers() can return error, make write_ctree_super() callers, btrfs_commit_transaction() and close_ctree_fs_info() to handle the error correctly. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com>	2019-04-16 09:04:25 +08:00
Qu Wenruo	e227e81d99	btrfs-progs: disk-io: Flush to ensure super block write is FUA [BUG] There are tons of reports of btrfs-progs screwing up the fs, the most recent one is "btrfs check --clear-space-cache v1" triggered BUG_ON() and then leaving the fs with transid mismatch problem. [CAUSE] In kernel, we have block layer handing the flush work, even on devices without FUA support (like most SATA device using default libata settings), kernel handles FUA write by flushing the device, then normal write, and finish it with another flush. The pre-flush, write, post-flush works pretty well to implement FUA write. However in btrfs-progs we just use pwrite(), there is nothing keeping the write order. So even for basic v1 free space cache clearing, we have different vision on the write sequence from kernel bio layer (by dm-log-writes) and user space pwrite() calls. In btrfs-progs, with extra debug output in write_tree_block() and write_dev_supers(), we can see btrfs-progs follows the right write sequence: Opening filesystem to check... Checking filesystem on /dev/mapper/log UUID: 3feb3c8b-4eb3-42f3-8e9c-0af22dd58ecf write tree block start=1708130304 gen=39 write tree block start=1708146688 gen=39 write tree block start=1708163072 gen=39 write super devid=1 gen=39 write tree block start=1708179456 gen=40 write tree block start=1708195840 gen=40 write super devid=1 gen=40 write tree block start=1708130304 gen=41 write tree block start=1708146688 gen=41 write tree block start=1708228608 gen=41 write super devid=1 gen=41 write tree block start=1708163072 gen=42 write tree block start=1708179456 gen=42 write super devid=1 gen=42 write tree block start=1708130304 gen=43 write tree block start=1708146688 gen=43 write super devid=1 gen=43 Free space cache cleared But from dm-log-writes, the bio sequence is a different story: replaying 1742: sector 131072, size 4096, flags 0(NONE) replaying 1743: sector 128, size 4096, flags 0(NONE) <<< Only one sb write replaying 1744: sector 2828480, size 4096, flags 0(NONE) replaying 1745: sector 2828488, size 4096, flags 0(NONE) replaying 1746: sector 2828496, size 4096, flags 0(NONE) replaying 1787: sector 2304120, size 4096, flags 0(NONE) ...... replaying 1790: sector 2304144, size 4096, flags 0(NONE) replaying 1791: sector 2304152, size 4096, flags 0(NONE) replaying 1792: sector 0, size 0, flags 8(MARK) During the free space cache clearing, we committed 3 transaction but dm-log-write only caught one super block write. This means all the 3 writes were merged into the last super block write. And the super block write was the 2nd write, before all tree block writes, completely screwing up the metadata CoW protection. No wonder crashed btrfs-progs can make things worse. [FIX] Fix this super serious problem by implementing pre and post flush for the primary super block in btrfs-progs. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com>	2019-04-16 09:04:25 +08:00
Qu Wenruo	2644f80611	btrfs-progs: disk-io: Make super block write error easier to read When we failed to write super blocks, we just output something like: WARNING: failed to write sb: I/O error Or WARNING: failed to write all sb data There is no info about which device failed and there are two different error message for the same write error. This patch will change it to something more detailed: ERROR: failed to write super block for devid 1: write error: I/O error This provides the basis for later super block flush error handling. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com>	2019-04-16 09:04:25 +08:00
Qu Wenruo	2437a88079	btrfs-progs: tests/fsck: Add test image for free space cache mode repair The image has one free space cache inode with invalid mode (0). item 9 key (256 INODE_ITEM 0) itemoff 13702 itemsize 160 generation 30 transid 30 size 65536 nbytes 1507328 block group 0 mode 0 links 1 uid 0 gid 0 rdev 0 sequence 23 flags 0x1b(NODATASUM\|NODATACOW\|NOCOMPRESS\|PREALLOC) atime 0.0 (1970-01-01 08:00:00) ctime 1553491158.189771625 (2019-03-25 13:19:18) mtime 0.0 (1970-01-01 08:00:00) otime 0.0 (1970-01-01 08:00:00) Both lowmem and original mode should be able to detect and fix it. The extracted test image is pretty big (1G extracted), as kernel won't cache small chunks. Even with SSD, such test may still take some seconds just extracting the image. Signed-off-by: Qu Wenruo <wqu@suse.com>	2019-04-16 09:04:25 +08:00
Qu Wenruo	427990ad74	btrfs-progs: check/original: Check and repair free space cache inode item Just like lowmem mode, also check and repair free space cache inode item. And since we don't really have a good timing/function to check free space chace inodes, we use the same common mode check_repair_free_space_inode() when iterating root tree. Signed-off-by: Qu Wenruo <wqu@suse.com>	2019-04-16 09:04:25 +08:00
Qu Wenruo	77fe19ba16	btrfs-progs: check/lowmem: Check and repair free space cache inode mode Unlike inodes in fs roots, we don't really check the inode items in root tree, in fact we just skip everything other than ROOT_ITEM and ROOT_REF. This makes invalid inode items sneak into root tree. For example: item 9 key (256 INODE_ITEM 0) itemoff 13702 itemsize 160 generation 30 transid 30 size 65536 nbytes 1507328 block group 0 mode 0 links 1 uid 0 gid 0 rdev 0 ^ Should be 100600 sequence 23 flags 0x1b(NODATASUM\|NODATACOW\|NOCOMPRESS\|PREALLOC) atime 0.0 (1970-01-01 08:00:00) ctime 1553491158.189771625 (2019-03-25 13:19:18) mtime 0.0 (1970-01-01 08:00:00) otime 0.0 (1970-01-01 08:00:00) There is a report of such problem in the mail list. This patch will check and repair inode items of free space cache inodes in lowmem mode. Since free space cache inodes doesn't have INODE_REF but still has 1 link, we can't use check_inode_item() directly. Instead we only check the inode mode, as that's the important part. The check and repair function: check_repair_free_space_inode() is also exported for original mode. Signed-off-by: Qu Wenruo <wqu@suse.com>	2019-04-16 09:04:25 +08:00
Qu Wenruo	11fd6cff82	btrfs-progs: check/original: Repair invalid inode mode in root tree This patch will reuse the mode independent repair_imode() function, to repair invalid inode mode. Signed-off-by: Qu Wenruo <wqu@suse.com>	2019-04-16 09:04:25 +08:00
Qu Wenruo	64ca43a8ad	btrfs-progs: check/lowmem: Repair invalid inode mode in root tree In root tree, we only have 2 types of inodes: - ROOT_TREE_DIR inode Its mode is fixed to 40755 - free space cache inodes Its mode is fixed to 100600 This patch will add the ability to repair such inodes to lowmem mode. For fs/subvolume tree error, at least we haven't see such corruption yet, so we don't need to rush to fix corruption in fs trees yet. The repair function, reset_imode() and repair_imode_common() can be reused by later original mode patch, so it's placed in check/mode-common.c. Signed-off-by: Qu Wenruo <wqu@suse.com>	2019-04-16 09:04:25 +08:00
Qu Wenruo	23f1e9a13f	btrfs-progs: check/original: Add inode mode check Just like lowmem mode, check inode mode, specially for S_IFMT bits and beyond. Please note that, this check only applies to inodes in fs/subvol trees. It doesn't apply to free space cache inodes. Reported-by: Thorsten Hirsch <t.hirsch@web.de> Signed-off-by: Qu Wenruo <wqu@suse.com>	2019-04-16 09:04:25 +08:00
Qu Wenruo	c06c5eef88	btrfs-progs: check/lowmem: Add inode mode check There is one report about invalid free space cache inode mode. Normally free space cache inode should have mode 100600 (regular file, no uid/gid/sticky bit, rw------ bit). But in that report, we have free space cache inode mode as 0. So at least btrfs check should report invalid inode mode. This patch will at least make btrfs check lowmem mode to detect this problem. Please note that, this check only applies to inodes in fs/subvol trees. It doesn't apply to free space cache inodes. Reported-by: Thorsten Hirsch <t.hirsch@web.de> Signed-off-by: Qu Wenruo <wqu@suse.com>	2019-04-16 09:04:25 +08:00
Qu Wenruo	66d610010b	btrfs-progs: disk-io: Try to find a best copy when reading tree blocks [BUG] If the first copy of a tree block has a bad key order, but the second copy is completely good, then "btrfs ins dump-tree -b <bytenr>" fails to print anything past the bad key: leaf 29786112 items 47 free space 983 generation 20 owner EXTENT_TREE leaf 29786112 flags 0x1(WRITTEN) backref revision 1 fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6 chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c [snip] item 9 key (20975616 METADATA_ITEM 0) itemoff 3543 itemsize 33 refs 1 gen 16 flags TREE_BLOCK tree block skinny level 0 tree block backref root CHUNK_TREE item 10 key (29360128 BLOCK_GROUP_ITEM 33554432) itemoff 3519 itemsize 24 block group used 94208 chunk_objectid 256 flags METADATA\|DUP ERROR: leaf 29786112 slot 11 pointer invalid, offset 1245184 size 0 leaf data limit 3995 ERROR: skip remaining slots While kernel can locate the good copy and acts just like nothing happened. [CAUSE] btrfs-progs uses read_tree_block() to try each copy. But it only uses less strict check_tree_block(), which has less sanity check than btrfs_check_node/leaf(). Some error like bad key order is ignored to allow btrfs check to fix it. This leads to above problem. [FIX] Introduce a new member, @candidate_mirror in read_tree_block(), which records the copy passes check_tree_block() but fails btrfs_check_leaf/node() as last chance. Only if no better copy found, then use @candidate_mirror. So btrfs-progs can act just like kernel to use best copy. Link: https://bugzilla.kernel.org/show_bug.cgi?id=202691 Reported-by: Yoon Jungyeon <jungyeon@gatech.edu> [Inspired by that image, not to fix any bug of that bugzilla] Signed-off-by: Qu Wenruo <wqu@suse.com>	2019-04-16 09:04:25 +08:00
Qu Wenruo	f76136a8d0	btrfs-progs: Move btrfs_num_copies() call out of the loop in read_tree_block() btrfs_num_copies really only needs to be called once, so move it out of the verification loop in read_tree_block(). Signed-off-by: Qu Wenruo <wqu@suse.com>	2019-04-16 09:04:25 +08:00
Qu Wenruo	e8ae577030	btrfs-progs: Use mirror_num start from 1 to avoid unnecessary retry [BUG] If the first copy of a tree block is corrupted but the other copy is good, btrfs-progs will report the error twice: checksum verify failed on 30556160 found 42A2DA71 wanted 00000000 checksum verify failed on 30556160 found 42A2DA71 wanted 00000000 While kernel only report it once, just as expected: BTRFS warning (device dm-3): dm-3 checksum verify failed on 30556160 wanted 0 found 42A2DA71 level 0 [CAUSE] We use mirror_num = 0 in read_tree_block() of btrfs-progs. At first glance it's pretty OK, but mirror num 0 in btrfs means ANY good copy. Real mirror num starts from 1. In the context of read_tree_block(), since it's read_tree_block() to do all the checks, mirror num 0 just means the first copy. So if the first copy is corrupted, btrfs-progs will try mirror num 1 next, which is just the same as mirror num 0. After reporting the same error on the same copy, btrfs-progs will finally try mirror num 2, and get the good copy. [FIX] The fix is way simpler than all the above analyse, just starts from mirror num 1. Signed-off-by: Qu Wenruo <wqu@suse.com>	2019-04-16 09:04:25 +08:00
Qu Wenruo	db51d8d8f6	btrfs-progs: Use @fs_info to replace @root for btrfs_check_leaf/node() Signed-off-by: Qu Wenruo <wqu@suse.com>	2019-04-16 09:04:25 +08:00
Qu Wenruo	83aeb251f7	btrfs-progs: Free bad extent buffer as soon as possible [BUG] For the new multiple -b parameter supporting, we could hit this bug on a 16K node sized btrfs: $ ./btrfs inspect dump-tree -b 1024 -b 2048 -b 4096 -b 8192 zimg btrfs-progs v4.20.2 ERROR: tree block bytenr 1024 is not aligned to sectorsize 4096 ERROR: tree block bytenr 2048 is not aligned to sectorsize 4096 Couldn't map the block 4096 Invalid mapping for 4096-20480, got 13631488-22020096 Couldn't map the block 4096 bad tree block 4096, bytenr mismatch, want=4096, have=0 ERROR: failed to read tree block 4096 extent_io.c:665: free_extent_buffer_internal: BUG_ON `eb->refs < 0` triggered, value 1 ./btrfs[0x426e57] ./btrfs(free_extent_buffer+0xe)[0x427701] ./btrfs(alloc_extent_buffer+0x3f)[0x427872] ./btrfs(btrfs_find_create_tree_block+0xf)[0x415b3c] ./btrfs(read_tree_block+0x5c)[0x4171b5] ./btrfs(cmd_inspect_dump_tree+0x587)[0x46fb75] ./btrfs(handle_command_group+0x44)[0x40df89] ./btrfs(cmd_inspect+0x15)[0x44b569] ./btrfs(main+0x8b)[0x40e032] /lib64/libc.so.6(__libc_start_main+0xeb)[0x7f2001a54b7b] ./btrfs(_start+0x2a)[0x40dd1a] Aborted (core dumped) This is not only limited to multiple ins dump-tree -b parameter support, but also to possible overlapping bad tree blocks. [CAUSE] Btrfs delay extent freeing to improve performance. However for the "-b 4096 -b 8192" case, the first -b 4096 will cause an extent buffer start=4096 len=16384 refs=0 in the cached extent tree. Then the incoming -b 8192 will hit the cache and reuse the cached extent buffer. And since the cached extent buffer doesn't match the bytenr, its refs won't get increased, and we're going to free that eb again. Since the bad cached eb already has a ref number 0, calling free_extent_buffer() on it again will trigger the assert. [FIX] So for bad extent buffer we failed to read, just delete them immediately. This will free them from extent buffer cache, so later extent buffer allocation will not hit the stale one, and prevent the bug from happening. Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Qu Wenruo <wqu@suse.com>	2019-04-16 09:04:25 +08:00
Qu Wenruo	fc4d433437	btrfs-progs: Update backup roots when writing super blocks The code is mostly ported from kernel with minimal change. Since btrfs-progs doesn't support replaying log, there is some code unnecessary for btrfs-progs, but to keep the code the same, that unnecessary code is kept as it. Now "btrfs check --repair" will update backup roots correctly. Signed-off-by: Qu Wenruo <wqu@suse.com>	2019-04-16 09:04:25 +08:00
Su Yue	84d433d861	btrfs-progs: fsck-test: enable lowmem repair for case 001 Lowmem can repair after commit 'btrfs-progs: lowmem: move nbytes check before isize check', so add the beacon file. Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com> Signed-off-by: Qu Wenruo <wqu@suse.com>	2019-04-16 09:04:25 +08:00
Lu Fengqi	8a3aefc78d	btrfs-progs: tests: add case for inode lose one file extent The missing extent will lead to the existence of the gap between adjacent extents. The fsck should can detect the gap correctly and repair by punch a hole. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: Qu Wenruo <wqu@suse.com>	2019-04-16 09:04:25 +08:00
Su Yanjun	cedbfc2561	btrfs-progs: check: Delete file extent item with unaligned disk bytenr For test case fsck-tests/001-bad-file-extent-bytenr, we have an obviously hand crafted image with unaligned file extent: item 7 key (257 EXTENT_DATA 0) itemoff 3453 itemsize 53 generation 6 type 1 (regular) extent data disk byte 755944791 nr 1048576 extent data offset 0 nr 1048576 ram 1048576 extent compression 0 (none) disk bytenr 755944791 is obviously unaligned (not even). For such obviously corrupted file extent, we should just delete the file extent. Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com> [Update commit message and comment] Signed-off-by: Qu Wenruo <wqu@suse.com>	2019-04-16 09:04:25 +08:00
Su Yanjun	3b35deeadd	btrfs-progs: check: fix wrong @offset used in find_possible_backrefs() Function find_possible_backrefs() is used to locate the file extents referring to an data extent. For data extent backref, its btrfs_extent_data_ref structure has the following members: - root Which root refers to this data extent - objectid Which inode refers to this data extent - offset Search hint. Its value is @file_offset - @extent_offset. While for @file_offset, it's directly recorded in (INO EXTENT_DATA FILE_OFFSET) key. So when searching the file extents refers to this data extent, we can't use btrfs_extent_data_ref::offset as search key::offset. We must search from file offset 0, and iterate all file extents until we hit a file extent matches the data backref. Thankfully such time consuming behavior is not triggered frequently, it only gets called for repair, so it shouldn't affect normal check routine. Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com> [Update commit message] Signed-off-by: Qu Wenruo <wqu@suse.com>	2019-04-16 09:04:25 +08:00
Su Yanjun	b6a0d97cba	Revert "btrfs-progs: Record orphan data extent ref to corresponding root." Commit `0ddf63c09f` ("btrfs-progs: Record orphan data extent ref to corresponding root.") introduces the ability to record a file extent even all other related info is lost (data backref, inode item). However this patch only records such info without doing any proper repair, further more, it could even record invalid file extents, and the report part only happens after all check is done. Since we will later introduce proper file extent repair functionality, we could revert that patch. Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com> [Update commit message, solve merge conflicts] Signed-off-by: Qu Wenruo <wqu@suse.com>	2019-04-16 09:03:51 +08:00

... 3 4 5 6 7 ...

4785 Commits