Commit Graph

23 Commits

Author SHA1 Message Date
Josef Bacik
fac1fae3ef btrfs-progs: rename extent buffer flags to EXTENT_BUFFER_*
We have been overloading the extent_state flags for use on the extent
buffers as well.  When we sync extent-io-tree.[ch] this will become
impossible, so rename these flags to EXTENT_BUFFER_* and use those
definitions instead of the extent_state definitions.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-28 18:57:44 +01:00
Josef Bacik
83cc5a5489 btrfs-progs: delete state_private code
We used to store random private things into extent_states, but we
haven't done this for a while and there are no users of this code,
simply delete it.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-28 18:57:44 +01:00
Josef Bacik
20d88c17e7 btrfs-progs: move extent cache code directly into btrfs_fs_info
We have some extra features in the btrfs-progs copy of the
extent_io_tree that don't exist in the kernel.  In order to make syncing
easier simply move this functionality into btrfs_fs_info, that way we
can sync in the new extent_io_tree code and not have to worry about
breaking anything.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-28 18:57:44 +01:00
Josef Bacik
ccee633f3a btrfs-progs: move dirty eb tracking to it's own io_tree
btrfs-progs has a cache tree embedded in the extent_io_tree in order to
track extent buffers.  We use the extent_io_tree part to track dirty,
and the cache tree to keep the extent buffers in.  When we sync
extent-io-tree.[ch] we'll lose this ability, so separate out the dirty
tracking into its own extent_io_tree.  Subsequent patches will adjust
the extent buffer lookup so it doesn't use the custom extent_io_tree
thing.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-28 18:57:43 +01:00
Josef Bacik
af30cf2e3e btrfs-progs: make the find extent buffer helpers take fs_info
This is a cleanup patch to make syncing the btrfs kernel code into
btrfs-progs easier.  In btrfs-progs we have an extra cache in the
extent_io_tree that's exclusively used for the extent buffer tracking.
In order to untangle this dependency start passing around the fs_info to
search for extent_buffers, and then have the helpers use the appropriate
structure to find the extent buffer.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-28 18:57:43 +01:00
Qu Wenruo
2aa4085bf7 btrfs-progs: properly handle degraded raid56 reads
[BUG]
For a degraded RAID5, btrfs check will fail to even read the chunk root:

  # mkfs.btrfs -f -m raid5 -d raid5 $dev1 $dev2 $dev3
  # wipefs -fa $dev1
  # btrfs check $dev2
  Opening filesystem to check...
  warning, device 1 is missing
  bad tree block 22036480, bytenr mismatch, want=22036480, have=0
  ERROR: cannot read chunk root
  ERROR: cannot open file system

[CAUSE]
Although read_tree_block() function from btrfs-progs is properly
iterating the mirrors (mirror 1 is reading from the disk directly,
mirror 2 will be rebuild from parity), the raid56 recovery path is not
handling the read error correctly.

The existing code will try to read the full stripe, but any read failure
(including missing device) will immediately cause an error:

	for (i = 0; i < num_stripes; i++) {
		ret = btrfs_pread(multi->stripes[i].dev->fd, pointers[i],
				  BTRFS_STRIPE_LEN, multi->stripes[i].physical,
				  fs_info->zoned);
		if (ret < BTRFS_STRIPE_LEN) {
			ret = -EIO;
			goto out;
		}
	}

[FIX]
To make failed_a/failed_b calculation much easier, and properly handle
too many missing devices, here this patch will introduce a new bitmap
based solution.

The new @failed_stripe_bitmap will represent all the failed stripes.

So the initial read will mark all the missing devices in the
@failed_stripe_bitmap, and later operations will all operate on that
bitmap.

Only before we call raid56_recov(), we convert the bitmap to the old
failed_a/failed_b interface and continue.

Now btrfs check can handle above case properly:

  # btrfs check $dev2
  Opening filesystem to check...
  warning, device 1 is missing
  Checking filesystem on /dev/test/scratch2
  UUID: 8b2e1cb4-f35b-4856-9b11-262d39d8458b
  [1/7] checking root items
  [2/7] checking extents
  [3/7] checking free space tree
  [4/7] checking fs roots
  [5/7] checking only csums items (without verifying data)
  [6/7] checking root refs
  [7/7] checking quota groups skipped (not enabled on this FS)
  found 147456 bytes used, no error found
  total csum bytes: 0
  total tree bytes: 147456
  total fs tree bytes: 32768
  total extent tree bytes: 16384
  btree space waste bytes: 139871
  file data blocks allocated: 0
   referenced 0

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-24 17:29:12 +01:00
David Sterba
c2be0e2ce0 btrfs-progs: use template for out of memory error messages
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:09 +02:00
Qu Wenruo
08bb354a1c btrfs-progs: properly handle write error when writing back tree blocks
[BUG]
If we emulate a write error during commit transaction, by setting the
block device read-only, then we can easily have the following crash
using "btrfs check --clear-space-cache v2":

  Opening filesystem to check...
  Checking filesystem on /dev/test/scratch1
  UUID: 5945915b-37f1-4bfa-9f64-684b318b8f73
  Clear free space cache v2
  Error writing to device 1
  kernel-shared/transaction.c:156: __commit_transaction: BUG_ON `ret` triggered, value 1
  ./btrfs(+0x570c9)[0x562ec894f0c9]
  ./btrfs(+0x57167)[0x562ec894f167]
  ./btrfs(__commit_transaction+0x13b)[0x562ec894f7f2]
  ./btrfs(btrfs_commit_transaction+0x214)[0x562ec894fa64]
  ./btrfs(btrfs_clear_free_space_tree+0x177)[0x562ec8941ae6]
  ./btrfs(+0xc8958)[0x562ec89c0958]
  ./btrfs(+0xc9d53)[0x562ec89c1d53]
  ./btrfs(+0x17ec7)[0x562ec890fec7]
  ./btrfs(main+0x12f)[0x562ec8910908]
  /usr/lib/libc.so.6(+0x232d0)[0x7ff917ee82d0]
  /usr/lib/libc.so.6(__libc_start_main+0x8a)[0x7ff917ee838a]
  ./btrfs(_start+0x25)[0x562ec890fdc5]
  Aborted (core dumped)

[CAUSE]
The call trace has shown it's a BUG_ON(), and it's from
__commit_transaction(), which is writing tree blocks back.

[FIX]
The fix is pretty simple, just return error.

In fact we even have an error value check in btrfs_commit_transaction()
just after __commit_transaction() call (although not catching the return
value from it).

And since we're here, also call btrfs_abort_transaction() to prevent
newer transactions from being started.

Now we won't have a full crash:

  Opening filesystem to check...
  Checking filesystem on /dev/test/scratch1
  UUID: 5945915b-37f1-4bfa-9f64-684b318b8f73
  Clear free space cache v2
  Error writing to device 1
  ERROR: failed to write bytenr 30425088 length 16384: Operation not permitted
  ERROR: failed to write tree block 30425088: Operation not permitted
  ERROR: failed to clear free space cache v2: -1
  extent buffer leak: start 30720000 len 16384

Reported-by: Christoph Anton Mitterer <calestyo@scientia.org>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:08 +02:00
Qu Wenruo
75800c2fee btrfs-progs: remove duplicated leaked extent buffer report
[BUG]
When transaction is aborted halfway, we can have extent buffer leaked,
and in that case, the same leaked extent buffer can be reported for
multiple times:

  ERROR: failed to clear free space cache v2: -1
  extent buffer leak: start 30441472 len 16384
  WARNING: dirty eb leak (aborted trans): start 30441472 len 16384
  extent buffer leak: start 30720000 len 16384
  extent buffer leak: start 30425088 len 16384
  extent buffer leak: start 30425088 len 16384 << Duplicated
  WARNING: dirty eb leak (aborted trans): start 30425088 len 16384

Note that 30425088 line is reported twice (not accounting the "dirty eb
leak" line).

[CAUSE]
When we detected a leaked eb, we call free_extent_buffer_nocache(), but
free_extent_buffer_nocache() can only remove the eb when its reduced
refs is 0.

If the eb has refs 2, it will need two free_extent_buffer_nocache()
calls to remove it from the cache.

[FIX]
Just reset the eb->refs to 1 so that free_extent_buffer_nocache() can
remove it from cache for sure.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:08 +02:00
Qu Wenruo
811ae819e3 btrfs-progs: remove unused function extent_io_tree_init_cache_max()
The function was introduced by commit a5ce5d2198 ("btrfs-progs:
extent-cache: actually cache extent buffers") but never got utilized.
Thus we can just remove it.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:08 +02:00
Qu Wenruo
2060120201 btrfs-progs: fix a BUG_ON() condition for write_data_to_disk()
The BUG_ON() condition in write_data_to_disk() is no longer correct.

Now write_raid56_with_parity() will return the bytes written of last
stripe.

Thus a success writeback can trigger the BUG_ON(ret).

Fix the condition to (ret < 0).

Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-08-16 15:18:12 +02:00
Qu Wenruo
fc6925bfd3 btrfs-progs: avoid repeated data write for metadata
[BUG]
Shinichiro reported that "mkfs.btrfs -m DUP" is doing repeated write
into the device.
For non-zoned device this is not a big deal, but for zoned device this
is critical, as zoned device doesn't support overwrite at all.

[CAUSE]
The problem is related to write_and_map_eb() call, since commit
2a93728391 ("btrfs-progs: use write_data_to_disk() to replace
write_extent_to_disk()"), we call write_data_to_disk() for metadata
write back.

But the problem is, write_data_to_disk() will call btrfs_map_block()
with rw = WRITE.

By that btrfs_map_block() will always return all stripes, while in
write_data_to_disk() we also iterate through each mirror of the range.

This results above repeated writeback.

[FIX]
Fix this problem by completely remove @mirror argument
from write_data_to_disk().
With extra comments to explicitly show that function will write to
all mirrors.

Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: 2a93728391 ("btrfs-progs: use write_data_to_disk() to replace write_extent_to_disk()")
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-08-16 15:18:12 +02:00
Qu Wenruo
4e9e978783 btrfs-progs: allow read_data_from_disk() to rebuild RAID56 using P/Q
This new ability is added by:

- Allow btrfs_map_block() to return the chunk type
  This makes later work much easier

- Only reset stripe offset inside btrfs_map_block() when needed
  Currently if @raid_map is not NULL, btrfs_map_block() will consider
  this call is for WRITE and will reset stripe offset.

  This is no longer the case, as for RAID56 read with mirror_num 1/0,
  we will still call btrfs_map_block() with non-NULL raid_map.

  Add a small check to make sure we won't reset stripe offset for
  mirror 1/0 read.

- Add new helper read_raid56() to handle rebuild
  We will read the full stripe (including all data and P/Q stripes)
  do the rebuild, then only copy the refered part to the caller.

  There is a catch for RAID6, we have no way to exhaust all combination,
  so the current repair will assume the mirror = 0 data is corrupted,
  then try to find a missing device.

  But if no missing device can be found, it will assume P is corrupted.
  This is just a guess, and can to totally wrong, but we have no better
  idea.

Now btrfs-progs have full read ability for RAID56.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-25 19:08:30 +02:00
Qu Wenruo
a99bece1cd btrfs-progs: remove extent_buffer::fd and extent_buffer::dev_bytes
Those two members are a shortcut for non-RAID56 profiles.

But we should not use such shortcut, and move all our logical address
read/write to the unified read_data_from_disk()/write_data_to_disk().

With previous refactors, now we're safe to remove them.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-25 19:08:30 +02:00
Qu Wenruo
3ff9d35257 btrfs-progs: use read_data_from_disk() to replace read_extent_from_disk() and replace read_extent_data()
The function read_extent_from_disk() is only a wrapper to read tree
block.

And read_extent_data() is just a while loop to eliminate short read
caused by stripe boundary.

In fact, a lot of call sites of read_extent_data() are either reading
metadata (thus no possible short read) or doing extra loop by
themselves.

This patch will replace those two functions with read_data_from_disk(),
making it the only entrance for data/metadata read.
And update read_data_from_disk() to return the read bytes, so caller can
do a simple while loop.

For the few callers of read_extent_data(), open-code a small while loop
for them.

This will allow later RAID56 read repair using P/Q much easier.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-25 19:08:30 +02:00
Qu Wenruo
2a93728391 btrfs-progs: use write_data_to_disk() to replace write_extent_to_disk()
Function write_extent_to_disk() is just writing the content of a tree
block to disk.

It can not handle RAID56, and its work is the same as
write_data_to_disk().

Thus we can replace write_extent_to_disk() with write_data_to_disk()
easily.

There is only one special call site in write_raid56_with_parity(), which
can easily be replace with btrfs_pwrite() directly.

This reduce the write entrance, and make later eb::fd removal easier.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-25 19:08:29 +02:00
Naohiro Aota
ae0dfb246d btrfs-progs: introduce btrfs_pread wrapper for pread
Wrap pread with btrfs_pread as well.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-20 18:59:23 +02:00
Naohiro Aota
c821e5545f btrfs-progs: introduce btrfs_pwrite wrapper for pwrite
Wrap pwrite with btrfs_pwrite(). It simply calls pwrite() on non-zoned
btrfs (opened without O_DIRECT). On zoned mode (opened with O_DIRECT),
it allocates an aligned bounce buffer, copies the contents and uses it
for direct-IO writing.

Writes in device_zero_blocks() and btrfs_wipe_existing_sb() are a little
tricky. We don't have fs_info on our hands, so use zinfo to determine it
is a zoned device or not.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-20 18:59:23 +02:00
David Sterba
c3ee6a8a09 btrfs-progs: unify GPL header comments
Add the GPL v2 header to files where it was missing and is not from an
external source, update to the most recent version with the address.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-07 13:58:44 +02:00
David Sterba
b1f374dd1d btrfs-progs: switch %Lu to %llu format
The %Lu format is not standard and we use %llu everywhere else, so
switch the remaining cases.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-19 22:07:49 +02:00
David Sterba
0144bcb713 btrfs-progs: move volumes.c to kernel-shared/
Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-31 17:01:06 +02:00
David Sterba
abb670f883 btrfs-progs: move ctree.c to kernel-shared/
Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-31 17:01:05 +02:00
David Sterba
4e49bd703d btrfs-progs: move extent_io.c to kernel-shared/
Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-31 17:01:04 +02:00