btrfs-progs: docs: update kernel changelogs 5.18 - 6.3

Signed-off-by: David Sterba <dsterba@suse.com>
This commit is contained in:
David Sterba 2023-04-12 00:58:35 +02:00
parent 0da68b0065
commit c62a19a067

View File

@ -6,21 +6,274 @@ Summary of kernel changes for each version.
6.x
---
6.0 (incomplete)
^^^^^^^^^^^^^^^^
6.0 (Oct 2022)
^^^^^^^^^^^^^^
* Send protocol version 2
- sysfs updates:
6.1 (incomplete)
^^^^^^^^^^^^^^^^
- export chunk size, in debug mode add tunable for setting its size
- show zoned among features (was only in debug mode)
- show commit stats (number, last/max/total duration)
- mixed_backref and big_metadata sysfs feature files removed, they've
been default for sufficiently long time, there are no known users and
mixed_backref could be confused with mixed_groups
* scrub: fix superblock errors immediately and don't leave it up to the next commit
* send: experimental support for fs-verity (send v3)
* sysfs: export discards stats and tunables
* sysfs: export qgroup global information about status
* sysfs: allow to skip quota recalculation for subvolumes ('drop_subtree_threshold')
* nowait mode for async writes
* check super block after filesystem thaw (detect accidental changes from outside)
- send protocol updated to version 2
- new commands:
- ability write larger data chunks than 64K
- send raw compressed extents (uses the encoded data ioctls), ie. no
decompression on send side, no compression needed on receive side
if supported
- send 'otime' (inode creation time) among other timestamps
- send file attributes (a.k.a file flags and xflags)
- this is first version bump, backward compatibility on send and
receive side is provided
- there are still some known and wanted commands that will be
implemented in the near future, another version bump will be needed,
however we want to minimize that to avoid causing usability issues
- print checksum type and implementation at mount time
- don't print some messages at mount (mentioned as people asked about
it), we want to print messages namely for new features so let's make
some space for that:
- big metadata - this has been supported for a long time and is not a feature
that's worth mentioning
- skinny metadata - same reason, set by default by mkfs
Performance improvements:
- reduced amount of reserved metadata for delayed items
- when inserted items can be batched into one leaf
- when deleting batched directory index items
- when deleting delayed items used for deletion
- overall improved count of files/sec, decreased subvolume lock
contention
- metadata item access bounds checker micro-optimized, with a few
percent of improved runtime for metadata-heavy operations
- increase direct io limit for read to 256 sectors, improved throughput
by 3x on sample workload
Notable fixes:
- raid56
- reduce parity writes, skip sectors of stripe when there are no data updates
- restore reading from stripe cache instead of triggering new read
- refuse to replay log with unknown incompat read-only feature bit set
- tree-checker verifies if extent items don't overlap
- check that subvolume is writable when changing xattrs from security
namespace
- fix space cache corruption and potential double allocations; this is
a rare bug but can be serious once it happens, stable backports and
analysis tool will be provided
- zoned:
- fix page locking when COW fails in the middle of allocation
- improved tracking of active zones, ZNS drives may limit the number
and there are ENOSPC errors due to that limit and not actual lack of
space
- adjust maximum extent size for zone append so it does not cause late
ENOSPC due to underreservation
- mirror reading error messages show the mirror number
- don't fallback to buffered IO for NOWAIT direct IO writes, we don't
have the NOWAIT semantics for buffered io yet
- send, fix sending link commands for existing file paths when there are
deleted and created hardlinks for same files
- repair all mirrors for profiles with more than 1 copy (raid1c34)
- fix repair of compressed extents, unify where error detection and
repair happen
6.1 (Dec 2022)
^^^^^^^^^^^^^^
Performance:
- outstanding FIEMAP speed improvements:
- algorithmic change how extents are enumerated leads to orders of
magnitude speed boost (uncached and cached)
- extent sharing check speedup (2.2x uncached, 3x cached)
- add more cancellation points, allowing to interrupt seeking in files
with large number of extents
- more efficient hole and data seeking (4x uncached, 1.3x cached)
- sample results:
256M, 32K extents: 4s -> 29ms (~150x)
512M, 64K extents: 30s -> 59ms (~550x)
1G, 128K extents: 225s -> 120ms (~1800x)
- improved inode logging, especially for directories (on dbench workload
throughput +25%, max latency -21%)
- improved buffered IO, remove redundant extent state tracking, lowering
memory consumption and avoiding rb tree traversal
- add sysfs tunable to let qgroup temporarily skip exact accounting when
deleting snapshot, leading to a speedup but requiring a rescan after
that, will be used by snapper
- support io_uring and buffered writes, until now it was just for direct
IO, with the no-wait semantics implemented in the buffered write path
it now works and leads to speed improvement in IOPS (2x), throughput
(2.2x), latency (depends, 2x to 150x)
- small performance improvements when dropping and searching for extent
maps as well as when flushing delalloc in COW mode (throughput +5MB/s)
User visible changes:
- new incompatible feature block-group-tree adding a dedicated tree for
tracking block groups, this allows a much faster load during mount and
avoids seeking unlike when it's scattered in the extent tree items
- this reduces mount time for many-terabyte sized filesystems
- conversion tool will be provided so existing filesystem can also be
updated in place
- to reduce test matrix and feature combinations requires no-holes
and free-space-tree (mkfs defaults since 5.15)
- improved reporting of super block corruption detected by scrub
- scrub also tries to repair super block and does not wait until next
commit
- discard stats and tunables are exported in sysfs
(/sys/fs/btrfs/FSID/discard)
- qgroup status is exported in sysfs (/sys/sys/fs/btrfs/FSID/qgroups/)
- verify that super block was not modified when thawing filesystem
Fixes:
- FIEMAP fixes:
- fix extent sharing status, does not depend on the cached status where merged
- flush delalloc so compressed extents are reported correctly
- fix alignment of VMA for memory mapped files on THP
- send: fix failures when processing inodes with no links (orphan files
and directories)
- handle more corner cases for read-only compat feature verification
- fix crash on raid0 filesystems created with <5.4 mkfs.btrfs that could
lead to division by zero
Core:
- preliminary support for fs-verity in send
- more effective memory use in scrub for subpage where sector is smaller
than page
- block group caching progress logic has been removed, load is now
synchronous
- add no-wait semantics to several functions (tree search, nocow,
flushing, buffered write
6.2 (Feb 2023)
^^^^^^^^^^^^^^
User visible features:
- raid56 reliability vs performance trade off:
- fix destructive RMW for raid5 data (raid6 still needs work) - do full RMW
cycle for writes and verify all checksums before overwrite, this should
prevent rewriting potentially corrupted data without notice
- stripes are cached in memory which should reduce the performance impact but
still can hurt some workloads
- checksums are verified after repair again
- this is the last option without introducing additional features (write
intent bitmap, journal, another tree), the RMW cycle was supposed to be
avoided by the original implementation exactly for performance reasons but
that caused all the reliability problems
- discard=async by default for devices that support it
- implement emergency flush reserve to avoid almost all unnecessary transaction
aborts due to ENOSPC in cases where there are too many delayed refs or
delayed allocation
- skip block group synchronization if there's no change in used bytes, can
reduce transaction commit count for some workloads
- print more specific errors to system log when device scan ioctl fails
Performance improvements:
- fiemap and lseek:
- overall speedup due to skipping unnecessary or duplicate searches (-40% run time)
- cache some data structures and sharedness of extents (-30% run time)
- send:
- faster backref resolution when finding clones
- cached leaf to root mapping for faster backref walking
- improved clone/sharing detection
- overall run time improvements (-70%)
Fixes:
- fix compat ro feature check at read-write remount
- handle case when read-repair happens with ongoing device replace
- reset defrag ioctl buffer on memory allocation error
- fix potential crash in quota when rescan races with disable
- fix qgroup accounting warning when rescan can be started at time with
temporarily disabled accounting
- don't cache a single-device filesystem device to avoid cases when a
loop device is reformatted and the entry gets stale
- limit number of send clones by maximum memory allocated
6.3 (? 2023)
^^^^^^^^^^^^
Features:
- block group allocation class heuristics:
- pack files by size (up to 128k, up to 8M, more) to avoid
fragmentation in block groups, assuming that file size and life time
is correlated, in particular this may help during balance
- with tracepoints and extensible in the future
- sysfs export of per-device fsid in DEV_INFO ioctl to distinguish seeding
devices, needed for testing
- print sysfs stats for the allocation classes
Performance:
- send: cache directory utimes and only emit the command when necessary
- speedup up to 10x
- smaller final stream produced (no redundant utimes commands issued),
- compatibility not affected
- fiemap:
- skip backref checks for shared leaves
- speedup 3x on sample filesystem with all leaves shared (e.g. on
snapshots)
- micro optimized b-tree key lookup, speedup in metadata operations
(sample benchmark: fs_mark +10% of files/sec)
Core changes:
- change where checksumming is done in the io path
- checksum and read repair does verification at lower layer
- cascaded cleanups and simplifications
Fixes:
- sysfs: make sure that a run-time change of a feature is correctly
tracked by the feature files
- scrub: better reporting of tree block errors
- fix calculation of unusable block group space reporting bogus values
due to 32/64b division
- fix unnecessary increment of read error stat on write error
- scan block devices in non-exclusive mode to avoid temporary mkfs
failures
- fix fast checksum detection, this affects filesystems with non-crc32c
checksum, calculation would not be offloaded to worker threads (since 5.4)
- restore thread_pool mount option behaviour for endio workers, the
new value for maximum active threads would not be set to the actual
work queues (since 6.0)
5.x
---
@ -419,6 +672,90 @@ Core:
* error handling improvements
* for other changes see the [https://git.kernel.org/linus/d601e58c5f2901783428bc1181e83ff783592b6b pull request]
5.18 (May 2022)
^^^^^^^^^^^^^^^
- encoded read/write ioctls, allows user space to read or write raw data
directly to extents (now compressed, encrypted in the future), will be
used by send/receive v2 where it saves processing time
- zoned mode now works with metadata DUP (the mkfs.btrfs default)
- allow reflinks/deduplication from two different mounts of the same
filesystem
- error message header updates:
- print error state: transaction abort, other error, log tree errors
- print transient filesystem state: remount, device replace, ignored
checksum verifications
- tree-checker: verify the transaction id of the to-be-written dirty
extent buffer
- fsync speedups
- directory logging speedups (up to -90% run time)
- avoid logging all directory changes during renames (up to -60% run
time)
- avoid inode logging during rename and link when possible (up to -60%
run time)
- prepare extents to be logged before locking a log tree path
(throughput +7%)
- stop copying old file extents when doing a full fsync ()
- improved logging of old extents after truncate
- remove balance v1 ioctl, superseded by v2 in 2012
Core, fixes:
- continued extent tree v2 preparatory work
- disable features that won't work yet
- add wrappers and abstractions for new tree roots
- prevent deleting subvolume with active swapfile
- remove device count in superblock and its item in one transaction so
they cant't get out of sync
- for subpage, force the free space v2 mount to avoid a warning and
make it easy to switch a filesystem on different page size systems
- export sysfs status of exclusive operation 'balance paused', so the
user space tools can recognize it and allow adding a device with
paused balance
5.19 (Jul 2022)
^^^^^^^^^^^^^^^
Features:
- subpage:
- support on PAGE_SIZE > 4K (previously only 64K)
- make it work with raid56
- prevent remount with v1 space cache
- repair super block num_devices automatically if it does not match
the number of device items
- defrag can convert inline extents to regular extents, up to now inline
files were skipped but the setting of mount option max_inline could
affect the decision logic
- zoned:
- minimal accepted zone size is explicitly set to 4MiB
- make zone reclaim less aggressive and don't reclaim if there are
enough free zones
- add per-profile sysfs tunable of the reclaim threshold
- allow automatic block group reclaim for non-zoned filesystems, with
sysfs tunables
- tree-checker: new check, compare extent buffer owner against owner
rootid
Performance:
- avoid blocking on space reservation when doing nowait direct io
writes, (+7% throughput for reads and writes)
- NOCOW write throughput improvement due to refined locking (+3%)
- send: reduce pressure to page cache by dropping extent pages right
after they're processed
4.x
---