btrfs-progs: docs: update kernel changelogs 5.18 - 6.3

Signed-off-by: David Sterba <dsterba@suse.com>
2025-04-11 03:31:17 +00:00 · 2023-04-12 00:58:35 +02:00 · 2023-04-12 00:58:35 +02:00 · c62a19a067
commit c62a19a067
parent 0da68b0065
1 changed files with 349 additions and 12 deletions
--- a/Documentation/Kernel-by-version.rst
+++ b/Documentation/Kernel-by-version.rst
@ -6,21 +6,274 @@ Summary of kernel changes for each version.
 6.x
 ---

-6.0 (incomplete)
-^^^^^^^^^^^^^^^^
+6.0 (Oct 2022)
+^^^^^^^^^^^^^^

-* Send protocol version 2
+- sysfs updates:

-6.1 (incomplete)
-^^^^^^^^^^^^^^^^
+  - export chunk size, in debug mode add tunable for setting its size
+  - show zoned among features (was only in debug mode)
+  - show commit stats (number, last/max/total duration)
+  - mixed_backref and big_metadata sysfs feature files removed, they've
+    been default for sufficiently long time, there are no known users and
+    mixed_backref could be confused with mixed_groups

-* scrub: fix superblock errors immediately and don't leave it up to the next commit
-* send: experimental support for fs-verity (send v3)
-* sysfs: export discards stats and tunables
-* sysfs: export qgroup global information about status
-* sysfs: allow to skip quota recalculation for subvolumes ('drop_subtree_threshold')
-* nowait mode for async writes
-* check super block after filesystem thaw (detect accidental changes from outside)
+- send protocol updated to version 2
+
+  - new commands:
+
+    - ability write larger data chunks than 64K
+    - send raw compressed extents (uses the encoded data ioctls), ie. no
+      decompression on send side, no compression needed on receive side
+      if supported
+    - send 'otime' (inode creation time) among other timestamps
+    - send file attributes (a.k.a file flags and xflags)
+  - this is first version bump, backward compatibility on send and
+    receive side is provided
+  - there are still some known and wanted commands that will be
+    implemented in the near future, another version bump will be needed,
+    however we want to minimize that to avoid causing usability issues
+
+- print checksum type and implementation at mount time
+- don't print some messages at mount (mentioned as people asked about
+  it), we want to print messages namely for new features so let's make
+  some space for that:
+
+  - big metadata - this has been supported for a long time and is not a feature
+    that's worth mentioning
+  - skinny metadata - same reason, set by default by mkfs
+
+Performance improvements:
+
+- reduced amount of reserved metadata for delayed items
+
+  - when inserted items can be batched into one leaf
+  - when deleting batched directory index items
+  - when deleting delayed items used for deletion
+  - overall improved count of files/sec, decreased subvolume lock
+    contention
+
+- metadata item access bounds checker micro-optimized, with a few
+  percent of improved runtime for metadata-heavy operations
+- increase direct io limit for read to 256 sectors, improved throughput
+  by 3x on sample workload
+
+Notable fixes:
+
+- raid56
+
+  - reduce parity writes, skip sectors of stripe when there are no data updates
+  - restore reading from stripe cache instead of triggering new read
+
+- refuse to replay log with unknown incompat read-only feature bit set
+- tree-checker verifies if extent items don't overlap
+- check that subvolume is writable when changing xattrs from security
+  namespace
+- fix space cache corruption and potential double allocations; this is
+  a rare bug but can be serious once it happens, stable backports and
+  analysis tool will be provided
+
+- zoned:
+
+  - fix page locking when COW fails in the middle of allocation
+  - improved tracking of active zones, ZNS drives may limit the number
+    and there are ENOSPC errors due to that limit and not actual lack of
+    space
+  - adjust maximum extent size for zone append so it does not cause late
+    ENOSPC due to underreservation
+
+- mirror reading error messages show the mirror number
+- don't fallback to buffered IO for NOWAIT direct IO writes, we don't
+  have the NOWAIT semantics for buffered io yet
+- send, fix sending link commands for existing file paths when there are
+  deleted and created hardlinks for same files
+- repair all mirrors for profiles with more than 1 copy (raid1c34)
+- fix repair of compressed extents, unify where error detection and
+  repair happen
+
+6.1 (Dec 2022)
+^^^^^^^^^^^^^^
+
+Performance:
+
+- outstanding FIEMAP speed improvements:
+
+  - algorithmic change how extents are enumerated leads to orders of
+    magnitude speed boost (uncached and cached)
+  - extent sharing check speedup (2.2x uncached, 3x cached)
+  - add more cancellation points, allowing to interrupt seeking in files
+    with large number of extents
+  - more efficient hole and data seeking (4x uncached, 1.3x cached)
+  - sample results:
+    256M, 32K extents:   4s ->  29ms  (~150x)
+    512M, 64K extents:  30s ->  59ms  (~550x)
+    1G,  128K extents: 225s -> 120ms (~1800x)
+
+- improved inode logging, especially for directories (on dbench workload
+  throughput +25%, max latency -21%)
+- improved buffered IO, remove redundant extent state tracking, lowering
+  memory consumption and avoiding rb tree traversal
+- add sysfs tunable to let qgroup temporarily skip exact accounting when
+  deleting snapshot, leading to a speedup but requiring a rescan after
+  that, will be used by snapper
+- support io_uring and buffered writes, until now it was just for direct
+  IO, with the no-wait semantics implemented in the buffered write path
+  it now works and leads to speed improvement in IOPS (2x), throughput
+  (2.2x), latency (depends, 2x to 150x)
+- small performance improvements when dropping and searching for extent
+  maps as well as when flushing delalloc in COW mode (throughput +5MB/s)
+
+User visible changes:
+
+- new incompatible feature block-group-tree adding a dedicated tree for
+  tracking block groups, this allows a much faster load during mount and
+  avoids seeking unlike when it's scattered in the extent tree items
+
+  - this reduces mount time for many-terabyte sized filesystems
+  - conversion tool will be provided so existing filesystem can also be
+    updated in place
+  - to reduce test matrix and feature combinations requires no-holes
+    and free-space-tree (mkfs defaults since 5.15)
+
+- improved reporting of super block corruption detected by scrub
+- scrub also tries to repair super block and does not wait until next
+  commit
+- discard stats and tunables are exported in sysfs
+  (/sys/fs/btrfs/FSID/discard)
+- qgroup status is exported in sysfs (/sys/sys/fs/btrfs/FSID/qgroups/)
+- verify that super block was not modified when thawing filesystem
+
+Fixes:
+
+- FIEMAP fixes:
+
+  - fix extent sharing status, does not depend on the cached status where merged
+  - flush delalloc so compressed extents are reported correctly
+
+- fix alignment of VMA for memory mapped files on THP
+- send: fix failures when processing inodes with no links (orphan files
+  and directories)
+- handle more corner cases for read-only compat feature verification
+- fix crash on raid0 filesystems created with <5.4 mkfs.btrfs that could
+  lead to division by zero
+
+Core:
+
+- preliminary support for fs-verity in send
+- more effective memory use in scrub for subpage where sector is smaller
+  than page
+- block group caching progress logic has been removed, load is now
+  synchronous
+- add no-wait semantics to several functions (tree search, nocow,
+  flushing, buffered write
+
+6.2 (Feb 2023)
+^^^^^^^^^^^^^^
+
+User visible features:
+
+- raid56 reliability vs performance trade off:
+
+  - fix destructive RMW for raid5 data (raid6 still needs work) - do full RMW
+    cycle for writes and verify all checksums before overwrite, this should
+    prevent rewriting potentially corrupted data without notice
+  - stripes are cached in memory which should reduce the performance impact but
+    still can hurt some workloads
+  - checksums are verified after repair again
+  - this is the last option without introducing additional features (write
+    intent bitmap, journal, another tree), the RMW cycle was supposed to be
+    avoided by the original implementation exactly for performance reasons but
+    that caused all the reliability problems
+
+- discard=async by default for devices that support it
+- implement emergency flush reserve to avoid almost all unnecessary transaction
+  aborts due to ENOSPC in cases where there are too many delayed refs or
+  delayed allocation
+- skip block group synchronization if there's no change in used bytes, can
+  reduce transaction commit count for some workloads
+- print more specific errors to system log when device scan ioctl fails
+
+Performance improvements:
+
+- fiemap and lseek:
+
+  - overall speedup due to skipping unnecessary or duplicate searches (-40% run time)
+  - cache some data structures and sharedness of extents (-30% run time)
+
+- send:
+
+  - faster backref resolution when finding clones
+  - cached leaf to root mapping for faster backref walking
+  - improved clone/sharing detection
+  - overall run time improvements (-70%)
+
+Fixes:
+
+- fix compat ro feature check at read-write remount
+- handle case when read-repair happens with ongoing device replace
+- reset defrag ioctl buffer on memory allocation error
+- fix potential crash in quota when rescan races with disable
+- fix qgroup accounting warning when rescan can be started at time with
+  temporarily disabled accounting
+- don't cache a single-device filesystem device to avoid cases when a
+  loop device is reformatted and the entry gets stale
+- limit number of send clones by maximum memory allocated
+
+6.3 (? 2023)
+^^^^^^^^^^^^
+
+Features:
+
+- block group allocation class heuristics:
+
+  - pack files by size (up to 128k, up to 8M, more) to avoid
+    fragmentation in block groups, assuming that file size and life time
+    is correlated, in particular this may help during balance
+  - with tracepoints and extensible in the future
+
+- sysfs export of per-device fsid in DEV_INFO ioctl to distinguish seeding
+  devices, needed for testing
+- print sysfs stats for the allocation classes
+
+Performance:
+
+- send: cache directory utimes and only emit the command when necessary
+
+  - speedup up to 10x
+  - smaller final stream produced (no redundant utimes commands issued),
+  - compatibility not affected
+
+- fiemap:
+
+  - skip backref checks for shared leaves
+  - speedup 3x on sample filesystem with all leaves shared (e.g. on
+    snapshots)
+
+- micro optimized b-tree key lookup, speedup in metadata operations
+  (sample benchmark: fs_mark +10% of files/sec)
+
+Core changes:
+
+- change where checksumming is done in the io path
+
+  - checksum and read repair does verification at lower layer
+  - cascaded cleanups and simplifications
+
+Fixes:
+
+- sysfs: make sure that a run-time change of a feature is correctly
+  tracked by the feature files
+- scrub: better reporting of tree block errors
+- fix calculation of unusable block group space reporting bogus values
+  due to 32/64b division
+- fix unnecessary increment of read error stat on write error
+- scan block devices in non-exclusive mode to avoid temporary mkfs
+  failures
+- fix fast checksum detection, this affects filesystems with non-crc32c
+  checksum, calculation would not be offloaded to worker threads (since 5.4)
+- restore thread_pool mount option behaviour for endio workers, the
+  new value for maximum active threads would not be set to the actual
+  work queues (since 6.0)

 5.x
 ---
@ -419,6 +672,90 @@ Core:
 * error handling improvements
 * for other changes see the [https://git.kernel.org/linus/d601e58c5f2901783428bc1181e83ff783592b6b pull request]

+5.18 (May 2022)
+^^^^^^^^^^^^^^^
+
+- encoded read/write ioctls, allows user space to read or write raw data
+  directly to extents (now compressed, encrypted in the future), will be
+  used by send/receive v2 where it saves processing time
+- zoned mode now works with metadata DUP (the mkfs.btrfs default)
+- allow reflinks/deduplication from two different mounts of the same
+  filesystem
+- error message header updates:
+
+  - print error state: transaction abort, other error, log tree errors
+  - print transient filesystem state: remount, device replace, ignored
+    checksum verifications
+
+- tree-checker: verify the transaction id of the to-be-written dirty
+  extent buffer
+- fsync speedups
+
+  - directory logging speedups (up to -90% run time)
+  - avoid logging all directory changes during renames (up to -60% run
+    time)
+  - avoid inode logging during rename and link when possible (up to -60%
+    run time)
+  - prepare extents to be logged before locking a log tree path
+    (throughput +7%)
+  - stop copying old file extents when doing a full fsync ()
+  - improved logging of old extents after truncate
+
+- remove balance v1 ioctl, superseded by v2 in 2012
+
+Core, fixes:
+
+- continued extent tree v2 preparatory work
+
+  - disable features that won't work yet
+  - add wrappers and abstractions for new tree roots
+
+- prevent deleting subvolume with active swapfile
+- remove device count in superblock and its item in one transaction so
+  they cant't get out of sync
+- for subpage, force the free space v2 mount to avoid a warning and
+  make it easy to switch a filesystem on different page size systems
+- export sysfs status of exclusive operation 'balance paused', so the
+  user space tools can recognize it and allow adding a device with
+  paused balance
+
+5.19 (Jul 2022)
+^^^^^^^^^^^^^^^
+
+Features:
+
+- subpage:
+
+  - support on PAGE_SIZE > 4K (previously only 64K)
+  - make it work with raid56
+  - prevent remount with v1 space cache
+
+- repair super block num_devices automatically if it does not match
+  the number of device items
+- defrag can convert inline extents to regular extents, up to now inline
+  files were skipped but the setting of mount option max_inline could
+  affect the decision logic
+
+- zoned:
+
+  - minimal accepted zone size is explicitly set to 4MiB
+  - make zone reclaim less aggressive and don't reclaim if there are
+    enough free zones
+  - add per-profile sysfs tunable of the reclaim threshold
+
+- allow automatic block group reclaim for non-zoned filesystems, with
+  sysfs tunables
+- tree-checker: new check, compare extent buffer owner against owner
+  rootid
+
+Performance:
+
+- avoid blocking on space reservation when doing nowait direct io
+  writes, (+7% throughput for reads and writes)
+- NOCOW write throughput improvement due to refined locking (+3%)
+- send: reduce pressure to page cache by dropping extent pages right
+  after they're processed
+
 4.x
 ---