538 lines
19 KiB
ReStructuredText
538 lines
19 KiB
ReStructuredText
Development notes
|
||
=================
|
||
|
||
Collection of various notes about development practices, how-to's or
|
||
checklists.
|
||
|
||
Adding a new ioctl, extending an existing one
|
||
---------------------------------------------
|
||
|
||
- add code to `strace <https://github.com/strace/strace>`__ so the ioctl calls
|
||
are parsed into a human readable form. Most of the ioctls are already
|
||
`implemented <https://github.com/strace/strace/blob/master/src/btrfs.c>`__ and
|
||
can be used a reference.
|
||
|
||
Tracepoints
|
||
-----------
|
||
|
||
The tracepoint message format should be compact and consistent, so please stick
|
||
to the following format:
|
||
|
||
- *key=value* no spaces around *=*
|
||
- separated by spaces, not commas
|
||
- named values: print value and string, like "%llu(%s)", no space between,
|
||
string in parens
|
||
- avoid abbreviating key values too much, (e.g. use 'size' not 'sz')
|
||
- hexadecimal values are always preceded by *0x* (use "0x%llx")
|
||
- use *struct btrfs_inode* for inode types, not plain *struct inode*
|
||
- inode number type is *u64*, use *btrfs_ino* if needed
|
||
- key can be printed as *[%llu,%u,%llu]*
|
||
- enum types need to be exported as *TRACE_DEFINE_ENUM*
|
||
|
||
**Example:**
|
||
|
||
event: *btrfs__chunk*
|
||
|
||
string: ``"root=%llu(%s) offset=%llu size=%llu num_stripes=%d sub_stripes=%d type=%s"``
|
||
|
||
|
||
Error messages, verbosity
|
||
-------------------------
|
||
|
||
- use *btrfs_\** helpers (btrfs_err, btrfs_info, ...), they print a filesystem
|
||
identification like ``BTRFS info (device sdb): ...``
|
||
- first letter in the string is lowercase
|
||
- message contents
|
||
|
||
- be descriptive
|
||
- keep the text length reasonable (fits one line without wrapping)
|
||
- no typos
|
||
- print values that refer to what happened (inode number, subvolume
|
||
id, path, offset, ...)
|
||
- print error value from the last call
|
||
- no *"\\n"* at the end of the string
|
||
- no *".*'' at the end of text
|
||
- un-indent the string so it fits under 80 columns
|
||
- don't split long strings, overflow 80 is ok in this case (we want
|
||
to be able to search for the error messages in the sources easily)
|
||
|
||
- value representation
|
||
|
||
- decimal: offsets, length, ordinary numbers
|
||
- hexadecimal: checksums
|
||
- hexadecimal + string: bitmasks (e.g. raid profiles, flags)
|
||
- intervals of integers:
|
||
|
||
- closed interval (end values inclusive): [0, 4096]
|
||
- half-open (right value excluded): [0, 4096)
|
||
- half-open (left value excluded): (0, 4096] -- that one may look
|
||
strange but is used in several places
|
||
|
||
Message level
|
||
-------------
|
||
|
||
- btrfs_err -- such messages have high visibility so use them for serious
|
||
problems that need user attention
|
||
- btrfs_warn -- conditions that are not too serious but can point to potential
|
||
problems, the system should be still in a good state
|
||
- btrfs_info -- use for informative messages that are useful to see what's
|
||
happening in the filesystem or might help debugging problems in the future
|
||
and are worth keeping in the logs
|
||
|
||
Error handling and transaction abort
|
||
------------------------------------
|
||
|
||
Please keep all transaction abort exactly at the place where they happen and do
|
||
not merge them to one. This pattern should be used everywhere and is important
|
||
when debugging because we can pinpoint the line in the code from the syslog
|
||
message and do not have to guess which way it got to the merged call.
|
||
|
||
Error handling and return values
|
||
--------------------------------
|
||
|
||
Functions are supposed to handle all errors of the callees and clean up the
|
||
local context before returning. If a function does not need to pass errors to
|
||
the caller it can be switched to return *void*. Before doing so make sure that:
|
||
|
||
- the function does not call any BUG/BUG_ON
|
||
- all callees properly handle errors and do not call BUG/BUG_ON in place of
|
||
error handling
|
||
- the whole call chain starting from the function satisfies the above
|
||
|
||
Handling unexpected conditions
|
||
------------------------------
|
||
|
||
This is different than error handling. An unexpected condition happens when the
|
||
code invariants/assumptions do not hold and there's no way to recover from the
|
||
situation. This means that returning an error to the caller can't be done and
|
||
continuing would only propagate the logic error further. The reasons for that
|
||
bug can be two fold: internal (a genuine bug) or external (e.g. memory bitflip,
|
||
memory corrupted by other subsystem). In this case it is allowed to use the
|
||
nuclear option and do BUG_ON, that is otherwise highly discouraged.
|
||
|
||
There are several ways how to react to the unexpected conditions:
|
||
|
||
- btrfs_abort_transaction()
|
||
|
||
The recommended way if and only if we can not recover from the situation and
|
||
have a transaction handle.
|
||
|
||
This would cause the filesystem to be flipped read-only to prevent further
|
||
corruption.
|
||
|
||
Additionally call trace would be dumpped for the first btrfs_abort_transaction()
|
||
call site.
|
||
|
||
- ASSERT()
|
||
|
||
Conditionally compiled in and crashes when the condition is false.
|
||
|
||
This should only be utilized for debugging purposes, acts as a fail-fast
|
||
option for developers, thus should not be utilized for error handling.
|
||
|
||
It's recommended only for very basic (thus sometimes unnecessary) requirements.
|
||
Such usage should be easy to locate, have no complex call chain.
|
||
E.g. to rule out invalid function parameter combinations.
|
||
|
||
Should not be utilized on any data/metadata reads from disks, as they can be
|
||
invalid. For sanity check examples of on-disk metadata, please refer to
|
||
`Tree checker`.
|
||
|
||
- WARN_ON
|
||
|
||
Unconditional and noisy checks, but still allow the code to continue.
|
||
|
||
Should only be utilized if a call trace is critical for debugging.
|
||
|
||
Not recommended if:
|
||
|
||
* The call site is unique or can be easily located
|
||
|
||
In that case, an error message is recommended.
|
||
|
||
* The call site would eventually lead to a btrfs_abort_transaction() call
|
||
|
||
btrfs_abort_transaction() call would dump the stack anyway.
|
||
If the call trace is critical, it's recommended to move the
|
||
btrfs_abort_transaction() call closer to the place where the error happens.
|
||
|
||
- BUG_ON
|
||
|
||
Should not be utilized, and is incrementally removed or replaced in the code.
|
||
|
||
Error injection using eBPF
|
||
--------------------------
|
||
|
||
Functions can be annotated to enable error injection using the eBPF scripts.
|
||
See e.g. ``disk-io.c:open_ctree``. For btrfs-specific injection, the annotation
|
||
is ALLOW_ERROR_INJECTION, but beware that this only overrides the return value
|
||
and this can leak memory or other resources. For error injection to generic
|
||
helpers (e.g. memory allocation), you can use something like
|
||
``bcc/tools/inject.py kmalloc btrfs_alloc_device() -P 0.001``
|
||
|
||
Resources:
|
||
|
||
- `eBPF <https://ebpf.io/>`_
|
||
- `BCC tools <https://github.com/iovisor/bcc>`_
|
||
|
||
Warnings and issues found by static checkers and similar tools
|
||
--------------------------------------------------------------
|
||
|
||
There are tools to automatically identify known issues in code and report them
|
||
as problems to be fixed, but not all such reports are valid or relevant in the
|
||
context of the code base. The fix should really fix the code, not just the
|
||
tool's warning. Such patches will be rejected with explanation first time and
|
||
ignored when sent repeatedly. Patches fixing real problems with a good
|
||
explanation are welcome. If you're not sure about sending such patch, please
|
||
ask the https://kernelnewbies.org/KernelJanitors for help.
|
||
|
||
Do not blindly report issues caught by:
|
||
|
||
- checkpatch.pl -- the script is good for catching some coding style but this
|
||
whole wiki page exists to be explicit what we want, not necessarily what
|
||
checkpatch wants
|
||
- clang static analyzer -- lots of the reports are not real problems and may
|
||
depend on a condition that's not recognized by the checker
|
||
- gcc -Wunused -- any of the -Wunused-\* options can report a valid issue but
|
||
it must be viewed in wider context and not just removed to get rid of the
|
||
warning
|
||
- codespell -- fixing typos is fine but should be done in batches and over
|
||
whole code base
|
||
|
||
Hints:
|
||
|
||
- if you find an issue, look in the whole code base if there are more instances
|
||
same or following a similar pattern
|
||
- look into git history of the changed code, why it got there and when, it may
|
||
help to understand if it's a bug or e.g. a stale code
|
||
|
||
Coding style preferences
|
||
------------------------
|
||
|
||
Before applying recommendations from this sections, please make sure you're
|
||
familiar with the `kernel coding style guide
|
||
<https://www.kernel.org/doc/html/latest/process/coding-style.html%7Cgeneric>`__.
|
||
|
||
The purpose of coding style is to maintain unified and consistent look & feel
|
||
of the patches and code, keeping distractions to minimum which decreases
|
||
cognitive load and allows focus on the important things. Coding style is not
|
||
only where to put white space or curly brackets but also coding patterns with
|
||
meaning that is established and understood in the developer group. The code in
|
||
linux kernel is maintained for a long period of time and maintainability is of
|
||
crucial importance. This means it does take time to write good code, with
|
||
attention to detail. Once written the code could stay unchanged for years but
|
||
will be read many times. `Read more
|
||
<https://simpleprogrammer.com/maintaining-code/>`__.
|
||
|
||
Patches
|
||
^^^^^^^
|
||
|
||
- for patch subject use "btrfs:" followed by a lowercase
|
||
- read the patch again and fix all typos and grammar
|
||
- size units should use short form K/M/G/T or IEC form KiB/MiB/GiB
|
||
- don't write references to parameters to subject (like removing @pointer)
|
||
- do not end subject with a dot '.'
|
||
- parts of btrfs that could have a subject prefix to point to a specific subsystem
|
||
|
||
- scrub, tests, integrity-checker, tree-checker, discard, locking, sysfs,
|
||
raid56, qgroup, compression, send, ioctl
|
||
|
||
- additional information
|
||
|
||
- if there's a stack trace relevant for the patch, add it ther (lockdep,
|
||
crash, warning)
|
||
- steps to reproduce a bug (that will also get turned to a proper fstests
|
||
case)
|
||
- sample output before/after if it could have impact on userspace
|
||
- `pahole <https://linux.die.net/man/1/pahole>`_ output if structure is being reorganized and optimized
|
||
|
||
Function declarations
|
||
^^^^^^^^^^^^^^^^^^^^^
|
||
|
||
- avoid prototypes for static functions, order them in new code in a way that
|
||
does not need it
|
||
|
||
- but don't move static functions just to get rid of the prototype
|
||
|
||
- exported functions have btrfs\_ prefix
|
||
- do not use functions with double underscore, there's only a few valid uses of
|
||
that, namely when *\__function* is doing the core of the work with looser
|
||
checks, no locks or more parameters than *function*
|
||
- function type and name are on the same line, if this is too long, the
|
||
parameters continue on the next line (indented)
|
||
- 'static inline' functions should be small (measured by their resulting binary
|
||
size)
|
||
- conditional definitions should follow the style below, where the full
|
||
function body is in .c
|
||
|
||
.. code-block:: c
|
||
|
||
#ifdef CONFIG_BTRFS_DEBUG
|
||
void btrfs_assert_everything_is_fine(void *ptr);
|
||
#else
|
||
void btrfs_assert_everything_is_fine(void *ptr) { }
|
||
#endif
|
||
|
||
Headers
|
||
^^^^^^^
|
||
|
||
- leave one newline before #endif in headers
|
||
|
||
Comments
|
||
^^^^^^^^
|
||
|
||
- function comment goes to the .c file, not the header
|
||
|
||
- kdoc style is recommended but the exact syntax is not mandated and
|
||
we're using only subset of the formatting
|
||
- the first line (mandatory): contains a brief description of what
|
||
the function does and should provide a summary information
|
||
|
||
- do write in the imperative style "Iterate all pages and clear
|
||
some bits"
|
||
- don't write "This function is a helper to ...", "This is used
|
||
to ..."
|
||
|
||
- parameter description (optional):
|
||
|
||
- each line describes the parameter
|
||
- the list needs to be in the same order as for the function
|
||
- the list needs to be complete
|
||
- trivial parameters don't need to be explained, e.g. fs_info is
|
||
clear so the description could be 'the filesystem'
|
||
- context of the parameters matters a lot in some cases and
|
||
cannot be inferred from the name, then it should be documented
|
||
|
||
.. code-block:: c
|
||
|
||
/*
|
||
* Look for blocks in the given offset.
|
||
*
|
||
* @fs_info: trivial parameters should be in the list but with some short description
|
||
* @offset: describe the context of the argument, e.g. offset to page or inode ...
|
||
*
|
||
* Long description comes here if necessary.
|
||
*
|
||
* Return value semantics if it's not obvious
|
||
*/
|
||
|
||
- comment new enum/define values, brief description or pointers to the code
|
||
that uses them
|
||
- comment new data structures, their purpose and context of use
|
||
- do not put struct member comments on the same line, put it on the line
|
||
before and do not trim/abbreviate the text
|
||
- comment text that fits on one line can use the */\* text \*/* format, slight
|
||
overflow of 80 chars is OK
|
||
|
||
Misc
|
||
^^^^
|
||
|
||
- fix spelling, grammar and formatting of comments that get moved or changed
|
||
- fix coding style in code that's only moved
|
||
- one newline between functions
|
||
|
||
Locking
|
||
^^^^^^^
|
||
|
||
- do not use ``spin_is_locked`` but ``lockdep_assert_held``
|
||
- do not use ``assert_spin_locked`` without reading it's semantics (it does
|
||
not check if the caller hold the lock)
|
||
- use ``lockdep_assert_held`` and its friends for lock assertions
|
||
- add lock assertions to functions called deep in the call chain
|
||
|
||
Code
|
||
^^^^
|
||
|
||
- default function return value is *int ret*, temporary return values should
|
||
be named like *ret2* etc
|
||
- structure initializers should use *{ 0 }*
|
||
- do not use *short* type if possible, if it fits to char/u8 use it instead,
|
||
or plain int
|
||
|
||
Output
|
||
^^^^^^
|
||
|
||
- when dumping a lot of data after an error, consider what will remain visible
|
||
last
|
||
|
||
- in case of ``btrfs_print_leaf``, print the specific error message after
|
||
that
|
||
|
||
Expressions, operators
|
||
^^^^^^^^^^^^^^^^^^^^^^
|
||
|
||
- spaces around binary operators, no spaces for unary operators
|
||
- extra braces around expressions that might be harder to understand wrt
|
||
precedence are fine (logical and/or, shifts with other operations)
|
||
|
||
- *a \* b + c*, *(a << b) + c*, *(a % b) + c*
|
||
|
||
- 64bit division is not allowed, either avoid it completely, or use bit
|
||
shifts or use div_u64 helpers
|
||
- do not use chained assignments: no *a = b = c;*
|
||
|
||
Variable declarations, parameters
|
||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
||
- declaration block in a function should do only simple initializations
|
||
(pointers, constants); nothing that would require error handling or has
|
||
non-trivial side effects
|
||
- use *const* extensively
|
||
- add temporary variable to store a value if it's used multiple times in the
|
||
function, or if reading the value needs to chase a long pointer chain
|
||
|
||
Kernel config options
|
||
---------------------
|
||
|
||
Compile-time config options for kernel that can help debugging, testing. They
|
||
usually take a hit on performance or resources (memory) so they should be
|
||
selected wisely. The options in **bold** should be safe to use by default for
|
||
debugging builds.
|
||
|
||
Please refer to the option documentation for further details.
|
||
|
||
- devices for testing
|
||
|
||
- **CONFIG_BLK_DEV_LOOP** - enable loop device
|
||
- for fstests: **DM_FLAKEY**, **CONFIG_FAIL_MAKE_REQUEST**
|
||
- **CONFIG_SCSI_DEBUG** - fake scsi block device
|
||
|
||
- memory
|
||
|
||
- **CONFIG_SLUB_DEBUG** - boot with slub_debug
|
||
- CONFIG_DEBUG_PAGEALLOC + CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT (on
|
||
newer kernels)
|
||
- CONFIG_SCHED_STACK_END_CHECK
|
||
- CONFIG_PAGE_POISONING
|
||
- CONFIG_HAVE_DEBUG_KMEMLEAK
|
||
- CONFIG_FAILSLAB -- fault injection to kmalloc
|
||
- CONFIG_DEBUG_LIST
|
||
- CONFIG_BUG_ON_DATA_CORRUPTION
|
||
|
||
- btrfs
|
||
|
||
- **CONFIG_BTRFS_DEBUG**
|
||
- **CONFIG_BTRFS_ASSERT**
|
||
- **CONFIG_BTRFS_FS_RUN_SANITY_TESTS** -- basic tests on module load
|
||
- **CONFIG_BTRFS_FS_CHECK_INTEGRITY** -- block integrity checker
|
||
enabled by mount options
|
||
- **CONFIG_BTRFS_FS_REF_VERIFY** -- additional checks for block
|
||
references
|
||
|
||
- locking
|
||
|
||
- CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_MUTEXES
|
||
- CONFIG_DEBUG_LOCK_ALLOC
|
||
- CONFIG_PROVE_LOCKING, CONFIG_LOCKDEP
|
||
- CONFIG_LOCK_STAT
|
||
- CONFIG_PROVE_RCU
|
||
- CONFIG_DEBUG_ATOMIC_SLEEP
|
||
|
||
- sanity checks
|
||
|
||
- CONFIG_DEBUG_STACK_USAGE, CONFIG_HAVE_DEBUG_STACKOVERFLOW,
|
||
CONFIG_DEBUG_STACKOVERFLOW
|
||
- CONFIG_STACKTRACE
|
||
- CONFIG_KASAN -- address sanitizer
|
||
- CONFIG_UBSAN -- undefined behaviour sanitizer
|
||
- CONFIG_KCSAN -- concurrency checker
|
||
|
||
- verbose reporting
|
||
|
||
- CONFIG_DEBUG_BUGVERBOSE
|
||
|
||
- tracing
|
||
|
||
- CONFIG_TRACING etc
|
||
|
||
BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!
|
||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
||
Not a bug. The lockdep structures can get in some cases full and cannot
|
||
properly track locks anymore. There's only a workaround to increase the kernel
|
||
config value of CONFIG_LOCKDEP_CHAINS_BITS, default is
|
||
16, 18 tends to work, increase if needed.
|
||
|
||
fstests setup
|
||
-------------
|
||
|
||
The `fstests <https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/>`_
|
||
suite has very few "hard" requirements and will succeed without
|
||
actually running many tests. In order to ensure full test coverage, your test
|
||
environment should provide the settings from the following sections. Please
|
||
note that newly added tests silently add new dependencies, so you should always
|
||
review results after an update.
|
||
|
||
|
||
Kernel config options for complete test coverage
|
||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
||
- ``CONFIG_FAULT_INJECTION=y``
|
||
- ``CONFIG_FAULT_INJECTION_DEBUG_FS=y``
|
||
- ``CONFIG_FAIL_MAKE_REQUEST=y``
|
||
- ``CONFIG_DM_FLAKEY=m`` or ``y``
|
||
- ``CONFIG_DM_THIN_PROVISIONING=m`` or ``y``
|
||
- ``CONFIG_DM_SNAPSHOT=m`` or ``y``
|
||
- ``CONFIG_DM_DELAY=m`` or ``y``
|
||
- ``CONFIG_DM_ERROR=m`` or ``y``
|
||
- ``CONFIG_DM_LOG_WRITES=m`` or ``y``
|
||
- ``CONFIG_DM_DUST=m`` or ``y``
|
||
- ``CONFIG_DM_ZERO=m`` or ``y``
|
||
- ``CONFIG_BLK_DEV_LOOP=m`` or ``y``
|
||
- ``CONFIG_EXT4_FS=m`` or ``y``
|
||
- ``CONFIG_SCSI_DEBUG=m``
|
||
- ``CONFIG_BLK_DEV_ZONED=y`` for zoned mode test coverage
|
||
- ``CONFIG_IO_URING==y``
|
||
|
||
|
||
Kernel config options for better bug reports
|
||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
||
See the list in the section above for more options.
|
||
|
||
|
||
User space utilities and development library dependencies
|
||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
||
- acl
|
||
- attr
|
||
- btrfsprogs
|
||
- dbench
|
||
- dmsetup (device-mapper)
|
||
- duperemove
|
||
- e2fsprogs
|
||
- fio
|
||
- fsverity-utils
|
||
- libacl
|
||
- libaio
|
||
- libattr
|
||
- libcap-progs
|
||
- libuuid
|
||
- lvm2
|
||
- openssl
|
||
- parted
|
||
- psmisc (killall)
|
||
- xfsprogs >= 4.3.1 (``xfs_io -c reflink`` is required)
|
||
|
||
Note: This list may be incomplete.
|
||
|
||
Storage environment
|
||
^^^^^^^^^^^^^^^^^^^
|
||
|
||
- At least 4 identically sized partitions/disks/virtual disks, specified using
|
||
``$SCRATCH_DEV_POOL``, some tests may require 8 such partitions
|
||
- some tests need at least 10G of free space, as determined by ``df``, i.e.
|
||
the size of the device may need to be larger
|
||
- some tests require ``$LOGWRITES_DEV``, yet another separate block device,
|
||
for power fail testing
|
||
- for testing trim and discard, the devices must be capable of that (physical
|
||
or virtual)
|
||
|
||
Other requirements
|
||
^^^^^^^^^^^^^^^^^^
|
||
|
||
- An ``fsgqa`` user and group must exist.
|
||
- An ``fsgqa2`` user and group must exist.
|
||
- The user ``nobody`` must exist.
|
||
- An ``123456-fsgqa`` user and group must exist.
|