mirror of
https://github.com/kdave/btrfs-progs
synced 2025-01-29 00:42:45 +00:00
487 lines
17 KiB
ReStructuredText
487 lines
17 KiB
ReStructuredText
|
Development notes
|
|||
|
=================
|
|||
|
|
|||
|
Collection of various notes about development practices, how-to's or
|
|||
|
checklists.
|
|||
|
|
|||
|
Adding a new ioctl, extending an existing one
|
|||
|
---------------------------------------------
|
|||
|
|
|||
|
- add code to `strace <https://github.com/strace/strace>`__ so the ioctl calls
|
|||
|
are parsed into a human readable form. Most of the ioctls are already
|
|||
|
`implemented <https://github.com/strace/strace/blob/master/btrfs.c>`__ and
|
|||
|
can be used a reference.
|
|||
|
|
|||
|
Tracepoints
|
|||
|
-----------
|
|||
|
|
|||
|
The tracepoint message format should be compact and consistent, so please stick
|
|||
|
to the following format:
|
|||
|
|
|||
|
- *key=value* no spaces around *=*
|
|||
|
- separated by spaces, not commas
|
|||
|
- named values: print value and string, like "%llu(%s)", no space between,
|
|||
|
string in parens
|
|||
|
- avoid abbreviating key values too much, (e.g. use 'size' not 'sz')
|
|||
|
- hexadecimal values are always preceded by *0x* (use "0x%llx")
|
|||
|
- use *struct btrfs_inode* for inode types, not plain *struct inode*
|
|||
|
- inode number type is *u64*, use *btrfs_ino* if needed
|
|||
|
- key can be printed as *[%llu,%u,%llu]*
|
|||
|
- enum types need to be exported as *TRACE_DEFINE_ENUM*
|
|||
|
|
|||
|
**Example:**
|
|||
|
|
|||
|
event: *btrfs__chunk*
|
|||
|
|
|||
|
string: ``"root=%llu(%s) offset=%llu size=%llu num_stripes=%d sub_stripes=%d type=%s"``
|
|||
|
|
|||
|
|
|||
|
Error messages, verbosity
|
|||
|
-------------------------
|
|||
|
|
|||
|
- use *btrfs_\** helpers (btrfs_err, btrfs_info, ...), they print a filesystem
|
|||
|
identification like ``BTRFS info (device sdb): ...``
|
|||
|
- first letter in the string is lowercase
|
|||
|
- message contents
|
|||
|
|
|||
|
- be descriptive
|
|||
|
- keep the text length reasonable (fits one line without wrapping)
|
|||
|
- no typos
|
|||
|
- print values that refer to what happened (inode number, subvolume
|
|||
|
id, path, offset, ...)
|
|||
|
- print error value from the last call
|
|||
|
- no *"\\n"* at the end of the string
|
|||
|
- no *".*'' at the end of text
|
|||
|
- un-indent the string so it fits under 80 columns
|
|||
|
- don't split long strings, overflow 80 is ok in this case (we want
|
|||
|
to be able to search for the error messages in the sources easily)
|
|||
|
|
|||
|
- value representation
|
|||
|
|
|||
|
- decimal: offsets, length, ordinary numbers
|
|||
|
- hexadecimal: checksums
|
|||
|
- hexadecimal + string: bitmasks (e.g. raid profiles, flags)
|
|||
|
- intervals of integers:
|
|||
|
|
|||
|
- closed interval (end values inclusive): [0, 4096]
|
|||
|
- half-open (right value excluded): [0, 4096)
|
|||
|
- half-open (left value excluded): (0, 4096] -- that one may look
|
|||
|
strange but is used in several places
|
|||
|
|
|||
|
Message level
|
|||
|
-------------
|
|||
|
|
|||
|
- btrfs_err -- such messages have high visibility so use them for serious
|
|||
|
problems that need user attention
|
|||
|
- btrfs_warn -- conditions that are not too serious but can point to potential
|
|||
|
problems, the system should be still in a good state
|
|||
|
- btrfs_info -- use for informative messages that are useful to see what's
|
|||
|
happening in the filesystem or might help debugging problems in the future
|
|||
|
and are worth keeping in the logs
|
|||
|
|
|||
|
Error handling and transaction abort
|
|||
|
------------------------------------
|
|||
|
|
|||
|
Please keep all transaction abort exactly at the place where they happen and do
|
|||
|
not merge them to one. This pattern should be used everywhere and is important
|
|||
|
when debugging because we can pinpoint the line in the code from the syslog
|
|||
|
message and do not have to guess which way it got to the merged call.
|
|||
|
|
|||
|
Error handling and return values
|
|||
|
--------------------------------
|
|||
|
|
|||
|
Functions are supposed to handle all errors of the callees and clean up the
|
|||
|
local context before returning. If a function does not need to pass errors to
|
|||
|
the caller it can be switched to return *void*. Before doing so make sure that:
|
|||
|
|
|||
|
- the function does not call any BUG/BUG_ON
|
|||
|
- all callees properly handle errors and do not call BUG/BUG_ON in place of
|
|||
|
error handling
|
|||
|
- the whole call chain starting from the function satisfies the above
|
|||
|
|
|||
|
Handling unexpected conditions
|
|||
|
------------------------------
|
|||
|
|
|||
|
This is different than error handling. An unexpected condition happens when the
|
|||
|
code invariants/assumptions do not hold and there's no way to recover from the
|
|||
|
situation. This means that returning an error to the caller can't be done and
|
|||
|
continuing would only propagate the logic error further. The reasons for that
|
|||
|
bug can be two fold: internal (a genuine bug) or external (e.g. memory bitflip,
|
|||
|
memory corrupted by other subsystem). In this case it is allowed to use the
|
|||
|
nuclear option and do BUG_ON, that is otherwise highly discouraged.
|
|||
|
|
|||
|
There are several ways how to react to the unexpected conditions:
|
|||
|
|
|||
|
- ASSERT -- conditionally compiled in and crashes when the condition is false,
|
|||
|
this is supposed to catch 'must never happen' at the time of development,
|
|||
|
code must not continue
|
|||
|
- WARN_ON -- light check that is visible in the log and allows the code to
|
|||
|
continue but the reasons must be investigated
|
|||
|
- BUG_ON -- last resort, checks condition that 'must never happen' and
|
|||
|
continuing would cause more harm than the instant crash; code should always
|
|||
|
try to avoid using it, but there are cases when sanity and invariant checks
|
|||
|
are done in advance
|
|||
|
|
|||
|
Error injection using eBPF
|
|||
|
--------------------------
|
|||
|
|
|||
|
Functions can be annotated to enable error injection using the eBPF scripts.
|
|||
|
See e.g. ``disk-io.c:open_ctree``. For btrfs-specific injection, the annotation
|
|||
|
is ALLOW_ERROR_INJECTION, but beware that this only overrides the return value
|
|||
|
and this can leak memory or other resources. For error injection to generic
|
|||
|
helpers (e.g. memory allocation), you can use something like
|
|||
|
``bcc/tools/inject.py kmalloc btrfs_alloc_device() -P 0.001``
|
|||
|
|
|||
|
Resources:
|
|||
|
|
|||
|
- eBPF
|
|||
|
- BCC tools
|
|||
|
|
|||
|
Warnings and issues found by static checkers and similar tools
|
|||
|
--------------------------------------------------------------
|
|||
|
|
|||
|
There are tools to automatically identify known issues in code and report them
|
|||
|
as problems to be fixed, but not all such reports are valid or relevant in the
|
|||
|
context of the code base. The fix should really fix the code, not just the
|
|||
|
tool's warning. Such patches will be rejected with explanation first time and
|
|||
|
ignored when sent repeatedly. Patches fixing real problems with a good
|
|||
|
explanation are welcome. If you're not sure about sending such patch, please
|
|||
|
ask the https://kernelnewbies.org/KernelJanitors for help.
|
|||
|
|
|||
|
Do not blindly report issues caught by:
|
|||
|
|
|||
|
- checkpatch.pl -- the script is good for catching some coding style but this
|
|||
|
whole wiki page exists to be explicit what we want, not necessarily what
|
|||
|
checkpatch wants
|
|||
|
- clang static analyzer -- lots of the reports are not real problems and may
|
|||
|
depend on a condition that's not recognized by the checker
|
|||
|
- gcc -Wunused -- any of the -Wunused-\* options can report a valid issue but
|
|||
|
it must be viewed in wider context and not just removed to get rid of the
|
|||
|
warning
|
|||
|
- codespell -- fixing typos is fine but should be done in batches and over
|
|||
|
whole code base
|
|||
|
|
|||
|
Hints:
|
|||
|
|
|||
|
- if you find an issue, look in the whole code base if there are more instances
|
|||
|
same or following a similar pattern
|
|||
|
- look into git history of the changed code, why it got there and when, it may
|
|||
|
help to understand if it's a bug or e.g. a stale code
|
|||
|
|
|||
|
Coding style preferences
|
|||
|
------------------------
|
|||
|
|
|||
|
Before applying recommendations from this sections, please make sure you're
|
|||
|
familiar with the `kernel coding style guide
|
|||
|
<https://www.kernel.org/doc/html/latest/process/coding-style.html%7Cgeneric>`__.
|
|||
|
|
|||
|
The purpose of coding style is to maintain unified and consistent look & feel
|
|||
|
of the patches and code, keeping distractions to minimum which decreases
|
|||
|
cognitive load and allows focus on the important things. Coding style is not
|
|||
|
only where to put white space or curly brackets but also coding patterns with
|
|||
|
meaning that is established and understood in the developer group. The code in
|
|||
|
linux kernel is maintained for a long period of time and maintainability is of
|
|||
|
crucial importance. This means it does take time to write good code, with
|
|||
|
attention to detail. Once written the code could stay unchanged for years but
|
|||
|
will be read many times. `Read more
|
|||
|
<https://simpleprogrammer.com/maintaining-code/>`__.
|
|||
|
|
|||
|
Patches
|
|||
|
^^^^^^^
|
|||
|
|
|||
|
- for patch subject use "btrfs:" followed by a lowercase
|
|||
|
- read the patch again and fix all typos and grammar
|
|||
|
- size units should use short form K/M/G/T or IEC form KiB/MiB/GiB
|
|||
|
- don't write references to parameters to subject (like removing @pointer)
|
|||
|
- do not end subject with a dot '.'
|
|||
|
- parts of btrfs that could have a subject prefix to point to a specific subsystem
|
|||
|
|
|||
|
- scrub, tests, integrity-checker, tree-checker, discard, locking, sysfs,
|
|||
|
raid56, qgroup, compression, send, ioctl
|
|||
|
|
|||
|
- additional information
|
|||
|
|
|||
|
- if there's a stack trace relevant for the patch, add it ther (lockdep,
|
|||
|
crash, warning)
|
|||
|
- steps to reproduce a bug (that will also get turned to a proper fstests
|
|||
|
case)
|
|||
|
- sample output before/after if it could have impact on userspace
|
|||
|
- *pahole* output if structure is being reorganized and optimized
|
|||
|
|
|||
|
Function declarations
|
|||
|
^^^^^^^^^^^^^^^^^^^^^
|
|||
|
|
|||
|
- avoid prototypes for static functions, order them in new code in a way that
|
|||
|
does not need it
|
|||
|
|
|||
|
- but don't move static functions just to get rid of the prototype
|
|||
|
|
|||
|
- exported functions have btrfs\_ prefix
|
|||
|
- do not use functions with double underscore, there's only a few valid uses of
|
|||
|
that, namely when *\__function* is doing the core of the work with looser
|
|||
|
checks, no locks or more parameters than *function*
|
|||
|
- function type and name are on the same line, if this is too long, the
|
|||
|
parameters continue on the next line (indented)
|
|||
|
- 'static inline' functions should be small (measured by their resulting binary
|
|||
|
size)
|
|||
|
- conditional definitions should follow the style below, where the full
|
|||
|
function body is in .c
|
|||
|
|
|||
|
.. code-block:: c
|
|||
|
|
|||
|
#ifdef CONFIG_BTRFS_DEBUG
|
|||
|
void btrfs_assert_everything_is_fine(void *ptr);
|
|||
|
#else
|
|||
|
void btrfs_assert_everything_is_fine(void *ptr) { }
|
|||
|
#endif
|
|||
|
|
|||
|
Headers
|
|||
|
^^^^^^^
|
|||
|
|
|||
|
- leave one newline before #endif in headers
|
|||
|
|
|||
|
Comments
|
|||
|
^^^^^^^^
|
|||
|
|
|||
|
- function comment goes to the .c file, not the header
|
|||
|
|
|||
|
- kdoc style is recommended but the exact syntax is not mandated and
|
|||
|
we're using only subset of the formatting
|
|||
|
- the first line (mandatory): contains a brief description of what
|
|||
|
the function does and should provide a summary information
|
|||
|
|
|||
|
- do write in the imperative style "Iterate all pages and clear
|
|||
|
some bits"
|
|||
|
- don't write "This function is a helper to ...", "This is used
|
|||
|
to ..."
|
|||
|
|
|||
|
- parameter description (optional):
|
|||
|
|
|||
|
- each line describes the parameter
|
|||
|
- the list needs to be in the same order as for the function
|
|||
|
- the list needs to be complete
|
|||
|
- trivial parameters don't need to be explained, e.g. fs_info is
|
|||
|
clear so the description could be 'the filesystem'
|
|||
|
- context of the parameters matters a lot in some cases and
|
|||
|
cannot be inferred from the name, then it should be documented
|
|||
|
|
|||
|
.. code-block:: c
|
|||
|
|
|||
|
/*
|
|||
|
* Look for blocks in the given offset.
|
|||
|
*
|
|||
|
* @fs_info: trivial parameters should be in the list but with some short description
|
|||
|
* @offset: describe the context of the argument, e.g. offset to page or inode ...
|
|||
|
*
|
|||
|
* Long description comes here if necessary.
|
|||
|
*
|
|||
|
* Return value semantics if it's not obvious
|
|||
|
*/
|
|||
|
|
|||
|
- comment new enum/define values, brief description or pointers to the code
|
|||
|
that uses them
|
|||
|
- comment new data structures, their purpose and context of use
|
|||
|
- do not put struct member comments on the same line, put it on the line
|
|||
|
before and do not trim/abbreviate the text
|
|||
|
- comment text that fits on one line can use the */\* text \*/* format, slight
|
|||
|
overflow of 80 chars is OK
|
|||
|
|
|||
|
Misc
|
|||
|
^^^^
|
|||
|
|
|||
|
- fix spelling, grammar and formatting of comments that get moved or changed
|
|||
|
- fix coding style in code that's only moved
|
|||
|
- one newline between functions
|
|||
|
|
|||
|
Locking
|
|||
|
^^^^^^^
|
|||
|
|
|||
|
- do not use ``spin_is_locked`` but ``lockdep_assert_held``
|
|||
|
- do not use ``assert_spin_locked`` without reading it's semantics (it does
|
|||
|
not check if the caller hold the lock)
|
|||
|
- use ``lockdep_assert_held`` and its friends for lock assertions
|
|||
|
- add lock assertions to functions called deep in the call chain
|
|||
|
|
|||
|
Code
|
|||
|
^^^^
|
|||
|
|
|||
|
- default function return value is *int ret*, temporary return values should
|
|||
|
be named like *ret2* etc
|
|||
|
- structure initializers should use *{ 0 }*
|
|||
|
- do not use *short* type if possible, if it fits to char/u8 use it instead,
|
|||
|
or plain int
|
|||
|
|
|||
|
Output
|
|||
|
^^^^^^
|
|||
|
|
|||
|
- when dumping a lot of data after an error, consider what will remain visible
|
|||
|
last
|
|||
|
|
|||
|
- in case of ``btrfs_print_leaf``, print the specific error message after
|
|||
|
that
|
|||
|
|
|||
|
Expressions, operators
|
|||
|
^^^^^^^^^^^^^^^^^^^^^^
|
|||
|
|
|||
|
- spaces around binary operators, no spaces for unary operators
|
|||
|
- extra braces around expressions that might be harder to understand wrt
|
|||
|
precedence are fine (logical and/or, shifts with other operations)
|
|||
|
|
|||
|
- *a \* b + c*, *(a << b) + c*, *(a % b) + c*
|
|||
|
|
|||
|
- 64bit division is not allowed, either avoid it completely, or use bit
|
|||
|
shifts or use div_u64 helpers
|
|||
|
- do not use chained assignments: no *a = b = c;*
|
|||
|
|
|||
|
Variable declarations, parameters
|
|||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|||
|
|
|||
|
- declaration block in a function should do only simple initializations
|
|||
|
(pointers, constants); nothing that would require error handling or has
|
|||
|
non-trivial side effects
|
|||
|
- use *const* extensively
|
|||
|
- add temporary variable to store a value if it's used multiple times in the
|
|||
|
function, or if reading the value needs to chase a long pointer chain
|
|||
|
|
|||
|
Kernel config options
|
|||
|
---------------------
|
|||
|
|
|||
|
Compile-time config options for kernel that can help debugging, testing. They
|
|||
|
usually take a hit on performance or resources (memory) so they should be
|
|||
|
selected wisely. The options in **bold** should be safe to use by default for
|
|||
|
debugging builds.
|
|||
|
|
|||
|
Please refer to the option documentation for further details.
|
|||
|
|
|||
|
- devices for testing
|
|||
|
|
|||
|
- **CONFIG_BLK_DEV_LOOP** - enable loop device
|
|||
|
- for fstests: **DM_FLAKEY**, **CONFIG_FAIL_MAKE_REQUEST**
|
|||
|
- **CONFIG_SCSI_DEBUG** - fake scsi block device
|
|||
|
|
|||
|
- memory
|
|||
|
|
|||
|
- **CONFIG_SLUB_DEBUG** - boot with slub_debug
|
|||
|
- CONFIG_DEBUG_PAGEALLOC + CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT (on
|
|||
|
newer kernels)
|
|||
|
- CONFIG_SCHED_STACK_END_CHECK
|
|||
|
- CONFIG_PAGE_POISONING
|
|||
|
- CONFIG_HAVE_DEBUG_KMEMLEAK
|
|||
|
- CONFIG_FAILSLAB -- fault injection to kmalloc
|
|||
|
- CONFIG_DEBUG_LIST
|
|||
|
- CONFIG_BUG_ON_DATA_CORRUPTION
|
|||
|
|
|||
|
- btrfs
|
|||
|
|
|||
|
- **CONFIG_BTRFS_DEBUG**
|
|||
|
- **CONFIG_BTRFS_ASSERT**
|
|||
|
- **CONFIG_BTRFS_FS_RUN_SANITY_TESTS** -- basic tests on module load
|
|||
|
- **CONFIG_BTRFS_FS_CHECK_INTEGRITY** -- block integrity checker
|
|||
|
enabled by mount options
|
|||
|
- **CONFIG_BTRFS_FS_REF_VERIFY** -- additional checks for block
|
|||
|
references
|
|||
|
|
|||
|
- locking
|
|||
|
|
|||
|
- CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_MUTEXES
|
|||
|
- CONFIG_DEBUG_LOCK_ALLOC
|
|||
|
- CONFIG_PROVE_LOCKING, CONFIG_LOCKDEP
|
|||
|
- CONFIG_LOCK_STAT
|
|||
|
- CONFIG_PROVE_RCU
|
|||
|
- CONFIG_DEBUG_ATOMIC_SLEEP
|
|||
|
|
|||
|
- sanity checks
|
|||
|
|
|||
|
- CONFIG_DEBUG_STACK_USAGE, CONFIG_HAVE_DEBUG_STACKOVERFLOW,
|
|||
|
CONFIG_DEBUG_STACKOVERFLOW
|
|||
|
- CONFIG_STACKTRACE
|
|||
|
- CONFIG_KASAN -- address sanitizer
|
|||
|
- CONFIG_UBSAN -- undefined behaviour sanitizer
|
|||
|
- CONFIG_KCSAN -- concurrency checker
|
|||
|
|
|||
|
- verbose reporting
|
|||
|
|
|||
|
- CONFIG_DEBUG_BUGVERBOSE
|
|||
|
|
|||
|
- tracing
|
|||
|
|
|||
|
- CONFIG_TRACING etc
|
|||
|
|
|||
|
BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!
|
|||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|||
|
|
|||
|
Not a bug. Increase the config value of LOCKDEP_CHAINS_BITS, default is
|
|||
|
16, 18 tends to work, increase if needed.
|
|||
|
|
|||
|
fstests setup
|
|||
|
-------------
|
|||
|
|
|||
|
The fstests suite has very few "hard" requirements and will succeed without
|
|||
|
actually running many tests. In order to ensure full test coverage, your test
|
|||
|
environment should provide the settings from the following sections. Please
|
|||
|
note that newly added tests silently add new dependencies, so you should always
|
|||
|
review results after an update.
|
|||
|
|
|||
|
|
|||
|
Kernel config options for complete test coverage
|
|||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|||
|
|
|||
|
- ``CONFIG_FAULT_INJECTION=y``
|
|||
|
- ``CONFIG_FAULT_INJECTION_DEBUG_FS=y``
|
|||
|
- ``CONFIG_FAIL_MAKE_REQUEST=y``
|
|||
|
- ``CONFIG_DM_FLAKEY=m`` or ``y``
|
|||
|
- ``CONFIG_DM_THIN_PROVISIONING=m`` or ``y``
|
|||
|
- ``CONFIG_DM_SNAPSHOT=m`` or ``y``
|
|||
|
- ``CONFIG_DM_DELAY=m`` or ``y``
|
|||
|
- ``CONFIG_DM_ERROR=m`` or ``y``
|
|||
|
- ``CONFIG_DM_LOG_WRITES=m`` or ``y``
|
|||
|
- ``CONFIG_DM_DUST=m`` or ``y``
|
|||
|
- ``CONFIG_BLK_DEV_LOOP=m`` or ``y``
|
|||
|
- ``CONFIG_EXT4_FS=m`` or ``y``
|
|||
|
- ``CONFIG_SCSI_DEBUG=m``
|
|||
|
- ``CONFIG_BLK_DEV_ZONED=y`` for zoned mode test coverage
|
|||
|
|
|||
|
|
|||
|
Kernel config options for better bug reports
|
|||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|||
|
|
|||
|
See the list in the section above for more options.
|
|||
|
|
|||
|
|
|||
|
User space utilities and development library dependencies
|
|||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|||
|
|
|||
|
- fio
|
|||
|
- dmsetup (device-mapper)
|
|||
|
- lvm
|
|||
|
- xfsprogs >= 4.3.1 (``xfs_io -c reflink`` is required)
|
|||
|
- btrfsprogs
|
|||
|
- dbench
|
|||
|
- openssl
|
|||
|
- libacl
|
|||
|
- libattr
|
|||
|
- libaio
|
|||
|
- libuuid
|
|||
|
- libcap-progs
|
|||
|
- duperemove
|
|||
|
|
|||
|
Note: This list may be incomplete.
|
|||
|
|
|||
|
Storage environment
|
|||
|
^^^^^^^^^^^^^^^^^^^
|
|||
|
|
|||
|
- At least 4 identically sized partitions/disks/virtual disks, specified using
|
|||
|
``$SCRATCH_DEV_POOL``, some tests may require 6 such partitions
|
|||
|
- some tests need at least 10G of free space, as determined by ``df``, i.e.
|
|||
|
the size of the device may need to be larger
|
|||
|
- some tests require ``$LOGWRITES_DEV``, yet another separate block device,
|
|||
|
for power fail testing
|
|||
|
- for testing trim and discard, the devices must be capable of that (physical
|
|||
|
or virtual)
|
|||
|
|
|||
|
Other requirements
|
|||
|
^^^^^^^^^^^^^^^^^^
|
|||
|
|
|||
|
- An ``fsgqa`` user and group must exist.
|
|||
|
- An ``123456-fsgqa`` user and group must exist.
|