haproxy/doc/internals/api/mt_list.txt
Willy Tarreau 4e65fc66f6 MAJOR: import: update mt_list to support exponential back-off (try #2)
This is the second attempt at importing the updated mt_list code (commit
59459ea3). The previous one was attempted with commit c618ed5ff4 ("MAJOR:
import: update mt_list to support exponential back-off") but revealed
problems with QUIC connections and was reverted.

The problem that was faced was that elements deleted inside an iterator
were no longer reset, and that if they were to be recycled in this form,
they could appear as busy to the next user. This was trivially reproduced
with this:

  $ cat quic-repro.cfg
  global
          stats socket /tmp/sock1 level admin
          stats timeout 1h
          limited-quic

  frontend stats
          mode http
          bind quic4@:8443 ssl crt rsa+dh2048.pem alpn h3
          timeout client 5s
          stats uri /

  $ ./haproxy -db -f quic-repro.cfg  &

  $ h2load -c 10 -n 100000 --npn h3 https://127.0.0.1:8443/
  => hang

This was purely an API issue caused by the simplified usage of the macros
for the iterator. The original version had two backups (one full element
and one pointer) that the user had to take care of, while the new one only
uses one that is transparent for the user. But during removal, the element
still has to be unlocked if it's going to be reused.

All of this sparked discussions with Fred and Aurélien regarding the still
unclear state of locking. It was found that the lock API does too much at
once and is lacking granularity. The new version offers a much more fine-
grained control allowing to selectively lock/unlock an element, a link,
the rest of the list etc.

It was also found that plenty of places just want to free the current
element, or delete it to do anything with it, hence don't need to reset
its pointers (e.g. event_hdl). Finally it appeared obvious that the
root cause of the problem was the unclear usage of the list iterators
themselves because one does not necessarily expect the element to be
presented locked when not needed, which makes the unlock easy to overlook
during reviews.

The updated version of the list presents explicit lock status in the
macro name (_LOCKED or _UNLOCKED suffixes). When using the _LOCKED
suffix, the caller is expected to unlock the element if it intends to
reuse it. At least the status is advertised. The _UNLOCKED variant,
instead, always unlocks it before starting the loop block. This means
it's not necessary to think about unlocking it, though it's obviously
not usable with everything. A few _UNLOCKED were used at obvious places
(i.e. where the element is deleted and freed without any prior check).

Interestingly, the tests performed last year on QUIC forwarding, that
resulted in limited traffic for the original version and higher bit
rate for the new one couldn't be reproduced because since then the QUIC
stack has gaind in efficiency, and the 100 Gbps barrier is now reached
with or without the mt_list update. However the unit tests definitely
show a huge difference, particularly on EPYC platforms where the EBO
provides tremendous CPU savings.

Overall, the following changes are visible from the application code:

  - mt_list_for_each_entry_safe() + 1 back elem + 1 back ptr
    => MT_LIST_FOR_EACH_ENTRY_LOCKED() or MT_LIST_FOR_EACH_ENTRY_UNLOCKED()
       + 1 back elem

  - MT_LIST_DELETE_SAFE() no longer needed in MT_LIST_FOR_EACH_ENTRY_UNLOCKED()
      => just manually set iterator to NULL however.
    For MT_LIST_FOR_EACH_ENTRY_LOCKED()
      => mt_list_unlock_self() (if element going to be reused) + NULL

  - MT_LIST_LOCK_ELT => mt_list_lock_full()
  - MT_LIST_UNLOCK_ELT => mt_list_unlock_full()

  - l = MT_LIST_APPEND_LOCKED(h, e);  MT_LIST_UNLOCK_ELT();
    => l=mt_list_lock_prev(h); mt_list_lock_elem(e); mt_list_unlock_full(e, l)
2024-07-09 16:46:38 +02:00

669 lines
31 KiB
Plaintext

MT_LIST: multi-thread aware doubly-linked lists
Abstract
--------
mt_lists are a form of doubly-linked lists that support thread-safe standard
list operations such as insert / append / delete / pop, as well as a safe
iterator that supports deletion and concurrent use.
Principles
----------
The lists are designed to minimize contention in environments where elements
may be concurrently manipulated at different locations. The principle is to
act on the links between the elements instead of the elements themselves. This
is achieved by temporarily "cutting" these links, which effectively consists in
replacing the ends of the links with special pointers serving as a lock, called
MT_LIST_BUSY. An element is considered locked when both its next and prev
pointers are equal to this MT_LIST_BUSY pointer. A link is locked when both of
its ends are equal to this MT_LIST_BUSY pointer, i.e. the next pointer of the
element at the source of the link and the prev pointer of the element the link
points to. It's worth noting that a locked link by definition no longer exists
since neither end knows where it was pointing to, unless a backup of it was
made prior to locking it.
The next and prev pointers are replaced by the list manipulation functions
using atomic exchange. This means that the caller knows if the element it tries
to replace was already locked or if it owns it. In order to replace a link,
both ends of the link must be owned by the thread willing to replace it.
Similarly when adding or removing an element, both ends of the elements must be
owned by the thread trying to manipulate the element.
Appending or inserting elements comes in two flavors: the standard one which
considers that the element is already owned by the thread and ignores its
contents; this is the most common usage for a link that was just allocated or
extracted from a list. The second flavor doesn't trust the thread's ownership
of the element and tries to own it prior to adding the element; this may be
used when this element is a shared one that needs to be placed into a list.
Removing an element always consists in owning the two links surrounding it,
hence owning the 4 pointers.
Scanning the list consists in locking the element to (re)start from, locking
the link used to jump to the next element, then locking that element and
unlocking the previous one. All types of concurrency issues are supported
there, including elements disappearing while trying to lock them. It is
perfectly possible to have multiple threads scan the same list at the same
time, and it's usually efficient. However, if those threads face a single
contention point (e.g. pause on a locked element), they may then restart
working from the same point all at the same time and compete for the same links
and elements for each step, which will become less efficient. However, it does
work fine.
There's currently no support for shared locking (e.g. rwlocks), elements and
links are always exclusively locked. Since locks are attempted in a sequence,
this creates a nested lock pattern which could theoretically cause deadlocks
if adjacent elements were locked in parallel. This situation is handled using
a rollback mechanism: if any thread fails to lock any element or pointer, it
detects the conflict with another thread and entirely rolls back its operations
in order to let the other thread complete. This rollback is what aims at
guaranteeing forward progress. There is, however, a non-null risk that both
threads spend their time rolling back and trying again. This is covered using
exponential back-off that may grow to large enough values to let a thread lock
all the pointer it needs to complete an operation. Other mechanisms could be
implemented in the future such as rotating priorities or random lock numbers
to let both threads know which one must roll back and which one may continue.
Due to certain operations applying to the type of an element (iterator, element
retrieval), some parts do require macros. In order to avoid keeping too
confusing an API, all operations are made accessible via macros. However, in
order to ease maintenance and improve error reporting when facing unexpected
arguments, all the code parts that were compatible have been implemented as
inlinable functions instead. And in order to help with performance profiling,
it is possible to prevent the compiler from inlining all the functions that
may loop. As a rule of thumb, operations which only exist as macros do modify
one or more of their arguments.
All exposed functions are called "mt_list_something()", all exposed macros are
called "MT_LIST_SOMETHING()", possibly mapping 1-to-1 to the equivalent
function, and the list element type is called "mt_list".
Operations
----------
mt_list_append(el1, el2)
Adds el2 before el1, which means that if el1 is the list's head, el2 will
effectively be appended to the end of the list.
before:
+---+
|el2|
+---+
V
+---+ +---+ +---+ +---+ +---+ +---+
#=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+ +---+
#=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<===>|el2|<=#
# +---+ +---+ +---+ +---+ +---+ +---+ +---+ #
#=====================================================================#
mt_list_try_append(el1, el2)
Tries to add el2 before el1, which means that if el1 is the list's head,
el2 will effectively be appended to the end of the list. el2 will only be
added if it's deleted (loops over itself). The operation will return zero if
this is not the case (el2 is not empty anymore) or non-zero on success.
before:
#=========#
# +---+ #
#=>|el2|<=#
+---+
V
+---+ +---+ +---+ +---+ +---+ +---+
#=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+ +---+
#=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<===>|el2|<=#
# +---+ +---+ +---+ +---+ +---+ +---+ +---+ #
#=====================================================================#
mt_list_insert(el1, el2)
Adds el2 after el1, which means that if el1 is the list's head, el2 will
effectively be insert at the beginning of the list.
before:
+---+
|el2|
+---+
V
+---+ +---+ +---+ +---+ +---+ +---+
#=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+ +---+
#=>|el1|<===>|el2|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ +---+ #
#=====================================================================#
mt_list_try_insert(el1, el2)
Tries to add el2 after el1, which means that if el1 is the list's head,
el2 will effectively be inserted at the beginning of the list. el2 will only
be added if it's deleted (loops over itself). The operation will return zero
if this is not the case (el2 is not empty anymore) or non-zero on success.
before:
#=========#
# +---+ #
#=>|el2|<=#
+---+
V
+---+ +---+ +---+ +---+ +---+ +---+
#=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+ +---+
#=>|el1|<===>|el2|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ +---+ #
#=====================================================================#
mt_list_delete(el1)
Removes el1 from the list, and marks it as deleted, wherever it is. If
the element was already not part of a list anymore, 0 is returned,
otherwise non-zero is returned if the operation could be performed.
before:
+---+ +---+ +---+ +---+ +---+ +---+ +---+
#=>| A |<===>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ +---+ #
#=====================================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
+---+
#=>|el1|<=#
# +---+ #
#=========#
mt_list_behead(l)
Detaches a list of elements from its head with the aim of reusing them to
do anything else. The head will be turned to an empty list, and the list
will be partially looped: the first element's prev will point to the last
one, and the last element's next will be NULL. The pointer to the first
element is returned, or NULL if the list was empty. This is essentially
used when recycling lists of unused elements, or to grab a lot of elements
at once for local processing. It is safe to be run concurrently with the
insert/append operations performed at the list's head, but not against
modifications performed at any other place, such as delete operation.
before:
+---+ +---+ +---+ +---+ +---+ +---+ +---+
#=>| L |<===>| A |<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ +---+ #
#=====================================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+ +---+
#=>| L |<=# ,--| A |<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<-.
# +---+ # | +---+ +---+ +---+ +---+ +---+ +---+ |
#=========# `-----------------------------------------------------------'
mt_list_pop(l)
Removes the list's first element, returns it deleted. If the list was empty,
NULL is returned. When combined with mt_list_append() this can be used to
implement MPMC queues for example. A macro MT_LIST_POP() is provided for a
more convenient use; instead of returning the list element, it will return
the structure holding the element, taking care of preserving the NULL.
before:
+---+ +---+ +---+ +---+ +---+ +---+ +---+
#=>| L |<===>| A |<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ +---+ #
#=====================================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+
#=>| L |<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
+---+
#=>| A |<=#
# +---+ #
#=========#
_mt_list_lock_next(elt)
Locks the link that starts at the next pointer of the designated element.
The link is replaced by two locked pointers, and a pointer to the next
element is returned. The link must then be unlocked using
_mt_list_unlock_next() passing it this pointer, or mt_list_unlock_link().
This function is not intended to be used by applications, and makes certain
assumptions about the state of the list pertaining to its use in iterators.
before:
+---+ +---+ +---+ +---+ +---+ +---+
#=>|elt|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+
#=>|elt|x x| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
Return
value: &B
_mt_list_unlock_next(elt, back)
Unlocks the link that starts at the next pointer of the designated element
and is supposed to end at <back>. This function is not intended to be used
by applications, and makes certain assumptions about the state of the list
pertaining to its use in iterators.
before: back
\
+---+ +---+ +---+ +---+ +---+ +---+
#=>|elt|x x| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+
#=>|elt|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
_mt_list_lock_prev(elt)
Locks the link that starts at the prev pointer of the designated element.
The link is replaced by two locked pointers, and a pointer to the prev
element is returned. The link must then be unlocked using
_mt_list_unlock_prev() passing it this pointer, or mt_list_unlock_link().
This function is not intended to be used by applications, and makes certain
assumptions about the state of the list pertaining to its use in iterators.
before:
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |<===>|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |x x|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
Return
value: &A
_mt_list_unlock_prev(elt, back)
Unlocks the link that starts at the prev pointer of the designated element
and is supposed to end at <back>. This function is not intended to be used
by applications, and makes certain assumptions about the state of the list
pertaining to its use in iterators.
before: back
/
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |x x|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |<===>|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
mt_list_lock_next(elt)
Cuts the list after the specified element. The link is replaced by two
locked pointers, and is returned as a list element. The list must then
be unlocked using mt_list_unlock_link() or mt_list_unlock_full() applied
to the returned list element.
before:
+---+ +---+ +---+ +---+ +---+ +---+
#=>|elt|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+
#=>|elt|x x| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
Return elt B
value: <===>
mt_list_lock_prev(elt)
Cuts the list before the specified element. The link is replaced by two
locked pointers, and is returned as a list element. The list must then
be unlocked using mt_list_unlock_link() or mt_list_unlock_full() applied
to the returned list element.
before:
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |<===>|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |x x|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
Return A elt
value: <===>
mt_list_lock_elem(elt)
Locks the element only. Both of its pointers are replaced by two locked
pointers, and the previous ones are returned as a list element. It's not
possible to remove such an element from a list since neighbors are not
locked. The sole purpose of this operation is to prevent another thread
from visiting this element during an operation. The element must then be
unlocked using mt_list_unlock_elem() applied to the returned element.
before:
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |<===>|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |=> x|elt|x <=| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
Return A C
value: <===>
mt_list_unlock_elem(elt, ends)
Unlocks the element only by restoring its backed up contents from <ends>,
as returned by a previous call to mt_list_lock_elem(elt). The ends of the
links are not affected, only the element is touched. This is intended to
terminate a critical section started by a call to mt_list_lock_elem(). It
may also be used on a fully locked element processed by mt_list_lock_full()
in which case it will leave the list still locked.
before:
A C
ends: <===>
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |=> x|elt|x <=| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |<===>|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
before:
A C
ends: <===>
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |x x|elt|x x| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |x <=|elt|=> x| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
mt_list_unlock_self(elt)
Unlocks the element only by resetting it (i.e. making it loop over itself).
This is useful in the locked variant of iterators when the element is to be
removed from the list and first needs to be unlocked because it's shared
with other operations (such as a concurrent attempt to delete it from a
list), or simply in case it is to be recycled in a usable state. The ends
of the links are not affected, only the element is touched. This is
normally only used from within locked iterators, which perform a full lock
(both links are locked).
before:
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |x x|elt|x x| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+
#=>|elt|<=# #=>| A |x x| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ # # +---+ +---+ +---+ +---+ +---+ #
#=========# #=================================================#
mt_list_lock_full(elt)
Locks both the element and its surrounding links. The extremities of the
previous links are returned as a single list element (which corresponds to
the element's before locking). The list must then be unlocked using
mt_list_unlock_full() to reconnect the element to the list and unlock
both, or mt_list_unlock_link() to effectively remove the element.
before:
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |<===>|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |x x|elt|x x| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
Return A C
value: <=============>
mt_list_unlock_link(ends)
Connects two ends in a list together, effectively unlocking the list if it
was locked. It takes a list head which contains a pointer to the prev and
next elements to connect together. It normally is a copy of a previous link
returned by functions such as mt_list_lock_next(), mt_list_lock_prev(), or
mt_list_lock_full(). If applied after mt_list_lock_full(), it will result
in the list being reconnected without the element, which remains locked,
effectively deleting it. Note that this is not meant to be used from within
iterators, as the iterator will automatically and safely reconnect ends
after each iteration.
before:
A C
Ends: <===>
+---+ +---+ +---+ +---+ +---+
#=>| A |x x| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ #
#=================================================#
after:
+---+ +---+ +---+ +---+ +---+
#=>| A |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ #
#=================================================#
mt_list_unlock_full(elt, ends)
Connects the specified element to the elements pointed to by the specified
<ends>, which is a backup copy of the previous list member of the element
prior to locking it using mt_list_lock_full() or mt_list_lock_elem(). This
is normally used to unlock an element and a list, but may also be used to
manually insert an element into an opened list (which should still be
locked). The element's list member is technically assigned a copy of <ends>
and both sides point to the element. This must not be used inside an
iterator as it would also unlock the list itself and make the loop visit
nodes in an unknown state.
before:
+---+
elt: x|elt|x
+---+
A C
ends: <=============>
+---+ +---+ +---+ +---+ +---+
#=>| A |x x| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |<===>|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
MT_LIST_FOR_EACH_ENTRY_LOCKED(item, list_head, member, back)
Iterates <item> through a list of items of type "typeof(*item)" which are
linked via a "struct mt_list" member named <member>. A pointer to the head
of the list is passed in <list_head>. <back> is a temporary struct mt_list,
used internally. It contains a copy of the contents of the current item's
list member before locking it. This macro is implemented using two nested
loops, each defined as a separate macro for easier inspection. The inner
loop will run for each element in the list, and the outer loop will run
only once to do some cleanup and unlocking when the end of the list is
reached or user breaks from inner loop. It is safe to break from this macro
as the cleanup will be performed anyway, but it is strictly forbidden to
branch (goto or return) from the loop because skipping the cleanup will
lead to undefined behavior. During the scan of the list, the item has both
of its links locked, so concurrent operations on the list are safe. However
the thread holding the list locked must be careful not to perform other
locking operations. In order to remove the current element, setting <item>
to NULL is sufficient to make the inner loop not try to re-attach it. It is
recommended to reinitialize it though if it is expected to be reused, so as
not to leave its pointers locked. Same if other threads are trying to
concurrently operate on the element.
From within the loop, the list looks like this:
MT_LIST_FOR_EACH_ENTRY_LOCKED(item, lh, list, back) {
// A C
// back: <=============>
// item->list
// +---+ +---+ +-V-+ +---+ +---+ +---+
// #=>|lh |<===>| A |x x| |x x| C |<===>| D |<===>| E |<=#
// # +---+ +---+ +---+ +---+ +---+ +---+ #
// #===========================================================#
}
This means that only the current item as well as its two neighbors are
locked. It is thus possible to act on any other part of the list in
parallel (other threads might have begun slightly earlier). However if
a thread is too slow to proceed, other threads may quickly reach its
position, and all of them will then wait on the same element, slowing
down the progress.
MT_LIST_FOR_EACH_ENTRY_UNLOCKED(item, list_head, member, back)
Iterates <item> through a list of items of type "typeof(*item)" which are
linked via a "struct mt_list" member named <member>. A pointer to the head
of the list is passed in <list_head>. <back> is a temporary struct mt_list,
used internally. It contains a copy of the contents of the current item's
list member before resetting it. This macro is implemented using two nested
loops, each defined as a separate macro for easier inspection. The inner
loop will run for each element in the list, and the outer loop will run
only once to do some cleanup and unlocking when the end of the list is
reached or user breaks from inner loop. It is safe to break from this macro
as the cleanup will be performed anyway, but it is strictly forbidden to
branch (goto or return) from the loop because skipping the cleanup will
lead to undefined behavior. During the scan of the list, the item has both
of its neighbours locked, with both of its ends pointing to itself. Thus,
concurrent walks on the list are safe, but not direct accesses to the
element. In order to remove the current element, setting <item> to NULL is
sufficient to make the inner loop not try to re-attach it. There is no need
to reinitialize it since it is already done. If the element is left, it will
be re-attached to the list. This version is meant as a more user-friendly
method to walk over a list in which it is known by design that elements are
not directly accessed (e.g. a pure MPMC queue). The typical pattern which
corresponds to this case is when the first operation in the iterator's body
is a call to unlock the iterator, which is then no longer needed (though
harmless).
From within the loop, the list looks like this:
MT_LIST_FOR_EACH_ENTRY_UNLOCKED(item, lh, list, back) {
// back: A C
// item->list <===>
// +-V-+ +---+ +---+ +---+ +---+ +---+
// #>| |<# #=>|lh |<===>| A |x x| C |<===>| D |<===>| E |<=#
// # +---+ # # +---+ +---+ +---+ +---+ +---+ #
// #=======# #=================================================#
}
This means that only the current item's neighbors are locked. It is thus
possible to act on any other part of the list in parallel (other threads
might have begun slightly earlier) but not on the element. However if a
thread is too slow to proceed, other threads may quickly reach its
position, and all of them will then wait on the same element, slowing down
the progress.
Examples
--------
The example below collects up to 50 jobs from a shared list that are compatible
with the current thread, and moves them to a local list for later processing.
The same pointers are used for both lists and placed in an anonymous union.
struct job {
union {
struct list list;
struct mt_list mt_list;
};
unsigned long thread_mask; /* 1 bit per eligible thread */
/* struct-specific stuff below */
...
};
extern struct mt_list global_job_queue;
extern struct list local_job_queue;
struct mt_list back;
struct job *item;
int budget = 50;
/* collect up to 50 shared items */
MT_LIST_FOR_EACH_ENTRY_LOCKED(item, &global_job_queue, mt_list, back) {
if (!(item->thread_mask & current_thread_bit))
continue; /* job not eligible for this thread */
LIST_APPEND(&local_job_queue, &item->list);
item = NULL;
if (!--budget)
break;
}
/* process extracted items */
LIST_FOR_EACH(item, &local_job_queue, list) {
...
}