From c618ed5ff41ce29454e784c610b23bad0ea21f4f Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Wed, 13 Sep 2023 08:53:48 +0200 Subject: [PATCH] MAJOR: import: update mt_list to support exponential back-off The new mt_list code supports exponential back-off on conflict, which is important for use cases where there is contention on a large number of threads. The API evolved a little bit and required some updates: - mt_list_for_each_entry_safe() is now in upper case to explicitly show that it is a macro, and only uses the back element, doesn't require a secondary pointer for deletes anymore. - MT_LIST_DELETE_SAFE() doesn't exist anymore, instead one just has to set the list iterator to NULL so that it is not re-inserted into the list and the list is spliced there. One must be careful because it was usually performed before freeing the element. Now instead the element must be nulled before the continue/break. - MT_LIST_LOCK_ELT() and MT_LIST_UNLOCK_ELT() have always been unclear. They were replaced by mt_list_cut_around() and mt_list_connect_elem() which more explicitly detach the element and reconnect it into the list. - MT_LIST_APPEND_LOCKED() was only in haproxy so it was left as-is in list.h. It may however possibly benefit from being upstreamed. This required tiny adaptations to event_hdl.c and quic_sock.c. The test case was updated and the API doc added. Note that in order to keep include files small, the struct mt_list definition remains in list-t.h (par of the internal API) and was ifdef'd out in mt_list.h. A test on QUIC with both quictls 1.1.1 and wolfssl 5.6.3 on ARM64 with 80 threads shows a drastic reduction of CPU usage thanks to this and the refined memory barriers. Please note that the CPU usage on OpenSSL 3.0.9 is significantly higher due to the excessive use of atomic ops by openssl, but 3.1 is only slightly above 1.1.1 though: - before: 35 Gbps, 3.5 Mpps, 7800% CPU - after: 41 Gbps, 4.2 Mpps, 2900% CPU --- doc/internals/api/mt_list.txt | 435 ++++++++++++++ include/haproxy/list.h | 687 ++-------------------- include/import/mt_list.h | 1040 +++++++++++++++++++++++++++++++++ src/event_hdl.c | 81 ++- src/hlua_fcn.c | 4 +- src/quic_sock.c | 6 +- src/server.c | 4 +- tests/unit/test-list.c | 42 +- 8 files changed, 1584 insertions(+), 715 deletions(-) create mode 100644 doc/internals/api/mt_list.txt create mode 100644 include/import/mt_list.h diff --git a/doc/internals/api/mt_list.txt b/doc/internals/api/mt_list.txt new file mode 100644 index 0000000000..6bce37f94f --- /dev/null +++ b/doc/internals/api/mt_list.txt @@ -0,0 +1,435 @@ +MT_LIST: multi-thread aware doubly-linked lists + +Abstract +-------- + +mt_lists are a form of doubly-linked lists that support thread-safe standard +list operations such as insert / append / delete / pop, as well as a safe +iterator that supports deletion and concurrent use. + +Principles +---------- + +The lists are designed to minimize contention in environments where elements +may be concurrently manipulated at different locations. The principle is to +act on the links between the elements instead of the elements themselves. This +is achieved by temporarily "cutting" these links, which effectively consists in +replacing the ends of the links with special pointers serving as a lock, called +MT_LIST_BUSY. An element is considered locked when either its next or prev +pointer is equal to this MT_LIST_BUSY pointer (or both). + +The next and prev pointers are replaced by the list manipulation functions +using atomic exchange. This means that the caller knows if the element it tries +to replace was already locked or if it owns it. In order to replace a link, +both ends of the link must be owned by the thread willing to replace it. +Similarly when adding or removing an element, both ends of the elements must be +owned by the thread trying to manipulate the element. + +Appending or inserting elements comes in two flavors: the standard one which +considers that the element is already owned by the thread and ignores its +contents; this is the most common usage for a link that was just allocated or +extracted from a list. The second flavor doesn't trust the thread's ownership +of the element and tries to own it prior to adding the element; this may be +used when this element is a shared one that needs to be placed into a list. + +Removing an element always consists in owning the two links surrounding it, +hence owning the 4 pointers. + +Scanning the list consists in locking the element to (re)start from, locking +the link used to jump to the next element, then locking that element and +unlocking the previous one. All types of concurrency issues are supported +there, including elements disappearing while trying to lock them. It is +perfectly possible to have multiple threads scan the same list at the same +time, and it's usually efficient. However, if those threads face a single +contention point (e.g. pause on a locked element), they may then restart +working from the same point all at the same time and compete for the same links +and elements for each step, which will become less efficient. However, it does +work fine. + +There's currently no support for shared locking (e.g. rwlocks), elements and +links are always exclusively locked. Since locks are attempted in a sequence, +this creates a nested lock pattern which could theoretically cause deadlocks +if adjacent elements were locked in parallel. This situation is handled using +a rollback mechanism: if any thread fails to lock any element or pointer, it +detects the conflict with another thread and entirely rolls back its operations +in order to let the other thread complete. This rollback is what aims at +guaranteeing forward progress. There is, however, a non-null risk that both +threads spend their time rolling back and trying again. This is covered using +exponential back-off that may grow to large enough values to let a thread lock +all the pointer it needs to complete an operation. Other mechanisms could be +implemented in the future such as rotating priorities or random lock numbers +to let both threads know which one must roll back and which one may continue. + +Due to certain operations applying to the type of an element (iterator, element +retrieval), some parts do require macros. In order to avoid keeping too +confusing an API, all operations are made accessible via macros. However, in +order to ease maintenance and improve error reporting when facing unexpected +arguments, all the code parts that were compatible have been implemented as +inlinable functions instead. And in order to help with performance profiling, +it is possible to prevent the compiler from inlining all the functions that +may loop. As a rule of thumb, operations which only exist as macros do modify +one or more of their arguments. + +All exposed functions are called "mt_list_something()", all exposed macros are +called "MT_LIST_SOMETHING()", possibly mapping 1-to-1 to the equivalent +function, and the list element type is called "mt_list". + + +Operations +---------- + +mt_list_append(el1, el2) + Adds el2 before el1, which means that if el1 is the list's head, el2 will + effectively be appended to the end of the list. + + before: + +---+ + |el2| + +---+ + V + +---+ +---+ +---+ +---+ +---+ +---+ + #=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# + # +---+ +---+ +---+ +---+ +---+ +---+ # + #===========================================================# + + after: + +---+ +---+ +---+ +---+ +---+ +---+ +---+ + #=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<===>|el2|<=# + # +---+ +---+ +---+ +---+ +---+ +---+ +---+ # + #=====================================================================# + + +mt_list_try_append(el1, el2) + Tries to add el2 before el1, which means that if el1 is the list's head, + el2 will effectively be appended to the end of the list. el2 will only be + added if it's deleted (loops over itself). The operation will return zero if + this is not the case (el2 is not empty anymore) or non-zero on success. + + before: + #=========# + # +---+ # + #=>|el2|<=# + +---+ + V + +---+ +---+ +---+ +---+ +---+ +---+ + #=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# + # +---+ +---+ +---+ +---+ +---+ +---+ # + #===========================================================# + + after: + +---+ +---+ +---+ +---+ +---+ +---+ +---+ + #=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<===>|el2|<=# + # +---+ +---+ +---+ +---+ +---+ +---+ +---+ # + #=====================================================================# + + +mt_list_insert(el1, el2) + Adds el2 after el1, which means that if el1 is the list's head, el2 will + effectively be insert at the beginning of the list. + + before: + +---+ + |el2| + +---+ + V + +---+ +---+ +---+ +---+ +---+ +---+ + #=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# + # +---+ +---+ +---+ +---+ +---+ +---+ # + #===========================================================# + + after: + +---+ +---+ +---+ +---+ +---+ +---+ +---+ + #=>|el1|<===>|el2|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# + # +---+ +---+ +---+ +---+ +---+ +---+ +---+ # + #=====================================================================# + + +mt_list_try_insert(el1, el2) + Tries to add el2 after el1, which means that if el1 is the list's head, + el2 will effectively be inserted at the beginning of the list. el2 will only + be added if it's deleted (loops over itself). The operation will return zero + if this is not the case (el2 is not empty anymore) or non-zero on success. + + before: + #=========# + # +---+ # + #=>|el2|<=# + +---+ + V + +---+ +---+ +---+ +---+ +---+ +---+ + #=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# + # +---+ +---+ +---+ +---+ +---+ +---+ # + #===========================================================# + + after: + +---+ +---+ +---+ +---+ +---+ +---+ +---+ + #=>|el1|<===>|el2|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# + # +---+ +---+ +---+ +---+ +---+ +---+ +---+ # + #=====================================================================# + + +mt_list_delete(el1) + Removes el1 from the list, and marks it as deleted, wherever it is. If + the element was already not part of a list anymore, 0 is returned, + otherwise non-zero is returned if the operation could be performed. + + before: + +---+ +---+ +---+ +---+ +---+ +---+ +---+ + #=>| A |<===>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# + # +---+ +---+ +---+ +---+ +---+ +---+ +---+ # + #=====================================================================# + + after: + +---+ +---+ +---+ +---+ +---+ +---+ + #=>| A |<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# + # +---+ +---+ +---+ +---+ +---+ +---+ # + #===========================================================# + + +---+ + #=>|el1|<=# + # +---+ # + #=========# + + +mt_list_behead(l) + Detaches a list of elements from its head with the aim of reusing them to + do anything else. The head will be turned to an empty list, and the list + will be partially looped: the first element's prev will point to the last + one, and the last element's next will be NULL. The pointer to the first + element is returned, or NULL if the list was empty. This is essentially + used when recycling lists of unused elements, or to grab a lot of elements + at once for local processing. It is safe to be run concurrently with the + insert/append operations performed at the list's head, but not against + modifications performed at any other place, such as delete operation. + + before: + +---+ +---+ +---+ +---+ +---+ +---+ +---+ + #=>| L |<===>| A |<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# + # +---+ +---+ +---+ +---+ +---+ +---+ +---+ # + #=====================================================================# + + after: + +---+ +---+ +---+ +---+ +---+ +---+ +---+ + #=>| L |<=# ,--| A |<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<-. + # +---+ # | +---+ +---+ +---+ +---+ +---+ +---+ | + #=========# `-----------------------------------------------------------' + + +mt_list_pop(l) + Removes the list's first element, returns it deleted. If the list was empty, + NULL is returned. When combined with mt_list_append() this can be used to + implement MPMC queues for example. A macro MT_LIST_POP() is provided for a + more convenient use; instead of returning the list element, it will return + the structure holding the element, taking care of preserving the NULL. + + before: + +---+ +---+ +---+ +---+ +---+ +---+ +---+ + #=>| L |<===>| A |<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# + # +---+ +---+ +---+ +---+ +---+ +---+ +---+ # + #=====================================================================# + + after: + +---+ +---+ +---+ +---+ +---+ +---+ + #=>| L |<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# + # +---+ +---+ +---+ +---+ +---+ +---+ # + #===========================================================# + + +---+ + #=>| A |<=# + # +---+ # + #=========# + + +mt_list_cut_after(elt) + Cuts the list after the specified element. The link is replaced by two + locked pointers, and is returned as a list element. The list must then + be unlocked using mt_list_reconnect() or mt_list_connect_elem() applied + to the returned list element. + + before: + +---+ +---+ +---+ +---+ +---+ +---+ + #=>|elt|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# + # +---+ +---+ +---+ +---+ +---+ +---+ # + #===========================================================# + + after: + +---+ +---+ +---+ +---+ +---+ +---+ + #=>|elt|x x| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# + # +---+ +---+ +---+ +---+ +---+ +---+ # + #===========================================================# + + Return elt B + value: <===> + + +mt_list_cut_before(elt) + Cuts the list before the specified element. The link is replaced by two + locked pointers, and is returned as a list element. The list must then + be unlocked using mt_list_reconnect() or mt_list_connect_elem() applied + to the returned list element. + + before: + +---+ +---+ +---+ +---+ +---+ +---+ + #=>| A |<===>|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=# + # +---+ +---+ +---+ +---+ +---+ +---+ # + #===========================================================# + + after: + +---+ +---+ +---+ +---+ +---+ +---+ + #=>| A |x x|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=# + # +---+ +---+ +---+ +---+ +---+ +---+ # + #===========================================================# + + Return A elt + value: <===> + + +mt_list_cut_around(elt) + Cuts the list both before and after the specified element. Both the list + and the element's pointers are locked. The extremities of the previous + links are returned as a single list element (which corresponds to the + element's before locking). The list must then be unlocked using + mt_list_connect_elem() to reconnect the element to the list and unlock + both, or mt_list_reconnect() to effectively remove the element. + + before: + +---+ +---+ +---+ +---+ +---+ +---+ + #=>| A |<===>|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=# + # +---+ +---+ +---+ +---+ +---+ +---+ # + #===========================================================# + + after: + +---+ +---+ +---+ +---+ +---+ +---+ + #=>| A |x x|elt|x x| C |<===>| D |<===>| E |<===>| F |<=# + # +---+ +---+ +---+ +---+ +---+ +---+ # + #===========================================================# + + Return A C + value: <=============> + + +mt_list_reconnect(ends) + Connect both ends of the specified locked list as if they were linked + together, and unlocks the list. This can complete an element removal + operation that was started using mt_list_cut_around(), or can restore a + list that was temporarily locked by mt_list_cut_{after,before}(). + + before: + A C + Ends: <===> + + +---+ +---+ +---+ +---+ +---+ + #=>| A |x x| C |<===>| D |<===>| E |<===>| F |<=# + # +---+ +---+ +---+ +---+ +---+ # + #=================================================# + + after: + +---+ +---+ +---+ +---+ +---+ + #=>| A |<===>| C |<===>| D |<===>| E |<===>| F |<=# + # +---+ +---+ +---+ +---+ +---+ # + #=================================================# + + +mt_list_connect_elem(elt,ends) + Connects the specified element to the elements pointed to by the specified + ends. This can be used to insert an element into a previously locked and + cut list, or to restore a list as it was before mt_list_cut_around(elt). + The element's list part is effectively replaced by the contents of the + ends. + + before: + +---+ + elt: x|elt|x + +---+ + A C + ends: <=============> + + +---+ +---+ +---+ +---+ +---+ + #=>| A |x x| C |<===>| D |<===>| E |<===>| F |<=# + # +---+ +---+ +---+ +---+ +---+ # + #===========================================================# + + after: + +---+ +---+ +---+ +---+ +---+ +---+ + #=>| A |<===>|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=# + # +---+ +---+ +---+ +---+ +---+ +---+ # + #===========================================================# + + +MT_LIST_FOR_EACH_ENTRY_SAFE(item, list_head, member, back) + Iterates through a list of items of type "typeof(*item)" which are + linked via a "struct mt_list" member named . A pointer to the head + of the list is passed in . is a temporary struct mt_list, + used internally. It contains a copy of the contents of the current item's + list member before locking it. This macro is implemented using two nested + loops, each defined as a separate macro for easier inspection. The inner + loop will run for each element in the list, and the outer loop will run + only once to do some cleanup and unlocking when the end of the list is + reached or user breaks from inner loop. It is safe to break from this macro + as the cleanup will be performed anyway, but it is strictly forbidden to + branch (goto or return) from the loop because skipping the cleanup will + lead to undefined behavior. During the scan of the list, the item is locked + thus disconnected and the list locked around it, so concurrent operations + on the list are safe. However the thread holding the list locked must be + careful not to perform other locking operations. In order to remove the + current element, setting to NULL is sufficient to make the inner + loop not try to re-attach it. It is recommended to reinitialize it though + if it is expected to be reused, so as not to leave its pointers locked. + + From within the loop, the list looks like this: + + MT_LIST_FOR_EACH_ENTRY_SAFE(item, lh, list, back) { + // A C + // back: <=============> + // item->list + // +---+ +---+ +-V-+ +---+ +---+ +---+ + // #=>|lh |<===>| A |x x| |x x| C |<===>| D |<===>| E |<=# + // # +---+ +---+ +---+ +---+ +---+ +---+ # + // #===========================================================# + } + + This means that only the current item as well as its two neighbors are + locked. It is thus possible to act on any other part of the list in + parallel (other threads might have begun slightly earlier). However if + a thread is too slow to proceed, other threads may quickly reach its + position, and all of them will then wait on the same element, slowing + down the progress. + +Examples +-------- + +The example below collects up to 50 jobs from a shared list that are compatible +with the current thread, and moves them to a local list for later processing. +The same pointers are used for both lists and placed in an anonymous union. + + struct job { + union { + struct list list; + struct mt_list mt_list; + }; + unsigned long thread_mask; /* 1 bit per eligible thread */ + /* struct-specific stuff below */ + ... + }; + + extern struct mt_list global_job_queue; + extern struct list local_job_queue; + + struct mt_list back; + struct job *item; + int budget = 50; + + /* collect up to 50 shared items */ + MT_LIST_FOR_EACH_ENTRY_SAFE(item, &global_job_queue, mt_list, back) { + if (!(item->thread_mask & current_thread_bit)) + continue; /* job not eligible for this thread */ + LIST_APPEND(&local_job_queue, &item->list); + item = NULL; + if (!--budget) + break; + } + + /* process extracted items */ + LIST_FOR_EACH(item, &local_job_queue, list) { + ... + } diff --git a/include/haproxy/list.h b/include/haproxy/list.h index 368e6d76b8..9992195672 100644 --- a/include/haproxy/list.h +++ b/include/haproxy/list.h @@ -24,6 +24,7 @@ #include #include +#include /* First undefine some macros which happen to also be defined on OpenBSD, * in sys/queue.h, used by sys/event.h @@ -229,658 +230,6 @@ &item->member != (list_head); \ item = back, back = LIST_ELEM(back->member.p, typeof(back), member)) - -/* - * Locked version of list manipulation macros. - * It is OK to use those concurrently from multiple threads, as long as the - * list is only used with the locked variants. - */ -#define MT_LIST_BUSY ((struct mt_list *)1) - -/* - * Add an item at the beginning of a list. - * Returns 1 if we added the item, 0 otherwise (because it was already in a - * list). - */ -#define MT_LIST_TRY_INSERT(_lh, _el) \ - ({ \ - int _ret = 0; \ - struct mt_list *lh = (_lh), *el = (_el); \ - for (;;__ha_cpu_relax()) { \ - struct mt_list *n, *n2; \ - struct mt_list *p, *p2; \ - n = _HA_ATOMIC_XCHG(&(lh)->next, MT_LIST_BUSY); \ - if (n == MT_LIST_BUSY) \ - continue; \ - p = _HA_ATOMIC_XCHG(&n->prev, MT_LIST_BUSY); \ - if (p == MT_LIST_BUSY) { \ - (lh)->next = n; \ - __ha_barrier_store(); \ - continue; \ - } \ - n2 = _HA_ATOMIC_XCHG(&el->next, MT_LIST_BUSY); \ - if (n2 != el) { /* element already linked */ \ - if (n2 != MT_LIST_BUSY) \ - el->next = n2; \ - n->prev = p; \ - __ha_barrier_store(); \ - lh->next = n; \ - __ha_barrier_store(); \ - if (n2 == MT_LIST_BUSY) \ - continue; \ - break; \ - } \ - p2 = _HA_ATOMIC_XCHG(&el->prev, MT_LIST_BUSY); \ - if (p2 != el) { \ - if (p2 != MT_LIST_BUSY) \ - el->prev = p2; \ - n->prev = p; \ - el->next = el; \ - __ha_barrier_store(); \ - lh->next = n; \ - __ha_barrier_store(); \ - if (p2 == MT_LIST_BUSY) \ - continue; \ - break; \ - } \ - (el)->next = n; \ - (el)->prev = p; \ - __ha_barrier_store(); \ - n->prev = (el); \ - __ha_barrier_store(); \ - p->next = (el); \ - __ha_barrier_store(); \ - _ret = 1; \ - break; \ - } \ - (_ret); \ - }) - -/* - * Add an item at the end of a list. - * Returns 1 if we added the item, 0 otherwise (because it was already in a - * list). - */ -#define MT_LIST_TRY_APPEND(_lh, _el) \ - ({ \ - int _ret = 0; \ - struct mt_list *lh = (_lh), *el = (_el); \ - for (;;__ha_cpu_relax()) { \ - struct mt_list *n, *n2; \ - struct mt_list *p, *p2; \ - p = _HA_ATOMIC_XCHG(&(lh)->prev, MT_LIST_BUSY); \ - if (p == MT_LIST_BUSY) \ - continue; \ - n = _HA_ATOMIC_XCHG(&p->next, MT_LIST_BUSY); \ - if (n == MT_LIST_BUSY) { \ - (lh)->prev = p; \ - __ha_barrier_store(); \ - continue; \ - } \ - p2 = _HA_ATOMIC_XCHG(&el->prev, MT_LIST_BUSY); \ - if (p2 != el) { \ - if (p2 != MT_LIST_BUSY) \ - el->prev = p2; \ - p->next = n; \ - __ha_barrier_store(); \ - lh->prev = p; \ - __ha_barrier_store(); \ - if (p2 == MT_LIST_BUSY) \ - continue; \ - break; \ - } \ - n2 = _HA_ATOMIC_XCHG(&el->next, MT_LIST_BUSY); \ - if (n2 != el) { /* element already linked */ \ - if (n2 != MT_LIST_BUSY) \ - el->next = n2; \ - p->next = n; \ - el->prev = el; \ - __ha_barrier_store(); \ - lh->prev = p; \ - __ha_barrier_store(); \ - if (n2 == MT_LIST_BUSY) \ - continue; \ - break; \ - } \ - (el)->next = n; \ - (el)->prev = p; \ - __ha_barrier_store(); \ - p->next = (el); \ - __ha_barrier_store(); \ - n->prev = (el); \ - __ha_barrier_store(); \ - _ret = 1; \ - break; \ - } \ - (_ret); \ - }) - -/* - * Add an item at the beginning of a list. - * It is assumed the element can't already be in a list, so it isn't checked. - */ -#define MT_LIST_INSERT(_lh, _el) \ - ({ \ - int _ret = 0; \ - struct mt_list *lh = (_lh), *el = (_el); \ - for (;;__ha_cpu_relax()) { \ - struct mt_list *n; \ - struct mt_list *p; \ - n = _HA_ATOMIC_XCHG(&(lh)->next, MT_LIST_BUSY); \ - if (n == MT_LIST_BUSY) \ - continue; \ - p = _HA_ATOMIC_XCHG(&n->prev, MT_LIST_BUSY); \ - if (p == MT_LIST_BUSY) { \ - (lh)->next = n; \ - __ha_barrier_store(); \ - continue; \ - } \ - (el)->next = n; \ - (el)->prev = p; \ - __ha_barrier_store(); \ - n->prev = (el); \ - __ha_barrier_store(); \ - p->next = (el); \ - __ha_barrier_store(); \ - _ret = 1; \ - break; \ - } \ - (_ret); \ - }) - -/* - * Add an item at the end of a list. - * It is assumed the element can't already be in a list, so it isn't checked - */ -#define MT_LIST_APPEND(_lh, _el) \ - ({ \ - int _ret = 0; \ - struct mt_list *lh = (_lh), *el = (_el); \ - for (;;__ha_cpu_relax()) { \ - struct mt_list *n; \ - struct mt_list *p; \ - p = _HA_ATOMIC_XCHG(&(lh)->prev, MT_LIST_BUSY); \ - if (p == MT_LIST_BUSY) \ - continue; \ - n = _HA_ATOMIC_XCHG(&p->next, MT_LIST_BUSY); \ - if (n == MT_LIST_BUSY) { \ - (lh)->prev = p; \ - __ha_barrier_store(); \ - continue; \ - } \ - (el)->next = n; \ - (el)->prev = p; \ - __ha_barrier_store(); \ - p->next = (el); \ - __ha_barrier_store(); \ - n->prev = (el); \ - __ha_barrier_store(); \ - _ret = 1; \ - break; \ - } \ - (_ret); \ - }) - -/* - * Add an item at the end of a list. - * It is assumed the element can't already be in a list, so it isn't checked - * Item will be added in busy/locked state, so that it is already - * referenced in the list but no other thread can use it until we're ready. - * - * This returns a struct mt_list, that will be needed at unlock time. - * (using MT_LIST_UNLOCK_ELT) - */ -#define MT_LIST_APPEND_LOCKED(_lh, _el) \ - ({ \ - struct mt_list np; \ - struct mt_list *lh = (_lh), *el = (_el); \ - (el)->next = MT_LIST_BUSY; \ - (el)->prev = MT_LIST_BUSY; \ - for (;;__ha_cpu_relax()) { \ - struct mt_list *n; \ - struct mt_list *p; \ - p = _HA_ATOMIC_XCHG(&(lh)->prev, MT_LIST_BUSY); \ - if (p == MT_LIST_BUSY) \ - continue; \ - n = _HA_ATOMIC_XCHG(&p->next, MT_LIST_BUSY); \ - if (n == MT_LIST_BUSY) { \ - (lh)->prev = p; \ - __ha_barrier_store(); \ - continue; \ - } \ - np.prev = p; \ - np.next = n; \ - break; \ - } \ - (np); \ - }) - -/* - * Detach a list from its head. A pointer to the first element is returned - * and the list is closed. If the list was empty, NULL is returned. This may - * exclusively be used with lists modified by MT_LIST_TRY_INSERT/MT_LIST_TRY_APPEND. This - * is incompatible with MT_LIST_DELETE run concurrently. - * If there's at least one element, the next of the last element will always - * be NULL. - */ -#define MT_LIST_BEHEAD(_lh) ({ \ - struct mt_list *lh = (_lh); \ - struct mt_list *_n; \ - struct mt_list *_p; \ - for (;;__ha_cpu_relax()) { \ - _p = _HA_ATOMIC_XCHG(&(lh)->prev, MT_LIST_BUSY); \ - if (_p == MT_LIST_BUSY) \ - continue; \ - if (_p == (lh)) { \ - (lh)->prev = _p; \ - __ha_barrier_store(); \ - _n = NULL; \ - break; \ - } \ - _n = _HA_ATOMIC_XCHG(&(lh)->next, MT_LIST_BUSY); \ - if (_n == MT_LIST_BUSY) { \ - (lh)->prev = _p; \ - __ha_barrier_store(); \ - continue; \ - } \ - if (_n == (lh)) { \ - (lh)->next = _n; \ - (lh)->prev = _p; \ - __ha_barrier_store(); \ - _n = NULL; \ - break; \ - } \ - (lh)->next = (lh); \ - (lh)->prev = (lh); \ - __ha_barrier_store(); \ - _n->prev = _p; \ - __ha_barrier_store(); \ - _p->next = NULL; \ - __ha_barrier_store(); \ - break; \ - } \ - (_n); \ -}) - - -/* Remove an item from a list. - * Returns 1 if we removed the item, 0 otherwise (because it was in no list). - */ -#define MT_LIST_DELETE(_el) \ - ({ \ - int _ret = 0; \ - struct mt_list *el = (_el); \ - for (;;__ha_cpu_relax()) { \ - struct mt_list *n, *n2; \ - struct mt_list *p, *p2 = NULL; \ - n = _HA_ATOMIC_XCHG(&(el)->next, MT_LIST_BUSY); \ - if (n == MT_LIST_BUSY) \ - continue; \ - p = _HA_ATOMIC_XCHG(&(el)->prev, MT_LIST_BUSY); \ - if (p == MT_LIST_BUSY) { \ - (el)->next = n; \ - __ha_barrier_store(); \ - continue; \ - } \ - if (p != (el)) { \ - p2 = _HA_ATOMIC_XCHG(&p->next, MT_LIST_BUSY); \ - if (p2 == MT_LIST_BUSY) { \ - (el)->prev = p; \ - (el)->next = n; \ - __ha_barrier_store(); \ - continue; \ - } \ - } \ - if (n != (el)) { \ - n2 = _HA_ATOMIC_XCHG(&n->prev, MT_LIST_BUSY); \ - if (n2 == MT_LIST_BUSY) { \ - if (p2 != NULL) \ - p->next = p2; \ - (el)->prev = p; \ - (el)->next = n; \ - __ha_barrier_store(); \ - continue; \ - } \ - } \ - n->prev = p; \ - p->next = n; \ - if (p != (el) && n != (el)) \ - _ret = 1; \ - __ha_barrier_store(); \ - (el)->prev = (el); \ - (el)->next = (el); \ - __ha_barrier_store(); \ - break; \ - } \ - (_ret); \ - }) - - -/* Remove the first element from the list, and return it */ -#define MT_LIST_POP(_lh, pt, el) \ - ({ \ - void *_ret; \ - struct mt_list *lh = (_lh); \ - for (;;__ha_cpu_relax()) { \ - struct mt_list *n, *n2; \ - struct mt_list *p, *p2; \ - n = _HA_ATOMIC_XCHG(&(lh)->next, MT_LIST_BUSY); \ - if (n == MT_LIST_BUSY) \ - continue; \ - if (n == (lh)) { \ - (lh)->next = lh; \ - __ha_barrier_store(); \ - _ret = NULL; \ - break; \ - } \ - p = _HA_ATOMIC_XCHG(&n->prev, MT_LIST_BUSY); \ - if (p == MT_LIST_BUSY) { \ - (lh)->next = n; \ - __ha_barrier_store(); \ - continue; \ - } \ - n2 = _HA_ATOMIC_XCHG(&n->next, MT_LIST_BUSY); \ - if (n2 == MT_LIST_BUSY) { \ - n->prev = p; \ - __ha_barrier_store(); \ - (lh)->next = n; \ - __ha_barrier_store(); \ - continue; \ - } \ - p2 = _HA_ATOMIC_XCHG(&n2->prev, MT_LIST_BUSY); \ - if (p2 == MT_LIST_BUSY) { \ - n->next = n2; \ - n->prev = p; \ - __ha_barrier_store(); \ - (lh)->next = n; \ - __ha_barrier_store(); \ - continue; \ - } \ - (lh)->next = n2; \ - (n2)->prev = (lh); \ - __ha_barrier_store(); \ - (n)->prev = (n); \ - (n)->next = (n); \ - __ha_barrier_store(); \ - _ret = MT_LIST_ELEM(n, pt, el); \ - break; \ - } \ - (_ret); \ - }) - -#define MT_LIST_HEAD(a) ((void *)(&(a))) - -#define MT_LIST_INIT(l) ((l)->next = (l)->prev = (l)) - -#define MT_LIST_HEAD_INIT(l) { &l, &l } -/* returns a pointer of type to a structure containing a list head called - * at address . Note that can be the result of a function or macro - * since it's used only once. - * Example: MT_LIST_ELEM(cur_node->args.next, struct node *, args) - */ -#define MT_LIST_ELEM(lh, pt, el) ((pt)(((const char *)(lh)) - ((size_t)&((pt)NULL)->el))) - -/* checks if the list head is empty or not */ -#define MT_LIST_ISEMPTY(lh) ((lh)->next == (lh)) - -/* returns a pointer of type to a structure following the element - * which contains list head , which is known as element in - * struct pt. - * Example: MT_LIST_NEXT(args, struct node *, list) - */ -#define MT_LIST_NEXT(lh, pt, el) (MT_LIST_ELEM((lh)->next, pt, el)) - - -/* returns a pointer of type to a structure preceding the element - * which contains list head , which is known as element in - * struct pt. - */ -#undef MT_LIST_PREV -#define MT_LIST_PREV(lh, pt, el) (MT_LIST_ELEM((lh)->prev, pt, el)) - -/* checks if the list element was added to a list or not. This only - * works when detached elements are reinitialized (using LIST_DEL_INIT) - */ -#define MT_LIST_INLIST(el) ((el)->next != (el)) - -/* Lock an element in the list, to be sure it won't be removed nor - * accessed by another thread while the lock is held. - * Locking behavior is inspired from MT_LIST_DELETE macro, - * thus this macro can safely be used concurrently with MT_LIST_DELETE. - * This returns a struct mt_list, that will be needed at unlock time. - * (using MT_LIST_UNLOCK_ELT) - */ -#define MT_LIST_LOCK_ELT(_el) \ - ({ \ - struct mt_list ret; \ - struct mt_list *el = (_el); \ - for (;;__ha_cpu_relax()) { \ - struct mt_list *n, *n2; \ - struct mt_list *p, *p2 = NULL; \ - n = _HA_ATOMIC_XCHG(&(el)->next, MT_LIST_BUSY); \ - if (n == MT_LIST_BUSY) \ - continue; \ - p = _HA_ATOMIC_XCHG(&(el)->prev, MT_LIST_BUSY); \ - if (p == MT_LIST_BUSY) { \ - (el)->next = n; \ - __ha_barrier_store(); \ - continue; \ - } \ - if (p != (el)) { \ - p2 = _HA_ATOMIC_XCHG(&p->next, MT_LIST_BUSY);\ - if (p2 == MT_LIST_BUSY) { \ - (el)->prev = p; \ - (el)->next = n; \ - __ha_barrier_store(); \ - continue; \ - } \ - } \ - if (n != (el)) { \ - n2 = _HA_ATOMIC_XCHG(&n->prev, MT_LIST_BUSY);\ - if (n2 == MT_LIST_BUSY) { \ - if (p2 != NULL) \ - p->next = p2; \ - (el)->prev = p; \ - (el)->next = n; \ - __ha_barrier_store(); \ - continue; \ - } \ - } \ - ret.next = n; \ - ret.prev = p; \ - break; \ - } \ - ret; \ - }) - -/* Unlock an element previously locked by MT_LIST_LOCK_ELT. "np" is the - * struct mt_list returned by MT_LIST_LOCK_ELT(). - */ -#define MT_LIST_UNLOCK_ELT(_el, np) \ - do { \ - struct mt_list *n = (np).next, *p = (np).prev; \ - struct mt_list *el = (_el); \ - (el)->next = n; \ - (el)->prev = p; \ - if (n != (el)) \ - n->prev = (el); \ - if (p != (el)) \ - p->next = (el); \ - } while (0) - -/* Internal macroes for the foreach macroes */ -#define _MT_LIST_UNLOCK_NEXT(el, np) \ - do { \ - struct mt_list *n = (np); \ - (el)->next = n; \ - if (n != (el)) \ - n->prev = (el); \ - } while (0) - -/* Internal macroes for the foreach macroes */ -#define _MT_LIST_UNLOCK_PREV(el, np) \ - do { \ - struct mt_list *p = (np); \ - (el)->prev = p; \ - if (p != (el)) \ - p->next = (el); \ - } while (0) - -#define _MT_LIST_LOCK_NEXT(el) \ - ({ \ - struct mt_list *n = NULL; \ - for (;;__ha_cpu_relax()) { \ - struct mt_list *n2; \ - n = _HA_ATOMIC_XCHG(&((el)->next), MT_LIST_BUSY); \ - if (n == MT_LIST_BUSY) \ - continue; \ - if (n != (el)) { \ - n2 = _HA_ATOMIC_XCHG(&n->prev, MT_LIST_BUSY);\ - if (n2 == MT_LIST_BUSY) { \ - (el)->next = n; \ - __ha_barrier_store(); \ - continue; \ - } \ - } \ - break; \ - } \ - n; \ - }) - -#define _MT_LIST_LOCK_PREV(el) \ - ({ \ - struct mt_list *p = NULL; \ - for (;;__ha_cpu_relax()) { \ - struct mt_list *p2; \ - p = _HA_ATOMIC_XCHG(&((el)->prev), MT_LIST_BUSY); \ - if (p == MT_LIST_BUSY) \ - continue; \ - if (p != (el)) { \ - p2 = _HA_ATOMIC_XCHG(&p->next, MT_LIST_BUSY);\ - if (p2 == MT_LIST_BUSY) { \ - (el)->prev = p; \ - __ha_barrier_store(); \ - continue; \ - } \ - } \ - break; \ - } \ - p; \ - }) - -#define _MT_LIST_RELINK_DELETED(elt2) \ - do { \ - struct mt_list *n = elt2.next, *p = elt2.prev; \ - ALREADY_CHECKED(p); \ - n->prev = p; \ - p->next = n; \ - } while (0); - -/* Equivalent of MT_LIST_DELETE(), to be used when parsing the list with mt_list_entry_for_each_safe(). - * It should be the element currently parsed (tmpelt1) - */ -#define MT_LIST_DELETE_SAFE(_el) \ - do { \ - struct mt_list *el = (_el); \ - (el)->prev = (el); \ - (el)->next = (el); \ - (_el) = NULL; \ - } while (0) - -/* Safe as MT_LIST_DELETE_SAFE, but it won't reinit the element */ -#define MT_LIST_DELETE_SAFE_NOINIT(_el) \ - do { \ - (_el) = NULL; \ - } while (0) - -/* Iterates through a list of items of type "typeof(*item)" which are - * linked via a "struct mt_list" member named . A pointer to the head - * of the list is passed in . - * - * is a temporary struct mt_list *, and is a temporary - * struct mt_list, used internally, both are needed for MT_LIST_DELETE_SAFE. - * - * This macro is implemented using a nested loop. The inner loop will run for - * each element in the list, and the upper loop will run only once to do some - * cleanup when the end of the list is reached or user breaks from inner loop. - * It's safe to break from this macro as the cleanup will be performed anyway, - * but it is strictly forbidden to goto from the loop because skipping the - * cleanup will lead to undefined behavior. - * - * In order to remove the current element, please use MT_LIST_DELETE_SAFE. - * - * Example: - * mt_list_for_each_entry_safe(item, list_head, list_member, elt1, elt2) { - * ... - * } - */ -#define mt_list_for_each_entry_safe(item, list_head, member, tmpelt, tmpelt2) \ - for ((tmpelt) = NULL; (tmpelt) != MT_LIST_BUSY; ({ \ - /* post loop cleanup: \ - * gets executed only once to perform cleanup \ - * after child loop has finished \ - */ \ - if (tmpelt) { \ - /* last elem still exists, unlocking it */ \ - if (tmpelt2.prev) \ - MT_LIST_UNLOCK_ELT(tmpelt, tmpelt2); \ - else { \ - /* special case: child loop did not run \ - * so tmpelt2.prev == NULL \ - * (empty list) \ - */ \ - _MT_LIST_UNLOCK_NEXT(tmpelt, tmpelt2.next); \ - } \ - } else { \ - /* last elem was deleted by user, relink required: \ - * prev->next = next \ - * next->prev = prev \ - */ \ - _MT_LIST_RELINK_DELETED(tmpelt2); \ - } \ - /* break parent loop \ - * (this loop runs exactly one time) \ - */ \ - (tmpelt) = MT_LIST_BUSY; \ - })) \ - for ((tmpelt) = (list_head), (tmpelt2).prev = NULL, (tmpelt2).next = _MT_LIST_LOCK_NEXT(tmpelt); ({ \ - /* this gets executed before each user body loop */ \ - (item) = MT_LIST_ELEM((tmpelt2.next), typeof(item), member); \ - if (&item->member != (list_head)) { \ - /* did not reach end of list \ - * (back to list_head == end of list reached) \ - */ \ - if (tmpelt2.prev != &item->member) \ - tmpelt2.next = _MT_LIST_LOCK_NEXT(&item->member); \ - else { \ - /* FIXME: is this even supposed to happen?? \ - * I'm not understanding how \ - * tmpelt2.prev could be equal to &item->member. \ - * running 'test_list' multiple times with 8 \ - * concurrent threads: this never gets reached \ - */ \ - tmpelt2.next = tmpelt; \ - } \ - if (tmpelt != NULL) { \ - /* if tmpelt was not deleted by user */ \ - if (tmpelt2.prev) { \ - /* not executed on first run \ - * (tmpelt2.prev == NULL on first run) \ - */ \ - _MT_LIST_UNLOCK_PREV(tmpelt, tmpelt2.prev); \ - /* unlock_prev will implicitly relink: \ - * elt->prev = prev \ - * prev->next = elt \ - */ \ - } \ - tmpelt2.prev = tmpelt; \ - } \ - (tmpelt) = &item->member; \ - } \ - /* else: end of list reached (loop stop cond) */ \ - }), \ - &item->member != (list_head);) - static __inline struct list *mt_list_to_list(struct mt_list *list) { union { @@ -904,4 +253,38 @@ static __inline struct mt_list *list_to_mt_list(struct list *list) } +/* + * Add an item at the end of a list. + * It is assumed the element can't already be in a list, so it isn't checked + * Item will be added in busy/locked state, so that it is already + * referenced in the list but no other thread can use it until we're ready. + * + * This returns a struct mt_list, that will be needed at unlock time. + * (using MT_LIST_UNLOCK_ELT) + */ +#define MT_LIST_APPEND_LOCKED(_lh, _el) \ + ({ \ + struct mt_list np; \ + struct mt_list *lh = (_lh), *el = (_el); \ + (el)->next = MT_LIST_BUSY; \ + (el)->prev = MT_LIST_BUSY; \ + for (;;__ha_cpu_relax()) { \ + struct mt_list *n; \ + struct mt_list *p; \ + p = _HA_ATOMIC_XCHG(&(lh)->prev, MT_LIST_BUSY); \ + if (p == MT_LIST_BUSY) \ + continue; \ + n = _HA_ATOMIC_XCHG(&p->next, MT_LIST_BUSY); \ + if (n == MT_LIST_BUSY) { \ + (lh)->prev = p; \ + __ha_barrier_store(); \ + continue; \ + } \ + np.prev = p; \ + np.next = n; \ + break; \ + } \ + (np); \ + }) + #endif /* _HAPROXY_LIST_H */ diff --git a/include/import/mt_list.h b/include/import/mt_list.h new file mode 100644 index 0000000000..5111410457 --- /dev/null +++ b/include/import/mt_list.h @@ -0,0 +1,1040 @@ +/* + * include/mt_list.h + * + * Multi-thread aware circular lists. + * + * Copyright (C) 2018-2023 Willy Tarreau + * Copyright (C) 2018-2023 Olivier Houchard + * + * Permission is hereby granted, free of charge, to any person obtaining + * a copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sublicense, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice shall be + * included in all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES + * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT + * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, + * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ + +#ifndef _MT_LIST_H +#define _MT_LIST_H + +#include +#include + +/* set NOINLINE to forcefully disable user functions inlining */ +#if defined(NOINLINE) +#define MT_INLINE __attribute__((noinline)) +#else +#define MT_INLINE inline +#endif + +// Note: already defined in list-t.h +#ifndef _HAPROXY_LIST_T_H +/* A list element, it's both a head or any element. Both pointers always point + * to a valid list element (possibly itself for a detached element or an empty + * list head), or are equal to MT_LIST_BUSY for a locked pointer indicating + * that the target element is about to be modified. + */ +struct mt_list { + struct mt_list *next; + struct mt_list *prev; +}; +#endif + +/* This is the value of the locked list pointer. It is assigned to an mt_list's + * ->next or ->prev pointer to lock the link to the other element while this + * element is being inspected or modified. + */ +#define MT_LIST_BUSY ((struct mt_list *)1) + +/* This is used to pre-initialize an mt_list element during its declaration. + * The argument is the name of the variable being declared and being assigned + * this value. Example: + * + * struct mt_list pool_head = MT_LIST_HEAD_INIT(pool_head); + */ +#define MT_LIST_HEAD_INIT(l) { .next = &l, .prev = &l } + + +/* Returns a pointer of type to the structure containing a member of type + * mt_list called that is accessible at address . Note that may be + * the result of a function or macro since it's used only once. Example: + * + * return MT_LIST_ELEM(cur_node->args.next, struct node *, args) + */ +#define MT_LIST_ELEM(a, t, m) ((t)(((const char *)(a)) - ((size_t)&((t)NULL)->m))) + + +/* Returns a pointer of type to a structure following the element which + * contains the list element at address , which is known as member in + * struct t*. Example: + * + * return MT_LIST_NEXT(args, struct node *, list); + */ +#define MT_LIST_NEXT(a, t, m) (MT_LIST_ELEM((a)->next, t, m)) + + +/* Returns a pointer of type to a structure preceeding the element which + * contains the list element at address , which is known as member in + * struct t*. Example: + * + * return MT_LIST_PREV(args, struct node *, list); + */ +#define MT_LIST_PREV(a, t, m) (MT_LIST_ELEM((a)->prev, t, m)) + + +/* This is used to prevent the compiler from knowing the origin of the + * variable, and sometimes avoid being confused about possible null-derefs + * that it sometimes believes are possible after pointer casts. + */ +#define MT_ALREADY_CHECKED(p) do { asm("" : "=rm"(p) : "0"(p)); } while (0) + + +/* Returns a pointer of type to the structure containing a member of type + * mt_list called that comes from the first element in list , that is + * atomically detached. If the list is empty, NULL is returned instead. + * Example: + * + * while ((conn = MT_LIST_POP(queue, struct conn *, list))) ... + */ +#define MT_LIST_POP(lh, t, m) \ + ({ \ + struct mt_list *_n = mt_list_pop(lh); \ + (_n ? MT_LIST_ELEM(_n, t, m) : NULL); \ + }) + +/* Iterates through a list of items of type "typeof(*item)" which are + * linked via a "struct mt_list" member named . A pointer to the head + * of the list is passed in . + * + * is a temporary struct mt_list, used internally to store the current + * element's ends while it is locked. + * + * This macro is implemented using two nested loops, each defined as a separate + * macro for easier inspection. The inner loop will run for each element in the + * list, and the outer loop will run only once to do some cleanup when the end + * of the list is reached or user breaks from inner loop. It's safe to break + * from this macro as the cleanup will be performed anyway, but it is strictly + * forbidden to branch (goto or return) from the loop because skipping the + * cleanup will lead to undefined behavior. + * + * The current element is detached from the list while being visited, with its + * extremities locked, and re-attached when switching to the next item. As such + * in order to delete the current item, it's sufficient to set it to NULL to + * prevent the inner loop from attaching it back. In this case it's recommended + * to re-init the item before reusing it in order to clear the locks. + * + * Example: + * MT_LIST_FOR_EACH_ENTRY_SAFE(item, list_head, list_member, back) { + * ... + * } + */ +#define MT_LIST_FOR_EACH_ENTRY_SAFE(item, list_head, member, back) \ + _MT_LIST_FOR_EACH_ENTRY_OUTER(item, list_head, member, back) \ + _MT_LIST_FOR_EACH_ENTRY_INNER(item, list_head, member, back) + + +/* The macros below directly map to their function equivalent. They are + * provided for ease of use. Please refer to the equivalent functions + * for their decription. + */ +#define MT_LIST_INIT(e) (mt_list_init(e)) +#define MT_LIST_ISEMPTY(e) (mt_list_isempty(e)) +#define MT_LIST_INLIST(e) (mt_list_inlist(e)) +#define MT_LIST_TRY_INSERT(l, e) (mt_list_try_insert(l, e)) +#define MT_LIST_TRY_APPEND(l, e) (mt_list_try_append(l, e)) +#define MT_LIST_BEHEAD(l) (mt_list_behead(l)) +#define MT_LIST_INSERT(l, e) (mt_list_insert(l, e)) +#define MT_LIST_APPEND(l, e) (mt_list_append(l, e)) +#define MT_LIST_DELETE(e) (mt_list_delete(e)) +#define MT_LIST_CUT_AFTER(el) (mt_list_cut_after(el)) +#define MT_LIST_CUT_BEFORE(el) (mt_list_cut_before(el)) +#define MT_LIST_CUT_AROUND(el) (mt_list_cut_around(el)) +#define MT_LIST_RECONNECT(ends) (mt_list_reconnect(ends)) +#define MT_LIST_CONNECT_ELEM(el, ends) (mt_list_connect_elem(el, ends)) + + +/* This is a Xorshift-based thread-local PRNG aimed at reducing the risk of + * resonance between competing threads during exponential back-off. Threads + * quickly become out of sync and use completely different values. + */ +static __thread unsigned int _prng_state = 0xEDCBA987; +static inline unsigned int mt_list_prng() +{ + unsigned int x = _prng_state; + + x ^= x << 13; + x ^= x >> 17; + x ^= x << 5; + return _prng_state = x; +} + +static inline unsigned int mt_list_wait(unsigned factor) +{ + //return ((uint64_t)factor * mt_list_prng() + factor) >> 32; + return mt_list_prng() & factor; +} + +/* This function relaxes the CPU during contention. It is meant to be + * architecture-specific and may even be OS-specific, and always exists in a + * generic version. It should return a non-null integer value that can be used + * as a boolean in while() loops. The argument indicates the maximum number of + * loops to be performed before returning. + */ +static inline __attribute__((always_inline)) unsigned long mt_list_cpu_relax(unsigned long loop) +{ + /* limit maximum wait time for unlucky threads */ + loop = mt_list_wait(loop); + + for (loop &= 0x7fffff; loop >= 32; loop--) { +#if defined(__x86_64__) + /* This is a PAUSE instruction on x86_64 */ + asm volatile("rep;nop\n"); +#elif defined(__aarch64__) + /* This was shown to improve fairness on modern ARMv8 + * such as Cortex A72 or Neoverse N1. + */ + asm volatile("isb"); +#else + /* Generic implementation */ + asm volatile(""); +#endif + } + /* faster ending */ + while (loop--) + asm volatile(""); + return 1; +} + + +/* Initialize list element . It will point to itself, matching a list head + * or a detached list element. The list element is returned. + */ +static inline struct mt_list *mt_list_init(struct mt_list *el) +{ + el->next = el->prev = el; + return el; +} + + +/* Returns true if the list element corresponds to an empty list head or a + * detached element, false otherwise. Only the member is checked. + */ +static inline long mt_list_isempty(const struct mt_list *el) +{ + return el->next == el; +} + + +/* Returns true if the list element corresponds to a non-empty list head or + * to an element that is part of a list, false otherwise. Only the member + * is checked. + */ +static inline long mt_list_inlist(const struct mt_list *el) +{ + return el->next != el; +} + + +/* Adds element at the beginning of list , which means that element + * is added immediately after element (nothing strictly requires that + * is effectively the list's head, any valid element will work). Returns + * non-zero if the element was added, otherwise zero (because the element was + * already part of a list). + */ +static MT_INLINE long mt_list_try_insert(struct mt_list *lh, struct mt_list *el) +{ + struct mt_list *n, *n2; + struct mt_list *p, *p2; + unsigned long loops = 0; + long ret = 0; + + /* Note that the first element checked is the most likely to face + * contention, particularly on the list's head/tail. That's why we + * perform a prior load there: if the element is being modified by + * another thread, requesting a read-only access only leaves the + * other thread's cache line in shared mode, which will impact it + * less than if we attempted a change that would invalidate it. + */ + for (;; mt_list_cpu_relax(loops = loops * 8 + 7)) { + if (__atomic_load_n(&lh->next, __ATOMIC_RELAXED) == MT_LIST_BUSY) + continue; + + n = __atomic_exchange_n(&lh->next, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (n == MT_LIST_BUSY) + continue; + + p = __atomic_exchange_n(&n->prev, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (p == MT_LIST_BUSY) { + lh->next = n; + __atomic_thread_fence(__ATOMIC_RELEASE); + continue; + } + + n2 = __atomic_exchange_n(&el->next, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (n2 != el) { + /* This element was already attached elsewhere */ + if (n2 != MT_LIST_BUSY) + el->next = n2; + n->prev = p; + __atomic_thread_fence(__ATOMIC_RELEASE); + + lh->next = n; + __atomic_thread_fence(__ATOMIC_RELEASE); + + if (n2 == MT_LIST_BUSY) + continue; + break; + } + + p2 = __atomic_exchange_n(&el->prev, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (p2 != el) { + /* This element was already attached elsewhere */ + if (p2 != MT_LIST_BUSY) + el->prev = p2; + n->prev = p; + el->next = el; + __atomic_thread_fence(__ATOMIC_RELEASE); + + lh->next = n; + __atomic_thread_fence(__ATOMIC_RELEASE); + + if (p2 == MT_LIST_BUSY) + continue; + break; + } + + el->next = n; + el->prev = p; + __atomic_thread_fence(__ATOMIC_RELEASE); + + n->prev = el; + __atomic_thread_fence(__ATOMIC_RELEASE); + + p->next = el; + __atomic_thread_fence(__ATOMIC_RELEASE); + + ret = 1; + break; + } + return ret; +} + + +/* Adds element at the end of list , which means that element is + * added immediately before element (nothing strictly requires that + * is effectively the list's head, any valid element will work). Returns non- + * zero if the element was added, otherwise zero (because the element was + * already part of a list). + */ +static MT_INLINE long mt_list_try_append(struct mt_list *lh, struct mt_list *el) +{ + struct mt_list *n, *n2; + struct mt_list *p, *p2; + unsigned long loops = 0; + long ret = 0; + + /* Note that the first element checked is the most likely to face + * contention, particularly on the list's head/tail. That's why we + * perform a prior load there: if the element is being modified by + * another thread, requesting a read-only access only leaves the + * other thread's cache line in shared mode, which will impact it + * less than if we attempted a change that would invalidate it. + */ + for (;; mt_list_cpu_relax(loops = loops * 8 + 7)) { + if (__atomic_load_n(&lh->prev, __ATOMIC_RELAXED) == MT_LIST_BUSY) + continue; + + p = __atomic_exchange_n(&lh->prev, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (p == MT_LIST_BUSY) + continue; + + n = __atomic_exchange_n(&p->next, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (n == MT_LIST_BUSY) { + lh->prev = p; + __atomic_thread_fence(__ATOMIC_RELEASE); + continue; + } + + p2 = __atomic_exchange_n(&el->prev, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (p2 != el) { + /* This element was already attached elsewhere */ + if (p2 != MT_LIST_BUSY) + el->prev = p2; + p->next = n; + __atomic_thread_fence(__ATOMIC_RELEASE); + + lh->prev = p; + __atomic_thread_fence(__ATOMIC_RELEASE); + + if (p2 == MT_LIST_BUSY) + continue; + break; + } + + n2 = __atomic_exchange_n(&el->next, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (n2 != el) { + /* This element was already attached elsewhere */ + if (n2 != MT_LIST_BUSY) + el->next = n2; + p->next = n; + el->prev = el; + __atomic_thread_fence(__ATOMIC_RELEASE); + + lh->prev = p; + __atomic_thread_fence(__ATOMIC_RELEASE); + + if (n2 == MT_LIST_BUSY) + continue; + break; + } + + el->next = n; + el->prev = p; + __atomic_thread_fence(__ATOMIC_RELEASE); + + p->next = el; + __atomic_thread_fence(__ATOMIC_RELEASE); + + n->prev = el; + __atomic_thread_fence(__ATOMIC_RELEASE); + + ret = 1; + break; + } + return ret; +} + + +/* Detaches a list from its head. A pointer to the first element is returned + * and the list is closed. If the list was empty, NULL is returned. This may + * exclusively be used with lists manipulated using mt_list_try_insert() and + * mt_list_try_append(). This is incompatible with mt_list_delete() run + * concurrently. If there's at least one element, the next of the last element + * will always be NULL. + */ +static MT_INLINE struct mt_list *mt_list_behead(struct mt_list *lh) +{ + struct mt_list *n; + struct mt_list *p; + unsigned long loops = 0; + + for (;; mt_list_cpu_relax(loops = loops * 8 + 7)) { + p = __atomic_exchange_n(&lh->prev, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (p == MT_LIST_BUSY) + continue; + if (p == lh) { + lh->prev = p; + __atomic_thread_fence(__ATOMIC_RELEASE); + n = NULL; + break; + } + + n = __atomic_exchange_n(&lh->next, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (n == MT_LIST_BUSY) { + lh->prev = p; + __atomic_thread_fence(__ATOMIC_RELEASE); + continue; + } + if (n == lh) { + lh->next = n; + lh->prev = p; + __atomic_thread_fence(__ATOMIC_RELEASE); + n = NULL; + break; + } + + lh->next = lh->prev = lh; + __atomic_thread_fence(__ATOMIC_RELEASE); + + n->prev = p; + __atomic_thread_fence(__ATOMIC_RELEASE); + + p->next = NULL; + __atomic_thread_fence(__ATOMIC_RELEASE); + break; + } + return n; +} + + +/* Adds element at the beginning of list , which means that element + * is added immediately after element (nothing strictly requires that + * is effectively the list's head, any valid element will work). It is + * assumed that the element cannot already be part of a list so it isn't + * checked for this. + */ +static MT_INLINE void mt_list_insert(struct mt_list *lh, struct mt_list *el) +{ + struct mt_list *n; + struct mt_list *p; + unsigned long loops = 0; + + for (;; mt_list_cpu_relax(loops = loops * 8 + 7)) { + n = __atomic_exchange_n(&lh->next, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (n == MT_LIST_BUSY) + continue; + + p = __atomic_exchange_n(&n->prev, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (p == MT_LIST_BUSY) { + lh->next = n; + __atomic_thread_fence(__ATOMIC_RELEASE); + continue; + } + + el->next = n; + el->prev = p; + __atomic_thread_fence(__ATOMIC_RELEASE); + + n->prev = el; + __atomic_thread_fence(__ATOMIC_RELEASE); + + p->next = el; + __atomic_thread_fence(__ATOMIC_RELEASE); + break; + } +} + + +/* Adds element at the end of list , which means that element is + * added immediately after element (nothing strictly requires that is + * effectively the list's head, any valid element will work). It is assumed + * that the element cannot already be part of a list so it isn't checked for + * this. + */ +static MT_INLINE void mt_list_append(struct mt_list *lh, struct mt_list *el) +{ + struct mt_list *n; + struct mt_list *p; + unsigned long loops = 0; + + for (;; mt_list_cpu_relax(loops = loops * 8 + 7)) { + p = __atomic_exchange_n(&lh->prev, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (p == MT_LIST_BUSY) + continue; + + n = __atomic_exchange_n(&p->next, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (n == MT_LIST_BUSY) { + lh->prev = p; + __atomic_thread_fence(__ATOMIC_RELEASE); + continue; + } + + el->next = n; + el->prev = p; + __atomic_thread_fence(__ATOMIC_RELEASE); + + p->next = el; + __atomic_thread_fence(__ATOMIC_RELEASE); + + n->prev = el; + __atomic_thread_fence(__ATOMIC_RELEASE); + break; + } +} + + +/* Removes element from the list it belongs to. The function returns + * non-zero if the element could be removed, otherwise zero if the element + * could not be removed, because it was already not in a list anymore. + */ +static MT_INLINE long mt_list_delete(struct mt_list *el) +{ + struct mt_list *n, *n2; + struct mt_list *p, *p2; + unsigned long loops = 0; + long ret = 0; + + for (;; mt_list_cpu_relax(loops = loops * 8 + 7)) { + p2 = NULL; + n = __atomic_exchange_n(&el->next, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (n == MT_LIST_BUSY) + continue; + + p = __atomic_exchange_n(&el->prev, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (p == MT_LIST_BUSY) { + el->next = n; + __atomic_thread_fence(__ATOMIC_RELEASE); + continue; + } + + if (p != el) { + p2 = __atomic_exchange_n(&p->next, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (p2 == MT_LIST_BUSY) { + el->prev = p; + el->next = n; + __atomic_thread_fence(__ATOMIC_RELEASE); + continue; + } + } + + if (n != el) { + n2 = __atomic_exchange_n(&n->prev, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (n2 == MT_LIST_BUSY) { + if (p2 != NULL) + p->next = p2; + el->prev = p; + el->next = n; + __atomic_thread_fence(__ATOMIC_RELEASE); + continue; + } + } + + n->prev = p; + p->next = n; + __atomic_thread_fence(__ATOMIC_RELEASE); + + el->prev = el->next = el; + __atomic_thread_fence(__ATOMIC_RELEASE); + + if (p != el && n != el) + ret = 1; + break; + } + return ret; +} + + +/* Removes the first element from the list , and returns it in detached + * form. If the list is already empty, NULL is returned instead. + */ +static MT_INLINE struct mt_list *mt_list_pop(struct mt_list *lh) +{ + struct mt_list *n, *n2; + struct mt_list *p, *p2; + unsigned long loops = 0; + + for (;; mt_list_cpu_relax(loops = loops * 8 + 7)) { + if (__atomic_load_n(&lh->next, __ATOMIC_RELAXED) == MT_LIST_BUSY) + continue; + + n = __atomic_exchange_n(&lh->next, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (n == MT_LIST_BUSY) + continue; + + if (n == lh) { + /* list is empty */ + lh->next = lh; + __atomic_thread_fence(__ATOMIC_RELEASE); + n = NULL; + break; + } + + p = __atomic_exchange_n(&n->prev, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (p == MT_LIST_BUSY) { + lh->next = n; + __atomic_thread_fence(__ATOMIC_RELEASE); + continue; + } + + n2 = __atomic_exchange_n(&n->next, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (n2 == MT_LIST_BUSY) { + n->prev = p; + __atomic_thread_fence(__ATOMIC_RELEASE); + + lh->next = n; + __atomic_thread_fence(__ATOMIC_RELEASE); + continue; + } + + p2 = __atomic_exchange_n(&n2->prev, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (p2 == MT_LIST_BUSY) { + n->next = n2; + n->prev = p; + __atomic_thread_fence(__ATOMIC_RELEASE); + + lh->next = n; + __atomic_thread_fence(__ATOMIC_RELEASE); + continue; + } + + lh->next = n2; + n2->prev = lh; + __atomic_thread_fence(__ATOMIC_RELEASE); + + n->prev = n->next = n; + __atomic_thread_fence(__ATOMIC_RELEASE); + + /* return n */ + break; + } + return n; +} + + +/* Opens the list just after which usually is the list's head, but not + * necessarily. The link between and its next element is cut and replaced + * with an MT_LIST_BUSY lock. The ends of the removed link are returned as an + * mt_list entry. The operation can be cancelled using mt_list_reconnect() on + * the returned value, which will restore the link and unlock the list, or + * using mt_list_connect_elem() which will replace the link with another + * element and also unlock the list, effectively resulting in inserting that + * element after . Example: + * + * struct mt_list *list_insert(struct mt_list *list) + * { + * struct mt_list tmp = mt_list_cut_after(list); + * struct mt_list *el = alloc_element_to_insert(); + * if (el) + * mt_list_connect_elem(el, tmp); + * else + * mt_list_reconnect(tmp); + * return el; + * } + */ +static MT_INLINE struct mt_list mt_list_cut_after(struct mt_list *lh) +{ + struct mt_list el; + unsigned long loops = 0; + + for (;; mt_list_cpu_relax(loops = loops * 8 + 7)) { + el.next = __atomic_exchange_n(&lh->next, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (el.next == MT_LIST_BUSY) + continue; + + el.prev = __atomic_exchange_n(&el.next->prev, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (el.prev == MT_LIST_BUSY) { + lh->next = el.next; + __atomic_thread_fence(__ATOMIC_RELEASE); + continue; + } + break; + } + return el; +} + + +/* Opens the list just before which usually is the list's head, but not + * necessarily. The link between and its prev element is cut and replaced + * with an MT_LIST_BUSY lock. The ends of the removed link are returned as an + * mt_list entry. The operation can be cancelled using mt_list_reconnect() on + * the returned value, which will restore the link and unlock the list, or + * using mt_list_connect_elem() which will replace the link with another + * element and also unlock the list, effectively resulting in inserting that + * element before . Example: + * + * struct mt_list *list_append(struct mt_list *list) + * { + * struct mt_list tmp = mt_list_cut_before(list); + * struct mt_list *el = alloc_element_to_insert(); + * if (el) + * mt_list_connect_elem(el, tmp); + * else + * mt_list_reconnect(tmp); + * return el; + * } + */ +static MT_INLINE struct mt_list mt_list_cut_before(struct mt_list *lh) +{ + struct mt_list el; + unsigned long loops = 0; + + for (;; mt_list_cpu_relax(loops = loops * 8 + 7)) { + el.prev = __atomic_exchange_n(&lh->prev, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (el.prev == MT_LIST_BUSY) + continue; + + el.next = __atomic_exchange_n(&el.prev->next, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (el.next == MT_LIST_BUSY) { + lh->prev = el.prev; + __atomic_thread_fence(__ATOMIC_RELEASE); + continue; + } + break; + } + return el; +} + + +/* Opens the list around element . Both the links between and its prev + * element and between and its next element are cut and replaced with an + * MT_LIST_BUSY lock. The element itself also has its ends replaced with a + * lock, and the ends of the element are returned as an mt_list entry. This + * results in the element being detached from the list and both the element and + * the list being locked. The operation can be terminated by calling + * mt_list_reconnect() on the returned value, which will unlock the list and + * effectively result in the removal of the element from the list, or by + * calling mt_list_connect_elem() to reinstall the element at its place in the + * list, effectively consisting in a temporary lock of this element. Example: + * + * struct mt_list *grow_shrink_remove(struct mt_list *el, size_t new_size) + * { + * struct mt_list *tmp = mt_list_cut_around(&node->list); + * struct mt_list *new = new_size ? realloc(el, new_size) : NULL; + * if (new_size) { + * mt_list_connect_elem(new ? new : el, tmp); + * } else { + * free(el); + * mt_list_reconnect(tmp); + * } + * return new; + * } + */ +static MT_INLINE struct mt_list mt_list_cut_around(struct mt_list *el) +{ + struct mt_list *n2; + struct mt_list *p2; + struct mt_list ret; + unsigned long loops = 0; + + for (;; mt_list_cpu_relax(loops = loops * 8 + 7)) { + p2 = NULL; + if (__atomic_load_n(&el->next, __ATOMIC_RELAXED) == MT_LIST_BUSY) + continue; + + ret.next = __atomic_exchange_n(&el->next, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (ret.next == MT_LIST_BUSY) + continue; + + ret.prev = __atomic_exchange_n(&el->prev, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (ret.prev == MT_LIST_BUSY) { + el->next = ret.next; + __atomic_thread_fence(__ATOMIC_RELEASE); + continue; + } + + if (ret.prev != el) { + p2 = __atomic_exchange_n(&ret.prev->next, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (p2 == MT_LIST_BUSY) { + *el = ret; + __atomic_thread_fence(__ATOMIC_RELEASE); + continue; + } + } + + if (ret.next != el) { + n2 = __atomic_exchange_n(&ret.next->prev, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (n2 == MT_LIST_BUSY) { + if (p2 != NULL) + ret.prev->next = p2; + *el = ret; + __atomic_thread_fence(__ATOMIC_RELEASE); + continue; + } + } + break; + } + return ret; +} + +/* Reconnects two elements in a list. This is used to complete an element + * removal or just to unlock a list previously locked with mt_list_cut_after(), + * mt_list_cut_before(), or mt_list_cut_around(). The link element returned by + * these function just needs to be passed to this one. See examples above. + */ +static inline void mt_list_reconnect(struct mt_list ends) +{ + ends.next->prev = ends.prev; + ends.prev->next = ends.next; +} + + +/* Connects element at both ends of a list which is still locked + * hence has the link between these endpoints cut. This automatically unlocks + * both the element and the list, and effectively results in inserting or + * appending the element to that list if the ends were just after or just + * before the list's head. It may also be used to unlock a previously locked + * element since locking an element consists in cutting the links around it. + * The element doesn't need to be previously initialized as it gets blindly + * overwritten with . See examples above. + */ +static inline void mt_list_connect_elem(struct mt_list *el, struct mt_list ends) +{ + *el = ends; + __atomic_thread_fence(__ATOMIC_RELEASE); + + if (__builtin_expect(ends.next != el, 1)) + ends.next->prev = el; + if (__builtin_expect(ends.prev != el, 1)) + ends.prev->next = el; +} + + +/***************************************************************************** + * The macros and functions below are only used by the iterators. These must * + * not be used for other purposes! * + *****************************************************************************/ + + +/* Unlocks element from the backup copy of previous next pointer . + * It supports the special case where the list was empty and the element locked + * while looping over itself (we don't need/want to overwrite ->prev in this + * case). + */ +static inline void _mt_list_unlock_next(struct mt_list *el, struct mt_list *back) +{ + el->next = back; + __atomic_thread_fence(__ATOMIC_RELEASE); + + if (back != el) + back->prev = el; +} + + +/* Unlocks element from the backup copy of previous prev pointer . + * cannot be equal to here because if the list is empty, the list's + * head is not locked for prev and the caller has NULL in back.prev, thus does + * not call this function. + */ +static inline void _mt_list_unlock_prev(struct mt_list *el, struct mt_list *back) +{ + el->prev = back; + __atomic_thread_fence(__ATOMIC_RELEASE); + + back->next = el; +} + + +/* Locks the link designated by element 's next pointer and returns its + * previous value. If the element does not loop over itself (empty list head), + * its reciprocal prev pointer is locked as well. This check is necessary + * because we don't want to lock the head twice. + */ +static MT_INLINE struct mt_list *_mt_list_lock_next(struct mt_list *el) +{ + struct mt_list *n, *n2; + unsigned long loops = 0; + + for (;; mt_list_cpu_relax(loops = loops * 8 + 7)) { + if (__atomic_load_n(&el->next, __ATOMIC_RELAXED) == MT_LIST_BUSY) + continue; + n = __atomic_exchange_n(&el->next, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (n == MT_LIST_BUSY) + continue; + + if (n != el) { + n2 = __atomic_exchange_n(&n->prev, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (n2 == MT_LIST_BUSY) { + el->next = n; + __atomic_thread_fence(__ATOMIC_RELEASE); + continue; + } + } + break; + } + return n; +} + + +/* Locks the link designated by element 's prev pointer and returns its + * previous value. The element cannot loop over itself because the caller will + * only lock the prev pointer on an non-empty list. + */ +static MT_INLINE struct mt_list *_mt_list_lock_prev(struct mt_list *el) +{ + struct mt_list *p, *p2; + unsigned long loops = 0; + + for (;; mt_list_cpu_relax(loops = loops * 8 + 7)) { + if (__atomic_load_n(&el->prev, __ATOMIC_RELAXED) == MT_LIST_BUSY) + continue; + p = __atomic_exchange_n(&el->prev, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (p == MT_LIST_BUSY) + continue; + + p2 = __atomic_exchange_n(&p->next, MT_LIST_BUSY, __ATOMIC_RELAXED); + if (p2 == MT_LIST_BUSY) { + el->prev = p; + __atomic_thread_fence(__ATOMIC_RELEASE); + continue; + } + break; + } + return p; +} + + +/* Outer loop of MT_LIST_FOR_EACH_ENTRY_SAFE(). Do not use directly! + * This loop is only used to unlock the last item after the end of the inner + * loop is reached or if we break out of it. + * + * Trick: item starts with the impossible and unused value MT_LIST_BUSY that is + * detected as the looping condition to force to enter the loop. The inner loop + * will first replace it, making the compiler notice that this condition cannot + * happen after the first iteration, and making it implement exactly one round + * and no more. + */ +#define _MT_LIST_FOR_EACH_ENTRY_OUTER(item, lh, lm, back) \ + for (/* init-expr: preset for one iteration */ \ + (back).prev = NULL, \ + (back).next = _mt_list_lock_next(lh), \ + (item) = (void*)MT_LIST_BUSY; \ + /* condition-expr: only one iteration */ \ + (void*)(item) == (void*)MT_LIST_BUSY; \ + /* loop-expr */ \ + ({ \ + /* post loop cleanup: \ + * gets executed only once to perform cleanup \ + * after child loop has finished, or a break happened \ + */ \ + if (item != NULL) { \ + /* last visited item still exists or is the list's head \ + * so we have to unlock it. back.prev may be null if \ + * the list is empty and the inner loop did not run. \ + */ \ + if (back.prev) \ + _mt_list_unlock_prev(&item->lm, back.prev); \ + _mt_list_unlock_next(&item->lm, back.next); \ + } else { \ + /* last item was deleted by user, relink is required: \ + * prev->next = next \ + * next->prev = prev \ + * Note that gcc may believe that back.prev may be null \ + * which is not possible by construction. \ + */ \ + MT_ALREADY_CHECKED(back.prev); \ + mt_list_reconnect(back); \ + } \ + }) \ + ) + + +/* Inner loop of MT_LIST_FOR_EACH_ENTRY_SAFE(). Do not use directly! + * This loop iterates over all list elements and unlocks the previously visited + * element. It stops when reaching the list's head, without unlocking the last + * element, which is left to the outer loop to deal with, just like when hitting + * a break. In order to preserve the locking, the loop takes care of always + * locking the next element before unlocking the previous one. During the first + * iteration, the prev element might be NULL since the head is singly-locked. + */ +#define _MT_LIST_FOR_EACH_ENTRY_INNER(item, lh, lm, back) \ + for (/* init-expr */ \ + item = MT_LIST_ELEM(lh, typeof(item), lm); \ + /* cond-expr (thus executed before the body of the loop) */ \ + (back.next != lh) && ({ \ + struct mt_list *__tmp_next = back.next; \ + /* did not reach end of list yet */ \ + back.next = _mt_list_lock_next(back.next); \ + if (item != NULL) { \ + /* previous item was not deleted, we must unlock it */ \ + if (back.prev) { \ + /* not executed on first run \ + * (back.prev == NULL on first run) \ + */ \ + _mt_list_unlock_prev(&item->lm, back.prev); \ + /* unlock_prev will implicitly relink: \ + * item->lm.prev = prev \ + * prev->next = &item->lm \ + */ \ + } \ + back.prev = &item->lm; \ + } \ + (item) = MT_LIST_ELEM(__tmp_next, typeof(item), lm); \ + 1; /* end of list not reached, we must execute */ \ + }); \ + /* empty loop-expr */ \ + ) + +#endif /* _MT_LIST_H */ diff --git a/src/event_hdl.c b/src/event_hdl.c index 7641486cf9..a18a6dcd69 100644 --- a/src/event_hdl.c +++ b/src/event_hdl.c @@ -63,14 +63,13 @@ static void _event_hdl_sub_list_destroy(event_hdl_sub_list *sub_list); static void event_hdl_deinit(struct sig_handler *sh) { event_hdl_sub_list *cur_list; - struct mt_list *elt1, elt2; + struct mt_list back; /* destroy all known subscription lists */ - mt_list_for_each_entry_safe(cur_list, &known_event_hdl_sub_list, known, elt1, elt2) { - /* remove cur elem from list */ - MT_LIST_DELETE_SAFE(elt1); - /* then destroy it */ + MT_LIST_FOR_EACH_ENTRY_SAFE(cur_list, &known_event_hdl_sub_list, known, back) { + /* remove cur elem from list and free it */ _event_hdl_sub_list_destroy(cur_list); + cur_list = NULL; } } @@ -288,11 +287,11 @@ static inline struct event_hdl_sub_type _event_hdl_getsub_async(struct event_hdl struct mt_list lock; struct event_hdl_sub_type type = EVENT_HDL_SUB_NONE; - lock = MT_LIST_LOCK_ELT(&cur_sub->mt_list); + lock = mt_list_cut_around(&cur_sub->mt_list); if (lock.next != &cur_sub->mt_list) type = _event_hdl_getsub(cur_sub); // else already removed - MT_LIST_UNLOCK_ELT(&cur_sub->mt_list, lock); + mt_list_connect_elem(&cur_sub->mt_list, lock); return type; } @@ -309,11 +308,11 @@ static inline int _event_hdl_resub_async(struct event_hdl_sub *cur_sub, struct e int status = 0; struct mt_list lock; - lock = MT_LIST_LOCK_ELT(&cur_sub->mt_list); + lock = mt_list_cut_around(&cur_sub->mt_list); if (lock.next != &cur_sub->mt_list) status = _event_hdl_resub(cur_sub, type); // else already removed - MT_LIST_UNLOCK_ELT(&cur_sub->mt_list, lock); + mt_list_connect_elem(&cur_sub->mt_list, lock); return status; } @@ -338,7 +337,7 @@ static inline void _event_hdl_unsubscribe(struct event_hdl_sub *del_sub) event_hdl_task_wakeup(del_sub->hdl.async_task); /* unlock END EVENT (we're done, the task is now free to consume it) */ - MT_LIST_UNLOCK_ELT(&del_sub->async_end->mt_list, lock); + mt_list_connect_elem(&del_sub->async_end->mt_list, lock); /* we don't free sub here * freeing will be performed by async task so it can safely rely @@ -408,8 +407,7 @@ static void event_hdl_unsubscribe_sync(const struct event_hdl_sub_mgmt *mgmt) return; /* already removed from sync ctx */ /* assuming that publish sync code will notice that mgmt->this is NULL - * and will perform the list removal using MT_LIST_DELETE_SAFE and - * _event_hdl_unsubscribe() + * and will perform the list removal and _event_hdl_unsubscribe() * while still owning the lock */ ((struct event_hdl_sub_mgmt *)mgmt)->this = NULL; @@ -436,7 +434,7 @@ struct event_hdl_sub *event_hdl_subscribe_ptr(event_hdl_sub_list *sub_list, struct event_hdl_sub_type e_type, struct event_hdl hdl) { struct event_hdl_sub *new_sub = NULL; - struct mt_list *elt1, elt2; + struct mt_list back; struct event_hdl_async_task_default_ctx *task_ctx = NULL; struct mt_list lock; @@ -512,14 +510,14 @@ struct event_hdl_sub *event_hdl_subscribe_ptr(event_hdl_sub_list *sub_list, /* ready for registration */ MT_LIST_INIT(&new_sub->mt_list); - lock = MT_LIST_LOCK_ELT(&sub_list->known); + lock = mt_list_cut_around(&sub_list->known); /* check if such identified hdl is not already registered */ if (hdl.id) { struct event_hdl_sub *cur_sub; uint8_t found = 0; - mt_list_for_each_entry_safe(cur_sub, &sub_list->head, mt_list, elt1, elt2) { + MT_LIST_FOR_EACH_ENTRY_SAFE(cur_sub, &sub_list->head, mt_list, back) { if (hdl.id == cur_sub->hdl.id) { /* we found matching registered hdl */ found = 1; @@ -528,7 +526,7 @@ struct event_hdl_sub *event_hdl_subscribe_ptr(event_hdl_sub_list *sub_list, } if (found) { /* error already registered */ - MT_LIST_UNLOCK_ELT(&sub_list->known, lock); + mt_list_connect_elem(&sub_list->known, lock); event_hdl_report_hdl_state(ha_alert, &hdl, "SUB", "could not subscribe: subscription with this id already exists"); goto cleanup; } @@ -540,7 +538,7 @@ struct event_hdl_sub *event_hdl_subscribe_ptr(event_hdl_sub_list *sub_list, * it is a memory/IO error since it should not be long before haproxy * enters the deinit() function anyway */ - MT_LIST_UNLOCK_ELT(&sub_list->known, lock); + mt_list_connect_elem(&sub_list->known, lock); goto cleanup; } @@ -564,7 +562,7 @@ struct event_hdl_sub *event_hdl_subscribe_ptr(event_hdl_sub_list *sub_list, MT_LIST_APPEND(&sub_list->head, &new_sub->mt_list); } - MT_LIST_UNLOCK_ELT(&sub_list->known, lock); + mt_list_connect_elem(&sub_list->known, lock); return new_sub; @@ -618,11 +616,11 @@ void event_hdl_pause(struct event_hdl_sub *cur_sub) { struct mt_list lock; - lock = MT_LIST_LOCK_ELT(&cur_sub->mt_list); + lock = mt_list_cut_around(&cur_sub->mt_list); if (lock.next != &cur_sub->mt_list) _event_hdl_pause(cur_sub); // else already removed - MT_LIST_UNLOCK_ELT(&cur_sub->mt_list, lock); + mt_list_connect_elem(&cur_sub->mt_list, lock); } void _event_hdl_resume(struct event_hdl_sub *cur_sub) @@ -634,11 +632,11 @@ void event_hdl_resume(struct event_hdl_sub *cur_sub) { struct mt_list lock; - lock = MT_LIST_LOCK_ELT(&cur_sub->mt_list); + lock = mt_list_cut_around(&cur_sub->mt_list); if (lock.next != &cur_sub->mt_list) _event_hdl_resume(cur_sub); // else already removed - MT_LIST_UNLOCK_ELT(&cur_sub->mt_list, lock); + mt_list_connect_elem(&cur_sub->mt_list, lock); } void event_hdl_unsubscribe(struct event_hdl_sub *del_sub) @@ -667,17 +665,17 @@ int event_hdl_lookup_unsubscribe(event_hdl_sub_list *sub_list, uint64_t lookup_id) { struct event_hdl_sub *del_sub = NULL; - struct mt_list *elt1, elt2; + struct mt_list back; int found = 0; if (!sub_list) sub_list = &global_event_hdl_sub_list; /* fall back to global list */ - mt_list_for_each_entry_safe(del_sub, &sub_list->head, mt_list, elt1, elt2) { + MT_LIST_FOR_EACH_ENTRY_SAFE(del_sub, &sub_list->head, mt_list, back) { if (lookup_id == del_sub->hdl.id) { /* we found matching registered hdl */ - MT_LIST_DELETE_SAFE(elt1); _event_hdl_unsubscribe(del_sub); + del_sub = NULL; found = 1; break; /* id is unique, stop searching */ } @@ -689,13 +687,13 @@ int event_hdl_lookup_resubscribe(event_hdl_sub_list *sub_list, uint64_t lookup_id, struct event_hdl_sub_type type) { struct event_hdl_sub *cur_sub = NULL; - struct mt_list *elt1, elt2; + struct mt_list back; int status = 0; if (!sub_list) sub_list = &global_event_hdl_sub_list; /* fall back to global list */ - mt_list_for_each_entry_safe(cur_sub, &sub_list->head, mt_list, elt1, elt2) { + MT_LIST_FOR_EACH_ENTRY_SAFE(cur_sub, &sub_list->head, mt_list, back) { if (lookup_id == cur_sub->hdl.id) { /* we found matching registered hdl */ status = _event_hdl_resub(cur_sub, type); @@ -709,13 +707,13 @@ int event_hdl_lookup_pause(event_hdl_sub_list *sub_list, uint64_t lookup_id) { struct event_hdl_sub *cur_sub = NULL; - struct mt_list *elt1, elt2; + struct mt_list back; int found = 0; if (!sub_list) sub_list = &global_event_hdl_sub_list; /* fall back to global list */ - mt_list_for_each_entry_safe(cur_sub, &sub_list->head, mt_list, elt1, elt2) { + MT_LIST_FOR_EACH_ENTRY_SAFE(cur_sub, &sub_list->head, mt_list, back) { if (lookup_id == cur_sub->hdl.id) { /* we found matching registered hdl */ _event_hdl_pause(cur_sub); @@ -730,13 +728,13 @@ int event_hdl_lookup_resume(event_hdl_sub_list *sub_list, uint64_t lookup_id) { struct event_hdl_sub *cur_sub = NULL; - struct mt_list *elt1, elt2; + struct mt_list back; int found = 0; if (!sub_list) sub_list = &global_event_hdl_sub_list; /* fall back to global list */ - mt_list_for_each_entry_safe(cur_sub, &sub_list->head, mt_list, elt1, elt2) { + MT_LIST_FOR_EACH_ENTRY_SAFE(cur_sub, &sub_list->head, mt_list, back) { if (lookup_id == cur_sub->hdl.id) { /* we found matching registered hdl */ _event_hdl_resume(cur_sub); @@ -751,13 +749,13 @@ struct event_hdl_sub *event_hdl_lookup_take(event_hdl_sub_list *sub_list, uint64_t lookup_id) { struct event_hdl_sub *cur_sub = NULL; - struct mt_list *elt1, elt2; + struct mt_list back; uint8_t found = 0; if (!sub_list) sub_list = &global_event_hdl_sub_list; /* fall back to global list */ - mt_list_for_each_entry_safe(cur_sub, &sub_list->head, mt_list, elt1, elt2) { + MT_LIST_FOR_EACH_ENTRY_SAFE(cur_sub, &sub_list->head, mt_list, back) { if (lookup_id == cur_sub->hdl.id) { /* we found matching registered hdl */ event_hdl_take(cur_sub); @@ -776,11 +774,11 @@ static int _event_hdl_publish(event_hdl_sub_list *sub_list, struct event_hdl_sub const struct event_hdl_cb_data *data) { struct event_hdl_sub *cur_sub; - struct mt_list *elt1, elt2; + struct mt_list back; struct event_hdl_async_event_data *async_data = NULL; /* reuse async data for multiple async hdls */ int error = 0; - mt_list_for_each_entry_safe(cur_sub, &sub_list->head, mt_list, elt1, elt2) { + MT_LIST_FOR_EACH_ENTRY_SAFE(cur_sub, &sub_list->head, mt_list, back) { /* notify each function that has subscribed to sub_family.type, unless paused */ if ((cur_sub->sub.family == e_type.family) && ((cur_sub->sub.subtype & e_type.subtype) == e_type.subtype) && @@ -810,10 +808,10 @@ static int _event_hdl_publish(event_hdl_sub_list *sub_list, struct event_hdl_sub if (!sub_mgmt.this) { /* user has performed hdl unsub * we must remove it from the list + * then free it. */ - MT_LIST_DELETE_SAFE(elt1); - /* then free it */ _event_hdl_unsubscribe(cur_sub); + cur_sub = NULL; } } else { /* async mode: here we need to prepare event data @@ -941,13 +939,12 @@ void event_hdl_sub_list_init(event_hdl_sub_list *sub_list) static void _event_hdl_sub_list_destroy(event_hdl_sub_list *sub_list) { struct event_hdl_sub *cur_sub; - struct mt_list *elt1, elt2; + struct mt_list back; - mt_list_for_each_entry_safe(cur_sub, &sub_list->head, mt_list, elt1, elt2) { - /* remove cur elem from list */ - MT_LIST_DELETE_SAFE(elt1); - /* then free it */ + MT_LIST_FOR_EACH_ENTRY_SAFE(cur_sub, &sub_list->head, mt_list, back) { + /* remove cur elem from list and free it */ _event_hdl_unsubscribe(cur_sub); + cur_sub = NULL; } } diff --git a/src/hlua_fcn.c b/src/hlua_fcn.c index 9937082138..a2471876a2 100644 --- a/src/hlua_fcn.c +++ b/src/hlua_fcn.c @@ -555,7 +555,7 @@ static int hlua_queue_push(lua_State *L) { struct hlua_queue *queue = hlua_check_queue(L, 1); struct hlua_queue_item *item; - struct mt_list *elt1, elt2; + struct mt_list back; struct hlua_queue_wait *waiter; if (lua_gettop(L) != 2 || lua_isnoneornil(L, 2)) { @@ -581,7 +581,7 @@ static int hlua_queue_push(lua_State *L) MT_LIST_APPEND(&queue->list, &item->list); /* notify tasks waiting on queue:pop_wait() (if any) */ - mt_list_for_each_entry_safe(waiter, &queue->wait_tasks, entry, elt1, elt2) { + MT_LIST_FOR_EACH_ENTRY_SAFE(waiter, &queue->wait_tasks, entry, back) { task_wakeup(waiter->task, TASK_WOKEN_MSG); } diff --git a/src/quic_sock.c b/src/quic_sock.c index 1cbb71f9e2..d6a77a9e5b 100644 --- a/src/quic_sock.c +++ b/src/quic_sock.c @@ -939,15 +939,15 @@ void quic_accept_push_qc(struct quic_conn *qc) struct task *quic_accept_run(struct task *t, void *ctx, unsigned int i) { struct li_per_thread *lthr; - struct mt_list *elt1, elt2; + struct mt_list back; struct quic_accept_queue *queue = &quic_accept_queues[tid]; - mt_list_for_each_entry_safe(lthr, &queue->listeners, quic_accept.list, elt1, elt2) { + MT_LIST_FOR_EACH_ENTRY_SAFE(lthr, &queue->listeners, quic_accept.list, back) { listener_accept(lthr->li); if (!MT_LIST_ISEMPTY(<hr->quic_accept.conns)) tasklet_wakeup((struct tasklet*)t); else - MT_LIST_DELETE_SAFE(elt1); + lthr = NULL; /* delete it */ } return NULL; diff --git a/src/server.c b/src/server.c index 3673340d15..2d5713c0d5 100644 --- a/src/server.c +++ b/src/server.c @@ -1586,11 +1586,11 @@ static int srv_parse_weight(char **args, int *cur_arg, struct proxy *px, struct void srv_shutdown_streams(struct server *srv, int why) { struct stream *stream; - struct mt_list *elt1, elt2; + struct mt_list back; int thr; for (thr = 0; thr < global.nbthread; thr++) - mt_list_for_each_entry_safe(stream, &srv->per_thr[thr].streams, by_srv, elt1, elt2) + MT_LIST_FOR_EACH_ENTRY_SAFE(stream, &srv->per_thr[thr].streams, by_srv, back) if (stream->srv_conn == srv) stream_shutdown(stream, why); } diff --git a/tests/unit/test-list.c b/tests/unit/test-list.c index 9e6ac38386..ff97b6f36e 100644 --- a/tests/unit/test-list.c +++ b/tests/unit/test-list.c @@ -2,11 +2,10 @@ #include #include #define USE_THREAD -#include +#include -/* Stress test the mt_lists. - * Compile from the haproxy directory with : - * cc -I../../include test-list.c -pthread -O2 -o test-list +/* Stress test for mt_lists. Compile this way: + * cc -O2 -o test-list test-list.c -I../include -pthread * The only argument it takes is the number of threads to be used. * ./test-list 4 */ @@ -19,17 +18,33 @@ struct pouet_lol { struct mt_list list_elt; }; +/* Fixed RNG sequence to ease reproduction of measurements (will be offset by + * the thread number). + */ +__thread uint32_t rnd32_state = 2463534242U; + +/* Xorshift RNG from http://www.jstatsoft.org/v08/i14/paper */ +static inline uint32_t rnd32() +{ + rnd32_state ^= rnd32_state << 13; + rnd32_state ^= rnd32_state >> 17; + rnd32_state ^= rnd32_state << 5; + return rnd32_state; +} + void *thread(void *pouet) { struct pouet_lol *lol; - struct mt_list *elt1, elt2; + struct mt_list elt2; tid = (uintptr_t)pouet; int i = 0; + rnd32_state += tid; + for (int i = 0; i < MAX_ACTION; i++) { struct pouet_lol *lol; - struct mt_list *elt1, elt2; - switch (random() % 4) { + struct mt_list elt2; + switch (rnd32() % 4) { case 0: lol = malloc(sizeof(*lol)); MT_LIST_INIT(&lol->list_elt); @@ -47,15 +62,12 @@ void *thread(void *pouet) free(lol); break; case 3: - - mt_list_for_each_entry_safe(lol, &pouet_list, list_elt, elt1, elt2) - -{ - if (random() % 2) { - MT_LIST_DELETE_SAFE(elt1); + MT_LIST_FOR_EACH_ENTRY_SAFE(lol, &pouet_list, list_elt, elt2) { + if (rnd32() % 2) { free(lol); + lol = NULL; } - if (random() % 2) { + if (rnd32() % 2) { break; } } @@ -63,6 +75,8 @@ void *thread(void *pouet) default: break; } + if ((i) / (MAX_ACTION/10) != (i+1) / (MAX_ACTION/10)) + printf("%u: %d\n", tid, i+1); } }