haproxy public development tree
Go to file
Willy Tarreau 4d6c594998 BUG/MEDIUM: task: close a possible data race condition on a tasklet's list link
In issue #958 Ashley Penney reported intermittent crashes on AWS's ARM
nodes which would not happen on x86 nodes. After investigation it turned
out that the Neoverse N1 CPU cores used in the Graviton2 CPU are much
more aggressive than the usual Cortex A53/A72/A55 or any x86 regarding
memory ordering.

The issue that was triggered there is that if a tasklet_wakeup() call
is made on a tasklet scheduled to run on a foreign thread and that
tasklet is just being dequeued to be processed, there can be a race at
two places:
  - if MT_LIST_TRY_ADDQ() happens between MT_LIST_BEHEAD() and
    LIST_SPLICE_END_DETACHED() if the tasklet is alone in the list,
    because the emptiness tests matches ;

  - if MT_LIST_TRY_ADDQ() happens during LIST_DEL_INIT() in
    run_tasks_from_lists(), then depending on how LIST_DEL_INIT() ends
    up being implemented, it may even corrupt the adjacent nodes while
    they're being reused for the in-tree storage.

This issue was introduced in 2.2 when support for waking up remote
tasklets was added. Initially the attachment of a tasklet to a list
was enough to know its status and this used to be stable information.
Now it's not sufficient to rely on this anymore, thus we need to use
a different information.

This patch solves this by adding a new task flag, TASK_IN_LIST, which
is atomically set before attaching a tasklet to a list, and is only
removed after the tasklet is detached from a list. It is checked
by tasklet_wakeup_on() so that it may only be done while the tasklet
is out of any list, and is cleared during the state switch when calling
the tasklet. Note that the flag is not set for pure tasks as it's not
needed.

However this introduces a new special case: the function
tasklet_remove_from_tasklet_list() needs to keep both states in sync
and cannot check both the state and the attachment to a list at the
same time. This function is already limited to being used by the thread
owning the tasklet, so in this case the test remains reliable. However,
just like its predecessors, this function is wrong by design and it
should probably be replaced with a stricter one, a lazy one, or be
totally removed (it's only used in checks to avoid calling a possibly
scheduled event, and when freeing a tasklet). Regardless, for now the
function exists so the flag is removed only if the deletion could be
done, which covers all cases we're interested in regarding the insertion.
This removal is safe against a concurrent tasklet_wakeup_on() since
MT_LIST_DEL() guarantees the atomic test, and will ultimately clear
the flag only if the task could be deleted, so the flag will always
reflect the last state.

This should be carefully be backported as far as 2.2 after some
observation period. This patch depends on previous patch
"MINOR: task: remove __tasklet_remove_from_tasklet_list()".
2020-11-30 18:17:59 +01:00
.github CI: github actions: enable 51degrees feature 2020-11-26 19:08:15 +01:00
contrib CONTRIB: release-estimator: Add release estimating tool 2020-10-24 12:27:17 +02:00
doc DOC: config: Move req.hdrs and req.hdrs_bin in L7 samples fetches section 2020-11-27 10:30:23 +01:00
examples CLEANUP: assorted typo fixes in the code and comments 2020-06-26 11:27:28 +02:00
include BUG/MEDIUM: task: close a possible data race condition on a tasklet's list link 2020-11-30 18:17:59 +01:00
reg-tests BUG/MAJOR: tcpcheck: Allocate input and output buffers from the buffer pool 2020-11-27 10:29:41 +01:00
scripts CI: travis-ci: replace not defined SSL_LIB, SSL_INC for BotringSSL builds 2020-10-11 21:12:33 +02:00
src BUG/MEDIUM: task: close a possible data race condition on a tasklet's list link 2020-11-30 18:17:59 +01:00
tests MEDIUM: config: remove the deprecated and dangerous global "debug" directive 2020-10-09 19:18:45 +02:00
.cirrus.yml CI: cirrus-ci: exclude slow reg-tests 2020-07-04 06:58:14 +02:00
.gitattributes MINOR: Commit .gitattributes 2020-09-05 16:21:59 +02:00
.gitignore CLEANUP: Update .gitignore 2020-09-12 13:11:24 +02:00
.travis.yml CI: travis-ci: remove builds migrated to GH actions 2020-11-21 05:40:27 +01:00
BRANCHES DOC: assorted typo fixes in the documentation 2020-03-09 14:45:58 +01:00
CHANGELOG [RELEASE] Released version 2.4-dev1 2020-11-21 16:00:40 +01:00
CONTRIBUTING DOC: Use gender neutral language 2020-07-26 22:35:43 +02:00
INSTALL DOC: mention in INSTALL that it's development again 2020-11-05 17:19:13 +01:00
LICENSE
MAINTAINERS REORG: include: split hathreads into haproxy/thread.h and haproxy/thread-t.h 2020-06-11 10:18:56 +02:00
Makefile BUILD: Show the value of DEBUG= in haproxy -vv 2020-11-21 18:27:33 +01:00
README DOC: create a BRANCHES file to explain the life cycle 2019-06-15 22:00:14 +02:00
ROADMAP DOC: update the outdated ROADMAP file 2019-06-15 21:59:54 +02:00
SUBVERS
VERDATE [RELEASE] Released version 2.4-dev1 2020-11-21 16:00:40 +01:00
VERSION [RELEASE] Released version 2.4-dev1 2020-11-21 16:00:40 +01:00

The HAProxy documentation has been split into a number of different files for
ease of use.

Please refer to the following files depending on what you're looking for :

  - INSTALL for instructions on how to build and install HAProxy
  - BRANCHES to understand the project's life cycle and what version to use
  - LICENSE for the project's license
  - CONTRIBUTING for the process to follow to submit contributions

The more detailed documentation is located into the doc/ directory :

  - doc/intro.txt for a quick introduction on HAProxy
  - doc/configuration.txt for the configuration's reference manual
  - doc/lua.txt for the Lua's reference manual
  - doc/SPOE.txt for how to use the SPOE engine
  - doc/network-namespaces.txt for how to use network namespaces under Linux
  - doc/management.txt for the management guide
  - doc/regression-testing.txt for how to use the regression testing suite
  - doc/peers.txt for the peers protocol reference
  - doc/coding-style.txt for how to adopt HAProxy's coding style
  - doc/internals for developer-specific documentation (not all up to date)