haproxy public development tree
Go to file
Willy Tarreau dfe79251da BUG/MEDIUM: stick-table: limit the time spent purging old entries
An interesting case was reported with threads and moderately sized
stick-tables. Sometimes the watchdog would trigger during the purge.
It turns out that the stick tables were sized in the 10s of K entries
which is the order of magnitude of the possible number of connections,
and that threads were used over distinct NUMA nodes. While at first
glance nothing looks problematic there, actually there is a risk that
a thread trying to purge the table faces 100% of entries still in use
by a connection with (ts->ref_cnt > 0), and ends up scanning the whole
table, while other threads on the other NUMA node are causing the
cache lines to bounce back and forth and considerably slow down its
progress to the point of possibly spending hundreds of milliseconds
there, multiplied by the number of queued threads all failing on the
same point.

Interestingly, smaller tables would not trigger it because the scan
would be faster, and larger ones would not trigger it because plenty
of entries would be idle!

The most efficient solution is to increase the table size to be large
enough for this never to happen, but this is not reliable. We could
have a parallel list of idle entries but that would significantly
increase the storage and processing cost only to improve a few rare
corner cases.

This patch takes a more pragmatic approach, it considers that it will
not visit more than twice the number of nodes to be deleted, which
means that it accepts to fail up to 50% of the time. Given that very
small batches are programmed each time (1/256 of the table size), this
means the operation will finish quickly (128 times faster than now),
and will reduce the inter-thread contention. If this needs to be
reconsidered, it will probably mean that the batch size needs to be
fixed differently.

This needs to be backported to stable releases which extensively use
threads, typically 2.0.

Kudos to Nenad Merdanovic for figuring the root cause triggering this!
2020-11-03 18:02:42 +01:00
.github CI: github actions: update h2spec to 2.6.0 2020-10-27 13:13:23 +01:00
contrib CONTRIB: release-estimator: Add release estimating tool 2020-10-24 12:27:17 +02:00
doc [RELEASE] Released version 2.3-dev9 2020-10-31 13:17:06 +01:00
examples CLEANUP: assorted typo fixes in the code and comments 2020-06-26 11:27:28 +02:00
include MINOR: debug: don't count free(NULL) in memstats 2020-11-03 16:46:48 +01:00
reg-tests MINOR: cache: Add Expires header value parsing 2020-10-30 11:08:38 +01:00
scripts CI: travis-ci: replace not defined SSL_LIB, SSL_INC for BotringSSL builds 2020-10-11 21:12:33 +02:00
src BUG/MEDIUM: stick-table: limit the time spent purging old entries 2020-11-03 18:02:42 +01:00
tests MEDIUM: config: remove the deprecated and dangerous global "debug" directive 2020-10-09 19:18:45 +02:00
.cirrus.yml CI: cirrus-ci: exclude slow reg-tests 2020-07-04 06:58:14 +02:00
.gitattributes MINOR: Commit .gitattributes 2020-09-05 16:21:59 +02:00
.gitignore CLEANUP: Update .gitignore 2020-09-12 13:11:24 +02:00
.travis.yml CI: travis-ci: switch to Ubuntu 20.04 2020-10-24 11:31:56 +02:00
BRANCHES DOC: assorted typo fixes in the documentation 2020-03-09 14:45:58 +01:00
CHANGELOG [RELEASE] Released version 2.3-dev9 2020-10-31 13:17:06 +01:00
CONTRIBUTING DOC: Use gender neutral language 2020-07-26 22:35:43 +02:00
INSTALL BUILD: makefile: Update feature flags for NetBSD 2020-10-09 09:53:56 +02:00
LICENSE
MAINTAINERS REORG: include: split hathreads into haproxy/thread.h and haproxy/thread-t.h 2020-06-11 10:18:56 +02:00
Makefile BUILD: makefile: add entries to build common debugging tools 2020-10-22 05:17:08 +02:00
README DOC: create a BRANCHES file to explain the life cycle 2019-06-15 22:00:14 +02:00
ROADMAP
SUBVERS
VERDATE [RELEASE] Released version 2.3-dev9 2020-10-31 13:17:06 +01:00
VERSION [RELEASE] Released version 2.3-dev9 2020-10-31 13:17:06 +01:00

The HAProxy documentation has been split into a number of different files for
ease of use.

Please refer to the following files depending on what you're looking for :

  - INSTALL for instructions on how to build and install HAProxy
  - BRANCHES to understand the project's life cycle and what version to use
  - LICENSE for the project's license
  - CONTRIBUTING for the process to follow to submit contributions

The more detailed documentation is located into the doc/ directory :

  - doc/intro.txt for a quick introduction on HAProxy
  - doc/configuration.txt for the configuration's reference manual
  - doc/lua.txt for the Lua's reference manual
  - doc/SPOE.txt for how to use the SPOE engine
  - doc/network-namespaces.txt for how to use network namespaces under Linux
  - doc/management.txt for the management guide
  - doc/regression-testing.txt for how to use the regression testing suite
  - doc/peers.txt for the peers protocol reference
  - doc/coding-style.txt for how to adopt HAProxy's coding style
  - doc/internals for developer-specific documentation (not all up to date)