BUG/MEDIUM: stick-table: limit the time spent purging old entries

An interesting case was reported with threads and moderately sized
stick-tables. Sometimes the watchdog would trigger during the purge.
It turns out that the stick tables were sized in the 10s of K entries
which is the order of magnitude of the possible number of connections,
and that threads were used over distinct NUMA nodes. While at first
glance nothing looks problematic there, actually there is a risk that
a thread trying to purge the table faces 100% of entries still in use
by a connection with (ts->ref_cnt > 0), and ends up scanning the whole
table, while other threads on the other NUMA node are causing the
cache lines to bounce back and forth and considerably slow down its
progress to the point of possibly spending hundreds of milliseconds
there, multiplied by the number of queued threads all failing on the
same point.

Interestingly, smaller tables would not trigger it because the scan
would be faster, and larger ones would not trigger it because plenty
of entries would be idle!

The most efficient solution is to increase the table size to be large
enough for this never to happen, but this is not reliable. We could
have a parallel list of idle entries but that would significantly
increase the storage and processing cost only to improve a few rare
corner cases.

This patch takes a more pragmatic approach, it considers that it will
not visit more than twice the number of nodes to be deleted, which
means that it accepts to fail up to 50% of the time. Given that very
small batches are programmed each time (1/256 of the table size), this
means the operation will finish quickly (128 times faster than now),
and will reduce the inter-thread contention. If this needs to be
reconsidered, it will probably mean that the batch size needs to be
fixed differently.

This needs to be backported to stable releases which extensively use
threads, typically 2.0.

Kudos to Nenad Merdanovic for figuring the root cause triggering this!
This commit is contained in:
Willy Tarreau 2020-11-03 17:47:41 +01:00
parent e6ee820c07
commit dfe79251da

View File

@ -165,12 +165,15 @@ static struct stksess *__stksess_init(struct stktable *t, struct stksess * ts)
/*
* Trash oldest <to_batch> sticky sessions from table <t>
* Returns number of trashed sticky sessions.
* Returns number of trashed sticky sessions. It may actually trash less
* than expected if finding these requires too long a search time (e.g.
* most of them have ts->ref_cnt>0).
*/
int __stktable_trash_oldest(struct stktable *t, int to_batch)
{
struct stksess *ts;
struct eb32_node *eb;
int max_search = to_batch * 2; // no more than 50% misses
int batched = 0;
int looped = 0;
@ -192,6 +195,9 @@ int __stktable_trash_oldest(struct stktable *t, int to_batch)
break;
}
if (--max_search < 0)
break;
/* timer looks expired, detach it from the queue */
ts = eb32_entry(eb, struct stksess, exp);
eb = eb32_next(eb);