BUG/MEDIUM: stick-table: limit the time spent purging old entries

An interesting case was reported with threads and moderately sized stick-tables. Sometimes the watchdog would trigger during the purge. It turns out that the stick tables were sized in the 10s of K entries which is the order of magnitude of the possible number of connections, and that threads were used over distinct NUMA nodes. While at first glance nothing looks problematic there, actually there is a risk that a thread trying to purge the table faces 100% of entries still in use by a connection with (ts->ref_cnt > 0), and ends up scanning the whole table, while other threads on the other NUMA node are causing the cache lines to bounce back and forth and considerably slow down its progress to the point of possibly spending hundreds of milliseconds there, multiplied by the number of queued threads all failing on the same point. Interestingly, smaller tables would not trigger it because the scan would be faster, and larger ones would not trigger it because plenty of entries would be idle! The most efficient solution is to increase the table size to be large enough for this never to happen, but this is not reliable. We could have a parallel list of idle entries but that would significantly increase the storage and processing cost only to improve a few rare corner cases. This patch takes a more pragmatic approach, it considers that it will not visit more than twice the number of nodes to be deleted, which means that it accepts to fail up to 50% of the time. Given that very small batches are programmed each time (1/256 of the table size), this means the operation will finish quickly (128 times faster than now), and will reduce the inter-thread contention. If this needs to be reconsidered, it will probably mean that the batch size needs to be fixed differently. This needs to be backported to stable releases which extensively use threads, typically 2.0. Kudos to Nenad Merdanovic for figuring the root cause triggering this!
2024-12-18 01:14:38 +00:00 · 2020-11-03 17:47:41 +01:00 · 2020-11-03 17:47:41 +01:00 · dfe79251da
commit dfe79251da
parent e6ee820c07
1 changed files with 7 additions and 1 deletions
--- a/src/stick_table.c
+++ b/src/stick_table.c
@ -165,12 +165,15 @@ static struct stksess *__stksess_init(struct stktable *t, struct stksess * ts)

 /*
 * Trash oldest <to_batch> sticky sessions from table <t>
- * Returns number of trashed sticky sessions.
+ * Returns number of trashed sticky sessions. It may actually trash less
+ * than expected if finding these requires too long a search time (e.g.
+ * most of them have ts->ref_cnt>0).
 */
 int __stktable_trash_oldest(struct stktable *t, int to_batch)
 {
 	struct stksess *ts;
 	struct eb32_node *eb;
+	int max_search = to_batch * 2; // no more than 50% misses
 	int batched = 0;
 	int looped = 0;

@ -192,6 +195,9 @@ int __stktable_trash_oldest(struct stktable *t, int to_batch)
 				break;
 		}

+		if (--max_search < 0)
+			break;
+
 		/* timer looks expired, detach it from the queue */
 		ts = eb32_entry(eb, struct stksess, exp);
 		eb = eb32_next(eb);