mirror of
http://git.haproxy.org/git/haproxy.git/
synced 2025-02-17 11:06:54 +00:00
BUG/MEDIUM: stick-table: do not leave entries in end of window during purge
At some moments expired stick table records stop being removed. This happens when the internal time wraps around the 32-bit limit, or every 49.7 days. What precisely happens is that some elements that are collected close to the end of the time window (2^32 - table's "expire" setting) might have been updated and will be requeued further, at the beginning of the next window. Here, three bad situations happen: - the incorrect integer-based comparison that is not aware of wrapping will result in the scan to restart from the freshly requeued element, skipping all those at the end of the window. The net effect of this is that at each wakeup of the expiration task, only one element from the end of the window will be expired, and other ones will remain there for a very long time, especially if they have to wait for all the predecessors to be picked one at a time after slow wakeups due to a long expiration ; this is what was observed in issue #2034 making the table fill up and appear as not expiring at all, and it seems that issue #2024 reports the same problem at the same moment (since such issues happen for everyone roughly at the same time when the clock doesn't drift too much). - the elements that were placed at the beginning of the next window are skipped as well for as long as there are refreshed entries at the end of the previous window, so these ones participate to filling the table as well. This is cause by the restart from the current, updated node that is generally placed after most other less recently updated elements. - once the last element at the end of the window is picked, suddenly there is a large amount of expired entries at the beginning of the next window that all have to be requeued. If the expiration delay is large, the number can be big and it can take a long time, which can very likely explain the periodic crashes reported in issue #2025. Limiting the batch size as done in commitdfe79251d
("BUG/MEDIUM: stick-table: limit the time spent purging old entries") would make sense for process_table_expire() as well. This patch addresses the incorrect tree scan algorithm to make sure that: - there's always a next element to compare against, even when dealing with the last one in the tree, the first one must be used ; - time comparisons used to decide whether to restart from the current element use tick_is_lt() as it is the only case where we know the current element will be placed before any other one (since the tree respects insertion ordering for duplicates) In order to reproduce the issue, it was found that injecting traffic on a random key that spans over half of the size of a table whose expiration is set to 15s while the date is going to wrap in 20s does exhibit an increase of the table's size 5s after startup, when entries start to be pushed to the next window. It's more effective when a second load generator constantly hammers a same key to be certain that none of them is ready to expire. This doesn't happen anymore after this patch. This fix needs to be backported to all stable versions. The bug has been there for as long as the stick tables were introduced in 1.4-dev7 with commit3bd697e07
("[MEDIUM] Add stick table (persistence) management functions and types"). A cleanup could consists in deduplicating that code by having process_table_expire() call __stktable_trash_oldest(), with that one improved to support an optional time check.
This commit is contained in:
parent
d85227fca2
commit
593802128c
@ -257,7 +257,18 @@ int __stktable_trash_oldest(struct stktable *t, int to_batch)
|
||||
ts->exp.key = ts->expire;
|
||||
eb32_insert(&t->exps, &ts->exp);
|
||||
|
||||
if (!eb || eb->key > ts->exp.key)
|
||||
/* the update might have jumped beyond the next element,
|
||||
* possibly causing a wrapping. We need to check whether
|
||||
* the next element should be used instead. If the next
|
||||
* element doesn't exist it means we're on the right
|
||||
* side and have to check the first one then. If it
|
||||
* exists and is closer, we must use it, otherwise we
|
||||
* use the current one.
|
||||
*/
|
||||
if (!eb)
|
||||
eb = eb32_first(&t->exps);
|
||||
|
||||
if (!eb || tick_is_lt(ts->exp.key, eb->key))
|
||||
eb = &ts->exp;
|
||||
|
||||
continue;
|
||||
@ -699,7 +710,18 @@ struct task *process_table_expire(struct task *task, void *context, unsigned int
|
||||
ts->exp.key = ts->expire;
|
||||
eb32_insert(&t->exps, &ts->exp);
|
||||
|
||||
if (!eb || eb->key > ts->exp.key)
|
||||
/* the update might have jumped beyond the next element,
|
||||
* possibly causing a wrapping. We need to check whether
|
||||
* the next element should be used instead. If the next
|
||||
* element doesn't exist it means we're on the right
|
||||
* side and have to check the first one then. If it
|
||||
* exists and is closer, we must use it, otherwise we
|
||||
* use the current one.
|
||||
*/
|
||||
if (!eb)
|
||||
eb = eb32_first(&t->exps);
|
||||
|
||||
if (!eb || tick_is_lt(ts->exp.key, eb->key))
|
||||
eb = &ts->exp;
|
||||
continue;
|
||||
}
|
||||
|
Loading…
Reference in New Issue
Block a user