BUG/MEDIUM: server: don't kill all idle conns when there are not enough

In srv_cleanup_idle_connections(), we compute how many idle connections
are in excess compared to the average need. But we may actually be missing
some, for example if a certain number were recently closed and the average
of used connections didn't change much since previous period. In this
case exceed_conn can become negative. There was no special case for this
in the code, and calculating the per-thread share of connections to kill
based on this value resulted in special value -1 to be passed to
srv_migrate_conns_to_remove(), which for this function means "kill all of
them", as used in srv_cleanup_connections() for example.

This causes large variations of idle connections counts on servers and
CPU spikes at the moment the cleanup task passes. These were quite more
visible with SSL as it costs CPU to close and re-establish these
connections, and it also takes time, reducing the reuse ratio, hence
increasing the amount of connections during reconnection.

In this patch we simply skip the killing loop when this condition is met.

No backport is needed, this is purely 2.2.
This commit is contained in:
Willy Tarreau 2020-07-02 19:05:30 +02:00
parent b39a3754d9
commit 18ed789ae2

View File

@ -5288,6 +5288,9 @@ struct task *srv_cleanup_idle_connections(struct task *task, void *context, unsi
srv->max_used_conns = srv->curr_used_conns;
if (exceed_conns <= 0)
goto remove;
/* check all threads starting with ours */
for (i = tid;;) {
int max_conn;