BUG/MEDIUM: listeners: always pause a listener on out-of-resource condition

A corner case was opened in the listener_accept() code by commit 3f0d02bbc2
("MAJOR: listener: do not hold the listener lock in listener_accept()"). The
issue is when one listener (or a group of) managed to eat all the proxy's or
all the process's maxconn, and another listener tries to accept a new socket.
This results in the atomic increment to detect the excess connection count
and immediately abort, without pausing the listener, thus the call is
immediately performed again. This doesn't happen when the test is run on a
single listener because this listener got limited when crossing the limit.
But with 2 or more listeners, we don't have this luxury.

The solution consists in limiting the listener as soon as we have to
decline accepting an incoming connection. This means that the listener
will not be marked full yet if it gets the exact connection count but
this is not a problem in practice since all other listeners will only be
marked full after their first attempt. Thus from now on, a listener is
only full once it has already failed taking an incoming connection.

This bug was definitely responsible for the unreproduceable occasional
reports of high CPU usage showing epoll_wait() returning immediately
without accepting an incoming connection, like in bug #129.

This fix must be backported to 1.9 and 1.8.
This commit is contained in:
Willy Tarreau 2019-11-15 10:20:07 +01:00
parent af7ea814f9
commit 93604edb65
1 changed files with 7 additions and 18 deletions

View File

@ -727,57 +727,46 @@ void listener_accept(int fd)
*/ */
do { do {
count = l->nbconn; count = l->nbconn;
if (l->maxconn && count >= l->maxconn) { if (unlikely(l->maxconn && count >= l->maxconn)) {
/* the listener was marked full or another /* the listener was marked full or another
* thread is going to do it. * thread is going to do it.
*/ */
next_conn = 0; next_conn = 0;
listener_full(l);
goto end; goto end;
} }
next_conn = count + 1; next_conn = count + 1;
} while (!_HA_ATOMIC_CAS(&l->nbconn, (int *)(&count), next_conn)); } while (!_HA_ATOMIC_CAS(&l->nbconn, (int *)(&count), next_conn));
if (l->maxconn && next_conn == l->maxconn) {
/* we filled it, mark it full */
listener_full(l);
}
if (p) { if (p) {
do { do {
count = p->feconn; count = p->feconn;
if (count >= p->maxconn) { if (unlikely(count >= p->maxconn)) {
/* the frontend was marked full or another /* the frontend was marked full or another
* thread is going to do it. * thread is going to do it.
*/ */
next_feconn = 0; next_feconn = 0;
limit_listener(l, &p->listener_queue);
goto end; goto end;
} }
next_feconn = count + 1; next_feconn = count + 1;
} while (!_HA_ATOMIC_CAS(&p->feconn, &count, next_feconn)); } while (!_HA_ATOMIC_CAS(&p->feconn, &count, next_feconn));
if (unlikely(next_feconn == p->maxconn)) {
/* we just filled it */
limit_listener(l, &p->listener_queue);
}
} }
if (!(l->options & LI_O_UNLIMITED)) { if (!(l->options & LI_O_UNLIMITED)) {
do { do {
count = actconn; count = actconn;
if (count >= global.maxconn) { if (unlikely(count >= global.maxconn)) {
/* the process was marked full or another /* the process was marked full or another
* thread is going to do it. * thread is going to do it.
*/ */
next_actconn = 0; next_actconn = 0;
limit_listener(l, &global_listener_queue);
task_schedule(global_listener_queue_task, tick_add(now_ms, 1000)); /* try again in 1 second */
goto end; goto end;
} }
next_actconn = count + 1; next_actconn = count + 1;
} while (!_HA_ATOMIC_CAS(&actconn, (int *)(&count), next_actconn)); } while (!_HA_ATOMIC_CAS(&actconn, (int *)(&count), next_actconn));
if (unlikely(next_actconn == global.maxconn)) {
limit_listener(l, &global_listener_queue);
task_schedule(global_listener_queue_task, tick_add(now_ms, 1000)); /* try again in 1 second */
}
} }
/* with sockpair@ we don't want to do an accept */ /* with sockpair@ we don't want to do an accept */