BUG/MEDIUM: listener/thread: fix a race when pausing a listener

There exists a race in the listener code where a thread might disable
receipt on a listener's FD then turn it to LI_PAUSED while at the same
time another one faces EAGAIN on accept() and enables it again via
fd_cant_recv(). The result is that the FD is in LI_PAUSED state with
its polling still enabled. listener_accept() does not do anything then
and doesn't disable the FD either, resulting in a thread eating all the
CPU as reported in issue #358. A solution would be to take the listener's
lock to perform the fd_cant_recv() call and do it only if the FD is still
in LI_READY state, but this would be totally overkill while in practice
the issue only happens during shutdown.

Instead what is done here is that when leaving we recheck the state and
disable polling if the listener is not in LI_READY state, which never
happens except when being limited. In the worst case there could be one
extra check per thread for the time required to converge, which is
absolutely nothing.

This fix was successfully tested, and should be backported to all
versions using the lock-free listeners, which means all those containing
commit 3f0d02bb ("MAJOR: listener: do not hold the listener lock in
listener_accept()"), hence 2.1, 2.0, 1.9.7+, 1.8.20+.
This commit is contained in:
Willy Tarreau 2019-12-05 07:40:32 +01:00
parent 920b035238
commit 4c044e274c

View File

@ -1058,6 +1058,9 @@ void listener_accept(int fd)
(!p->fe_sps_lim || freq_ctr_remain(&p->fe_sess_per_sec, p->fe_sps_lim, 0) > 0))
dequeue_all_listeners(&p->listener_queue);
}
if (l->state != LI_READY)
fd_stop_recv(l->fd);
}
/* Notify the listener that a connection initiated from it was released. This