From b4734c2bd7981846142ecabe3490bbf356993b08 Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Tue, 9 Apr 2024 08:03:10 +0200 Subject: [PATCH] BUG/MINOR: sock: handle a weird condition with connect() As reported on github issue #2491, there's a very strange situation where epoll_wait() appears to be reported EPOLLERR only (and not IN/OUT/HUP etc as normally happens with EPOLLERR), and when connect() is called again to check the state of the ongoing connection, it returns EALREADY, basically saying "no news, please wait". This obviously triggers a wakeup loop. For now it has remained impossible to reproduce this issue outside of the reporter's environment, but that's definitely something that is impossible to get out from. The workaround here is to address the lowest level cause we can act on, which is to avoid returning to wait if EPOLLERR was returned. Indeed, in this case we know it will loop, so we must definitely take this one into account. We only do that after connect() asks us to wait, so that a properly established connection with a queued error at the end of an exchange will not be diverted and will be handled as usual. This should be backported to approximately all versions, at least as far as 2.4 according to the reporter who observed it there. Thanks to @donnyxray for their useful captures isolating the problem. --- src/sock.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/src/sock.c b/src/sock.c index 2d7d920c6..f1dfea404 100644 --- a/src/sock.c +++ b/src/sock.c @@ -818,6 +818,13 @@ int sock_conn_check(struct connection *conn) return 0; wait: + /* we may arrive here due to connect() misleadingly reporting EALREADY + * in some corner cases while the system disagrees and reports an error + * on the FD. + */ + if (fdtab[fd].state & FD_POLL_ERR) + goto out_error; + fd_cant_send(fd); fd_want_send(fd); return 0;