BUG/MAJOR: raw_sock: must check error code on hangup

In raw_sock, we already check for FD_POLL_HUP after a short recv()
to avoid a useless syscall and detect the end of stream. However,
we fail to check for FD_POLL_ERR here, which causes major issues
as some errors might be delivered and ignored if they are delivered
at the same time as a HUP, and there is no data to send to detect
them on the other direction.

Since the connections flags do not have the CO_FL_ERROR flag, the
polling is not disabled on the socket and the pollers immediately
call the conn_fd_handler() again, resulting in CPU spikes for as
long as the timeouts allow them.

Note that this patch alone fixes the issue but a few patches will
follow to strengthen this fragile area.

Big thanks to Bryan Berry who reported the issue with significant
amounts of detailed traces that helped rule out many other initially
suspected causes and to finally reproduce the issue in the lab.
This commit is contained in:
Willy Tarreau 2012-12-07 00:01:33 +01:00
parent ee2663b1cd
commit debdc4b657

View File

@ -282,6 +282,16 @@ static int raw_sock_to_buf(struct connection *conn, struct buffer *buf, int coun
read0:
conn_sock_read0(conn);
conn->flags &= ~CO_FL_WAIT_L4_CONN;
/* Now a final check for a possible asynchronous low-level error
* report. This can happen when a connection receives a reset
* after a shutdown, both POLL_HUP and POLL_ERR are queued, and
* we might have come from there by just checking POLL_HUP instead
* of recv()'s return value 0, so we have no way to tell there was
* an error without checking.
*/
if (unlikely(fdtab[conn->t.sock.fd].ev & FD_POLL_ERR))
conn->flags |= CO_FL_ERROR;
return done;
}