BUG/MAJOR: raw_sock: must check error code on hangup

In raw_sock, we already check for FD_POLL_HUP after a short recv() to avoid a useless syscall and detect the end of stream. However, we fail to check for FD_POLL_ERR here, which causes major issues as some errors might be delivered and ignored if they are delivered at the same time as a HUP, and there is no data to send to detect them on the other direction. Since the connections flags do not have the CO_FL_ERROR flag, the polling is not disabled on the socket and the pollers immediately call the conn_fd_handler() again, resulting in CPU spikes for as long as the timeouts allow them. Note that this patch alone fixes the issue but a few patches will follow to strengthen this fragile area. Big thanks to Bryan Berry who reported the issue with significant amounts of detailed traces that helped rule out many other initially suspected causes and to finally reproduce the issue in the lab.
2024-12-14 15:34:35 +00:00 · 2012-12-07 00:01:33 +01:00 · 2012-12-07 00:01:33 +01:00 · debdc4b657
commit debdc4b657
parent ee2663b1cd
1 changed files with 10 additions and 0 deletions
--- a/src/raw_sock.c
+++ b/src/raw_sock.c
@ -282,6 +282,16 @@ static int raw_sock_to_buf(struct connection *conn, struct buffer *buf, int coun
 read0:
 	conn_sock_read0(conn);
 	conn->flags &= ~CO_FL_WAIT_L4_CONN;
+
+	/* Now a final check for a possible asynchronous low-level error
+	 * report. This can happen when a connection receives a reset
+	 * after a shutdown, both POLL_HUP and POLL_ERR are queued, and
+	 * we might have come from there by just checking POLL_HUP instead
+	 * of recv()'s return value 0, so we have no way to tell there was
+	 * an error without checking.
+	 */
+	if (unlikely(fdtab[conn->t.sock.fd].ev & FD_POLL_ERR))
+		conn->flags |= CO_FL_ERROR;
 	return done;
 }