BUG/MEDIUM: checks: ignore late resets after valid responses

Reinout Verkerk from Trilex reported an issue with servers recently flapping after an haproxy upgrade. Haproxy checks a simple agent returning an HTTP response. The issue is that if the request packet is lost but the simple agent responds before reading the HTTP request and closes, the server will emit a TCP RST once the request finally reaches it. The way checks have been ported to use connections makes the error flag show up as a failure after the success, reporting a stupid case where the server is said to be down with a correct response. In order to fix this, let's ignore the connection's error flag if a successful check has already been reported. Reinout could verify that a patched server did not exhibit the problem anymore.
2024-12-13 15:04:42 +00:00 · 2012-12-30 01:44:24 +01:00 · 2012-12-30 01:44:24 +01:00 · c5c61fcf45
commit c5c61fcf45
parent 9568d7108f
1 changed files with 2 additions and 1 deletions
--- a/src/checks.c
+++ b/src/checks.c
@ -1185,7 +1185,8 @@ static int wake_srv_chk(struct connection *conn)
 	if (unlikely(conn->flags & CO_FL_ERROR)) {
 		/* Note that we might as well have been woken up by a handshake handler */
-		s->result |= SRV_CHK_FAILED;
+		if (s->result == SRV_CHK_UNKNOWN)
 			s->result |= SRV_CHK_FAILED;
 		__conn_data_stop_both(conn);
 		task_wakeup(s->check.task, TASK_WOKEN_IO);
 	}