BUG/MEDIUM: checks: ignore late resets after valid responses

Reinout Verkerk from Trilex reported an issue with servers recently
flapping after an haproxy upgrade. Haproxy checks a simple agent
returning an HTTP response. The issue is that if the request packet
is lost but the simple agent responds before reading the HTTP request
and closes, the server will emit a TCP RST once the request finally
reaches it.

The way checks have been ported to use connections makes the error
flag show up as a failure after the success, reporting a stupid case
where the server is said to be down with a correct response.

In order to fix this, let's ignore the connection's error flag if a
successful check has already been reported. Reinout could verify that
a patched server did not exhibit the problem anymore.
This commit is contained in:
Willy Tarreau 2012-12-30 01:44:24 +01:00
parent 9568d7108f
commit c5c61fcf45

View File

@ -1185,7 +1185,8 @@ static int wake_srv_chk(struct connection *conn)
if (unlikely(conn->flags & CO_FL_ERROR)) {
/* Note that we might as well have been woken up by a handshake handler */
s->result |= SRV_CHK_FAILED;
if (s->result == SRV_CHK_UNKNOWN)
s->result |= SRV_CHK_FAILED;
__conn_data_stop_both(conn);
task_wakeup(s->check.task, TASK_WOKEN_IO);
}