BUG/MEDIUM: checks: mark the check as stopped after a connect error

Health checks currently still use the connection's fd to know whether
a check is running (this needs to change). When a health check
immediately fails during connect() because of a lack of local resource
(eg: port), we failed to unset the fd, so each time the process_chk
woken up after such an error, it believed a check was still running
and used to close the fd again instead of starting a new check. This
could result in other connections being closed because they were
assigned the same fd value.

The bug is only marked medium because when this happens, the system
is already in a bad state.

A comment was added above tcp_connect_server() to clarify that the
fd is *not* valid on error.
This commit is contained in:
Willy Tarreau 2012-11-23 08:51:32 +01:00
parent ad15d127a7
commit 6b0a850503
2 changed files with 5 additions and 0 deletions

View File

@ -1359,6 +1359,8 @@ static struct task *process_chk(struct task *t)
}
/* here, we have seen a failure */
conn->t.sock.fd = -1; /* report that no check is running anymore */
if (s->health > s->rise) {
s->health--; /* still good */
s->counters.failed_checks++;

View File

@ -233,6 +233,9 @@ int tcp_bind_socket(int fd, int flags, struct sockaddr_storage *local, struct so
* - SN_ERR_RESOURCE if a system resource is lacking (eg: fd limits, ports, ...)
* - SN_ERR_INTERNAL for any other purely internal errors
* Additionnally, in the case of SN_ERR_RESOURCE, an emergency log will be emitted.
*
* The connection's fd is inserted only when SN_ERR_NONE is returned, otherwise
* it's invalid and the caller has nothing to do.
*/
int tcp_connect_server(struct connection *conn, int data)