mirror of
http://git.haproxy.org/git/haproxy.git/
synced 2024-12-16 16:34:42 +00:00
BUG/MEDIUM: ssl: check a connection's status before computing a handshake
As spotted in issue #822, we're having a problem with error detection in the SSL layer. The problem is that on an overwhelmed machine, accepted connections can start to pile up, each of them requiring a slow handshake, and during all this time if the client aborts, the handshake will still be calculated. The error controls are properly placed, it's just that the SSL layer reads records exactly of the advertised size, without having the ability to encounter a pending connection error. As such if injecting many TLS connections to a listener with a huge backlog, it's fairly possible to meet this situation: 12:50:48.236056 accept4(8, {sa_family=AF_INET, sin_port=htons(62794), sin_addr=inet_addr("127.0.0.1")}, [128->16], SOCK_NONBLOCK) = 1109 12:50:48.236071 setsockopt(1109, SOL_TCP, TCP_NODELAY, [1], 4) = 0 (process other connections' handshakes) 12:50:48.257270 getsockopt(1109, SOL_SOCKET, SO_ERROR, [ECONNRESET], [4]) = 0 (proof that error was detectable there but this code was added for the PoC) 12:50:48.257297 recvfrom(1109, "\26\3\1\2\0", 5, 0, NULL, NULL) = 5 12:50:48.257310 recvfrom(1109, "\1\0\1\3"..., 512, 0, NULL, NULL) = 512 (handshake calculation taking 700us) 12:50:48.258004 sendto(1109, "\26\3\3\0z"..., 1421, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EPIPE (Broken pipe) 12:50:48.258036 close(1109) = 0 The situation was amplified by the multi-queue accept code, as it resulted in many incoming connections to be accepted long before they could be handled. Prior to this they would have been accepted and the handshake immediately started, which would have resulted in most of the connections waiting in the the system's accept queue, and dying there when the client aborted, thus the error would have been detected before even trying to pass them to the handshake code. As a result, with a listener running on a very large backlog, it's possible to quickly accept tens of thousands of connections and waste time slowly running their handshakes while they get replaced by other ones. This patch adds an SO_ERROR check on the connection's FD before starting the handshake. This is not pretty as it requires to access the FD, but it does the job. Some improvements should be made over the long term so that the transport layers can report extra information with their ->rcv_buf() call, or at the very least, implement a ->get_conn_status() function to report various flags such as shutr, shutw, error at various stages, allowing an upper layer to inquire for the relevance of engaging into a long operation if it's known the connection is not usable anymore. An even simpler step could probably consist in implementing this in the control layer. This patch is simple enough to be backported as far as 2.0. Many thanks to @ngaugler for his numerous tests with detailed feedback.
This commit is contained in:
parent
8695ce0bae
commit
0630038e77
@ -5345,6 +5345,9 @@ static int ssl_sock_handshake(struct connection *conn, unsigned int flag)
|
||||
struct ssl_counters *counters_px = NULL;
|
||||
struct listener *li;
|
||||
struct server *srv;
|
||||
socklen_t lskerr;
|
||||
int skerr;
|
||||
|
||||
|
||||
if (!conn_ctrl_ready(conn))
|
||||
return 0;
|
||||
@ -5372,6 +5375,21 @@ static int ssl_sock_handshake(struct connection *conn, unsigned int flag)
|
||||
if (!conn->xprt_ctx)
|
||||
goto out_error;
|
||||
|
||||
/* don't start calculating a handshake on a dead connection */
|
||||
if (conn->flags & (CO_FL_ERROR | CO_FL_SOCK_RD_SH | CO_FL_SOCK_WR_SH))
|
||||
goto out_error;
|
||||
|
||||
/* FIXME/WT: for now we don't have a clear way to inspect the connection
|
||||
* status from the lower layers, so let's check the FD directly. Ideally
|
||||
* the xprt layers should provide some status indicating their knowledge
|
||||
* of shutdowns or error.
|
||||
*/
|
||||
skerr = 0;
|
||||
lskerr = sizeof(skerr);
|
||||
if ((getsockopt(conn->handle.fd, SOL_SOCKET, SO_ERROR, &skerr, &lskerr) < 0) ||
|
||||
skerr != 0)
|
||||
goto out_error;
|
||||
|
||||
#ifdef SSL_READ_EARLY_DATA_SUCCESS
|
||||
/*
|
||||
* Check if we have early data. If we do, we have to read them
|
||||
|
Loading…
Reference in New Issue
Block a user