mirror of
http://git.haproxy.org/git/haproxy.git/
synced 2024-12-17 08:54:41 +00:00
BUG/MEDIUM: stream-int: completely detach connection on connect error
Tim Butler reported a troubling issue affecting all versions since 1.5. When a connection error occurs and a retry is performed on the same server, the server connection first goes into the turn-around state (SI_ST_TAR) for one second. During this time, the client may speak and try to push some data. The tests in place confirm that the stream interface is in a state <= SI_ST_EST and that a connection exists, so all ingredients are present to try to perform a send() to forward data. The send() cannot be performed since the connection's control layer is marked as not ready, but the polling flags are changed, and due to the remaining error flag present on the connection, the polling on the FD is disabled in both directions. But if this FD was reassigned to another connection in the mean time, it is this FD which is disabled, and it causes a timeout on another connection. A configuration allowing to reproduce the issue looks like this : listen test bind :8003 server s1 127.0.0.1:8001 # this one should be closed listen victim bind :8002 server s1 127.0.0.1:8000 # this one should respond slowly (~50ms) Two parallel injections should be run with short time-outs (100ms). After some time, some dead connections will appear in listener "victim" due to their I/Os being disabled by some of the failed transfers on "test" instance. These ones will only be flushed on time out. A dead connection looks like this : > show sess 0x7dcb70 0x7dcb70: [07/Aug/2016:08:58:40.120151] id=3771 proto=tcpv4 source=127.0.0.1:34682 flags=0xce, conn_retries=3, srv_conn=0x7da020, pend_pos=(nil) frontend=victim (id=3 mode=tcp), listener=? (id=1) addr=127.0.0.1:8002 backend=victim (id=3 mode=tcp) addr=127.0.0.1:37736 server=s1 (id=1) addr=127.0.0.1:8000 task=0x7dcaf8 (state=0x08 nice=0 calls=2 exp=<NEVER> age=30s) si[0]=0x7dcd68 (state=EST flags=0x08 endp0=CONN:0x7e2410 exp=<NEVER>, et=0x000) si[1]=0x7dcd88 (state=EST flags=0x18 endp1=CONN:0x7e0cd0 exp=<NEVER>, et=0x000) co0=0x7e2410 ctrl=tcpv4 xprt=RAW data=STRM target=LISTENER:0x7d9ea8 flags=0x0020b306 fd=122 fd.state=25 fd.cache=0 updt=0 co1=0x7e0cd0 ctrl=tcpv4 xprt=RAW data=STRM target=SERVER:0x7da020 flags=0x0020b306 fd=93 fd.state=20 fd.cache=0 updt=0 req=0x7dcb80 (f=0x848000 an=0x0 pipe=0 tofwd=-1 total=129) an_exp=<NEVER> rex=<NEVER> wex=<NEVER> buf=0x7893c0 data=0x7893d4 o=0 p=0 req.next=0 i=0 size=0 res=0x7dcbc0 (f=0x80008000 an=0x0 pipe=0 tofwd=-1 total=0) an_exp=<NEVER> rex=<NEVER> wex=<NEVER> buf=0x7893c0 data=0x7893d4 o=0 p=0 rsp.next=0 i=0 size=0 The solution against this issue is to completely detach the connection upon error instead of only performing a forced close. This fix should be backported to 1.6 and 1.5. Special thanks to Tim who did all the troubleshooting work and provided a lot of traces allowing to find the root cause of this problem.
This commit is contained in:
parent
8dfe863fbf
commit
568743a21f
@ -555,7 +555,6 @@ static int sess_update_st_con_tcp(struct stream *s)
|
||||
struct stream_interface *si = &s->si[1];
|
||||
struct channel *req = &s->req;
|
||||
struct channel *rep = &s->res;
|
||||
struct connection *srv_conn = __objt_conn(si->end);
|
||||
|
||||
/* If we got an error, or if nothing happened and the connection timed
|
||||
* out, we must give up. The CER state handler will take care of retry
|
||||
@ -575,7 +574,7 @@ static int sess_update_st_con_tcp(struct stream *s)
|
||||
si->exp = TICK_ETERNITY;
|
||||
si->state = SI_ST_CER;
|
||||
|
||||
conn_force_close(srv_conn);
|
||||
si_release_endpoint(si);
|
||||
|
||||
if (si->err_type)
|
||||
return 0;
|
||||
|
Loading…
Reference in New Issue
Block a user