haproxy

mirror of http://git.haproxy.org/git/haproxy.git/ synced 2025-05-01 15:28:00 +00:00

Author	SHA1	Message	Date
Thierry FOURNIER	65f34c6367	MINOR: lua: txn: create class TXN associated with the transaction. This class of functions permit to access to all the functions associated with the transaction like http header, HAProxy internal fetches, etc ... This patch puts the skeleton of this class. The class will be enhanced later.	2015-02-28 23:12:34 +01:00
Thierry FOURNIER	bc4c1ac6ad	MEDIUM: http/tcp: permit to resume http and tcp custom actions Later, the processing of some actions needs to be interrupted and resumed later. This patch permit to resume the actions. The actions that needs to run with the resume mode are not yet avalaible. It will be soon with Lua patches. So the code added by this patch is untestable for the moment. The list of "tcp_exec_req_rules" cannot resme because is called by the unresumable function "accept_session".	2015-02-28 23:12:33 +01:00
Thierry FOURNIER	f41a809dc9	MINOR: sample: add private argument to the struct sample_fetch The add of this private argument is to prepare the integration of the lua fetchs.	2015-02-28 23:12:31 +01:00
Thierry FOURNIER	b83862dd74	MEDIUM: channel: wake up any request analyzer on response activity This behavior is already existing for the "WAIT_HTTP" analyzer, this patch just extends the system to any analyzer that would be waked up on response activity.	2015-02-28 23:12:31 +01:00
Thierry FOURNIER	2e05a8c742	MEDIUM: task: call session analyzers if the task is woken by a message. When a task used to receive a message from another one, its analysers were not called if there was no I/O activity.	2015-02-28 23:12:30 +01:00
Willy Tarreau	a24adf0795	MAJOR: session: only wake up as many sessions as available buffers permit We've already experimented with three wake up algorithms when releasing buffers : the first naive one used to wake up far too many sessions, causing many of them not to get any buffer. The second approach which was still in use prior to this patch consisted in waking up either 1 or 2 sessions depending on the number of FDs we had released. And this was still inaccurate. The third one tried to cover the accuracy issues of the second and took into consideration the number of FDs the sessions would be willing to use, but most of the time we ended up waking up too many of them for nothing, or deadlocking by lack of buffers. This patch completely removes the need to allocate two buffers at once. Instead it splits allocations into critical and non-critical ones and implements a reserve in the pool for this. The deadlock situation happens when all buffers are be allocated for requests pending in a maxconn-limited server queue, because then there's no more way to allocate buffers for responses, and these responses are critical to release the servers's connection in order to release the pending requests. In fact maxconn on a server creates a dependence between sessions and particularly between oldest session's responses and latest session's requests. Thus, it is mandatory to get a free buffer for a response in order to release a server connection which will permit to release a request buffer. Since we definitely have non-symmetrical buffers, we need to implement this logic in the buffer allocation mechanism. What this commit does is implement a reserve of buffers which can only be allocated for responses and that will never be allocated for requests. This is made possible by the requester indicating how much margin it wants to leave after the allocation succeeds. Thus it is a cooperative allocation mechanism : the requester (process_session() in general) prefers not to get a buffer in order to respect other's need for response buffers. The session management code always knows if a buffer will be used for requests or responses, so that is not difficult : - either there's an applet on the initiator side and we really need the request buffer (since currently the applet is called in the context of the session) - or we have a connection and we really need the response buffer (in order to support building and sending an error message back) This reserve ensures that we don't take all allocatable buffers for requests waiting in a queue. The downside is that all the extra buffers are really allocated to ensure they can be allocated. But with small values it is not an issue. With this change, we don't observe any more deadlocks even when running with maxconn 1 on a server under severely constrained memory conditions. The code becomes a bit tricky, it relies on the scheduler's run queue to estimate how many sessions are already expected to run so that it doesn't wake up everyone with too few resources. A better solution would probably consist in having two queues, one for urgent requests and one for normal requests. A failed allocation for a session dealing with an error, a connection event, or the need for a response (or request when there's an applet on the left) would go to the urgent request queue, while other requests would go to the other queue. Urgent requests would be served from 1 entry in the pool, while the regular ones would be served only according to the reserve. Despite not yet having this, it works remarkably well. This mechanism is quite efficient, we don't perform too many wake up calls anymore. For 1 million sessions elapsed during massive memory contention, we observe about 4.5M calls to process_session() compared to 4.0M without memory constraints. Previously we used to observe up to 16M calls, which rougly means 12M failures. During a test run under high memory constraints (limit enforced to 27 MB instead of the 58 MB normally needed), performance used to drop by 53% prior to this patch. Now with this patch instead it increases by about 1.5%. The best effect of this change is that by limiting the memory usage to about 2/3 to 3/4 of what is needed by default, it's possible to increase performance by up to about 18% mainly due to the fact that pools are reused more often and remain hot in the CPU cache (observed on regular HTTP traffic with 20k objects, buffers.limit = maxconn/10, buffers.reserve = limit/2). Below is an example of scenario which used to cause a deadlock previously : - connection is received - two buffers are allocated in process_session() then released - one is allocated when receiving an HTTP request - the second buffer is allocated then released in process_session() for request parsing then connection establishment. - poll() says we can send, so the request buffer is sent and released - process session gets notified that the connection is now established and allocates two buffers then releases them - all other sessions do the same till one cannot get the request buffer without hitting the margin - and now the server responds. stream_interface allocates the response buffer and manages to get it since it's higher priority being for a response. - but process_session() cannot allocate the request buffer anymore => We could end up with all buffers used by responses so that none may be allocated for a request in process_session(). When the applet processing leaves the session context, the test will have to be changed so that we always allocate a response buffer regardless of the left side (eg: H2->H1 gateway). A final improvement would consists in being able to only retry the failed I/O operation without waking up a task, but to date all experiments to achieve this have proven not to be reliable enough.	2014-12-24 23:47:33 +01:00
Willy Tarreau	10fc09e872	MAJOR: session: only allocate buffers when needed A session doesn't need buffers all the time, especially when they're empty. With this patch, we don't allocate buffers anymore when the session is initialized, we only allocate them in two cases : - during process_session() - during I/O operations During process_session(), we try hard to allocate both buffers at once so that we know for sure that a started operation can complete. Indeed, a previous version of this patch used to allocate one buffer at a time, but it can result in a deadlock when all buffers are allocated for requests for example, and there's no buffer left to emit error responses. Here, if any of the buffers cannot be allocated, the whole operation is cancelled and the session is added at the tail of the buffer wait queue. At the end of process_session(), a call to session_release_buffers() is done so that we can offer unused buffers to other sessions waiting for them. For I/O operations, we only need to allocate a buffer on the Rx path. For this, we only allocate a single buffer but ensure that at least two are available to avoid the deadlock situation. In case buffers are not available, SI_FL_WAIT_ROOM is set on the stream interface and the session is queued. Unused buffers resulting either from a successful send() or from an unused read buffer are offered to pending sessions during the ->wake() callback.	2014-12-24 23:47:33 +01:00
Willy Tarreau	bf883e0aa7	MAJOR: session: implement a wait-queue for sessions who need a buffer When a session_alloc_buffers() fails to allocate one or two buffers, it subscribes the session to buffer_wq, and waits for another session to release buffers. It's then removed from the queue and woken up with TASK_WAKE_RES, and can attempt its allocation again. We decide to try to wake as many waiters as we release buffers so that if we release 2 and two waiters need only once, they both have their chance. We must never come to the situation where we don't wake enough tasks up. It's common to release buffers after the completion of an I/O callback, which can happen even if the I/O could not be performed due to half a failure on memory allocation. In this situation, we don't want to move out of the wait queue the session that was just added, otherwise it will never get any buffer. Thus, we only force ourselves out of the queue when freeing the session. Note: at the moment, since session_alloc_buffers() is not used, no task is subscribed to the wait queue.	2014-12-24 23:47:33 +01:00
Willy Tarreau	656859d478	MEDIUM: session: implement a basic atomic buffer allocator This patch introduces session_alloc_recv_buffer(), session_alloc_buffers() and session_release_buffers() whose purpose will be to allocate missing buffers and release unneeded ones around the process_session() and during I/O operations. I/O callbacks only need a single buffer for recv operations and none for send. However we still want to ensure that we don't pick the last buffer. That's what session_alloc_recv_buffer() is for. This allocator is atomic in that it always ensures we can get 2 buffers or fails. Here, if any of the buffers is not ready and cannot be allocated, the operation is cancelled. The purpose is to guarantee that we don't enter into the deadlock where all buffers are allocated by the same size of all sessions. A queue will have to be implemented for failed allocations. For now they're just reported as failures.	2014-12-24 23:47:32 +01:00
Willy Tarreau	909e267be0	MINOR: session: group buffer allocations together We'll soon want to release buffers together upon failure so we need to allocate them after the channels. Let's change this now. There's no impact on the behaviour, only the error path is unrolled slightly differently. The same was done in peers.	2014-12-24 23:47:32 +01:00
Willy Tarreau	7dfca9daec	MINOR: buffer: only use b_free to release buffers We don't call pool_free2(pool2_buffers) anymore, we only call b_free() to do the job. This ensures that we can start to centralize the releasing of buffers.	2014-12-24 23:47:32 +01:00
Willy Tarreau	696a2910a0	MINOR: buffer: move buffer initialization after channel initialization It's not clean to initialize the buffer before the channel since it dereferences one pointer in the channel. Also we'll want to let the channel pre-initialize the buffer, so let's ensure that the channel is always initialized prior to the buffers.	2014-12-24 23:47:32 +01:00
Willy Tarreau	e583ea583a	MEDIUM: buffer: use b_alloc() to allocate and initialize a buffer b_alloc() now allocates a buffer and initializes it to the size specified in the pool minus the size of the struct buffer itself. This ensures that callers do not need to care about buffer details anymore. Also this never applies memory poisonning, which is slow and useless on buffers.	2014-12-24 23:47:32 +01:00
Willy Tarreau	474cf54a97	MINOR: buffer: reset a buffer in b_reset() and not channel_init() We'll soon need to be able to switch buffers without touching the channel, so let's move buffer initialization out of channel_init(). We had the same in compressoin.c.	2014-12-24 23:47:31 +01:00
Willy Tarreau	3b24641745	BUG/MAJOR: sessions: unlink session from list on out of memory Since embryonic sessions were introduced in 1.5-dev12 with commit `2542b53` ("MAJOR: session: introduce embryonic sessions"), a major bug remained present. If haproxy cannot allocate memory during session_complete() (for example, no more buffers), it will not unlink the new session from the sessions list. This will cause memory corruptions if the memory area from the session is reused for anything else, and may also cause bogus output on "show sess" on the CLI. This fix must be backported to 1.5.	2014-11-25 22:09:05 +01:00
KOVACS Krisztian	b3e54fe387	MAJOR: namespace: add Linux network namespace support This patch makes it possible to create binds and servers in separate namespaces. This can be used to proxy between multiple completely independent virtual networks (with possibly overlapping IP addresses) and a non-namespace-aware proxy implementation that supports the proxy protocol (v2). The setup is something like this: net1 on VLAN 1 (namespace 1) -\ net2 on VLAN 2 (namespace 2) -- haproxy ==== proxy (namespace 0) net3 on VLAN 3 (namespace 3) -/ The proxy is configured to make server connections through haproxy and sending the expected source/target addresses to haproxy using the proxy protocol. The network namespace setup on the haproxy node is something like this: = 8< = $ cat setup.sh ip netns add 1 ip link add link eth1 type vlan id 1 ip link set eth1.1 netns 1 ip netns exec 1 ip addr add 192.168.91.2/24 dev eth1.1 ip netns exec 1 ip link set eth1.$id up ... = 8< = = 8< = $ cat haproxy.cfg frontend clients bind 127.0.0.1:50022 namespace 1 transparent default_backend scb backend server mode tcp server server1 192.168.122.4:2222 namespace 2 send-proxy-v2 = 8< = A bind line creates the listener in the specified namespace, and connections originating from that listener also have their network namespace set to that of the listener. A server line either forces the connection to be made in a specified namespace or may use the namespace from the client-side connection if that was set. For more documentation please read the documentation included in the patch itself. Signed-off-by: KOVACS Tamas <ktamas@balabit.com> Signed-off-by: Sarkozi Laszlo <laszlo.sarkozi@balabit.com> Signed-off-by: KOVACS Krisztian <hidden@balabit.com>	2014-11-21 07:51:57 +01:00
Willy Tarreau	3a5e060bf6	MINOR: session: release a few other pools when stopping We currently release all pools when a proxy is stopped, except the connection, pendconn, and pipe pools. Doing so can improve further reduce memory usage of old processes, eventhough the connection struct is quite small, but there are a lot and they can participate to memory fragmentation. The pipe pool is very small and limited, and not exported so it's not done here.	2014-11-13 16:56:12 +01:00
Willy Tarreau	e12704bfc7	MINOR: session: export the function 'smp_fetch_sc_stkctr' This one is sometimes useful outside of this file.	2014-07-15 19:09:56 +02:00
Willy Tarreau	b5975defba	MINOR: stick-table: make stktable_fetch_key() indicate why it failed stktable_fetch_key() does not indicate whether it returns NULL because the input sample was not found or because it's unstable. It causes trouble with track-sc* rules. Just like with sample_fetch_string(), we want it to be able to give more information to the caller about what it found. Thus, now we use the pointer to a sample passed by the caller, and fill it with the information we have about the sample. That way, even if we return NULL, the caller has the ability to check whether a sample was found and if it is still changing or not.	2014-06-25 17:17:53 +02:00
Willy Tarreau	6f0a7bac28	BUG/MAJOR: session: revert all the crappy client-side timeout changes This is the 3rd regression caused by the changes below. The latest to date was reported by Finn Arne Gangstad. If a server responds with no content-length and the client's FIN is never received, either we leak the client-side FD or we spin at 100% CPU if timeout client-fin is set. Enough is enough. The amount of tricks needed to cover these side-effects starts to look like used toilet paper stacked over a chocolate cake. I don't want to eat that cake anymore! All this to avoid reporting a server-side timeout when a client stops uploading data and haproxy expires faster than the server... A lot of "ifs" resulting in a technically valid log that doesn't always please users, and whose alternative causes that many issues for all others users. So let's revert this crap merged since 1.5-dev25 : Revert "CLEANUP: http: don't clear CF_READ_NOEXP twice" This reverts commit `1592d1e72a`. Revert "BUG/MEDIUM: http: clear CF_READ_NOEXP when preparing a new transaction" This reverts commit `77d29029af`. Revert "BUG/MEDIUM: session: don't clear CF_READ_NOEXP if analysers are not called" This reverts commit `0943757a21`. Revert "BUG/MEDIUM: http: disable server-side expiration until client has sent the body" This reverts commit `3bed5e9337`. Revert "BUG/MEDIUM: http: correctly report request body timeouts" This reverts commit `b9edf8fbec`. Revert "BUG/MEDIUM: http/session: disable client-side expiration only after body" This reverts commit `b1982e27aa`. If a cleaner AND SAFER way to do something equivalent in 1.6-dev, we might consider backporting it to 1.5, but given the vicious bugs that have surfaced since, I doubt it will happen any time soon. Fortunately, that crap never made it into 1.4 so no backport is needed.	2014-06-23 15:47:00 +02:00
Willy Tarreau	4bfc580dd3	MEDIUM: session: maintain per-backend and per-server time statistics Using the last rate counters, we now compute the queue, connect, response and total times per server and per backend with a 95% accuracy over the last 1024 samples. The operation is cheap so we don't need to condition it.	2014-06-17 17:15:56 +02:00
Willy Tarreau	33a14e515b	MEDIUM: session: redispatch earlier when possible As discussed with Dmitry Sivachenko, is a server farm has more than one active server, uses a guaranteed non-determinist algorithm (round robin), and a connection was initiated from a non-persistent connection, there's no point insisting to reconnect to the same server after a connect failure, better redispatch upon the very first retry instead of insisting on the same server multiple times.	2014-06-13 17:53:55 +02:00
Willy Tarreau	db6d012270	MEDIUM: session: don't apply the retry delay when redispatching The retry delay is only useful when sticking to a same server. During a redispatch, it's useless and counter-productive if we're sure to switch to another server, which is almost guaranteed when there's more than one server and the balancing algorithm is round robin, so better not pass via the turn-around state in this case. It could be done as well for leastconn, but there's a risk of always killing the delay after the recovery of a server in a farm where it's almost guaranteed to take most incoming traffic. So better only kill the delay when using round robin.	2014-06-13 17:48:45 +02:00
Willy Tarreau	b02906659b	MEDIUM: session: allow shorter retry delay if timeout connect is small As discussed with Dmitry Sivachenko, the default 1-second connect retry delay can be large for situations where the connect timeout is much smaller, because it means that an active connection reject will take more time to be retried than a silent drop, and that does not make sense. This patch changes this so that the retry delay is the minimum of 1 second and the connect timeout. That way people running with sub-second connect timeout will benefit from the shorter reconnect.	2014-06-13 17:04:44 +02:00
Willy Tarreau	892337c8e1	MAJOR: server: use states instead of flags to store the server state Servers used to have 3 flags to store a state, now they have 4 states instead. This avoids lots of confusion for the 4 remaining undefined states. The encoding from the previous to the new states can be represented this way : SRV_STF_RUNNING \| SRV_STF_GOINGDOWN \| \| SRV_STF_WARMINGUP \| \| \| 0 x x SRV_ST_STOPPED 1 0 0 SRV_ST_RUNNING 1 0 1 SRV_ST_STARTING 1 1 x SRV_ST_STOPPING Note that the case where all bits were set used to exist and was randomly dealt with. For example, the task was not stopped, the throttle value was still updated and reported in the stats and in the http_server_state header. It was the same if the server was stopped by the agent or for maintenance. It's worth noting that the internal function names are still quite confusing.	2014-05-22 11:27:00 +02:00
Willy Tarreau	c93cd16b6c	REORG/MEDIUM: server: split server state and flags in two different variables Till now, the server's state and flags were all saved as a single bit field. It causes some difficulties because we'd like to have an enum for the state and separate flags. This commit starts by splitting them in two distinct fields. The first one is srv->state (with its counter-part srv->prev_state) which are now enums, but which still contain bits (SRV_STF_*). The flags now lie in their own field (srv->flags). The function srv_is_usable() was updated to use the enum as input, since it already used to deal only with the state. Note that currently, the maintenance mode is still in the state for simplicity, but it must move as well.	2014-05-22 11:27:00 +02:00
Willy Tarreau	0943757a21	BUG/MEDIUM: session: don't clear CF_READ_NOEXP if analysers are not called As more or less suspected, commit `b1982e2` ("BUG/MEDIUM: http/session: disable client-side expiration only after body") was hazardous. It introduced a regression causing client side timeout to expire during connection retries if it's lower than the time needed to cover the amount of retries, so clients get a 408 when the connection to the server fails to establish fast enough. The reason is that the CF_READ_NOEXP flag is set after the MSG_DONE state is reached, which protects the timeout from being re-armed, then during the retries, process_session() clears the flag without calling the analyser (since there's no activity for it), so the timeouts are rearmed. Ideally, these one-shot flags should be per-analyser, and the analyser which sets them would be responsible for clearing them, or they would automatically be cleared when switching to another analyser. Unfortunately this is not really possible currently. What can be done however is to only clear them in the following situations : - we're going to call analysers - analysers have all been unsubscribed This method seems reliable enough and approaches the ideal case well enough. No backport is needed, this bug was introduced in 1.5-dev25.	2014-05-21 16:58:17 +02:00
Willy Tarreau	05cdd9655d	MEDIUM: session: implement half-closed timeouts (client-fin and server-fin) Long-lived sessions are often subject to half-closed sessions resulting in a lot of sessions appearing in FIN_WAIT state in the system tables, and no way for haproxy to get rid of them. This typically happens because clients suddenly disconnect without sending any packet (eg: FIN or RST was lost in the path), and while the server detects this using an applicative heart beat, haproxy does not close the connection. This patch adds two new timeouts : "timeout client-fin" and "timeout server-fin". The former allows one to override the client-facing timeout when a FIN has been received or sent. The latter does the same for server-facing connections, which is less useful.	2014-05-10 15:14:05 +02:00
Willy Tarreau	b4f98098aa	BUG/MAJOR: session: recover the correct connection pointer in half-initialized sessions John-Paul Bader reported a nasty segv which happens after a few hours when SSL is enabled under a high load. Fortunately he could catch a stack trace, systematically looking like this one : (gdb) bt full level = 6 conn = (struct connection ) 0x0 err_msg = <value optimized out> s = (struct session ) 0x80337f800 conn = <value optimized out> flags = 41997063 new_updt = <value optimized out> old_updt = 1 e = <value optimized out> status = 0 fd = 53999616 nbfd = 279 wait_time = <value optimized out> updt_idx = <value optimized out> en = <value optimized out> eo = <value optimized out> count = 78 sr = <value optimized out> sw = <value optimized out> rn = <value optimized out> wn = <value optimized out> The variable "flags" in conn_fd_handler() holds a copy of connection->flags when entering the function. These flags indicate 41997063 = 0x0280d307 : - {SOCK,DATA,CURR}_RD_ENA=1 => it's a handshake, waiting for reading - {SOCK,DATA,CURR}_WR_ENA=0 => no need for writing - CTRL_READY=1 => FD is still allocated - XPRT_READY=1 => transport layer is initialized - ADDR_FROM_SET=1, ADDR_TO_SET=0 => clearly it's a frontend connection - INIT_DATA=1, WAKE_DATA=1 => processing a handshake (ssl I guess) - {DATA,SOCK}_{RD,WR}_SH=0 => no shutdown - ERROR=0, CONNECTED=0 => handshake not completed yet - WAIT_L4_CONN=0 => normal - WAIT_L6_CONN=1 => waiting for an L6 handshake to complete - SSL_WAIT_HS=1 => the pending handshake is an SSL handshake So this is a handshake is in progress. And the only way to reach line 88 is for the handshake to complete without error. So we know for sure that ssl_sock_handshake() was called and completed the handshake then removed the CO_FL_SSL_WAIT_HS flag from the connection. With these flags, ssl_sock_handshake() does only call SSL_do_handshake() and retruns. So that means that the problem is necessarily in data->init(). The fd is wrong as reported but is simply mis-decoded as it's the lower half of the last function pointer. What happens in practice is that there's an issue with the way we deal with embryonic sessions during their conversion to regular sessions. Since they have no stream interface at the beginning, the pointer to the connection is temporarily stored into s->target. Then during their conversion, the first stream interface is properly initialized and the connection is attached to it, then s->target is set to NULL. The problem is that if anything fails in session_complete(), the session is left in this intermediate state where s->target is NULL, and kill_mini_session() is called afterwards to perform the cleanup. It needs the connection, that it finds in s->target which is NULL, dereferences it and dies. The only reasons for dying here are a problem on the TCP connection when doing the setsockopt(TCP_NODELAY) or a memory allocation issue. This patch implements a solution consisting in restoring s->target in session_complete() on the error path. That way embryonic sessions that were valid before calling it are still valid after. The bug was introduced in 1.5-dev20 by commit `f8a49ea` ("MEDIUM: session: attach incoming connection to target on embryonic sessions"). No backport is needed. Special thanks to John for his numerous tests and traces.	2014-05-08 22:46:32 +02:00
Willy Tarreau	b1982e27aa	BUG/MEDIUM: http/session: disable client-side expiration only after body For a very long time, back in the v1.3 days, we used to rely on a trick to avoid expiring the client side while transferring a payload to the server. The problem was that if a client was able to quickly fill the buffers, and these buffers took some time to reach the server, the client should not expire while not sending anything. In order to cover this situation, the client-side timeout was disabled once the connection to the server was OK, since it implied that we would at least expire on the server if required. But there is a drawback to this : if a client stops uploading data before the end, its timeout is not enforced and we only expire on the server's timeout, so the logs report a 504. Since 1.4, we have message body analysers which ensure that we know whether all the expected data was received or not (HTTP_MSG_DATA or HTTP_MSG_DONE). So we can fix this problem by disabling the client-side or server-side timeout at the end of the transfer for the respective side instead of having it unconditionally in session.c during all the transfer. With this, the logs now report the correct side for the timeout. Note that this patch is not enough, because another issue remains : the HTTP body forwarders do not abort upon timeout, they simply rely on the generic handling from session.c. So for now, the session is still aborted when reaching the server timeout, but the culprit is properly reported. A subsequent patch will address this specific point. This bug was tagged MEDIUM because of the changes performed. The issue it fixes is minor however. After some cooling down, it may be backported to 1.4. It was reported by and discussed with Rachel Chavez and Patrick Hemmer on the mailing list.	2014-05-07 14:21:47 +02:00
Willy Tarreau	644c101e2d	BUG/MAJOR: http: connection setup may stall on balance url_param On the mailing list, seri0528@naver.com reported an issue when using balance url_param or balance uri. The request would sometimes stall forever. Cyril Bont� managed to reproduce it with the configuration below : listen test :80 mode http balance url_param q hash-type consistent server s demo.1wt.eu:80 and found it appeared with this commit : `80a92c0` ("BUG/MEDIUM: http: don't start to forward request data before the connect"). The bug is subtle but real. The problem is that the HTTP request forwarding analyzer refrains from starting to parse the request body when some LB algorithms might need the body contents, in order to preserve the data pointer and avoid moving things around during analysis in case a redispatch is later needed. And in order to detect that the connection establishes, it watches the response channel's CF_READ_ATTACHED flag. The problem is that a request analyzer is not subscribed to a response channel, so it will only see changes when woken for other (generally correlated) reasons, such as the fact that part of the request could be sent. And since the CF_READ_ATTACHED flag is cleared once leaving process_session(), it is important not to miss it. It simply happens that sometimes the server starts to respond in a sequence that validates the connection in the middle of process_session(), that it is detected after the analysers, and that the newly assigned CF_READ_ATTACHED is not used to detect that the request analysers need to be called again, then the flag is lost. The CF_WAKE_WRITE flag doesn't work either because it's cleared upon entry into process_session(), ie if we spend more than one call not connecting. Thus we need a new flag to tell the connection initiator that we are specifically interested in being notified about connection establishment. This new flag is CF_WAKE_CONNECT. It is set by the requester, and is cleared once the connection succeeds, where CF_WAKE_ONCE is set instead, causing the request analysers to be scanned again. For future versions, some better options will have to be considered : - let all analysers subscribe to both request and response events ; - let analysers subscribe to stream interface events (reduces number of useless calls) - change CF_WAKE_WRITE's semantics to persist across calls to process_session(), but that is different from validating a connection establishment (eg: no data sent, or no data to send) The bug was introduced in 1.5-dev23, no backport is needed.	2014-04-30 20:02:02 +02:00
Willy Tarreau	f51658dac4	MEDIUM: config: relax use_backend check to make the condition optional Since it became possible to use log-format expressions in use_backend, having a mandatory condition becomes annoying because configurations are full of "if TRUE". Let's relax the check to accept no condition like many other keywords (eg: redirect).	2014-04-23 01:21:56 +02:00
Willy Tarreau	b9a551e6aa	BUG/MINOR: stats: last session was not always set Cyril Bont� reported that the "lastsess" field of a stats-only backend was never updated. In fact the same is true for any applet and anything not a server. Also, lastsess was not updated for a server reusing its connection for a new request. Since the goal of this field is to report recent activity, it's better to ensure that all accesses are reported. The call has been moved to the code validating the session establishment instead, since everything passes there.	2014-04-23 00:35:17 +02:00
Willy Tarreau	5a8f947f4f	CLEANUP: http: rename http_process_request_body() This function does not process anything, it just waits for the beginning of the request body. Let's rename it http_wait_for_request_body().	2014-04-22 23:15:27 +02:00
Thierry FOURNIER	d988f21589	BUG/MAJOR: session: fix a possible crash with src_tracked Since commit `4d4149c` ("MEDIUM: counters: support passing the counter number as a fetch argument"), the sample fetch sc_tracked(num) became equivalent to sc[0-9]_tracked, by using the same smp_fetch_sc_tracked() function. This was theorically made possible after the series of changes starting with commit `a65536ca` ("MINOR: counters: provide a generic function to retrieve a stkctr for sc* and src."). Unfortunately, while all other functions were changed to use the generic primitive smp_fetch_sc_stkctr(), smp_fetch_sc_tracked() was forgotten and is not able to differentiate between sc_tracked, src_tracked and sc[0-9]_tracked. The resulting mess is that if sc_tracked is used, the counter number is assumed to be 47 because that's what remains after subtracting "0" from char "_". Fix this by simply relying on the generic function as should have been done. The bug was introduced in 1.5-dev20. No backport is needed.	2014-04-15 11:09:49 +02:00
Thierry FOURNIER	74c219dc04	BUG/MEDIUM: stick-table: fix IPv4-to-IPv6 conversion in src_* fetches The function addr_to_stktable_key doesn't consider the expected type of key. If the stick table key is based on IPv6 addresses and the input is IPv4, the returned key is IPv4 adddress and his length is 4 bytes, while is expected 16 bytes key. This patch considers the expected key and try to convert IPv4 to IPv6 and IPv6 to IPv4 according with the expected key. This fixes the bug reported by Apollon Oikonomopoulos. This bug was introduced somewhere in the 1.5-dev process.	2014-04-14 18:22:57 +02:00
Willy Tarreau	6a0b6bd648	BUG/MAJOR: counters: check for null-deref when looking up an alternate table Constructions such as sc0_get_gpc0(foo) allow to look up the same key as the current key but in an alternate table. A check was missing to ensure we already have a key, resulting in a crash if this lookup is performed before the associated track-sc rule. This bug was reported on the mailing list by Neil@iamafreeman and narrowed down further by Lukas Tribus and Thierry Fournier. This bug was introduced in 1.5-dev20 by commit "0f791d4 MEDIUM: counters: support looking up a key in an alternate table".	2014-04-09 13:32:11 +02:00
Bertrand Jacquin	702d44f2ff	MEDIUM: proxy: support use_backend with dynamic names We have a use case where we look up a customer ID in an HTTP header and direct it to the corresponding server. This can easily be done using ACLs and use_backend rules, but the configuration becomes painful to maintain when the number of customers grows to a few tens or even a several hundreds. We realized it would be nice if we could make the use_backend resolve its name at run time instead of config parsing time, and use a similar expression as http-request add-header to decide on the proper backend to use. This permits the use of prefixes or even complex names in backend expressions. If no name matches, then the default backend is used. Doing so allowed us to get rid of all the use_backend rules. Since there are some config checks on the use_backend rules to see if the referenced backend exists, we want to keep them to detect config errors in normal config. So this patch does not modify the default behaviour and proceeds this way : - if the backend name in the use_backend directive parses as a log format rule, it's used as-is and is resolved at run time ; - otherwise it's a static name which must be valid at config time. There was the possibility of doing this with the use-server directive instead of use_backend, but it seems like use_backend is more suited to this task, as it can be used for other purposes. For example, it becomes easy to serve a customer-specific proxy.pac file based on the customer ID by abusing the errorfile primitive : use_backend bk_cust_%[hdr(X-Cust-Id)] if { hdr(X-Cust-Id) -m found } default_backend bk_err_404 backend bk_cust_1 errorfile 200 /etc/haproxy/static/proxy.pac.cust1 Signed-off-by: Bertrand Jacquin <bjacquin@exosec.fr>	2014-03-31 10:18:30 +02:00
Thierry FOURNIER	a47a94fb13	MINOR: session: don't always assume there's a listener For outgoing connections initiated from an applet, there might not be any listener. It's the case with peers, which resort to a hack consisting in making the session's listener point to the peer. This listener is only used for statistics now so it's much easier to check for its presence now.	2014-03-28 13:16:32 +01:00
Willy Tarreau	7519560767	MINOR: http: release compression context only in http_end_txn() Currently there are two places where the compression context is released, one in session_free() and another one in http_end_txn_clean_session(). Both of them call http_end_txn(), either directly or via http_reset_txn(), and this function is made for this exact purpose. So let's centralize the call there instead.	2014-03-14 19:26:20 +01:00
Bhaskar Maddala	a20cb85eba	MINOR: stats: Enhancement to stats page to provide information of last session time. Summary: Track and report last session time on the stats page for each server in every backend, as well as the backend. This attempts to address the requirement in the ROADMAP - add a last activity date for each server (req/resp) that will be displayed in the stats. It will be useful with soft stop. The stats page reports this as time elapsed since last session. This change does not adequately address the requirement for long running session (websocket, RDP... etc).	2014-02-08 01:19:58 +01:00
Willy Tarreau	a23ee3a2ea	MINOR: session: clean up the connection free code Use conn_free() instead of pool_free2(conn...). This makes the code more auditable.	2014-02-05 00:18:47 +01:00
Willy Tarreau	818dca5098	BUG/MEDIUM: listener: improve detection of non-working accept4() On ARM, glibc does not implement accept4() and simply returns ENOSYS which was not caught as a reason to fall back to accept(), resulting in a spinning process since poll() would call again. Let's change the error detection mechanism to save the broken status of the syscall into a local variable that is used to fall back to the legacy accept(). In addition to this, since the code was becoming a bit messy, the accept4() was removed, so now the fallback code and the legacy code are the same. This will also increase bug report accuracy if needed. This is 1.5-specific, no backport is needed.	2014-01-31 19:40:19 +01:00
Willy Tarreau	cc08d2c9ff	MEDIUM: counters: stop relying on session flags at all Till now, we had one flag per stick counter to indicate if it was tracked in a backend or in a frontend. We just had to add another flag per stick-counter to indicate if it relies on contents or just connection. These flags are quite painful to maintain and tend to easily conflict with other flags if their number is changed. The correct solution consists in moving the flags to the stkctr struct itself, but currently this struct is made of 2 pointers, so adding a new entry there to store only two bits will cause at least 16 more bytes to be eaten per counter due to alignment issues, and we definitely don't want to waste tens to hundreds of bytes per session just for things that most users don't use. Since we only need to store two bits per counter, an intermediate solution consists in replacing the entry pointer with a composite value made of the original entry pointer and the two flags in the 2 unused lower bits. If later a need for other flags arises, we'll have to store them in the struct. A few inline functions have been added to abstract the retrieval and assignment of the pointers and flags, resulting in very few changes. That way there is no more dependence on the number of stick-counters and their position in the session flags.	2014-01-28 23:34:45 +01:00
Willy Tarreau	e9101695ef	BUG/MEDIUM: counters: fix stick-table entry leak when using track-sc2 in connection In 1.5-dev19, commit `e25c917` ("MEDIUM: counters: add support for tracking a third counter") introduced the third track counter. However, there was a hard-coded test in the accept() error path to release only sc0 and sc1. So it seems that if tracking sc2 at the connection level and deciding to reject once the track-sc2 has been done, there could be some leaking of stick-table entries which remain marked used forever, thus which can never be purged nor expired. There's no memory leak though, it's just that entries are unexpirable forever. The simple solution consists in removing the test and always calling the inline function which iterates over all entries.	2014-01-28 23:32:50 +01:00
Willy Tarreau	1f0da2485e	BUG/MEDIUM: unique_id: HTTP request counter is not stable Patrick Hemmer reported that using unique_id_format and logs did not report the same unique ID counter since commit `9f09521` ("BUG/MEDIUM: unique_id: HTTP request counter must be unique!"). This is because the increment was done while producing the log message, so it was performed twice. A better solution consists in fetching a new value once per request and saving it in the request or session context for all of this request's life. It happens that sessions already have a unique ID field which is used for debugging and reporting errors, and which differs from the one sent in logs and unique_id header. So let's change this to reuse this field to have coherent IDs everywhere. As of now, a session gets a new unique ID once it is instanciated. This means that TCP sessions will also benefit from a unique ID that can be logged. And this ID is renewed for each extra HTTP request received on an existing session. Thus, all TCP sessions and HTTP requests will have distinct IDs that will be stable along all their life, and coherent between all places where they're used (logs, unique_id header, "show sess", "show errors"). This feature is 1.5-specific, no backport to 1.4 is needed.	2014-01-25 11:07:06 +01:00
Willy Tarreau	2b028dd828	OPTIM: session: put unlikely() around the freewheeling code The code which enables tunnel mode or TCP transfers is rarely used and at most once per session. Putting it in an unlikely() clause reduces the length of the hot path of process_session() which is already quite long, and also slightly reduces its overall size. Some measurements show a steady gain of about 0.2% thanks to this.	2013-12-31 23:56:46 +01:00
Willy Tarreau	9e5a3aacf4	MEDIUM: stream-int: make si_connect() return an established state when possible si_connect() used to only return SI_ST_CON. But it already detect the connection reuse and is the function which avoids calling connect(). So it already knows the connection is valid and reuse. Thus we make it return SI_ST_EST when a connection is reused. This means that connect_server() can return this state and sess_update_stream_int() as well. Thanks to this change, we don't need to leave process_session() in SI_ST_CON state to immediately enter it again to switch to SI_ST_EST. Implementing this removes one call to process_session() per request in keep-alive mode. We're now at 2 calls per request, which is the minimum (one for the request and another one for the response). The number of calls to http_wait_for_response() has also dropped from 2 to one. Tests indicate a performance gain of about 2.6% in request rate in keep-alive mode. There should be no gain in http-server-close() since we don't use this faster path.	2013-12-31 23:32:12 +01:00
Willy Tarreau	b44c873d61	MEDIUM: session: prepare to support earlier transitions to the established state At the moment it is possible in sess_prepare_conn_req() to switch to the established state when the target is an applet. But sess_update_stream_int() will soon also have the ability to set the established state via connect_server() when a connection is reused, leading to a synchronous connect. So prepare the code to handle this SI_ST_ASS -> SI_ST_EST transition, which really matches what's done in the lower layers.	2013-12-31 23:16:50 +01:00
Willy Tarreau	0e37f1c40e	MINOR: session: factor out the connect time measurement Currently there are 3 places in the code where t_connect is set after switching to state SI_ST_EST, and a fourth one will soon come. Since all these places lead to an immediate call to sess_establish() to complete the session establishment, better move that measurement there.	2013-12-31 23:06:46 +01:00

1 2 3 4 5 ...

447 Commits