This class of functions permit to access to all the functions
associated with the transaction like http header, HAProxy internal
fetches, etc ...
This patch puts the skeleton of this class. The class will be
enhanced later.
Later, the processing of some actions needs to be interrupted and resumed
later. This patch permit to resume the actions. The actions that needs
to run with the resume mode are not yet avalaible. It will be soon with
Lua patches. So the code added by this patch is untestable for the moment.
The list of "tcp_exec_req_rules" cannot resme because is called by the
unresumable function "accept_session".
This behavior is already existing for the "WAIT_HTTP" analyzer,
this patch just extends the system to any analyzer that would
be waked up on response activity.
We've already experimented with three wake up algorithms when releasing
buffers : the first naive one used to wake up far too many sessions,
causing many of them not to get any buffer. The second approach which
was still in use prior to this patch consisted in waking up either 1
or 2 sessions depending on the number of FDs we had released. And this
was still inaccurate. The third one tried to cover the accuracy issues
of the second and took into consideration the number of FDs the sessions
would be willing to use, but most of the time we ended up waking up too
many of them for nothing, or deadlocking by lack of buffers.
This patch completely removes the need to allocate two buffers at once.
Instead it splits allocations into critical and non-critical ones and
implements a reserve in the pool for this. The deadlock situation happens
when all buffers are be allocated for requests pending in a maxconn-limited
server queue, because then there's no more way to allocate buffers for
responses, and these responses are critical to release the servers's
connection in order to release the pending requests. In fact maxconn on
a server creates a dependence between sessions and particularly between
oldest session's responses and latest session's requests. Thus, it is
mandatory to get a free buffer for a response in order to release a
server connection which will permit to release a request buffer.
Since we definitely have non-symmetrical buffers, we need to implement
this logic in the buffer allocation mechanism. What this commit does is
implement a reserve of buffers which can only be allocated for responses
and that will never be allocated for requests. This is made possible by
the requester indicating how much margin it wants to leave after the
allocation succeeds. Thus it is a cooperative allocation mechanism : the
requester (process_session() in general) prefers not to get a buffer in
order to respect other's need for response buffers. The session management
code always knows if a buffer will be used for requests or responses, so
that is not difficult :
- either there's an applet on the initiator side and we really need
the request buffer (since currently the applet is called in the
context of the session)
- or we have a connection and we really need the response buffer (in
order to support building and sending an error message back)
This reserve ensures that we don't take all allocatable buffers for
requests waiting in a queue. The downside is that all the extra buffers
are really allocated to ensure they can be allocated. But with small
values it is not an issue.
With this change, we don't observe any more deadlocks even when running
with maxconn 1 on a server under severely constrained memory conditions.
The code becomes a bit tricky, it relies on the scheduler's run queue to
estimate how many sessions are already expected to run so that it doesn't
wake up everyone with too few resources. A better solution would probably
consist in having two queues, one for urgent requests and one for normal
requests. A failed allocation for a session dealing with an error, a
connection event, or the need for a response (or request when there's an
applet on the left) would go to the urgent request queue, while other
requests would go to the other queue. Urgent requests would be served
from 1 entry in the pool, while the regular ones would be served only
according to the reserve. Despite not yet having this, it works
remarkably well.
This mechanism is quite efficient, we don't perform too many wake up calls
anymore. For 1 million sessions elapsed during massive memory contention,
we observe about 4.5M calls to process_session() compared to 4.0M without
memory constraints. Previously we used to observe up to 16M calls, which
rougly means 12M failures.
During a test run under high memory constraints (limit enforced to 27 MB
instead of the 58 MB normally needed), performance used to drop by 53% prior
to this patch. Now with this patch instead it *increases* by about 1.5%.
The best effect of this change is that by limiting the memory usage to about
2/3 to 3/4 of what is needed by default, it's possible to increase performance
by up to about 18% mainly due to the fact that pools are reused more often
and remain hot in the CPU cache (observed on regular HTTP traffic with 20k
objects, buffers.limit = maxconn/10, buffers.reserve = limit/2).
Below is an example of scenario which used to cause a deadlock previously :
- connection is received
- two buffers are allocated in process_session() then released
- one is allocated when receiving an HTTP request
- the second buffer is allocated then released in process_session()
for request parsing then connection establishment.
- poll() says we can send, so the request buffer is sent and released
- process session gets notified that the connection is now established
and allocates two buffers then releases them
- all other sessions do the same till one cannot get the request buffer
without hitting the margin
- and now the server responds. stream_interface allocates the response
buffer and manages to get it since it's higher priority being for a
response.
- but process_session() cannot allocate the request buffer anymore
=> We could end up with all buffers used by responses so that none may
be allocated for a request in process_session().
When the applet processing leaves the session context, the test will have
to be changed so that we always allocate a response buffer regardless of
the left side (eg: H2->H1 gateway). A final improvement would consists in
being able to only retry the failed I/O operation without waking up a
task, but to date all experiments to achieve this have proven not to be
reliable enough.
A session doesn't need buffers all the time, especially when they're
empty. With this patch, we don't allocate buffers anymore when the
session is initialized, we only allocate them in two cases :
- during process_session()
- during I/O operations
During process_session(), we try hard to allocate both buffers at once
so that we know for sure that a started operation can complete. Indeed,
a previous version of this patch used to allocate one buffer at a time,
but it can result in a deadlock when all buffers are allocated for
requests for example, and there's no buffer left to emit error responses.
Here, if any of the buffers cannot be allocated, the whole operation is
cancelled and the session is added at the tail of the buffer wait queue.
At the end of process_session(), a call to session_release_buffers() is
done so that we can offer unused buffers to other sessions waiting for
them.
For I/O operations, we only need to allocate a buffer on the Rx path.
For this, we only allocate a single buffer but ensure that at least two
are available to avoid the deadlock situation. In case buffers are not
available, SI_FL_WAIT_ROOM is set on the stream interface and the session
is queued. Unused buffers resulting either from a successful send() or
from an unused read buffer are offered to pending sessions during the
->wake() callback.
When a session_alloc_buffers() fails to allocate one or two buffers,
it subscribes the session to buffer_wq, and waits for another session
to release buffers. It's then removed from the queue and woken up with
TASK_WAKE_RES, and can attempt its allocation again.
We decide to try to wake as many waiters as we release buffers so
that if we release 2 and two waiters need only once, they both have
their chance. We must never come to the situation where we don't wake
enough tasks up.
It's common to release buffers after the completion of an I/O callback,
which can happen even if the I/O could not be performed due to half a
failure on memory allocation. In this situation, we don't want to move
out of the wait queue the session that was just added, otherwise it
will never get any buffer. Thus, we only force ourselves out of the
queue when freeing the session.
Note: at the moment, since session_alloc_buffers() is not used, no task
is subscribed to the wait queue.
This patch introduces session_alloc_recv_buffer(), session_alloc_buffers()
and session_release_buffers() whose purpose will be to allocate missing
buffers and release unneeded ones around the process_session() and during
I/O operations.
I/O callbacks only need a single buffer for recv operations and none
for send. However we still want to ensure that we don't pick the last
buffer. That's what session_alloc_recv_buffer() is for.
This allocator is atomic in that it always ensures we can get 2 buffers
or fails. Here, if any of the buffers is not ready and cannot be
allocated, the operation is cancelled. The purpose is to guarantee that
we don't enter into the deadlock where all buffers are allocated by the
same size of all sessions.
A queue will have to be implemented for failed allocations. For now
they're just reported as failures.
We'll soon want to release buffers together upon failure so we need to
allocate them after the channels. Let's change this now. There's no
impact on the behaviour, only the error path is unrolled slightly
differently. The same was done in peers.
We don't call pool_free2(pool2_buffers) anymore, we only call b_free()
to do the job. This ensures that we can start to centralize the releasing
of buffers.
It's not clean to initialize the buffer before the channel since it
dereferences one pointer in the channel. Also we'll want to let the
channel pre-initialize the buffer, so let's ensure that the channel
is always initialized prior to the buffers.
b_alloc() now allocates a buffer and initializes it to the size specified
in the pool minus the size of the struct buffer itself. This ensures that
callers do not need to care about buffer details anymore. Also this never
applies memory poisonning, which is slow and useless on buffers.
We'll soon need to be able to switch buffers without touching the
channel, so let's move buffer initialization out of channel_init().
We had the same in compressoin.c.
Since embryonic sessions were introduced in 1.5-dev12 with commit
2542b53 ("MAJOR: session: introduce embryonic sessions"), a major
bug remained present. If haproxy cannot allocate memory during
session_complete() (for example, no more buffers), it will not
unlink the new session from the sessions list. This will cause
memory corruptions if the memory area from the session is reused
for anything else, and may also cause bogus output on "show sess"
on the CLI.
This fix must be backported to 1.5.
This patch makes it possible to create binds and servers in separate
namespaces. This can be used to proxy between multiple completely independent
virtual networks (with possibly overlapping IP addresses) and a
non-namespace-aware proxy implementation that supports the proxy protocol (v2).
The setup is something like this:
net1 on VLAN 1 (namespace 1) -\
net2 on VLAN 2 (namespace 2) -- haproxy ==== proxy (namespace 0)
net3 on VLAN 3 (namespace 3) -/
The proxy is configured to make server connections through haproxy and sending
the expected source/target addresses to haproxy using the proxy protocol.
The network namespace setup on the haproxy node is something like this:
= 8< =
$ cat setup.sh
ip netns add 1
ip link add link eth1 type vlan id 1
ip link set eth1.1 netns 1
ip netns exec 1 ip addr add 192.168.91.2/24 dev eth1.1
ip netns exec 1 ip link set eth1.$id up
...
= 8< =
= 8< =
$ cat haproxy.cfg
frontend clients
bind 127.0.0.1:50022 namespace 1 transparent
default_backend scb
backend server
mode tcp
server server1 192.168.122.4:2222 namespace 2 send-proxy-v2
= 8< =
A bind line creates the listener in the specified namespace, and connections
originating from that listener also have their network namespace set to
that of the listener.
A server line either forces the connection to be made in a specified
namespace or may use the namespace from the client-side connection if that
was set.
For more documentation please read the documentation included in the patch
itself.
Signed-off-by: KOVACS Tamas <ktamas@balabit.com>
Signed-off-by: Sarkozi Laszlo <laszlo.sarkozi@balabit.com>
Signed-off-by: KOVACS Krisztian <hidden@balabit.com>
We currently release all pools when a proxy is stopped, except the
connection, pendconn, and pipe pools. Doing so can improve further
reduce memory usage of old processes, eventhough the connection struct
is quite small, but there are a lot and they can participate to memory
fragmentation. The pipe pool is very small and limited, and not exported
so it's not done here.
stktable_fetch_key() does not indicate whether it returns NULL because
the input sample was not found or because it's unstable. It causes trouble
with track-sc* rules. Just like with sample_fetch_string(), we want it to
be able to give more information to the caller about what it found. Thus,
now we use the pointer to a sample passed by the caller, and fill it with
the information we have about the sample. That way, even if we return NULL,
the caller has the ability to check whether a sample was found and if it is
still changing or not.
This is the 3rd regression caused by the changes below. The latest to
date was reported by Finn Arne Gangstad. If a server responds with no
content-length and the client's FIN is never received, either we leak
the client-side FD or we spin at 100% CPU if timeout client-fin is set.
Enough is enough. The amount of tricks needed to cover these side-effects
starts to look like used toilet paper stacked over a chocolate cake. I
don't want to eat that cake anymore!
All this to avoid reporting a server-side timeout when a client stops
uploading data and haproxy expires faster than the server... A lot of
"ifs" resulting in a technically valid log that doesn't always please
users, and whose alternative causes that many issues for all others
users.
So let's revert this crap merged since 1.5-dev25 :
Revert "CLEANUP: http: don't clear CF_READ_NOEXP twice"
This reverts commit 1592d1e72a.
Revert "BUG/MEDIUM: http: clear CF_READ_NOEXP when preparing a new transaction"
This reverts commit 77d29029af.
Revert "BUG/MEDIUM: session: don't clear CF_READ_NOEXP if analysers are not called"
This reverts commit 0943757a21.
Revert "BUG/MEDIUM: http: disable server-side expiration until client has sent the body"
This reverts commit 3bed5e9337.
Revert "BUG/MEDIUM: http: correctly report request body timeouts"
This reverts commit b9edf8fbec.
Revert "BUG/MEDIUM: http/session: disable client-side expiration only after body"
This reverts commit b1982e27aa.
If a cleaner AND SAFER way to do something equivalent in 1.6-dev, we *might*
consider backporting it to 1.5, but given the vicious bugs that have surfaced
since, I doubt it will happen any time soon.
Fortunately, that crap never made it into 1.4 so no backport is needed.
Using the last rate counters, we now compute the queue, connect, response
and total times per server and per backend with a 95% accuracy over the last
1024 samples. The operation is cheap so we don't need to condition it.
As discussed with Dmitry Sivachenko, is a server farm has more than one
active server, uses a guaranteed non-determinist algorithm (round robin),
and a connection was initiated from a non-persistent connection, there's
no point insisting to reconnect to the same server after a connect failure,
better redispatch upon the very first retry instead of insisting on the same
server multiple times.
The retry delay is only useful when sticking to a same server. During
a redispatch, it's useless and counter-productive if we're sure to
switch to another server, which is almost guaranteed when there's
more than one server and the balancing algorithm is round robin, so
better not pass via the turn-around state in this case. It could be
done as well for leastconn, but there's a risk of always killing the
delay after the recovery of a server in a farm where it's almost
guaranteed to take most incoming traffic. So better only kill the
delay when using round robin.
As discussed with Dmitry Sivachenko, the default 1-second connect retry
delay can be large for situations where the connect timeout is much smaller,
because it means that an active connection reject will take more time to be
retried than a silent drop, and that does not make sense.
This patch changes this so that the retry delay is the minimum of 1 second
and the connect timeout. That way people running with sub-second connect
timeout will benefit from the shorter reconnect.
Servers used to have 3 flags to store a state, now they have 4 states
instead. This avoids lots of confusion for the 4 remaining undefined
states.
The encoding from the previous to the new states can be represented
this way :
SRV_STF_RUNNING
| SRV_STF_GOINGDOWN
| | SRV_STF_WARMINGUP
| | |
0 x x SRV_ST_STOPPED
1 0 0 SRV_ST_RUNNING
1 0 1 SRV_ST_STARTING
1 1 x SRV_ST_STOPPING
Note that the case where all bits were set used to exist and was randomly
dealt with. For example, the task was not stopped, the throttle value was
still updated and reported in the stats and in the http_server_state header.
It was the same if the server was stopped by the agent or for maintenance.
It's worth noting that the internal function names are still quite confusing.
Till now, the server's state and flags were all saved as a single bit
field. It causes some difficulties because we'd like to have an enum
for the state and separate flags.
This commit starts by splitting them in two distinct fields. The first
one is srv->state (with its counter-part srv->prev_state) which are now
enums, but which still contain bits (SRV_STF_*).
The flags now lie in their own field (srv->flags).
The function srv_is_usable() was updated to use the enum as input, since
it already used to deal only with the state.
Note that currently, the maintenance mode is still in the state for
simplicity, but it must move as well.
As more or less suspected, commit b1982e2 ("BUG/MEDIUM: http/session:
disable client-side expiration only after body") was hazardous. It
introduced a regression causing client side timeout to expire during
connection retries if it's lower than the time needed to cover the
amount of retries, so clients get a 408 when the connection to the
server fails to establish fast enough.
The reason is that the CF_READ_NOEXP flag is set after the MSG_DONE state
is reached, which protects the timeout from being re-armed, then during
the retries, process_session() clears the flag without calling the analyser
(since there's no activity for it), so the timeouts are rearmed.
Ideally, these one-shot flags should be per-analyser, and the analyser
which sets them would be responsible for clearing them, or they would
automatically be cleared when switching to another analyser. Unfortunately
this is not really possible currently.
What can be done however is to only clear them in the following situations :
- we're going to call analysers
- analysers have all been unsubscribed
This method seems reliable enough and approaches the ideal case well enough.
No backport is needed, this bug was introduced in 1.5-dev25.
Long-lived sessions are often subject to half-closed sessions resulting in
a lot of sessions appearing in FIN_WAIT state in the system tables, and no
way for haproxy to get rid of them. This typically happens because clients
suddenly disconnect without sending any packet (eg: FIN or RST was lost in
the path), and while the server detects this using an applicative heart
beat, haproxy does not close the connection.
This patch adds two new timeouts : "timeout client-fin" and
"timeout server-fin". The former allows one to override the client-facing
timeout when a FIN has been received or sent. The latter does the same for
server-facing connections, which is less useful.
John-Paul Bader reported a nasty segv which happens after a few hours
when SSL is enabled under a high load. Fortunately he could catch a
stack trace, systematically looking like this one :
(gdb) bt full
level = 6
conn = (struct connection *) 0x0
err_msg = <value optimized out>
s = (struct session *) 0x80337f800
conn = <value optimized out>
flags = 41997063
new_updt = <value optimized out>
old_updt = 1
e = <value optimized out>
status = 0
fd = 53999616
nbfd = 279
wait_time = <value optimized out>
updt_idx = <value optimized out>
en = <value optimized out>
eo = <value optimized out>
count = 78
sr = <value optimized out>
sw = <value optimized out>
rn = <value optimized out>
wn = <value optimized out>
The variable "flags" in conn_fd_handler() holds a copy of connection->flags
when entering the function. These flags indicate 41997063 = 0x0280d307 :
- {SOCK,DATA,CURR}_RD_ENA=1 => it's a handshake, waiting for reading
- {SOCK,DATA,CURR}_WR_ENA=0 => no need for writing
- CTRL_READY=1 => FD is still allocated
- XPRT_READY=1 => transport layer is initialized
- ADDR_FROM_SET=1, ADDR_TO_SET=0 => clearly it's a frontend connection
- INIT_DATA=1, WAKE_DATA=1 => processing a handshake (ssl I guess)
- {DATA,SOCK}_{RD,WR}_SH=0 => no shutdown
- ERROR=0, CONNECTED=0 => handshake not completed yet
- WAIT_L4_CONN=0 => normal
- WAIT_L6_CONN=1 => waiting for an L6 handshake to complete
- SSL_WAIT_HS=1 => the pending handshake is an SSL handshake
So this is a handshake is in progress. And the only way to reach line 88
is for the handshake to complete without error. So we know for sure that
ssl_sock_handshake() was called and completed the handshake then removed
the CO_FL_SSL_WAIT_HS flag from the connection. With these flags,
ssl_sock_handshake() does only call SSL_do_handshake() and retruns. So
that means that the problem is necessarily in data->init().
The fd is wrong as reported but is simply mis-decoded as it's the lower
half of the last function pointer.
What happens in practice is that there's an issue with the way we deal
with embryonic sessions during their conversion to regular sessions.
Since they have no stream interface at the beginning, the pointer to
the connection is temporarily stored into s->target. Then during their
conversion, the first stream interface is properly initialized and the
connection is attached to it, then s->target is set to NULL.
The problem is that if anything fails in session_complete(), the
session is left in this intermediate state where s->target is NULL,
and kill_mini_session() is called afterwards to perform the cleanup.
It needs the connection, that it finds in s->target which is NULL,
dereferences it and dies. The only reasons for dying here are a problem
on the TCP connection when doing the setsockopt(TCP_NODELAY) or a
memory allocation issue.
This patch implements a solution consisting in restoring s->target in
session_complete() on the error path. That way embryonic sessions that
were valid before calling it are still valid after.
The bug was introduced in 1.5-dev20 by commit f8a49ea ("MEDIUM: session:
attach incoming connection to target on embryonic sessions"). No backport
is needed.
Special thanks to John for his numerous tests and traces.
For a very long time, back in the v1.3 days, we used to rely on a trick
to avoid expiring the client side while transferring a payload to the
server. The problem was that if a client was able to quickly fill the
buffers, and these buffers took some time to reach the server, the
client should not expire while not sending anything.
In order to cover this situation, the client-side timeout was disabled
once the connection to the server was OK, since it implied that we would
at least expire on the server if required.
But there is a drawback to this : if a client stops uploading data before
the end, its timeout is not enforced and we only expire on the server's
timeout, so the logs report a 504.
Since 1.4, we have message body analysers which ensure that we know whether
all the expected data was received or not (HTTP_MSG_DATA or HTTP_MSG_DONE).
So we can fix this problem by disabling the client-side or server-side
timeout at the end of the transfer for the respective side instead of
having it unconditionally in session.c during all the transfer.
With this, the logs now report the correct side for the timeout. Note that
this patch is not enough, because another issue remains : the HTTP body
forwarders do not abort upon timeout, they simply rely on the generic
handling from session.c. So for now, the session is still aborted when
reaching the server timeout, but the culprit is properly reported. A
subsequent patch will address this specific point.
This bug was tagged MEDIUM because of the changes performed. The issue
it fixes is minor however. After some cooling down, it may be backported
to 1.4.
It was reported by and discussed with Rachel Chavez and Patrick Hemmer
on the mailing list.
On the mailing list, seri0528@naver.com reported an issue when
using balance url_param or balance uri. The request would sometimes
stall forever.
Cyril Bont managed to reproduce it with the configuration below :
listen test :80
mode http
balance url_param q
hash-type consistent
server s demo.1wt.eu:80
and found it appeared with this commit : 80a92c0 ("BUG/MEDIUM: http:
don't start to forward request data before the connect").
The bug is subtle but real. The problem is that the HTTP request
forwarding analyzer refrains from starting to parse the request
body when some LB algorithms might need the body contents, in order
to preserve the data pointer and avoid moving things around during
analysis in case a redispatch is later needed. And in order to detect
that the connection establishes, it watches the response channel's
CF_READ_ATTACHED flag.
The problem is that a request analyzer is not subscribed to a response
channel, so it will only see changes when woken for other (generally
correlated) reasons, such as the fact that part of the request could
be sent. And since the CF_READ_ATTACHED flag is cleared once leaving
process_session(), it is important not to miss it. It simply happens
that sometimes the server starts to respond in a sequence that validates
the connection in the middle of process_session(), that it is detected
after the analysers, and that the newly assigned CF_READ_ATTACHED is
not used to detect that the request analysers need to be called again,
then the flag is lost.
The CF_WAKE_WRITE flag doesn't work either because it's cleared upon
entry into process_session(), ie if we spend more than one call not
connecting.
Thus we need a new flag to tell the connection initiator that we are
specifically interested in being notified about connection establishment.
This new flag is CF_WAKE_CONNECT. It is set by the requester, and is
cleared once the connection succeeds, where CF_WAKE_ONCE is set instead,
causing the request analysers to be scanned again.
For future versions, some better options will have to be considered :
- let all analysers subscribe to both request and response events ;
- let analysers subscribe to stream interface events (reduces number
of useless calls)
- change CF_WAKE_WRITE's semantics to persist across calls to
process_session(), but that is different from validating a
connection establishment (eg: no data sent, or no data to send)
The bug was introduced in 1.5-dev23, no backport is needed.
Since it became possible to use log-format expressions in use_backend,
having a mandatory condition becomes annoying because configurations
are full of "if TRUE". Let's relax the check to accept no condition
like many other keywords (eg: redirect).
Cyril Bont reported that the "lastsess" field of a stats-only backend
was never updated. In fact the same is true for any applet and anything
not a server. Also, lastsess was not updated for a server reusing its
connection for a new request.
Since the goal of this field is to report recent activity, it's better
to ensure that all accesses are reported. The call has been moved to
the code validating the session establishment instead, since everything
passes there.
Since commit 4d4149c ("MEDIUM: counters: support passing the counter
number as a fetch argument"), the sample fetch sc_tracked(num) became
equivalent to sc[0-9]_tracked, by using the same smp_fetch_sc_tracked()
function.
This was theorically made possible after the series of changes starting
with commit a65536ca ("MINOR: counters: provide a generic function to
retrieve a stkctr for sc* and src."). Unfortunately, while all other
functions were changed to use the generic primitive smp_fetch_sc_stkctr(),
smp_fetch_sc_tracked() was forgotten and is not able to differentiate
between sc_tracked, src_tracked and sc[0-9]_tracked. The resulting mess is
that if sc_tracked is used, the counter number is assumed to be 47 because
that's what remains after subtracting "0" from char "_".
Fix this by simply relying on the generic function as should have been
done. The bug was introduced in 1.5-dev20. No backport is needed.
The function addr_to_stktable_key doesn't consider the expected
type of key. If the stick table key is based on IPv6 addresses
and the input is IPv4, the returned key is IPv4 adddress and his
length is 4 bytes, while is expected 16 bytes key.
This patch considers the expected key and try to convert IPv4 to
IPv6 and IPv6 to IPv4 according with the expected key.
This fixes the bug reported by Apollon Oikonomopoulos.
This bug was introduced somewhere in the 1.5-dev process.
Constructions such as sc0_get_gpc0(foo) allow to look up the same key as
the current key but in an alternate table. A check was missing to ensure
we already have a key, resulting in a crash if this lookup is performed
before the associated track-sc rule.
This bug was reported on the mailing list by Neil@iamafreeman and
narrowed down further by Lukas Tribus and Thierry Fournier.
This bug was introduced in 1.5-dev20 by commit "0f791d4 MEDIUM: counters:
support looking up a key in an alternate table".
We have a use case where we look up a customer ID in an HTTP header
and direct it to the corresponding server. This can easily be done
using ACLs and use_backend rules, but the configuration becomes
painful to maintain when the number of customers grows to a few
tens or even a several hundreds.
We realized it would be nice if we could make the use_backend
resolve its name at run time instead of config parsing time, and
use a similar expression as http-request add-header to decide on
the proper backend to use. This permits the use of prefixes or
even complex names in backend expressions. If no name matches,
then the default backend is used. Doing so allowed us to get rid
of all the use_backend rules.
Since there are some config checks on the use_backend rules to see
if the referenced backend exists, we want to keep them to detect
config errors in normal config. So this patch does not modify the
default behaviour and proceeds this way :
- if the backend name in the use_backend directive parses as a log
format rule, it's used as-is and is resolved at run time ;
- otherwise it's a static name which must be valid at config time.
There was the possibility of doing this with the use-server directive
instead of use_backend, but it seems like use_backend is more suited
to this task, as it can be used for other purposes. For example, it
becomes easy to serve a customer-specific proxy.pac file based on the
customer ID by abusing the errorfile primitive :
use_backend bk_cust_%[hdr(X-Cust-Id)] if { hdr(X-Cust-Id) -m found }
default_backend bk_err_404
backend bk_cust_1
errorfile 200 /etc/haproxy/static/proxy.pac.cust1
Signed-off-by: Bertrand Jacquin <bjacquin@exosec.fr>
For outgoing connections initiated from an applet, there might not be
any listener. It's the case with peers, which resort to a hack consisting
in making the session's listener point to the peer. This listener is only
used for statistics now so it's much easier to check for its presence now.
Currently there are two places where the compression context is released,
one in session_free() and another one in http_end_txn_clean_session().
Both of them call http_end_txn(), either directly or via http_reset_txn(),
and this function is made for this exact purpose. So let's centralize the
call there instead.
Summary:
Track and report last session time on the stats page for each server
in every backend, as well as the backend.
This attempts to address the requirement in the ROADMAP
- add a last activity date for each server (req/resp) that will be
displayed in the stats. It will be useful with soft stop.
The stats page reports this as time elapsed since last session. This
change does not adequately address the requirement for long running
session (websocket, RDP... etc).
On ARM, glibc does not implement accept4() and simply returns ENOSYS
which was not caught as a reason to fall back to accept(), resulting
in a spinning process since poll() would call again.
Let's change the error detection mechanism to save the broken status
of the syscall into a local variable that is used to fall back to the
legacy accept().
In addition to this, since the code was becoming a bit messy, the
accept4() was removed, so now the fallback code and the legacy code
are the same. This will also increase bug report accuracy if needed.
This is 1.5-specific, no backport is needed.
Till now, we had one flag per stick counter to indicate if it was
tracked in a backend or in a frontend. We just had to add another
flag per stick-counter to indicate if it relies on contents or just
connection. These flags are quite painful to maintain and tend to
easily conflict with other flags if their number is changed.
The correct solution consists in moving the flags to the stkctr struct
itself, but currently this struct is made of 2 pointers, so adding a
new entry there to store only two bits will cause at least 16 more bytes
to be eaten per counter due to alignment issues, and we definitely don't
want to waste tens to hundreds of bytes per session just for things that
most users don't use.
Since we only need to store two bits per counter, an intermediate
solution consists in replacing the entry pointer with a composite
value made of the original entry pointer and the two flags in the
2 unused lower bits. If later a need for other flags arises, we'll
have to store them in the struct.
A few inline functions have been added to abstract the retrieval
and assignment of the pointers and flags, resulting in very few
changes. That way there is no more dependence on the number of
stick-counters and their position in the session flags.
In 1.5-dev19, commit e25c917 ("MEDIUM: counters: add support for tracking
a third counter") introduced the third track counter. However, there was
a hard-coded test in the accept() error path to release only sc0 and sc1.
So it seems that if tracking sc2 at the connection level and deciding to
reject once the track-sc2 has been done, there could be some leaking of
stick-table entries which remain marked used forever, thus which can never
be purged nor expired. There's no memory leak though, it's just that
entries are unexpirable forever.
The simple solution consists in removing the test and always calling
the inline function which iterates over all entries.
Patrick Hemmer reported that using unique_id_format and logs did not
report the same unique ID counter since commit 9f09521 ("BUG/MEDIUM:
unique_id: HTTP request counter must be unique!"). This is because
the increment was done while producing the log message, so it was
performed twice.
A better solution consists in fetching a new value once per request
and saving it in the request or session context for all of this
request's life.
It happens that sessions already have a unique ID field which is used
for debugging and reporting errors, and which differs from the one
sent in logs and unique_id header.
So let's change this to reuse this field to have coherent IDs everywhere.
As of now, a session gets a new unique ID once it is instanciated. This
means that TCP sessions will also benefit from a unique ID that can be
logged. And this ID is renewed for each extra HTTP request received on
an existing session. Thus, all TCP sessions and HTTP requests will have
distinct IDs that will be stable along all their life, and coherent
between all places where they're used (logs, unique_id header,
"show sess", "show errors").
This feature is 1.5-specific, no backport to 1.4 is needed.
The code which enables tunnel mode or TCP transfers is rarely used
and at most once per session. Putting it in an unlikely() clause
reduces the length of the hot path of process_session() which is
already quite long, and also slightly reduces its overall size.
Some measurements show a steady gain of about 0.2% thanks to this.
si_connect() used to only return SI_ST_CON. But it already detect the
connection reuse and is the function which avoids calling connect().
So it already knows the connection is valid and reuse. Thus we make it
return SI_ST_EST when a connection is reused. This means that
connect_server() can return this state and sess_update_stream_int()
as well.
Thanks to this change, we don't need to leave process_session() in
SI_ST_CON state to immediately enter it again to switch to SI_ST_EST.
Implementing this removes one call to process_session() per request
in keep-alive mode. We're now at 2 calls per request, which is the
minimum (one for the request and another one for the response). The
number of calls to http_wait_for_response() has also dropped from 2
to one.
Tests indicate a performance gain of about 2.6% in request rate in
keep-alive mode. There should be no gain in http-server-close() since
we don't use this faster path.
At the moment it is possible in sess_prepare_conn_req() to switch to the
established state when the target is an applet. But sess_update_stream_int()
will soon also have the ability to set the established state via
connect_server() when a connection is reused, leading to a synchronous
connect.
So prepare the code to handle this SI_ST_ASS -> SI_ST_EST transition, which
really matches what's done in the lower layers.
Currently there are 3 places in the code where t_connect is set after
switching to state SI_ST_EST, and a fourth one will soon come. Since
all these places lead to an immediate call to sess_establish() to
complete the session establishment, better move that measurement
there.