Some users want to add their own data types to stick tables. We don't
want to use a linked list here for performance reasons, so we need to
continue to use an indexed array. This patch allows one to reserve a
compile-time-defined number of extra data types by setting the new
macro STKTABLE_EXTRA_DATA_TYPES to anything greater than zero, keeping
in mind that anything larger will slightly inflate the memory consumed
by stick tables (not per entry though).
Then calling stktable_register_data_store() with the new keyword will
either register a new keyword or fail if the desired entry was already
taken or the keyword already registered.
Note that this patch does not dictate how the data will be used, it only
offers the possibility to create new keywords and have an index to
reference them in the config and in the tables. The caller will not be
able to use stktable_data_cast() and will have to explicitly cast the
stable pointers to the expected types. It can be used for experimentation
as well.
Daniel Dubovik reported an interesting bug showing that the request body
processing was still not 100% fixed. If a POST request contained short
enough data to be forwarded at once before trying to establish the
connection to the server, we had no way to correctly rewind the body.
The first visible case is that balancing on a header does not always work
on such POST requests since the header cannot be found. But there are even
nastier implications which are that http-send-name-header would apply to
the wrong location and possibly even affect part of the request's body
due to an incorrect rewinding.
There are two options to fix the problem :
- first one is to force the HTTP_MSG_F_WAIT_CONN flag on all hash-based
balancing algorithms and http-send-name-header, but there's always a
risk that any new algorithm forgets to set it ;
- the second option is to account for the amount of skipped data before
the connection establishes so that we always know the position of the
request's body relative to the buffer's origin.
The second option is much more reliable and fits very well in the spirit
of the past changes to fix forwarding. Indeed, at the moment we have
msg->sov which points to the start of the body before headers are forwarded
and which equals zero afterwards (so it still points to the start of the
body before forwarding data). A minor change consists in always making it
point to the start of the body even after data have been forwarded. It means
that it can get a negative value (so we need to change its type to signed)..
In order to avoid wrapping, we only do this as long as the other side of
the buffer is not connected yet.
Doing this definitely fixes the issues above for the requests. Since the
response cannot be rewound we don't need to perform any change there.
This bug was introduced/remained unfixed in 1.5-dev23 so the fix must be
backported to 1.5.
In order to fix the abstact socket pause mechanism during soft restarts,
we'll need to proceed differently depending on the socket protocol. The
pause_listener() function already supports some protocol-specific handling
for the TCP case.
This commit makes this cleaner by adding a new ->pause() function to the
protocol struct, which, if defined, may be used to pause a listener of a
given protocol.
For now, only TCP has been adapted, with the specific code moved from
pause_listener() to tcp_pause_listener().
With all the goodies supported by logformat, people find that the limit
of 1024 chars for log lines is too short. Some servers do not support
larger lines and can simply drop them, so changing the default value is
not always the best choice.
This patch takes a different approach. Log line length is specified per
log server on the "log" line, with a value between 80 and 65535. That
way it's possibly to satisfy all needs, even with some fat local servers
and small remote ones.
This value was set in log.h without any #ifndef around, so when one
wanted to change it, a patch was needed. Let's move it to defaults.h
with the usual #ifndef so that it's easier to change it.
This patch remove all references of standard regex in haproxy. The last
remaining references are only in the regex.[ch] files.
In the file src/checks.c, the original function uses a "pmatch" array.
In fact this array is unused. This patch remove it.
This patch adds two new actions to http-request and http-response rulesets :
- replace-header : replace a whole header line, suited for headers
which might contain commas
- replace-value : replace a single header value, suited for headers
defined as lists.
The match consists in a regex, and the replacement string takes a log-format
and supports back-references.
Using the last rate counters, we now compute the queue, connect, response
and total times per server and per backend with a 95% accuracy over the last
1024 samples. The operation is cheap so we don't need to condition it.
This is in order to simplify the PPv2 header parsing code to look more
like the one provided as an example in the spec. No code change was
performed beyond just merging the proxy_addr union into the proxy_hdr_v2
struct.
This patch adds support for captures with no header name. The purpose
is to allow extra captures to be defined and logged along with the
header captures.
When no static DH parameters are specified, this patch makes haproxy
use standardized (rfc 2409 / rfc 3526) DH parameters with prime lenghts
of 1024, 2048, 4096 or 8192 bits for DHE key exchange. The size of the
temporary/ephemeral DH key is computed as the minimum of the RSA/DSA server
key size and the value of a new option named tune.ssl.default-dh-param.
One important aspect of SSL performance tuning is the cache size,
but there's no metric to know whether it's large enough or not. This
commit introduces two counters, one for the cache lookups and another
one for cache misses. These counters are reported on "show info" on
the stats socket. This way, it suffices to see the cache misses
counter constantly grow to know that a larger cache could possibly
help.
It's commonly needed to know how many SSL asymmetric keys are computed
per second on either side (frontend or backend), and to know the SSL
session reuse ratio. Now we compute these values and report them in
"show info".
Agent will have the ability to return a weight without indicating an
up/down status. Currently this is not possible, so let's add a 5th
result CHK_RES_NEUTRAL for this purpose. It has been mapped to the
unused HCHK_STATUS_CHECKED which already serves as a neutral delimitor
between initiated checks and those returning a result.
Instead of enabling/disabling maintenance mode and drain mode separately
using 4 actions, we now offer 3 simplified actions :
- set state to READY
- set state to DRAIN
- set state to MAINT
They have the benefit of reporting the same state as displayed on the page,
and of doing the double-switch atomically eg when switching from drain to
maint.
Note that the old actions are still supported for users running scripts.
This patch adds support for a new "drain" mode. So now we have 3 admin
modes for a server :
- READY
- DRAIN
- MAINT
The drain mode disables load balancing but leaves the server up. It can
coexist with maint, except that maint has precedence. It is also inherited
from tracked servers, so just like maint, it's represented with 2 bits.
New functions were designed to set/clear each flag and to propagate the
changes to tracking servers when relevant, and to log the changes. Existing
functions srv_set_adm_maint() and srv_set_adm_ready() were replaced to make
use of the new functions.
Currently the drain mode is not yet used, however the whole logic was tested
with all combinations of set/clear of both flags in various orders to catch
all corner cases.
Servers used to have 3 flags to store a state, now they have 4 states
instead. This avoids lots of confusion for the 4 remaining undefined
states.
The encoding from the previous to the new states can be represented
this way :
SRV_STF_RUNNING
| SRV_STF_GOINGDOWN
| | SRV_STF_WARMINGUP
| | |
0 x x SRV_ST_STOPPED
1 0 0 SRV_ST_RUNNING
1 0 1 SRV_ST_STARTING
1 1 x SRV_ST_STOPPING
Note that the case where all bits were set used to exist and was randomly
dealt with. For example, the task was not stopped, the throttle value was
still updated and reported in the stats and in the http_server_state header.
It was the same if the server was stopped by the agent or for maintenance.
It's worth noting that the internal function names are still quite confusing.
Now we introduce srv->admin and srv->prev_admin which are bitfields
containing one bit per source of administrative status (maintenance only
for now). For the sake of backwards compatibility we implement a single
source (ADMF_FMAINT) but the code already checks any source (ADMF_MAINT)
where the STF_MAINTAIN bit was previously checked. This will later allow
us to add ADMF_IMAINT for maintenance mode inherited from tracked servers.
Along doing these changes, it appeared that some places will need to be
revisited when implementing the inherited bit, this concerns all those
modifying the ADMF_FMAINT bit (enable/disable actions on the CLI or stats
page), and the checks to report "via" on the stats page. But currently
the code is harmless.
Till now, the server's state and flags were all saved as a single bit
field. It causes some difficulties because we'd like to have an enum
for the state and separate flags.
This commit starts by splitting them in two distinct fields. The first
one is srv->state (with its counter-part srv->prev_state) which are now
enums, but which still contain bits (SRV_STF_*).
The flags now lie in their own field (srv->flags).
The function srv_is_usable() was updated to use the enum as input, since
it already used to deal only with the state.
Note that currently, the maintenance mode is still in the state for
simplicity, but it must move as well.
When run in daemon mode (i.e. with at least one forked process) and using
the epoll poller, sending USR1 (graceful shutdown) to the worker processes
can cause some workers to start running at 100% CPU. Precondition is having
an established HTTP keep-alive connection when the signal is received.
The cloned (during fork) listening sockets do not get closed in the parent
process, thus they do not get removed from the epoll set automatically
(see man 7 epoll). This can lead to the process receiving epoll events
that it doesn't feel responsible for, resulting in an endless loop around
epoll_wait() delivering these events.
The solution is to explicitly remove these file descriptors from the epoll
set. To not degrade performance, care was taken to only do this when
neccessary, i.e. when the file descriptor was cloned during fork.
Signed-off-by: Conrad Hoffmann <conrad@soundcloud.com>
[wt: a backport to 1.4 could be studied though chances to catch the bug are low]
This flag is only a copy of (srv->uweight == 0), so better get rid of
it to reduce some of the confusion that remains in the code, and use
a simple function to return this state based on this weight instead.
Long-lived sessions are often subject to half-closed sessions resulting in
a lot of sessions appearing in FIN_WAIT state in the system tables, and no
way for haproxy to get rid of them. This typically happens because clients
suddenly disconnect without sending any packet (eg: FIN or RST was lost in
the path), and while the server detects this using an applicative heart
beat, haproxy does not close the connection.
This patch adds two new timeouts : "timeout client-fin" and
"timeout server-fin". The former allows one to override the client-facing
timeout when a FIN has been received or sent. The latter does the same for
server-facing connections, which is less useful.
Some consistency checks cannot be performed between frontends, backends
and peers at the moment because there is no way to check for intersection
between processes bound to some processes when the number of processes is
higher than the number of bits in a word.
So first, let's limit the number of processes to the machine's word size.
This means nbproc will be limited to 32 on 32-bit machines and 64 on 64-bit
machines. This is far more than enough considering that configs rarely go
above 16 processes due to scalability and management issues, so 32 or 64
should be fine.
This way we'll ensure we can always build a mask of all the processes a
section is bound to.
This commit modifies the PROXY protocol V2 specification to support headers
longer than 255 bytes allowing for optional extensions. It implements the
PROXY protocol V2 which is a binary representation of V1. This will make
parsing more efficient for clients who will know in advance exactly how
many bytes to read. Also, it defines and implements some optional PROXY
protocol V2 extensions to send information about downstream SSL/TLS
connections. Support for PROXY protocol V1 remains unchanged.
On the mailing list, seri0528@naver.com reported an issue when
using balance url_param or balance uri. The request would sometimes
stall forever.
Cyril Bont managed to reproduce it with the configuration below :
listen test :80
mode http
balance url_param q
hash-type consistent
server s demo.1wt.eu:80
and found it appeared with this commit : 80a92c0 ("BUG/MEDIUM: http:
don't start to forward request data before the connect").
The bug is subtle but real. The problem is that the HTTP request
forwarding analyzer refrains from starting to parse the request
body when some LB algorithms might need the body contents, in order
to preserve the data pointer and avoid moving things around during
analysis in case a redispatch is later needed. And in order to detect
that the connection establishes, it watches the response channel's
CF_READ_ATTACHED flag.
The problem is that a request analyzer is not subscribed to a response
channel, so it will only see changes when woken for other (generally
correlated) reasons, such as the fact that part of the request could
be sent. And since the CF_READ_ATTACHED flag is cleared once leaving
process_session(), it is important not to miss it. It simply happens
that sometimes the server starts to respond in a sequence that validates
the connection in the middle of process_session(), that it is detected
after the analysers, and that the newly assigned CF_READ_ATTACHED is
not used to detect that the request analysers need to be called again,
then the flag is lost.
The CF_WAKE_WRITE flag doesn't work either because it's cleared upon
entry into process_session(), ie if we spend more than one call not
connecting.
Thus we need a new flag to tell the connection initiator that we are
specifically interested in being notified about connection establishment.
This new flag is CF_WAKE_CONNECT. It is set by the requester, and is
cleared once the connection succeeds, where CF_WAKE_ONCE is set instead,
causing the request analysers to be scanned again.
For future versions, some better options will have to be considered :
- let all analysers subscribe to both request and response events ;
- let analysers subscribe to stream interface events (reduces number
of useless calls)
- change CF_WAKE_WRITE's semantics to persist across calls to
process_session(), but that is different from validating a
connection establishment (eg: no data sent, or no data to send)
The bug was introduced in 1.5-dev23, no backport is needed.
Till now we used to return a pointer to a rule, but that makes it
complicated to later add support for registering new actions which
may fail. For example, the redirect may fail if the response is too
large to fit into the buffer.
So instead let's return a verdict. But we needed the pointer to the
last rule to get the address of a redirect and to get the realm used
by the auth page. So these pieces of code have moved into the function
and they produce a verdict.
Last fix did address the issue for inlined patterns, but it was not
enough because the flags are lost as well when updating patterns
dynamically over the CLI.
Also if the same file was used once with -i and another time without
-i, their references would have been merged and both would have used
the same matching method.
It's appear that the patterns have two types of flags. The first
ones are relative to the pattern matching, and the second are
relative to the pattern storage. The pattern matching flags are
the same for all the patterns of one expression. Now they are
stored in the expression. The storage flags are information
returned by the pattern mathing function. This information is
relative to each entry and is stored in the "struct pattern".
Now, the expression matching flags are forwarded to the parse
and index functions. These flags are stored during the
configuration parsing, and they are used during the parse and
index actions.
This issue was introduced in dev23 with the major pattern rework,
and is a continuation of commit a631fc8 ("BUG/MAJOR: patterns: -i
and -n are ignored for inlined patterns"). No backport is needed.
Using the previous callback, it's trivial to block the heartbeat attack,
first we control the message length, then we emit an SSL error if it is
out of bounds. A special log is emitted, indicating that a heartbleed
attack was stopped so that they are not confused with other failures.
That way, haproxy can protect itself even when running on an unpatched
SSL stack. Tests performed with openssl-1.0.1c indicate a total success.
Users have seen a huge increase in the rate of SSL handshake failures
starting from 2014/04/08 with the release of the Heartbleed OpenSSL
vulnerability (CVE-2014-0160). Haproxy can detect that a heartbeat
was received in the incoming handshake, and such heartbeats are not
supposed to be common, so let's log a different message when a
handshake error happens after a heartbeat is detected.
This patch only adds the new message and the new code.
The http_(res|req)_keywords_register() functions allow to register
new keywords.
You need to declare a keyword list:
struct http_req_action_kw_list test_kws = {
.scope = "testscope",
.kw = {
{ "test", parse_test },
{ NULL, NULL },
}
};
and a parsing function:
int parse_test(const char **args, int *cur_arg, struct proxy *px, struct http_req_rule *rule, char **err)
{
rule->action = HTTP_REQ_ACT_CUSTOM_STOP;
rule->action_ptr = action_function;
return 0;
}
http_req_keywords_register(&test_kws);
The HTTP_REQ_ACT_CUSTOM_STOP action stops evaluation of rules after
your rule, HTTP_REQ_ACT_CUSTOM_CONT permits the evaluation of rules
after your rule.
This patch allows manipulation of ACL and MAP content thanks to any
information available in a session: source IP address, HTTP request or
response header, etc...
It's an update "on the fly" of the content of the map/acls. This means
it does not resist to reload or restart of HAProxy.
Finn Arne Gangstad suggested that we should have the ability to break
keep-alive when the target server has reached its maxconn and that a
number of connections are present in the queue. After some discussion
around his proposed patch, the following solution was suggested : have
a per-proxy setting to fix a limit to the number of queued connections
on a server after which we break keep-alive. This ensures that even in
high latency networks where keep-alive is beneficial, we try to find a
different server.
This patch is partially based on his original proposal and implements
this configurable threshold.
There are still some pending issues in the gzip compressor, and fixing
them requires a better handling of intermediate parsing states.
Another issue to deal with is the rewinding of a buffer during a redispatch
when a load balancing algorithm involves L7 data because the exact amount of
data to rewind is not clear. At the moment, this is handled by unwinding all
pending data, which cannot work in responses due to pipelining.
Last, having a first analysis which parses the body and another one which
restarts from where the parsing was left is wrong. Right now it only works
because we never both parse and transform in the same direction. But that
is wrong anyway.
In order to address the first issue, we'll have to use msg->eoh + msg->eol
to find the end of headers, and we still need to store the information about
the forwarded header length somewhere (msg->sol might be reused for this).
msg->sov may only be used for the start of data and not for subsequent chunks
if possible. This first implies that we stop sharing it with header length,
and stop using msg->sol there. In fact we don't need it already as it is
always zero when reaching the HTTP_MSG_BODY state. It was only updated to
reflect a copy of msg->sov.
So now as a first step into that direction, this patch ensure that msg->sol
is never re-assigned after being set to zero and is not used anymore when
we're dealing with HTTP processing and forwarding. We'll later reuse it
differently but for now it's secured.
The patch does nothing magic, it only removes msg->sol everywhere it was
already zero and avoids setting it. In order to keep the sov-sol difference,
it now resets sov after forwarding data. In theory there's no problem here,
but the patch is still tagged major because that code is complex.
One of the issues we face when we need to either forward headers only
before compressing, or rewind the stream during a redispatch is to know
the proper length of the request headers. msg->eoh always has the total
length up to the last CRLF, and we never know whether the request ended
with a single LF or a standard CRLF. This makes it hard to rewind the
headers without explicitly checking the bytes in the buffer.
Instead of doing so, we now use msg->eol to carry the length of the last
CRLF (either 1 or 2). Since it is not modified at all after HTTP_MSG_BODY,
and was only left in an undefined state, it is safe to use at any moment.
Thus, the complete header length to forward or to rewind now is always
msg->eoh + msg->eol.
This is the continuation of previous patch. Now that full buffers are
not rejected anymore, let's wait for at least the advertised chunk or
body length to be present or the buffer to be full. When either
condition is met, the message processing can go forward.
Thus we don't need to use url_param_post_limit anymore, which was passed
in the configuration as an optionnal <max_wait> parameter after the
"check_post" value. This setting was necessary when the feature was
implemented because there was no support for parsing message bodies.
The argument is now silently ignored if set in the configuration.