This patchs rename the "regex_exec" to "regex_exec2". It add a new
"regex_exec", "regex_exec_match" and "regex_exec_match2" function. This
function can match regex and return array containing matching parts.
Otherwise, this function use the compiled method (JIT or PCRE or POSIX).
JIT require a subject with length. PCREPOSIX and native POSIX regex
require a null terminted subject. The regex_exec* function are splited
in two version. The first version take a null terminated string, but it
execute strlen() on the subject if it is compiled with JIT. The second
version (terminated by "2") take the subject and the length. This
version adds a null character in the subject if it is compiled with
PCREPOSIX or native POSIX functions.
The documentation of posix regex and pcreposix says that the function
returns 0 if the string matche otherwise it returns REG_NOMATCH. The
REG_NOMATCH macro take the value 1 with posix regex and the value 17
with the pcreposix. The documentaion of the native pcre API (used with
JIT) returns a negative number if no match, otherwise, it returns 0 or a
positive number.
This patch fix also the return codes of the regex_exec* functions. Now,
these function returns true if the string match, otherwise it returns
false.
When I renamed the modify-header action to replace-value, one of them
was mistakenly set to "replace-val" instead. Additionally, differentiation
of the two actions must be done on args[0][8] and not *args[8]. Thanks
Thierry for spotting...
This patch adds two new actions to http-request and http-response rulesets :
- replace-header : replace a whole header line, suited for headers
which might contain commas
- replace-value : replace a single header value, suited for headers
defined as lists.
The match consists in a regex, and the replacement string takes a log-format
and supports back-references.
The time statistics computed by previous patches are now reported in the
HTML stats in the tips related to the total sessions for backend and servers,
and as separate columns for the CSV stats.
Using the last rate counters, we now compute the queue, connect, response
and total times per server and per backend with a 95% accuracy over the last
1024 samples. The operation is cheap so we don't need to condition it.
While the current functions report average event counts per period, we are
also interested in average values per event. For this we use a different
method. The principle is to rely on a long tail which sums the new value
with a fraction of the previous value, resulting in a sliding window of
infinite length depending on the precision we're interested in.
The idea is that we always keep (N-1)/N of the sum and add the new sampled
value. The sum over N values can be computed with a simple program for a
constant value 1 at each iteration :
N
,---
\ N - 1 e - 1
> ( --------- )^x ~= N * -----
/ N e
'---
x = 1
Note: I'm not sure how to demonstrate this but at least this is easily
verified with a simple program, the sum equals N * 0.632120 for any N
moderately large (tens to hundreds).
Inserting a constant sample value V here simply results in :
sum = V * N * (e - 1) / e
But we don't want to integrate over a small period, but infinitely. Let's
cut the infinity in P periods of N values. Each period M is exactly the same
as period M-1 with a factor of ((N-1)/N)^N applied. A test shows that given a
large N :
N - 1 1
( ------- )^N ~= ---
N e
Our sum is now a sum of each factor times :
N*P P
,--- ,---
\ N - 1 e - 1 \ 1
> v ( --------- )^x ~= VN * ----- * > ---
/ N e / e^x
'--- '---
x = 1 x = 0
For P "large enough", in tests we get this :
P
,---
\ 1 e
> --- ~= -----
/ e^x e - 1
'---
x = 0
This simplifies the sum above :
N*P
,---
\ N - 1
> v ( --------- )^x = VN
/ N
'---
x = 1
So basically by summing values and applying the last result an (N-1)/N factor
we just get N times the values over the long term, so we can recover the
constant value V by dividing by N.
A value added at the entry of the sliding window of N values will thus be
reduced to 1/e or 36.7% after N terms have been added. After a second batch,
it will only be 1/e^2, or 13.5%, and so on. So practically speaking, each
old period of N values represents only a quickly fading ratio of the global
sum :
period ratio
1 36.7%
2 13.5%
3 4.98%
4 1.83%
5 0.67%
6 0.25%
7 0.09%
8 0.033%
9 0.012%
10 0.0045%
So after 10N samples, the initial value has already faded out by a factor of
22026, which is quite fast. If the sliding window is 1024 samples wide, it
means that a sample will only count for 1/22k of its initial value after 10k
samples went after it, which results in half of the value it would represent
using an arithmetic mean. The benefit of this method is that it's very cheap
in terms of computations when N is a power of two. This is very well suited
to record response times as large values will fade out faster than with an
arithmetic mean and will depend on sample count and not time.
Demonstrating all the above assumptions with maths instead of a program is
left as an exercise for the reader.
Now that we can quote unsafe string, it becomes possible to dump the health
check responses on the CSV page as well. The two new fields are "last_chk"
and "last_agt".
Indicate that the text cells in the CSV format may contain quotes to
escape ambiguous texts. We don't have this case right now since we limit
the output, but it may happen in the future.
qstr() and cstr() will be used to quote-encode strings. The first one
does it unconditionally. The second one is aimed at CSV files where the
quote-encoding is only needed when the field contains a quote or a comma.
This helper is similar to addr_to_str but
tries to convert the port rather than the address
of a struct sockaddr_storage.
This is in preparation for supporting
an external agent check.
Signed-off-by: Simon Horman <horms@verge.net.au>
The "accept-proxy" statement of bind lines was still limited to version
1 of the protocol, while send-proxy-v2 is now available on the server
lines. This patch adds support for parsing v2 of the protocol on incoming
connections. The v2 header is automatically recognized so there is no
need for a new option.
This is in order to simplify the PPv2 header parsing code to look more
like the one provided as an example in the spec. No code change was
performed beyond just merging the proxy_addr union into the proxy_hdr_v2
struct.
If haproxy receives a connection over a unix socket and forwards it to
another haproxy instance using proxy protocol v1, it sends an UNKNOWN
protocol, which is rejected by the other side. Make the receiver accept
the UNKNOWN protocol as per the spec, and only use the local connection's
address for this.
As discussed with Dmitry Sivachenko, is a server farm has more than one
active server, uses a guaranteed non-determinist algorithm (round robin),
and a connection was initiated from a non-persistent connection, there's
no point insisting to reconnect to the same server after a connect failure,
better redispatch upon the very first retry instead of insisting on the same
server multiple times.
The retry delay is only useful when sticking to a same server. During
a redispatch, it's useless and counter-productive if we're sure to
switch to another server, which is almost guaranteed when there's
more than one server and the balancing algorithm is round robin, so
better not pass via the turn-around state in this case. It could be
done as well for leastconn, but there's a risk of always killing the
delay after the recovery of a server in a farm where it's almost
guaranteed to take most incoming traffic. So better only kill the
delay when using round robin.
As discussed with Dmitry Sivachenko, the default 1-second connect retry
delay can be large for situations where the connect timeout is much smaller,
because it means that an active connection reject will take more time to be
retried than a silent drop, and that does not make sense.
This patch changes this so that the retry delay is the minimum of 1 second
and the connect timeout. That way people running with sub-second connect
timeout will benefit from the shorter reconnect.
This new directive captures the specified fetch expression, converts
it to text and puts it into the next capture slot. The capture slots
are shared with header captures so that it is possible to dump all
captures at once or selectively in logs and header processing.
The purpose is to permit logs to contain whatever payload is found in
a request, for example bytes at a fixed location or the SNI of forwarded
SSL traffic.
This patch adds support for captures with no header name. The purpose
is to allow extra captures to be defined and logged along with the
header captures.
Currently, all callers to sample_fetch_string() call it with SMP_OPT_FINAL.
Now we improve it to support the case where this option is not set, and to
make it return the original sample as-is. The purpose is to let the caller
check the SMP_F_MAY_CHANGE flag in the result and know that it should wait
to get complete contents. Currently this has no effect on existing code.
Similar to previous patches, HTTP header captures are performed when
a TCP frontend switches to an HTTP backend, but are not possible to
report. So let's relax the check to explicitly allow them to be present
in TCP frontends.
Log format is defined in the frontend, and some frontends may be chained to
an HTTP backend. Sometimes it's very convenient to be able to log the HTTP
status code of these HTTP backends. This status is definitely present in
the internal structures, it's just that we used to limit it to be used in
HTTP frontends. So let's simply relax the check to allow it to be used in
TCP frontends as well.
In OpenSSL, the name of a cipher using ephemeral diffie-hellman for key
exchange can start with EDH, but also DHE, EXP-EDH or EXP1024-DHE.
We work around this issue by using the cipher's description instead of
the cipher's name.
Hopefully the description is less likely to change in the future.
When no static DH parameters are specified, this patch makes haproxy
use standardized (rfc 2409 / rfc 3526) DH parameters with prime lenghts
of 1024, 2048, 4096 or 8192 bits for DHE key exchange. The size of the
temporary/ephemeral DH key is computed as the minimum of the RSA/DSA server
key size and the value of a new option named tune.ssl.default-dh-param.
Using the systemd daemon mode the parent doesn't exits but waits for
his childs without closing its listening sockets.
As linux 3.9 introduced a SO_REUSEPORT option (always enabled in
haproxy if available) this will give unhandled connections problems
after an haproxy reload with open connections.
The problem is that when on reload a new parent is started (-Ds
$oldchildspids), in haproxy.c main there's a call to start_proxies
that, without SO_REUSEPORT, should fail (as the old processes are
already listening) and so a SIGTOU is sent to old processes. On this
signal the old childs will call (in pause_listener) a shutdown() on
the listening fd. From my tests (if I understand it correctly) this
affects the in kernel file (so the listen is really disabled for all
the processes, also the parent).
Instead, with SO_REUSEPORT, the call to start_proxies doesn't fail and
so SIGTOU is never sent. Only SIGUSR1 is sent and the listen isn't
disabled for the parent but only the childs will stop listening (with
a call to close())
So, with SO_REUSEPORT, the old childs will close their listening
sockets but will wait for the current connections to finish or
timeout, and, as their parent has its listening socket open, the
kernel will schedule some connections on it. These connections will
never be accepted by the parent as it's in the waitpid loop.
This fix will close all the listeners on the parent before entering the
waitpid loop.
Signed-off-by: Simone Gotti <simone.gotti@gmail.com>
Richard Russo reported that the example code in the PP spec is wrong
now that we slightly changed the format to merge <ver> and <cmd>. Also
rename the field <ver_cmd> to avoid any ambiguity on the usage.
MySQL will in stop supporting pre-4.1 authentication packets in the future
and is already giving us a hard time regarding non-silencable warnings
which are logged on each health check. Warnings look like the following:
"[Warning] Client failed to provide its character set. 'latin1' will be used
as client character set."
This patch adds basic support for post-4.1 authentication by sending the proper
authentication packet with the character set, along with the QUIT command.
Commit b1982e2 ("BUG/MEDIUM: http/session: disable client-side expiration
only after body") was tricky and caused an issue which was fixed by commit
0943757 ("BUG/MEDIUM: session: don't clear CF_READ_NOEXP if analysers are
not called"). But that's not enough, another issue was introduced and further
emphasized by last fix.
The issue is that the CF_READ_NOEXP flag needs to be cleared when waiting
for a new request over that connection, otherwise we cannot expire anymore
an idle connection waiting for a new request.
This explains the neverending keepalives reported by at least 3 different
persons since dev24. No backport is needed.
As reported by Vincent Bernat and Ryan O'Hara, building haproxy with the
option above causes this :
src/dumpstats.c: In function 'stats_dump_sv_stats':
src/dumpstats.c:3059:4: error: format not a string literal and no format arguments [-Werror=format-security]
cc1: some warnings being treated as errors
make: *** [src/dumpstats.o] Error 1
With that option, gcc wants an argument after a string format even when
that string format is a const but not a litteral. It can be anything
invalid, for example an integer when a string is expected, it just
wants something. So feed it with something :-(
Dmitry Sivachenko reported that "uint" doesn't build on FreeBSD 10.
On Linux it's defined in sys/types.h and indicated as "old". Just
get rid of the very few occurrences.
Released version 1.5-dev26 with the following main changes :
- BUG/MEDIUM: polling: fix possible CPU hogging of worker processes after receiving SIGUSR1.
- BUG/MINOR: stats: fix a typo on a closing tag for a server tracking another one
- OPTIM: stats: avoid the calculation of a useless link on tracking servers in maintenance
- MINOR: fix a few memory usage errors
- CONTRIB: halog: Filter input lines by date and time through timestamp
- MINOR: ssl: SSL_CTX_set_options() and SSL_CTX_set_mode() take a long, not an int
- BUG/MEDIUM: regex: fix risk of buffer overrun in exp_replace()
- MINOR: acl: set "str" as default match for strings
- DOC: Add some precisions about acl default matching method
- MEDIUM: acl: strenghten the option parser to report invalid options
- BUG/MEDIUM: config: a stats-less config crashes in 1.5-dev25
- BUG/MINOR: checks: tcp-check must not stop on '\0' for binary checks
- MINOR: stats: improve alignment of color codes to save one line of header
- MINOR: checks: simplify and improve reporting of state changes when using log-health-checks
- MINOR: server: remove the SRV_DRAIN flag which can always be deduced
- MINOR: server: use functions to detect state changes and to update them
- MINOR: server: create srv_was_usable() from srv_is_usable() and use a pointer
- BUG/MINOR: stats: do not report "100%" in the thottle column when server is draining
- BUG/MAJOR: config: don't free valid regex memory
- BUG/MEDIUM: session: don't clear CF_READ_NOEXP if analysers are not called
- BUG/MINOR: stats: tracking servers may incorrectly report an inherited DRAIN status
- MEDIUM: proxy: make timeout parser a bit stricter
- REORG/MEDIUM: server: split server state and flags in two different variables
- REORG/MEDIUM: server: move the maintenance bits out of the server state
- MAJOR: server: use states instead of flags to store the server state
- REORG: checks: put the functions in the appropriate files !
- MEDIUM: server: properly support and propagate the maintenance status
- MEDIUM: server: allow multi-level server tracking
- CLEANUP: checks: rename the server_status_printf function
- MEDIUM: checks: simplify server up/down/nolb transitions
- MAJOR: checks: move health checks changes to set_server_check_status()
- MINOR: server: make the status reporting function support a reason
- MINOR: checks: simplify health check reporting functions
- MINOR: server: implement srv_set_stopped()
- MINOR: server: implement srv_set_running()
- MINOR: server: implement srv_set_stopping()
- MEDIUM: checks: simplify failure notification using srv_set_stopped()
- MEDIUM: checks: simplify success notification using srv_set_running()
- MEDIUM: checks: simplify stopping mode notification using srv_set_stopping()
- MEDIUM: stats: report a server's own state instead of the tracked one's
- MINOR: server: make use of srv_is_usable() instead of checking eweight
- MAJOR: checks: add support for a new "drain" administrative mode
- MINOR: stats: use the admin flags for soft enable/disable/stop/start on the web page
- MEDIUM: stats: introduce new actions to simplify admin status management
- MINOR: cli: introduce a new "set server" command
- MINOR: stats: report a distinct output for DOWN caused by agent
- MINOR: checks: support specific check reporting for the agent
- MINOR: checks: support a neutral check result
- BUG/MINOR: cli: "agent" was missing from the "enable"/"disable" help message
- MEDIUM: cli: add support for enabling/disabling health checks.
- MEDIUM: stats: report down caused by agent prior to reporting up
- MAJOR: agent: rework the response processing and support additional actions
- MINOR: stats: improve the stats web page to support more actions
- CONTRIB: halog: avoid calling time/localtime/mktime for each line
- DOC: document the workarouds for Google Chrome's bogus pre-connect
- MINOR: stats: report SSL key computations per second
- MINOR: stats: add counters for SSL cache lookups and misses
One important aspect of SSL performance tuning is the cache size,
but there's no metric to know whether it's large enough or not. This
commit introduces two counters, one for the cache lookups and another
one for cache misses. These counters are reported on "show info" on
the stats socket. This way, it suffices to see the cache misses
counter constantly grow to know that a larger cache could possibly
help.
It's commonly needed to know how many SSL asymmetric keys are computed
per second on either side (frontend or backend), and to know the SSL
session reuse ratio. Now we compute these values and report them in
"show info".
Currently exp_replace() (which is used in reqrep/reqirep) is
vulnerable to a buffer overrun. I have been able to reproduce it using
the attached configuration file and issuing the following command:
wget -O - -S -q http://localhost:8000/`perl -e 'print "a"x4000'`/cookie.php
Str was being checked only in in while (str) and it was possible to
read past that when more than one character was being accessed in the
loop.
WT:
Note that this bug is only marked MEDIUM because configurations
capable of triggering this bug are very unlikely to exist at all due
to the fact that most rewrites consist in static string additions
that largely fit into the reserved area (8kB by default).
This fix should also be backported to 1.4 and possibly even 1.3 since
it seems to have been present since 1.1 or so.
Config:
-------
global
maxconn 500
stats socket /tmp/haproxy.sock mode 600
defaults
timeout client 1000
timeout connect 5000
timeout server 5000
retries 1
option redispatch
listen stats
bind :8080
mode http
stats enable
stats uri /stats
stats show-legends
listen tcp_1
bind :8000
mode http
maxconn 400
balance roundrobin
reqrep ^([^\ :]*)\ /(.*)/(.*)\.php(.*) \1\ /\3.php?arg=\2\2\2\2\2\2\2\2\2\2\2\2\2\4
server srv1 127.0.0.1:9000 check port 9000 inter 1000 fall 1
server srv2 127.0.0.1:9001 check port 9001 inter 1000 fall 1
More and more people are complaining about the bugs experienced by
Chrome users due to the pre-connect feature and the fact that Chrome
does not monitor its connections and happily displays the error page
instead of re-opening a new connection. Since we can work around this
bug, let's document how to do it.
The last commit provides time-based filtering. Unfortunately, it wastes
90% of the time calling the expensive time()/localtime()/mktime()
functions.
This patch does 3 things :
- call time()/localtime() only once to initialize the correct
struct timeinfo ;
- call mktime() only when the time has changed regardless of
the current second.
- manually add the current second to the cached result.
Doing just this is enough to multiply the parsing speed by 8.
I wanted to make a graph with average answer time in nagios that takes only
the last 5 mn of the log. Filtering the log before using halog was too
slow, so I added that filter to halog.
The patch attached to this mail is a proposal to add a new option : -time
[min][:max]
The values are min timestamp and/or max timestamp of the lines to be used
for stats. The date and time of the log lines between '[' and ']' are
converted to timestamp and compared to these values.
Here is an exemple of usage :
cat /var/log/haproxy.log | ./halog -srv -H -q -time $(date --date '-5 min' +%s)
We now retrieve a lot of information from a single line of response, which
can be made up of various words delimited by spaces/tabs/commas. We try to
arrange all this and report whatever unusual we detect. The agent now supports :
- "up", "down", "stopped", "fail" for the operational states
- "ready", "drain", "maint" for the administrative states
- any "%" number for the weight
- an optional reason after a "#" that can be reported on the stats page
The line parser and processor should move to its own function so that
we can reuse the exact same one for http-based agent checks later.
When an agent is enabled and forces a down state, it's important to have
this exact information and to report the agent's status, so let's check
the agent before checking the health check.