If the acl keyword is a "fetch", the dedicated parsing function
"sample_parse_expr()" is used. Otherwise, the acl parsing function
"parse_acl_expr()" is extended to understand the syntax of a series
of converters placed after the "fetch" keyword.
Before this patch, each acl uses a "struct sample_fetch" and executes
it with the "<fetch>->process()" function. Now, the dedicated function
"sample_process()" is called.
These syntax are now avalaible:
acl bad req.hdr(host),lower -m str www
http-request redirect prefix /go-away if bad
acl bad hdr_beg(host),lower www
http-request redirect prefix /go-away if bad
Some errors were still reported as log-format instead of their respective
contexts (acl, request header, stick, ...). This is harmless and does not
require any backport.
When parsing track-sc* actions in tcp-request rules, we now automatically
compute the track-sc identifier number using %d when displaying an error
message. But the ID has become wrong since we introduced sc0, we continue
to report id+1 in error messages causing some confusion.
No backport is needed.
A very old bug resulting from some code refactoring causes
assign_server_address() to refrain from retrieving the destination
address from the client-side connection when transparent mode is
enabled and we're connecting to a server which has address 0.0.0.0.
The impact is low since such configurations are unlikely to ever
be encountered. The fix should be backported to older branches.
When a server tracks another one, its state on the stats page always reports
"via xx/yy". That's convenient to know what server to act on to change the
state. But it is also possible to force the tracking server itself into
maintenance mode and in this case we should not report "via xx/yy" because
the tracked server can't do anything to change the server's state, which
is confusing. In practice there is nothing wrong in leaving it as-is,
except that it's highly misleading when looking at the stats page.
Note that we only change the HTML output, not the CSV one. The states are
already different : "MAINT" vs "MAINT(via)" and we expect anyone coding a
monitoring system based on the CSV output to know the differences between
all possible states.
This is the continuation of previous fix bc16cd8 "BUG/MAJOR: fix haproxy
crash when using server tracking instead of checks", the soft-stop/start
states were not addressed by this fix.
Igor at owind reported a very recent bug (just present in latest snapshot).
Commit "4a741432 MEDIUM: Paramatise functions over the check of a server"
causes up/down to die with tracked servers due to a typo.
The following call in set_server_down causes the server to put itself
down recurseively because "check" is the current server's check, so once
fed to the function again, it will pass through the exact same path (note
we have the exact symmetry in set_server_up) :
for (srv = s->tracknext; srv; srv = srv->tracknext)
if (!(srv->state & SRV_MAINTAIN))
/* Only notify tracking servers that are not already in maintenance. */
set_server_down(check);
Instead we should stop the tracking server being visited in the loop :
for (srv = s->tracknext; srv; srv = srv->tracknext)
if (!(srv->state & SRV_MAINTAIN))
/* Only notify tracking servers that are not already in maintenance. */
set_server_down(&srv->check);
But that's not exactly enough because srv->check->server is only set when
checks are enabled, so ->server is NULL for tracking servers, still causing a
crash upon first iteration. The fix is easy and consists in always initializing
check->server when creating a new server, which is what was already done a few
patches later by 69d29f9 (MEDIUM: cfgparse: Factor out check initialisation).
With the fix above alone on top of current version or snapshot 20131122, the
problem disappears.
Thanks to Igor for testing and reporting the issue.
While analysing old bug (9d9179b) with Emeric, we first believed
that the fix was wrong and that there was a potential for learning
one extra character in the peers learning code for strings due to
the use of table->key_size instead of table->key_size-1. In fact it
cannot happen with a normally behaving sender because the key sizes
are compared when synchronizing the table.
But this unveiled a suboptimal handling of strings. It can be quite
common to see admins reload haproxy to increase some key sizes when
seeing that user agents or cookies get truncated, or conversely to
reduce them after seeing they take too much memory and are never full.
The problem is that this will get rid of the table's contents because
of the size mismatch. While this is understandable for properly
formatted data (eg: IP addresses, integers, SSLIDs...) it's too bad
for strings.
So instead, make an exception to accept string of incompatible lengths
and let the synchronization code truncate them to the appropriate size
just as if the keys were learned normally.
Thanks to this change, it is now possible to change the "len" parameter
of a string stick-table and restart without losing its contents.
Both of them are rare and are detected from the same flags source, so
let's detect errors in the handshake loop and remove two tests in the
fast path. This seems to improve overall performance by less than 0.5%
on connection-bound workloads.
Add a DRAIN sub-state for a server which
will be shown on the stats page instead of UP if
its effective weight is zero.
Also, log if a server enters or leaves the DRAIN state
as the result of an agent check.
Signed-off-by: Simon Horman <horms@verge.net.au>
The syntax of this new commands are:
enable agent <backend>/<server>
disable agent <backend>/<server>
These commands allow temporarily stopping and subsequently
re-starting an auxiliary agent check. The effect of this is as follows:
New checks are only initialised when the agent is in the enabled. Thus,
disable agent will prevent any new agent checks from begin initiated until
the agent re-enabled using enable agent.
When an agent is disabled the processing of an auxiliary agent check that
was initiated while the agent was set as enabled is as follows: All
results that would alter the weight, specifically "drain" or a weight
returned by the agent, are ignored. The processing of agent check is
otherwise unchanged.
The motivation for this feature is to allow the weight changing effects
of the agent checks to be paused to allow the weight of a server to be
configured using set weight without being overridden by the agent.
Signed-off-by: Simon Horman <horms@verge.net.au>
This is achieved by moving rise and fall from struct server to struct check.
After this move the behaviour of the primary check, server->check is
unchanged. However, the secondary agent check, server->agent now has
independent rise and fall values each of which are set to 1.
The result is that receiving "fail", "stopped" or "down" just once from the
agent will mark the server as down. And receiving a weight just once will
allow the server to be marked up if its primary check is in good health.
This opens up the scope to allow the rise and fall values of the agent
check to be configurable, however this has not been implemented at this
stage.
Signed-off-by: Simon Horman <horms@verge.net.au>
In the case where agent-port is used and the agent
check is a secondary check to not mark a server as down
if the agent becomes unavailable.
In this configuration the agent should only cause a server to be marked
as down if the agent returns "fail", "stopped" or "down".
Signed-off-by: Simon Horman <horms@verge.net.au>
Allow an auxiliary agent check to be run independently of the
regular a regular health check. This is enabled by the agent-check
server setting.
The agent-port, which specifies the TCP port to use for the agent's
connections, is required.
The agent-inter, which specifies the interval between agent checks and
timeout of agent checks, is optional. If not set the value for regular
checks is used.
e.g.
server web1_1 127.0.0.1:80 check agent-port 10000
If either the health or agent check determines that a server is down
then it is marked as being down, otherwise it is marked as being up.
An agent health check performed by opening a TCP socket and reading an
ASCII string. The string should have one of the following forms:
* An ASCII representation of an positive integer percentage.
e.g. "75%"
Values in this format will set the weight proportional to the initial
weight of a server as configured when haproxy starts.
* The string "drain".
This will cause the weight of a server to be set to 0, and thus it
will not accept any new connections other than those that are
accepted via persistence.
* The string "down", optionally followed by a description string.
Mark the server as down and log the description string as the reason.
* The string "stopped", optionally followed by a description string.
This currently has the same behaviour as "down".
* The string "fail", optionally followed by a description string.
This currently has the same behaviour as "down".
Signed-off-by: Simon Horman <horms@verge.net.au>
Remove option lb-agent-chk and thus the facility to configure
a stand-alone agent health check. This feature was added by
"MEDIUM: checks: Add agent health check". It will be replaced
by subsequent patches with a features to allow an agent check
to be run as either a secondary check, along with any of the existing
checks, or as part of an http check with the status returned
in an HTTP header.
This patch does not entirely revert "MEDIUM: checks: Add agent health
check". The infrastructure it provides to parse the results of an
agent health check remains and will be re-used by the planned features
that are mentioned above.
Signed-off-by: Simon Horman <horms@verge.net.au>
In the case where an agent check returns fail, stopped or down,
log this as info when logging the server status along with any
trailing message returned by the agent after fail, stopped or down.
Previously only the trailing message was logged as info and
if omitted no info was logged.
Signed-off-by: Simon Horman <horms@verge.net.au>
Send SIGINT to child processes when killed. This ensures that
the haproxy process managed by the systemd-wrapper is stopped
when "systemctl stop haproxy.service" is called.
The column used to report the throttle percentage when a server is in
slowstart is based on the time only. This is wrong, because server weights
in slowstart are updated at most once a second, so the reported value is
wrong at least fo rone second during each step, which means all the time
when using short delays (< 20s).
The second point is that it's disturbing to see a weight < 100% without
any throttle at the end of the period (during the last second), because
the effective weight has not yet been updated.
Instead, we now compute the exact ratio between eweight and uweight and
report it. It's always accurate and describes the value being used instead
of using only the date.
It can be backported to 1.4 though it's not particularly important.
A crash was reported by Igor at owind when changing a server's weight
on the CLI. Lukas Tribus could reproduce a related bug where setting
a server's weight would result in the new weight being multiplied by
the initial one. The two bugs are the same.
The incorrect weight calculation results in the total farm weight being
larger than what was initially allocated, causing the map index to be out
of bounds on some hashes. It's easy to reproduce using "balance url_param"
with a variable param, or with "balance static-rr".
It appears that the calculation is made at many places and is not always
right and not always wrong the same way. Thus, this patch introduces a
new function "server_recalc_eweight()" which is dedicated to this task
of computing ->eweight from many other elements including uweight and
current time (for slowstart), and all users now switch to use this
function.
The patch is a bit large but the code was not trivially fixable in a way
that could guarantee this situation would not occur anymore. The fix is
much more readable and has been verified to work with all algorithms,
with both consistent and map-based hashes, and even with static-rr.
Slowstart was tested as well, just like enable/disable server.
The same bug is very likely present in 1.4 as well, so the patch will
probably need to be backported eventhough it will not apply as-is.
Thanks to Lukas and Igor for the information they provided to reproduce it.
Commit 2e99390 (BUG/MEDIUM: checks: fix slowstart behaviour when server
tracking is in use) moved the slowstart task initialization within the
health check code and leaves it unset when checks are disabled. The
problem is that it's possible to trigger slowstart from the CLI by
issuing "disable server XXX / enable server XXX" even when checks are
disabled. The result is a crash when trying to wake up the slowstart
task of that server.
Move the task initialization earlier so that it is done even if the
checks are disabled.
This patch should be backported to 1.4 since the commit above was
backported there.
Commit c08057c does the align job for buffer_dump(), but it has not fixed the
issue that less than 8 characters are left in the last line as below:
Dumping contents from byte 0 to byte 119
0 1 2 3 4 5 6 7 8 9 a b c d e f
0000: 47 45 54 20 2f 69 6e 64 - 65 78 2e 68 74 6d 20 48 GET /index.htm H
0010: 54 54 50 2f 31 2e 30 0d - 0a 55 73 65 72 2d 41 67 TTP/1.0..User-Ag
...
0060: 6e 65 63 74 69 6f 6e 3a - 20 4b 65 65 70 2d 41 6c nection: Keep-Al
0070: 69 76 65 0d 0a 0d 0a ive....
The last line of the hex column is still overlapped by the text column. Since
there will be additional "- " for the output line which has no less than 8
characters, two additional spaces should be present when there is less than 8
characters in order to do alignment. The result after being fixed is as below:
Dumping contents from byte 0 to byte 119
0 1 2 3 4 5 6 7 8 9 a b c d e f
0000: 47 45 54 20 2f 69 6e 64 - 65 78 2e 68 74 6d 20 48 GET /index.htm H
0010: 54 54 50 2f 31 2e 30 0d - 0a 55 73 65 72 2d 41 67 TTP/1.0..User-Ag
...
0060: 6e 65 63 74 69 6f 6e 3a - 20 4b 65 65 70 2d 41 6c nection: Keep-Al
0070: 69 76 65 0d 0a 0d 0a ive....
Signed-off-by: Godbach <nylzhaowei@gmail.com>
Summary:
Added a document for hashing under internal docs explaining
hashing in haproxy along with the results of tests under the test
folder.
These documents together explain the motivation for adding
options for hashing algorithms with the option of enabling or
disabling of avalanche.
This is in preparation for associating a agent check
with a server which runs as well as the server's existing check.
Signed-off-by: Simon Horman <horms@verge.net.au>
Add state to struct check. This is currently used to store one bit,
CHK_RUNNING, which is set if a check is running and clear otherwise.
This bit was previously SRV_CHK_RUNNING of the state element of struct
server.
This is in preparation for associating a agent check
with a server which runs as well as the server's existing check.
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Paramatise the following functions over the check of a server
* set_server_down
* set_server_up
* srv_getinter
* server_status_printf
* set_server_check_status
* set_server_disabled
* set_server_enabled
Generally the server parameter of these functions has been removed.
Where it is still needed it is obtained using check->server.
This is in preparation for associating a agent check
with a server which runs as well as the server's existing check.
By paramatising these functions they may act on each of the checks
without further significant modification.
Explanation of the SSP_O_HCHK portion of this change:
* Prior to this patch SSP_O_HCHK serves a single purpose which
is to tell server_status_printf() weather it should print
the details of the check of a server or not.
With the paramatisation that this patch adds there are two cases.
1) Printing the details of the check in which case a
valid check parameter is needed.
2) Not printing the details of the check in which case
the contents check parameter are unused.
In case 1) we could pass SSP_O_HCHK and a valid check and;
In case 2) we could pass !SSP_O_HCHK and any value for check
including NULL.
If NULL is used for case 2) then SSP_O_HCHK becomes supurfulous
and as NULL is used for case 2) SSP_O_HCHK has been removed.
Signed-off-by: Simon Horman <horms@verge.net.au>
Move result element from struct server to struct check
This allows check results to be independent of the check's server.
This is in preparation for associating a agent check
with a server which runs as well as the server's existing check.
Signed-off-by: Simon Horman <horms@verge.net.au>
This is in preparation for associating a agent check
with a server which runs as well as the server's existing check.
The split has been made by:
* Moving elements of struct server's check element that will
be shared by both checks into a new check_common element
of struct server.
* Moving the remaining elements to a new struct check and
making struct server's check element a struct check.
* Adding a server element to struct check, a back-pointer
to the server element it is a member of.
- At this time the server could be obtained using
container_of, however, this will not be so easy
once a second struct check element is added to struct server
to accommodate an agent health check.
Signed-off-by: Simon Horman <horms@verge.net.au>
This was inadvertently added by "MEDIUM: checks: Add agent health check".
It appears to have never been used.
Signed-off-by: Simon Horman <horms@verge.net.au>
commit 39c63c5 "url32+src - like base32+src but whole url including parameters"
was missing the last argument "const char *kw", resulting in the build warning
below :
src/proto_http.c:10351:2: warning: initialization from incompatible pointer type [enabled by default]
src/proto_http.c:10351:2: warning: (near initialization for 'sample_fetch_keywords.kw[50].process') [enabled by default]
src/proto_http.c:10352:2: warning: initialization from incompatible pointer type [enabled by default]
src/proto_http.c:10352:2: warning: (near initialization for 'sample_fetch_keywords.kw[51].process') [enabled by default]
It's harmless since it's not needed there anyway.
Baptiste Assmann reported a bug affecting the "http-request redirect"
parser. It may randomly crash when reporting an error message if the
syntax is not OK. It happens that this is caused by the output error
message pointer which was not initialized to NULL.
This bug is 1.5-specific (introduced in dev17), no backport is needed.
I have a need to limit traffic to each url from each source address. much
like base32+src but the whole url including parameters (this came from
looking at the recent 'Haproxy rate limit per matching request' thread)
attached is patch that seems to do the job, its a copy and paste job of the
base32 functions
the url32 function seems to work too and using 2 machines to request the
same url locks me out of both if I abuse from either with the url32 key
function and only the one if I use url32_src.
Neil
The reqdeny/reqtarpit and http-request deny/tarpit were using
a copy-paste of the error handling code because originally the
req* actions used to maintain their own stats. This is not the
case anymore so we can use the same error blocks for both.
The http-request rulesets still has precedence over req* so no
functionality was changed.
The reqdeny/reqideny and reqtarpit/reqitarpit rules used to maintain
the stats counters themselves while http-request deny/tarpit and
rspdeny/rspideny used to centralize them at the point where the
error is processed.
Thus, let's do the same for reqdeny/reqtarpit so that the functions
which iterate over the rules do not have to deal with these counters
anymore.
When a connection is tarpitted, a denied req is counted once when the
action is applied, and then a failed req is counted when the tarpit
timeout expires. This is completely wrong as the tarpit is exactly
equivalent to a deny since it's a disguised deny.
So let's not increment the failed req anymore.
This fix may be backported to 1.4 which has the same issue.
Commit 986a9d2d12 moved the source address from the stream interface
to the session, but it did not set the flag on the connection to
report that the source address is known. Thus when logs are enabled,
we had a call to getpeername() which is redundant with the result
from accept(). This patch simply sets the flag.
This function was designed for haproxy while testing other functions
in the past. Initially it was not planned to be used given the not
very interesting numbers it showed on real URL data : it is not as
smooth as the other ones. But later tests showed that the other ones
are extremely sensible to the server count and the type of input data,
especially DJB2 which must not be used on numeric input. So in fact
this function is still a generally average performer and it can make
sense to merge it in the end, as it can provide an alternative to
sdbm+avalanche or djb2+avalanche for consistent hashing or when hashing
on numeric data such as a source IP address or a visitor identifier in
a URL parameter.
Summary:
Avalanche is supported not as a native hashing choice, but a modifier
on the hashing function. Note that this means that possible configs
written after 1.5-dev4 using "hash-type avalanche" will get an informative
error instead. But as discussed on the mailing list it seems nobody ever
used it anyway, so let's fix it before the final 1.5 release.
The default values were selected for backward compatibility with previous
releases, as discussed on the mailing list, which means that the consistent
hashing will still apply the avalanche hash by default when no explicit
algorithm is specified.
Examples
(default) hash-type map-based
Map based hashing using sdbm without avalanche
(default) hash-type consistent
Consistent hashing using sdbm with avalanche
Additional Examples:
(a) hash-type map-based sdbm
Same as default for map-based above
(b) hash-type map-based sdbm avalanche
Map based hashing using sdbm with avalanche
(c) hash-type map-based djb2
Map based hashing using djb2 without avalanche
(d) hash-type map-based djb2 avalanche
Map based hashing using djb2 with avalanche
(e) hash-type consistent sdbm avalanche
Same as default for consistent above
(f) hash-type consistent sdbm
Consistent hashing using sdbm without avalanche
(g) hash-type consistent djb2
Consistent hashing using djb2 without avalanche
(h) hash-type consistent djb2 avalanche
Consistent hashing using djb2 with avalanche
Summary:
In testing at tumblr, we found that using djb2 hashing instead of the
default sdbm hashing resulted is better workload distribution to our backends.
This commit implements a change, that allows the user to specify the hash
function they want to use. It does not limit itself to consistent hashing
scenarios.
The supported hash functions are sdbm (default), and djb2.
For a discussion of the feature and analysis, see mailing list thread
"Consistent hashing alternative to sdbm" :
http://marc.info/?l=haproxy&m=138213693909219
Note: This change does NOT make changes to new features, for instance,
applying an avalance hashing always being performed before applying
consistent hashing.