The types and minimal number of ACL keyword arguments are now stored in
their declaration. This will allow many more fantasies if some ACL use
several arguments or types.
Doing so required to rework all ACL keyword declarations to add two
parameters. So this was a good opportunity for a general cleanup and
to sort all entries in alphabetical order.
We still have two pending issues :
- parse_acl_expr() checks for errors but has no way to report them to
the user ;
- the types of some arguments are still not resolved and kept as strings
(eg: ARGT_FE/BE/TAB) for compatibility reasons, which must be resolved
in acl_find_targets()
The ACL parser now uses the argument parser to build a typed argument list.
Right now arguments are all strings and only one argument is supported since
this is what ACLs currently support.
This change introduces the buffer's base pointer, which is the limit between
incoming and outgoing data. It's the point where the parsing should start
from. A number of computations have already been greatly simplified, but
more simplifications are expected to come from the removal of buf->r.
The changes appear good and have revealed occasional improper use of some
pointers. It is possible that this patch has introduced bugs or revealed
some, although preliminary testings tend to indicate that everything still
works as it should.
We don't have buf->l anymore. We have buf->i for pending data and
the total length is retrieved by adding buf->o. Some computation
already become simpler.
Despite extreme care, bugs are not excluded.
It's worth noting that msg->err_pos as set by HTTP request/response
analysers becomes relative to pending data and not to the beginning
of the buffer. This has not been completed yet so differences might
occur when outgoing data are left in the buffer.
The wrong byte was checked for the session_id length in the payload. This
used to work when the session ID was absent because zero was found there,
but when a session ID is present, there is 1/256 chance that the inspected
data contains 0x20 (the actual session ID length), so it fails.
Thanks to Emmanuel Bézagu for reporting this bug.
This bug does not need backporting, it is 1.5 specific.
Now strings and data blocks are stored in the temp_pattern's chunk
and matched against this one.
The rdp_cookie currently makes extensive use of acl_fetch_rdp_cookie()
and will be a good candidate for the initial rework so that ACLs use
the patterns framework and not the other way around.
IPv4 and IPv6 addresses are now stored into temp_pattern instead of
the dirty hack consisting into storing them into the consumer's target
address.
Some refactoring should now be possible since the methods used to fetch
source and destination addresses are similar between patterns and ACLs.
All ACL fetches which return integer value now store the result into
the temporary pattern struct. All ACL matches which rely on integer
also get their value there.
Note: the pattern data types are not set right now.
Server Name Indication (SNI) is a TLS extension which makes a client
present the name of the server it is connecting to in the client hello.
It allows a transparent proxy to take a decision based on the beginning
of an SSL/TLS stream without deciphering it.
The new ACL "req_ssl_sni" matches the name extracted from the TLS
handshake against a list of names which may be loaded from a file if
needed.
This patch introduces hdr_len, path_len and url_len for matching these
respective parts lengths against integers. This can be used to detect
abuse or empty headers.
*_dom is mostly used for matching Host headers, and host headers may
include port numbers. To avoid having to create multiple rules with
and without :<port-number> in hdr_dom rules, change the *_dom
matching functions to also handle : as a delimiter.
Typically there are rules like this in haproxy.cfg:
acl is_foo hdr_dom(host) www.foo.com
Most clients send "Host: www.foo.com" in their HTTP header, but some
send "Host: www.foo.com:80" (which is allowed), and the above
rule will now work for those clients as well.
[Note: patch was edited before merge, any unexpected bug is mine /willy]
Both Hank A. Paulson and Rob at pixsense reported a crash when
loading ACLs from a pattern file which contains empty lines.
From the tests, it appears that only files that contain nothing
but empty lines are causing that (in the past they would have had
their line feeds loaded as patterns).
The crash happens in the free_pattern() call which doesn't like to
be called with a NULL pattern. Let's make it accept it so that it's
more in line with the standard uses of free() which ignores NULLs.
Gabriel Sosa reported that haproxy unexpectedly reports an error
when a pattern file loaded by an ACL contains an empty line. The
test was present but inefficient as it did not consider the '\n'
as the end of the line. This fix relies on the line length instead.
It should be backported to 1.4.
Some functions which act on generic buffer contents without being
tcp-specific were historically in proto_tcp.c. This concerns ACLs
and RDP cookies. Those have been moved away to more appropriate
locations. Ideally we should create some new files for each layer6
protocol parser. Let's do that later.
Just like we do on health checks, we should consider that ACLs that make
use of buffer data are layer 6 and not layer 4, because we'll soon have
to distinguish between pure layer 4 ACLs (without any buffer) and these
ones.
This ACL was missing in complex setups where the status of a remote site
has to be considered in switching decisions. Until there, using a server's
status in an ACL required to have a dedicated backend, which is a bit heavy
when multiple servers have to be monitored.
Most often, pattern files used by ACLs will be produced by tools
which emit some comments (eg: geolocation lists). It's very annoying
to have to clean the files before using them, and it does not make
much sense to be able to support patterns we already can't input in
the config file. So this patch makes the pattern file loader skip
lines beginning with a sharp and the empty ones, and strips leading
spaces and tabs.
Networks patterns loaded from files for longest match ACL testing
will now be arranged into a prefix tree. This is possible thanks to
the new prefix features in ebtree v6.0. Longest match testing is
slightly slower than exact data maching. However, the measured impact
of running at 42000 requests per second and testing whether the IP
address found in a header belongs to a list of 52000 networks or
not is 3% CPU (increase from 66% to 69%). This is low enough to
permit true geolocation based on huge tables.
Now if some ACL patterns are loaded from a file and the operation is
an exact string match, the data will be arranged in a tree, yielding
a significant performance boost on large data sets. Note that this
only works when case is sensitive.
A new dedicated function, acl_lookup_str(), has been created for this
matching. It is called for every possible input data to test and it
looks the tree up for the data. Since the keywords are loosely typed,
we would have had to add a new columns to all keywords to adjust the
function depending on the type. Instead, we just compare on the match
function. We call acl_lookup_str() when we could use acl_match_str().
The tree lookup is performed first, then the remaining patterns are
attempted if the tree returned nothing.
A quick test shows that when matching a header against a list of 52000
network names, haproxy uses 68% of one core on a core2-duo 3.2 GHz at
42000 requests per second, versus 66% without any rule, which means
only a 2% CPU increase for 52000 rules. Doing the same test without
the tree leads to 100% CPU at 6900 requests/s. Also it was possible
to run the same test at full speed with about 50 sets of 52000 rules
without any measurable performance drop.
The code is now ready to support loading pattern from filesinto trees. For
that, it will be required that the ACL keyword has a flag ACL_MAY_LOOKUP
and that the expr is case sensitive. When that is true, the pattern will
have a flag ACL_PAT_F_TREE_OK to indicate that it is possible to feed the
tree instead of a usual pattern if the parsing function is able to do this.
The tree's root is pre-initialized in the pattern's value so that the
function can easily find it. At that point, if the parsing function decides
to use the tree, it just sets ACL_PAT_F_TREE in the return flags so that
the caller knows the tree has been used and the pattern can be recycled.
That way it will be possible to load some patterns into the tree when it
is compatible, and other ones as linear linked lists. A good example of
this might be IPv4 network entries : right now we support holes in masks,
but this very rare feature is not compatible with binary lookup in trees.
So the parser will be able to decide itself whether the pattern can go to
the tree or not.
The "acl XXX -f <file>" syntax was supported but nothing was read from
the file. This is now possible. All lines are merged verbatim, even if
they contain spaces (useful for user-agents). There are shortcomings
though. The worst one is that error reporting is too approximative.
The new anonymous ACL feature was buggy. If several ones are
declared, the first rule is always matched because all of them
share the same internal name (".noname"). Now we simply declare
them with an empty name and ensure that we disable any merging
when the name is empty.
Anonymous ACLs allow the declaration of rules which rely directly on
ACL expressions without passing via the declaration of an ACL. Example :
With named ACLs :
acl site_dead nbsrv(dynamic) lt 2
acl site_dead nbsrv(static) lt 2
monitor fail if site_dead
With anonymous ACLs :
monitor fail if { nbsrv(dynamic) lt 2 } || { nbsrv(static) lt 2 }
Support the new syntax (http-request allow/deny/auth) in
http stats.
Now it is possible to use the same syntax is the same like in
the frontend/backend http-request access control:
acl src_nagios src 192.168.66.66
acl stats_auth_ok http_auth(L1)
stats http-request allow if src_nagios
stats http-request allow if stats_auth_ok
stats http-request auth realm LB
The old syntax is still supported, but now it is emulated
via private acls and an aditional userlist.
This function automatically builds a rule, considering the if/unless
statements, and automatically updates the proxy's acl_requires, the
condition's file and line.
Commit 404e8ab461 introduced
smart checking for stupid acl typos. However, now haproxy shows
the warning even for valid acls, like this one:
acl Cookie-X-NoAccel hdr_reg(cookie) (^|\ |;)X-NoAccel=1(;|$)
I've discovered a configuration with lots of occurrences of the
following :
acl xxx hdr_beg (host) xxx
The problem is that hdr_beg will match every header against patterns
(host) and xxx due to the space between both, which certainly is not
what the user wanted. Now we detect such ACLs and report a warning
with a suggestion to add "--" between "hdr_beg" and "(host)" if this
is definitely what is wanted.
The RDP protocol is quite simple and documented, which permits
an easy detection and extraction of cookies. It can be useful
to match the MSTS cookie which can contain the username specified
by the client.
Now that we can perform TCP-based content switching, it makes sense
to be able to detect HTTP traffic and act accordingly. We already
have an HTTP decoder, we just have to call it in order to detect HTTP
protocol. Note that since the decoder will automatically fill in the
interesting fields of the HTTP transaction, it would make sense to
use this parsing to extend HTTP matching to TCP.
When an ACL is referenced at a wrong place (eg: response during request, layer7
during layer4), try to indicate precisely the name and requirements of this ACL.
Only the first faulty ACL is returned. A small change consisting in iterating
that way may improve reports :
cap = ACL_USE_any_unexpected
while ((acl=cond_find_require(cond, cap))) {
warning()
cap &= ~acl->requires;
}
This will report the first ACL of each unsupported type. But doing so will
mangle the error reporting a lot, so we need to rework error reports first.
All currently known ACL verbs have been assigned a type which makes
it possible to detect inconsistencies, such as response values used
in request rules.
ACL now hold information on the availability of the data they rely
on. They can indicate which parts of the requests/responses they
require, and the rules parser may now report inconsistencies.
As an example, switching rules are now checked for response-specific
ACLs, though those are not still set. A warning is reported in case
of mismatch. ACLs keyword restrictions will now have to be specifically
set wherever a better control is expected.
The line number where an ACL condition is declared has been added to
the conditions in order to be able to report the faulty line number
during post-loading checks.
The new "wait_end" acl delays evaluation of the rule (and the next ones)
to the end of the analysis period. This is intented to be used with TCP
content analysis. A rule referencing such an ACL will not match until
the delay is over. An equivalent default ACL "WAIT_END" has been created.
For protocol analysis, it's not always convenient to have to run through
a fetch then a match against dummy values. It's easier to let the fetch()
function set the result itself. This obviously works only for boolean
values.
With content inspection, checking the presence of data in the
request buffer is very important. It's getting boring to always
add such an ACL, so let's add it by default.
It should be stated as a rule that a C file should never
include types/xxx.h when proto/xxx.h exists, as it gives
less exposure to declaration conflicts (one of which was
caught and fixed here) and it complicates the file headers
for nothing.
Only types/global.h, types/capture.h and types/polling.h
have been found to be valid includes from C files.
This new function supports one major and one minor and makes an int of them.
It is very convenient to compare versions (eg: SSL) just as if they were plain
integers, as the comparison functions will still be based on integers.