We'll soon have an SSL socket layer, and in order to ease the difference
between the two, we use the name "sock_raw" to designate the one which
directly talks to the sockets without any conversion.
There is no more reason for the realign function being HTTP specific,
it only operates on a buffer now. Let's move it to buffers.c instead.
It's likely that buffer_bounce_realign is broken (not used), this will
have to be inspected. The function is worth rewriting as it can be
cheaper than buffer_slow_realign() to realign large wrapping buffers.
All keywords registered using a cfg_kw_list now make use of the new error reporting
framework. This allows easier and more precise error reporting without having to
deal with complex buffer allocation issues.
From time to time, some bugs are discovered that are caused by non-initialized
memory areas. It happens that most platforms return a zero-filled area upon
first malloc() thus hiding potential bugs. This patch also replaces malloc()
in pools with calloc() to ensure that all platforms exhibit the same behaviour
upon startup. In order to catch these bugs more easily, add a -dM command line
flag to enable memory poisonning. Optionally, passing -dM<byte> forces the
poisonning byte to <byte>.
A number of important information were missing from the error captures, so
let's improve them. Now we also log source port, session flags, transaction
flags, message flags, pending output bytes, expected buffer wrapping position,
total bytes transferred, message chunk length, and message body length.
As such, the output format has slightly evolved and the source address moved
to the third line :
[08/May/2012:11:14:36.341] frontend echo (#1): invalid request
backend echo (#1), server <NONE> (#-1), event #1
src 127.0.0.1:40616, session #4, session flags 0x00000000
HTTP msg state 26, msg flags 0x00000000, tx flags 0x00000000
HTTP chunk len 0 bytes, HTTP body len 0 bytes
buffer flags 0x00909002, out 0 bytes, total 28 bytes
pending 28 bytes, wrapping at 8030, error at position 7:
00000 GET / /?t=20000 HTTP/1.1\r\n
00026 \r\n
[08/May/2012:11:13:13.426] backend echo (#1) : invalid response
frontend echo (#1), server local (#1), event #0
src 127.0.0.1:40615, session #1, session flags 0x0000044e
HTTP msg state 32, msg flags 0x0000000e, tx flags 0x08200000
HTTP chunk len 0 bytes, HTTP body len 20 bytes
buffer flags 0x00008002, out 81 bytes, total 92 bytes
pending 11 bytes, wrapping at 7949, error at position 9:
00000 Foo: bar\r\r\n
The previous sockstream_accept() function uses nothing from sockstream, and
is totally irrelevant to stream interfaces. Move this to the protocols.c
file which handles listeners and protocols, and call it listener_accept().
It now makes much more sense that the code dealing with listen() also handles
accept() and passes it to upper layers.
These operators are used regardless of the socket protocol family. Move
them to a "sock_ops" struct. ->read and ->write have been moved there too
as they have no reason to remain at the protocol level.
Make use of the new IPv6 pattern type so that acl_match_ip() knows how to
compare pattern and sample.
IPv6 may be entered in their usual form, with or without a netmask appended.
Only bit counts are accepted for IPv6 netmasks. In order to avoid any risk of
trouble with randomly resolved IP addresses, host names are never allowed in
IPv6 patterns.
HAProxy is also able to match IPv4 addresses with IPv6 addresses in the
following situations :
- tested address is IPv4, pattern address is IPv4, the match applies
in IPv4 using the supplied mask if any.
- tested address is IPv6, pattern address is IPv6, the match applies
in IPv6 using the supplied mask if any.
- tested address is IPv6, pattern address is IPv4, the match applies in IPv4
using the pattern's mask if the IPv6 address matches with 2002:IPV4::,
::IPV4 or ::ffff:IPV4, otherwise it fails.
- tested address is IPv4, pattern address is IPv6, the IPv4 address is first
converted to IPv6 by prefixing ::ffff: in front of it, then the match is
applied in IPv6 using the supplied IPv6 mask.
We cannot currently match IPv6 addresses in ACL simply because we don't
support types on the patterns. Let's introduce this notion. For now, we
rely on the SMP_TYPES though it doesn't seem like it will last forever
given that some types are not present there (eg: regex, meth). Still it
should be enough to support mixed matchings for most types.
We use the special impossible value SMP_TYPES for types that don't exist
in the SMP_T_* space.
This is mainly a massive renaming in the code to get it in line with the
calling convention. Next patch will rename a few files to complete this
operation.
All parsing errors were known but impossible to return. Now by making use
of memprintf(), we're able to build meaningful error messages that the
caller can display.
HTTP header fetch is now done using smp_fetch_hdr() for both ACLs and
patterns. This one also supports an occurrence number, making it possible
to specify explicit occurrences for ACLs and patterns.
This way, fetch functions will be able to tell if they're called for a single
request or as part of a loop. This is important for instance when we use
hdr(foo), because in an ACL this means that all hdr(foo) occurrences must
be checked while in a pattern it means only one of them (eg: last one).
Patterns were using a bitmask to indicate if request or response was desired
in fetch functions and keywords. ACLs were using a bitmask in fetch keywords
and a single bit in fetch functions. ACLs were also using an ACL_PARTIAL bit
in fetch functions indicating that a non-final fetch was performed, which was
an abuse of the existing direction flag.
The change now consists in using :
- a capabilities field for fetch keywords => SMP_CAP_REQ/RES to indicate
if a keyword supports requests, responses, both, etc...
- an option field for fetch functions to indicate what the caller expects
(request/response, final/non-final)
The ACL_PARTIAL bit was reversed to get SMP_OPT_FINAL as it's more explicit
to know we're working on a final buffer than on a non-final one.
ACL_DIR_* were removed, as well as PATTERN_FETCH_*. L4 fetches were improved
to support being called on responses too since they're still available.
The <dir> field of all fetch functions was changed to <opt> which is now
unsigned.
The patch is large but mostly made of cosmetic changes to accomodate this, as
almost no logic change happened.
Having the args everywhere will make it easier to share fetch functions
between patterns and ACLs. The only place where we could have needed
the expr was in the http_prefetch function which can do well without.
Previously, both pattern, backend and persist_rdp_cookie would build fake
ACL expressions to fetch an RDP cookie by calling acl_fetch_rdp_cookie().
Now we switch roles. The RDP cookie fetch function is provided as a sample
fetch function that all others rely on, including ACL. The code is exactly
the same, only the args handling moved from expr->args to args. The code
was moved to proto_tcp.c, but probably that a dedicated file would be more
suited to content handling.
Now there is no more reference to union pattern_data. All pattern fetch and
conversion functions now make use of the common sample type. Note: none of
them adjust the type right now so it's important to do it next otherwise
we would risk sharing such functions with ACLs and seeing them fail.
This change is pretty minor. Struct pattern is only used for
pattern_process() now so changing it to use the common type is
quite obvious. It's worth noting that the last argument of
pattern_process() is never used so the function is self-sufficient.
Note that pattern_process() does not initialize the pattern at all
before calling fetch->process(), and that minimal initialization
will be required when we later change the argument for the sample.
These ones were either unused or improperly used. Some integers were marked
read-only, which does not make much sense. Buffers are not read-only, they're
"constant" in that they must be kept intact after any possible change.
This one is not needed anymore as we can return the data and its type in the
sample provided by the caller. ACLs now always return the proper type. BOOL
is already returned when the result is expected to be processed as a boolean.
temp_pattern has been unexported now.
The new sample types are necessary for the acl-pattern convergence.
These types are boolean and signed int. Some types were renamed for
less ambiguity (ip->ipv4, integer->uint).
The pattern type is ambiguous because a pattern is only a type and a data
part, and is normally used to match against samples. Currently, patterns
cannot hold information related to the life of the data which was extracted.
We don't want to overload patterns either, so let's add a new "sample" type
which will progressively supersede the acl_test and maybe the pattern at most
places. The sample shares similar information with patterns and also has flags
describing the data volatility and protection.
This flag was used to force a boolean match even if there was no pattern
to match. It was used only by http_auth() and designed only for this one.
It's easier and cleaner to make the fetch function perform the test and
report the boolean result as a few other functions already do. It simplifies
the acl_exec_cond() logic and will help merging ACLs and patterns.
This is used to validate that arguments are coherent. For instance,
payload_lv expects that the last arg (if any) is not more negative
than the sum of the first two. The error is reported if any.
We don't need the pattern-specific args parsers anymore, make use of the
common parser instead. We still need to improve this by adding a validation
function to report abnormal argument values or combinations. We don't report
precise parsing errors yet but this was not previously done either.
arg_i was almost unused, and since we migrated to use struct arg everywhere,
the rare cases where arg_i was needed could be replaced by switching to
arg->type = ARGT_STOP.
The types and minimal number of ACL keyword arguments are now stored in
their declaration. This will allow many more fantasies if some ACL use
several arguments or types.
Doing so required to rework all ACL keyword declarations to add two
parameters. So this was a good opportunity for a general cleanup and
to sort all entries in alphabetical order.
We still have two pending issues :
- parse_acl_expr() checks for errors but has no way to report them to
the user ;
- the types of some arguments are still not resolved and kept as strings
(eg: ARGT_FE/BE/TAB) for compatibility reasons, which must be resolved
in acl_find_targets()
The ACL parser now uses the argument parser to build a typed argument list.
Right now arguments are all strings and only one argument is supported since
this is what ACLs currently support.
make_arg_list() builds an array of typed arguments with their values,
that the caller describes how to parse. This will be used to support
multiple arguments for ACLs and patterns, which is currently problematic
and prevents ACLs and patterns from being merged. Up to 7 arguments types
may be enumerated in a single 32-bit word, including their number of
mandatory parts.
At the moment, these files are not used yet, they're only built. Note that
the 4-bit encoding for the type has left only one unused type!
This is more convenient and efficient than buf->p = b_ptr(buf, n);
It simply advances the buffer's pointer by <n> and trasfers that
amount of bytes from <in> to <out>. The BF_OUT_EMPTY flag is updated
accordingly.
A few occurrences of such computations in buffers.c and stream_sock.c
were updated to use b_adv(), which resulted in a small code shrink.
buffer_wrap_add was convenient for the migration but is not handy at all.
Let's have new wrappers that report input begin/end and output begin/end
instead.
It looks like we'll also need a b_adv(ofs) to advance a buffer's pointer.
buffer_ignore may only be used when the output of a buffer is empty,
but it's not granted it is always the case when sending HTTP error
responses. Better use buffer_cut_tail() instead, and use buffer_ignore
only on non-wrapping data.
msg->sol is now a relative pointer just like all other ones. There is no
more absolute references to the buffer outside the struct buffer itself.
Next two cleanups should include removing buffer references to functions
which already have an msg, and removal of wrapping detection in request
and response parsing which cannot wrap by definition.
ACLs and patterns only rely on a struct http_msg and don't know the pointer
to the actual data. struct http_msg will soon only hold relative references
so that's not possible. We need http_msg to hold a reference to the struct
buffer before having relative pointers everywhere.
It is likely that doing so will also result in opportunities to simplify
a number of functions arguments. The following functions are already
candidate :
http_buffer_heavy_realign
http_capture_bad_message
http_change_connection_header
http_forward_trailers
http_header_add_tail
http_header_add_tail2
http_msg_analyzer
http_parse_chunk_size
http_parse_connection_header
http_remove_header2
http_send_name_header
http_skip_chunk_crlf
http_upgrade_v09_to_v10
These offsets were relative to the buffer itself. Now they're relative to
the buffer's origin (buf->p) which normally corresponds to the start of
current message.
This saves a big dependency between the HTTP message struct and the buffers.
It appeared during this change that ->col is not used anymore (it will have
to be removed). Next step is to turn ->eol and ->sol from absolute to relative.
The buffer's pointer <lr> was only used by HTTP parsers which also use a
struct http_msg to keep track of the parser's state. We've reached a point
where it makes no sense to keep ->lr in the buffer, as the split between
buffer and msg is only arbitrary for historical reasons.
This change ensures that touching buffers will not impact HTTP messages
anymore, making the buffers more content-agnostic. However, it becomes
very important not to forget to update msg->next when some data get
forwarded or moved (and in general each time buf->p is updated).
The new pointer in http_msg becomes relative to buffer->p so that
parsing multiple messages becomes easier. It is possible that at one
point ->som and ->next will be merged.
Note: http_parse_reqline() and http_parse_stsline() have been temporarily
modified to know the message starting point in the buffer (->p).
This change gets rid of buf->r which is always equal to buf->p + buf->i.
It removed some wrapping detection at a number of places, but required addition
of new relative offset computations at other locations. A large number of places
can be simplified now with extreme care, since most of the time, either the
pointer has to be computed once or we need a difference between the old ->w and
old ->r to compute free space. The cleanup will probably happen with the rewrite
of the buffer_input_* and buffer_output_* functions anyway.
buf->lr still has to move to the struct http_msg and be relative to buf->p
for the rework to be complete.
This change introduces the buffer's base pointer, which is the limit between
incoming and outgoing data. It's the point where the parsing should start
from. A number of computations have already been greatly simplified, but
more simplifications are expected to come from the removal of buf->r.
The changes appear good and have revealed occasional improper use of some
pointers. It is possible that this patch has introduced bugs or revealed
some, although preliminary testings tend to indicate that everything still
works as it should.
We don't have buf->l anymore. We have buf->i for pending data and
the total length is retrieved by adding buf->o. Some computation
already become simpler.
Despite extreme care, bugs are not excluded.
It's worth noting that msg->err_pos as set by HTTP request/response
analysers becomes relative to pending data and not to the beginning
of the buffer. This has not been completed yet so differences might
occur when outgoing data are left in the buffer.
Too many flags are stored in the transaction structure. Some flags are
clearly message-specific and exist in two versions (request and response).
Move them to a new "flags" field in the http_message struct instead.
memprintf() is just like snprintf() except that it always returns a properly
sized allocated string that the caller is responsible for freeing. NULL is
returned on serious errors. It also supports stackable calls over the same
pointer since it offers support for automatically freeing a previous one :
memprintf(&err, "invalid argument: '%s'", arg);
...
memprintf(&err, "keyword parser said: <%s>", *err);
...
memprintf(&err, "line parser said: %s\n", *err);
...
free(*err);
It's very annoying that we have to deal with the crappy size_t and with ints
at some places because these ones don't mix well. Patch 6f61b2 changed the
chunk len to int but its size remains size_t and some functions are having
trouble being used by several callers depending on the type of their arguments.
Let's turn extract_cookie_value() to int for now on, and plan a massive cleanup
later to remove all size_t.
These callbacks are used to retrieve the source and destination address
of a socket. The address flags are not hold on the stream interface and
not on the session anymore. The addresses are collected when needed.
This still needs to be improved to store the IP and port separately so
that it is not needed to perform a getsockname() when only the IP address
is desired for outgoing traffic.
The Unique ID, is an ID generated with several informations. You can use
a log-format string to customize it, with the "unique-id-format" keyword,
and insert it in the request header, with the "unique-id-header" keyword.
%Fi: Frontend IP
%Fp: Frontend Port
%Si: Server IP
%Sp: Server Port
%Ts: Timestamp
%rt: HTTP request counter
%H: hostname
%pid: PID
+X: Hexadecimal represenation
The +X mode in logformat displays hexadecimal for the following flags
%Ci %Cp %Fi %Fp %Bi %Bp %Si %Sp %Ts %ct %pid
rename logformat_write_string() to lf_text()
Optimize size computation
* logformat functions now take a format linked list as argument
* build_logline() build a logline using a format linked list
* rename LOG_* by LOG_FMT_* in enum
* improve error management in build_logline()
Sometimes it is desirable to forward a particular request to a specific
server without having to declare a dedicated backend for this server. This
can be achieved using the "use-server" rules. These rules are evaluated after
the "redirect" rules and before evaluating cookies, and they have precedence
on them. There may be as many "use-server" rules as desired. All of these
rules are evaluated in their declaration order, and the first one which
matches will assign the server.
memcmp()/strcmp() calls were needed in different parts of code to determine
the status code. Each new status code introduces new calls, which can become
inefficient and source of bugs.
This patch reorganizes the code to rely on a numeric status code internally
and to be hopefully more generic.
Previously, the stats admin page required POST parameters to be provided
exactly in the same order as the HTML form.
This patch allows to handle those parameters in any orders.
Also, note that haproxy won't alter server states anymore if backend or server
names are ambiguous (duplicated names in the configuration) to prevent
unexpected results (the same should probably be applied to the stats socket).
Commit a1cc38 introduced a regression which was easy to trigger till ad4cd58
(snapshots 20120222 to 20120311 included). The bug was still present after
that but harder to trigger.
The bug is caused by the use of two distinct log buffers due to intermediary
changes. The issue happens when an HTTP request is logged just after a TCP
request during the same second and the HTTP request is too large for the buffer.
In this case, it happens that the HTTP request is logged into the TCP buffer
instead and that length controls can't detect anything.
Starting with bddd4f, the issue is still possible when logging too large an
HTTP request just after a send_log() call (typically a server status change).
We owe a big thanks to Sander Klein for testing several snapshots and more
specifically for taking significant risks in production by letting the buggy
version crash several times in order to provide an exploitable core ! The bug
could not have been found without this precious help. Thank you Sander !
This fix does not need to be backported, it did not affect any released version.
The difference could be seen when logging a request in HTTP mode with option
tcplog, as it would keep emitting 4 chars. Better use two distinct flags to
clear the confusion.
%Bi return the backend source IP
%Bp return the backend source port
Add a function pointer in logformat_type to do additional configuration
during the log-format variable parsing.
Merge http_sess_log() and tcp_sess_log() to sess_log() and move it to
log.c
A new field in logformat_type define if you can use a logformat
variable in TCP or HTTP mode.
doc: log-format in tcp mode
Note that due to the way log buffer allocation currently works, trying to
log an HTTP request without "option httplog" is still not possible. This
will change in the near future.
A number of offset computation functions use struct buffer* arguments
and return integers without modifying the input. Using consts helps
simplifying some operations in callers.
The principle behind this load balancing algorithm was first imagined
and modeled by Steen Larsen then iteratively refined through several
work sessions until it would totally address its original goal.
The purpose of this algorithm is to always use the smallest number of
servers so that extra servers can be powered off during non-intensive
hours. Additional tools may be used to do that work, possibly by
locally monitoring the servers' activity.
The first server with available connection slots receives the connection.
The servers are choosen from the lowest numeric identifier to the highest
(see server parameter "id"), which defaults to the server's position in
the farm. Once a server reaches its maxconn value, the next server is used.
It does not make sense to use this algorithm without setting maxconn. Note
that it can however make sense to use minconn so that servers are not used
at full load before starting new servers, and so that introduction of new
servers requires a progressively increasing load (the number of servers
would more or less follow the square root of the load until maxconn is
reached). This algorithm ignores the server weight, and is more beneficial
to long sessions such as RDP or IMAP than HTTP, though it can be useful
there too.
http_sess_log now use the logformat linked list to make the log
string, snprintf is not used for speed issue.
CLF mode also uses logformat.
NOTE: as of now, empty fields in CLF now are "" not "-" anymore.
parse_logformat_string: parse the string, detect the type: text,
separator or variable
parse_logformat_var: dectect variable name
parse_logformat_var_args: parse arguments and flags
add_to_logformat_list: add to the logformat linked list
send_log function is now splited in 3 functions
* hdr_log: generate the syslog header
* send_log: send a syslog message with a printf format string
* __send_log: send a syslog message
It was reported that a server configured with a zero weight would
sometimes still take connections from the backend queue. This issue is
real, it happens this way :
1) the disabled server accepts a request with a cookie
2) many cookie-less requests accumulate in the backend queue
3) when the disabled server completes its request, it checks its own
queue and the backend's queue
4) the server takes a pending request from the backend queue and
processes it. In response, the server's cookie is assigned to
the client, which ensures that some requests will continue to
be served by this server, leading back to point 1 above.
The fix consists in preventing a zero-weight server from dequeuing pending
requests from the backend. Making use of srv_is_usable() in such tests makes
the tests more robust against future changes.
This fix must be backported to 1.4 and 1.3.
In a config where server "s1" is marked disabled and "s2" tracks "s1",
s2 appears disabled on the stats but is still inserted into the LB farm
because the tracking is resolved too late in the configuration process.
We now resolve tracked servers before building LB maps and we also mark
the tracking server in maintenance mode, which previously was not done,
causing half of the issue.
Last point is that we also protect srv_is_usable() against electing a
server marked for maintenance. This is not absolutely needed but is a
safe choice and makes a lot of sense.
This fix must be backported to 1.4.
New option "http-send-name-header" specifies the name of a header which
will hold the server name in outgoing requests. This is the name of the
server the connection is really sent to, which means that upon redispatches,
the header's value is updated so that it always matches the server's name.
The new function does not return IP addresses but header values instead,
so that the caller is free to make what it want of them. The conversion
is not quite clean yet, as the previous test which considered that address
0.0.0.0 meant "no address" is still used. A different IP parsing function
should be used to take this into account.
Till now the pattern data integer type was unsigned without any
particular reason. In order to make ACLs use it, we must switch it
to signed int instead.
(from ebtree 6.0.6)
This version is mainly aimed at clarifying the fact that the ebtree license
is LGPL. Some files used to indicate LGPL and other ones GPL, while the goal
clearly is to have it LGPL. A LICENSE file has also been added.
No code is affected, but it's better to have the local tree in sync anyway.
(cherry picked from commit 24dc7cca051f081600fe8232f33e55ed30e88425)
In commit 4b517ca93a (MEDIUM: buffers:
add some new primitives and rework existing ones), we forgot to check
if buffer_max_len() < l.
No backport is needed.
A number of primitives were missing for buffer management, and some
of them were particularly awkward to use. Specifically, the functions
used to compute free space could not always be used depending what was
wrapping in the buffers. Some documentation has been added about how
the buffers work and their properties. Some functions are still missing
such as a buffer replacement which would support wrapping buffers.
This patch settles the 2 loggers limitation.
Loggers are now stored in linked lists.
Using "global log", the global loggers list content is added at the end
of the current proxy list. Each "log" entries are added at the end of
the proxy list.
"no log" flush a logger list.
Ludovic Levesque reported and diagnosed an annoying bug. When a server is
configured to track another one and has a slowstart interval set, it's
assigned a minimal weight when the tracked server goes back up but keeps
this weight forever.
This is because the throttling during the warmup phase is only computed
in the health checking function.
After several attempts to resolve the issue, the only real solution is to
split the check processing task in two tasks, one for the checks and one
for the warmup. Each server with a slowstart setting has a warmum task
which is responsible for updating the server's weight after a down to up
transition. The task does not run in othe situations.
In the end, the fix is neither complex nor long and should be backported
to 1.4 since the issue was detected there first.
When reading the code, the "tracked" member of a server makes one
think the server is tracked while it's the opposite, it's a pointer
to the server being tracked. This is particularly true in constructs
such as :
if (srv->tracked) {
Since it's the second time I get caught misunderstanding it, let's
rename it to "track" to avoid the confusion.
For a long time, the max number of headers was taken as a part of the buffer
size. Since the header size can be configured at runtime, it does not make
much sense anymore.
Nothing was making it necessary to have a static value, so let's turn this into
a tunable with a default value of 101 which equals what was previously used.
It makes no sense to have one pointer to the hdr_idx pool in each proxy
struct since these pools do not depend on the proxy. Let's have a common
pool instead as it is already the case for other types.
By default, pipes are the default size for the system. But sometimes when
using TCP splicing, it can improve performance to increase pipe sizes,
especially if it is suspected that pipes are not filled and that many
calls to splice() are performed. This has an impact on the kernel's
memory footprint, so this must not be changed if impacts are not understood.
Struct sockaddr_storage is huge (128 bytes) and severely impacts the
cache. It also displaces other struct members, causing them to have
larger relative offsets. By moving these few occurrences to the end
of the structs which host them, we can reduce the code size by no less
than 2 kB !
Stream interfaces used to distinguish between client and server addresses
because they were previously of different types (sockaddr_storage for the
client, sockaddr_in for the server). This is not the case anymore, and this
distinction is confusing at best and has caused a number of regressions to
be introduced in the process of converting everything to full-ipv6. We can
now remove this and have a much cleaner code.
This patch introduces hdr_len, path_len and url_len for matching these
respective parts lengths against integers. This can be used to detect
abuse or empty headers.
These requests are mainly monitor requests, as well as stats requests when
the stats are processed by the frontend. Having this counter helps explain
the difference in number of sessions that is sometimes observed between a
frontend and a backend.
We now measure the work and idle times in order to report the idle
time in the stats. It's expected that we'll be able to use it at
other places later.
We already had the ability to kill a connection, but it was only
for the checks. Now we can do this for any session, and for this we
add a specific flag "K" to the logs.
The stats socket now allows the admin to disable, enable or shutdown a frontend.
This can be used when a bug is discovered in a configuration and it's desirable
to fix it but the rules in place don't allow to change a running config. Thus it
becomes possible to kill the frontend to release the port and start a new one in
a separate process.
This can also be used to temporarily make haproxy return TCP resets to incoming
requests to pretend the service is not bound. For instance, this may be useful
to quickly flush a very deep SYN backlog.
The frontend check and lookup code was factored with the "set maxconn" usage.
This one enforces a per-process connection rate limit, regardless of what
may be set per frontend. It can be a way to limit the CPU usage of a process
being severely attacked.
The side effect is that the global process connection rate is now measured
for each incoming connection, so it will be possible to report it.
This option permits to change the global maxconn setting within the
limit that was set by the initial value, which is now reported as the
hard maxconn value. This allows to immediately accept more concurrent
connections or to stop accepting new ones until the value passes below
the indicated setting.
The main use of this option is on systems where many haproxy instances
are loaded and admins need to re-adjust resource sharing at run time
to regain a bit of fairness between processes.
Trailing spaces after headers were not trimmed, only the leading ones
were. An issue was detected today with a content-length value which
was padded with spaces and which was rejected. Recent updates to the
http-bis draft made it a lot more clear that such spaces must be ignored,
so this is what this patch does.
It should be backported to 1.4.
Many inet_ntop calls were partially right, which was hard to detect given
the complex combinations. Some of them were relying on the listener's proto
instead of the address itself, which could have been different when dealing
with an accept-proxy connection.
The new addr_to_str() function does the dirty job and returns the family, which
makes it particularly suited to calls from switch/case statements. A large number
of if/else statements were removed and the stats output could even be cleaned up
in the case of session dump.
As a side effect of doing this, the resulting code is smaller by almost 1kB.
All changed parts have been tested and provided expected output.
Some older libc don't define splice() and and don't define _syscall*()
either, which causes build errors if splicing is enabled.
To solve this, we now split the syscall redefinition into two layers :
- one file per syscall (epoll, splice)
- one common file to declare the _syscall*() macros
The code is cleaner because files using the syscalls just have to include
their respective file. It's not adviced to merge multiple syscall families
into a same file if all are not intended to be used simultaneously, because
defining unused static functions causes warnings to be emitted during build.
As a result, the new USE_MY_SPLICE parameter was added in order to be able
to define the splice() syscall separately.
If "option forwardfor" has the "if-none" argument, then the header is
only added when the request did not already have one. This option has
security implications, and should not be set blindly.
Manoj Kumar reported a case where haproxy would crash upon start-up. The
cause was an "http-check expect" statement declared in the defaults section,
which caused a NULL regex to be used during the check. This statement is not
allowed in defaults sections precisely because this requires saving a copy
of the regex in the default proxy. But the check was not made to prevent it
from being declared there, hence the issue.
Instead of adding code to detect its abnormal use, we decided to implement
it. It was not that much complex because the expect_str part was not used
with regexes, so it could hold the string form of the regex in order to
compile it again for every backend (there's no way to clone regexes).
This patch has been tested and works. So it's both a bugfix and a minor
feature enhancement.
It should be backported to 1.4 though it's not critical since the config
was not supposed to be supported.
Adding health checks has become a real pain, with cross-references to all
checks everywhere because they're all a single bit. Since they're all
exclusive, let's change this to have a check number only. We reserve 4
bits allowing up to 16 checks (15+tcp), only 7 of which are currently
used. The code has shrunk by almost 1kB and we saved a few option bits.
The "dispatch" option has been moved to px->options, making a few tests
a bit cleaner.
This patch provides a new "option redis-check" statement to enable server health checks based on redis PING request (http://www.redis.io/commands/ping).
This global task is used to periodically check for end of resource shortage
and to try to enable queued listeners again. This is important in case some
temporary system-wide shortage is encountered, so that we don't have to wait
for an existing connection to be released before checking the queue again.
For situations where listeners are queued due to the global maxconn being
reached, the task is woken up at least every second. For situations where
a system resource shortage is detected (memory, sockets, ...) the task is
woken up at least every 100 ms. That way, recovery from severe events can
still be achieved under acceptable conditions.
This was revealed with one of the very latest patches which caused
the listener_queue not to be initialized on the stats socket frontend.
And in fact a number of other ones were missing too. This is getting so
boring that now we'll always make use of the same function to initialize
any proxy. Doing so has even saved about 500 bytes on the binary due to
the avoided code redundancy.
No backport is needed.
This function is finally not needed anymore, as it has been replaced with
a per-proxy task that is scheduled when some limits are encountered on
incoming connections or when the process is stopping. The savings should
be noticeable on configs with a large number of proxies. The most important
point is that the rate limiting is now enforced in a clean and solid way.
Those states have been replaced with PR_STFULL and PR_STREADY respectively,
as it is what matches them the best now. Also, two occurrences of PR_STIDLE
in peers.c have been removed as this did not provide any form of error recovery
anyway.
All listeners that are limited by a proxy-specific resource are now
queued at the proxy's and not globally. This allows finer-grained
wakeups when releasing resource.
When an accept() fails because of a connection limit or a memory shortage,
we now disable it and queue it so that it's dequeued only when a connection
is released. This has improved the behaviour of the process near the fd limit
as now a listener with a no connection (eg: stats) will not loop forever
trying to get its connection accepted.
The solution is still not 100% perfect, as we'd like to have this used when
proxy limits are reached (use a per-proxy list) and for safety, we'd need
to have dedicated tasks to periodically re-enable them (eg: to overcome
temporary system-wide resource limitations when no connection is released).
When a listeners encounters a resource shortage, it currently stops until
one re-enables it. This is far from being perfect as it does not yet handle
the case where the single connection from the listener is rejected (eg: the
stats page).
Now we'll have a special status for resource limited listeners and we'll
queue them into one or multiple lists. That way, each time we have to stop
a listener because of a resource shortage, we can enqueue it and change its
state, so that it is dequeued once more resources are available.
This patch currently does not change any existing behaviour, it only adds
the basic building blocks for doing that.
Managing listeners state is difficult because they have their own state
and can at the same time have theirs dictated by their proxy. The pause
is not done properly, as the proxy code is fiddling with sockets. By
introducing new functions such as pause_listener()/resume_listener(), we
make it a bit more obvious how/when they're supposed to be used. The
listen_proxies() function was also renamed to resume_proxies() since
it's only used for pause/resume.
This patch is the first in a series aiming at getting rid of the maintain_proxies
mess. In the end, proxies should not call enable_listener()/disable_listener()
anymore.
Patch af5149 introduced an issue which can be detected only on out of
memory conditions : a LIST_DEL() may be performed on an uninitialized
struct member instead of a LIST_INIT() during the accept() phase,
causing crashes and memory corruption to occur.
This issue was detected and diagnosed by the Exceliance R&D team.
This is 1.5-specific and very recent, so no existing deployment should
be impacted.
Never add connections allocated to this sever to a stick-table.
This may be used in conjunction with backup to ensure that
stick-table persistence is disabled for backup servers.
apsession_refresh() and apsess_refressh are only used inside apsession.c
and thus can be made static.
The only use of apsession_refresh() is appsession_task_init().
These functions have been re-ordered to avoid the need for
a forward-declaration of apsession_refresh().
If a connection is closed by because the backend became unavailable
then log 'D' as the termination condition.
Signed-off-by: Simon Horman <horms@verge.net.au>
This adds the "on-marked-down shutdown-sessions" statement on "server" lines,
which causes all sessions established on a server to be killed at once when
the server goes down. The task's priority is reniced to the highest value
(1024) so that servers holding many tasks don't cause a massive slowdown due
to the wakeup storm.
The motivation for this is to allow iteration of all the connections
of a server without the expense of iterating over the global list
of connections.
The first use of this will be to implement an option to close connections
associated with a server when is is marked as being down or in maintenance
mode.
* The declaration of peer_session_create() does
not match its definition. As it is only
used inside of peers.c make it static.
* Make the declaration of peers_register_table()
match its definition.
* Also, make all functions in peers.c that
are not also in peers.h static
Bashkim Kasa reported that the stats admin page did not work when colons
were used in server or backend names. This was caused by url-encoding
resulting in ':' being sent as '%3A'. Now we systematically decode the
field names and values to fix this issue.
It's more expensive to call splice() on short payloads than to use
recv()+send(). One of the reasons is that doing a splice() involves
allocating a pipe. One other reason is that the kernel will have to
copy itself if we try to splice less than a page. So let's fix a
short offset of 4kB below which we don't splice.
A quick test shows that on chunked encoded data, with splice we had
6826 syscalls (1715 splice, 3461 recv, 1650 send) while with this
patch, the same transfer resulted in 5793 syscalls (3896 recv, 1897
send).
There are some very rare server-to-server applications that abuse the HTTP
protocol and expect the payload phase to be highly interactive, with many
interleaved data chunks in both directions within a single request. This is
absolutely not supported by the HTTP specification and will not work across
most proxies or servers. When such applications attempt to do this through
haproxy, it works but they will experience high delays due to the network
optimizations which favor performance by instructing the system to wait for
enough data to be available in order to only send full packets. Typical
delays are around 200 ms per round trip. Note that this only happens with
abnormal uses. Normal uses such as CONNECT requests nor WebSockets are not
affected.
When "option http-no-delay" is present in either the frontend or the backend
used by a connection, all such optimizations will be disabled in order to
make the exchanges as fast as possible. Of course this offers no guarantee on
the functionality, as it may break at any other place. But if it works via
HAProxy, it will work as fast as possible. This option should never be used
by default, and should never be used at all unless such a buggy application
is discovered. The impact of using this option is an increase of bandwidth
usage and CPU usage, which may significantly lower performance in high
latency environments.
This change should be backported to 1.4 since the first report of such a
misuse was in 1.4. Next patch will also be needed.
This status code is used in response to requests matching "monitor-uri".
Some users need to adjust it to fit their needs (eg: make some strings
appear there). As it's already defined as a chunked string and used
exactly like other status codes, it makes sense to make it configurable
with the usual "errorfile", "errorloc", ...
John Helliwell reported a runtime issue on Solaris since 1.5-dev5. Traces
show that connect() returns EINVAL, which means the socket length is not
appropriate for the family. Solaris does not like being called with sizeof
and needs the address family's size on sockaddr_storage.
The fix consists in adding a get_addr_len() function which returns the
socket's address length based on its family. Tests show that this works
for both IPv4 and IPv6 addresses.
Since IPv6 is a different type than IPv4, the pattern fetch functions
src6 and dst6 were added. IPv6 stick-tables can also fetch IPv4 addresses
with src and dst. In this case, the IPv4 addresses are mapped to their
IPv6 counterpart, according to RFC 4291.
Since the latest additions to buffer_forward(), it became too large for
inlining, so let's uninline it. The code size drops by 3kB. Should be
backported to 1.4 too.
Despite much care around handling the content-length as a 64-bit integer,
forwarding was broken on 32-bit platforms due to the 32-bit nature of
the ->to_forward member of the "buffer" struct. The issue is that this
member is declared as a long, so while it works OK on 64-bit platforms,
32-bit truncate the content-length to the lower 32-bits.
One solution could consist in turning to_forward to a long long, but it
is used a lot in the critical path, so it's not acceptable to perform
all buffer size computations on 64-bit there.
The fix consists in changing the to_forward member to a strict 32-bit
integer and ensure in buffer_forward() that only the amount of bytes
that can fit into it is considered. Callers of buffer_forward() are
responsible for checking that their data were taken into account. We
arbitrarily ensure we never consider more than 2G at once.
That's the way it was intended to work on 32-bit platforms except that
it did not.
This issue was tracked down hard at Exosec with Bertrand Jacquin,
Thierry Fournier and Julien Thomas. It remained undetected for a long
time because files larger than 4G are almost always transferred in
chunked-encoded format, and most platforms dealing with huge contents
these days run on 64-bit.
The bug affects all 1.5 and 1.4 versions, and must be backported.
The parser now distinguishes between pure addresses and address:port. This
is useful for some config items where only an address is required.
Raw IPv6 addresses are now parsed, but IPv6 host name resolution is still not
handled (gethostbyname does not resolve IPv6 names to addresses).
This option enables use of the PROXY protocol with the server, which
allows haproxy to transport original client's address across multiple
architecture layers.
Upon connection establishment, stream_sock is now able to send a PROXY
line before sending any data. Since it's possible that the buffer is
already full, and we don't want to allocate a block for that line, we
compute it on-the-fly when we need it. We just store the offset from
which to (re-)send from the end of the line, since it's assumed that
multiple outputs of the same proxy line will be strictly equivalent. In
practice, one call is enough. We just make sure to handle the case where
the first send() would indicate an incomplete output, eventhough it's
very unlikely to ever happen.
And also rename "req_acl_rule" "http_req_rule". At the beginning that
was a bit confusing to me, especially the "req_acl" list which in fact
holds what we call rules. After some digging, it appeared that some
part of the code is 100% HTTP and not just related to authentication
anymore, so let's move that part to HTTP and keep the auth-only code
in auth.c.
It's very annoying that frontend and backend stats are merged because we
don't know what we're observing. For instance, if a "listen" instance
makes use of a distinct backend, it's impossible to know what the bytes_out
means.
Some points take care of not updating counters twice if the backend points
to the frontend, indicating a "listen" instance. The thing becomes more
complex when we try to add support for server side keep-alive, because we
have to maintain a pointer to the backend used for last request, and to
update its stats. But we can't perform such comparisons anymore because
the counters will not match anymore.
So in order to get rid of this situation, let's have both frontend AND
backend stats in the "struct proxy". We simply update the relevant ones
during activity. Some of them are only accounted for in the backend,
while others are just for frontend. Maybe we can improve a bit on that
later, but the essential part is that those counters now reflect what
they really mean.
This patch turns internal server addresses to sockaddr_storage to
store IPv6 addresses, and makes the connect() function use it. This
code already works but some caveats with getaddrinfo/gethostbyname
still need to be sorted out while the changes had to be merged at
this stage of internal architecture changes. So for now the config
parser will not emit an IPv6 address yet so that user experience
remains unchanged.
This change should have absolutely zero user-visible effect, otherwise
it's a bug introduced during the merge, that should be reported ASAP.
This one has been removed and is now totally superseded by ->target.
To get the server, one must use target_srv(&s->target) instead of
s->srv now.
The function ensures that non-server targets still return NULL.
s->prev_srv is used by assign_server() only, but all code paths leading
to it now take s->prev_srv from the existing s->srv. So assign_server()
can do that copy into its own stack.
If at one point a different srv is needed, we still have a copy of the
last server on which we failed a connection attempt in s->target.
When dealing with HTTP keep-alive, we'll have to know if we can reuse
an existing connection. For that, we'll have to check if the current
connection was made on the exact same target (referenced in the stream
interface).
Thus, we need to first assign the next target to the session, then
copy it to the stream interface upon connect(). Later we'll check for
equivalence between those two operations.
Till now we used the fact that the dispatch address was not null to use
the dispatch mode. This is very unconvenient, so let's have a dedicated
option.
This is in fact where those parts belong to. The old data_state was replaced
by applet.state and is now initialized when the applet is registered. It's
worth noting that the applet does not need to know the session nor the
buffer anymore since everything is brought by the stream interface.
It is possible that having a separate applet struct would simplify the
code but that's not a big deal.
Now that we have the target pointer and type in the stream interface,
we don't need the applet.handler pointer anymore. That makes the code
somewhat cleaner because we know we're dealing with an applet by checking
its type instead of checking the pointer is not null.
When doing a connect() on a stream interface, some information is needed
from the server and from the backend. In some situations, we don't have
a server and only a backend (eg: peers). In other cases, we know we have
an applet and we don't want to connect to anything, but we'd still like
to have the info about the applet being used.
For this, we now store a pointer to the "target" into the stream interface.
The target describes what's on the other side before trying to connect. It
can be a server, a proxy or an applet for now. Later we'll probably have
descriptors for multiple-stage chains so that the final information may
still be found.
This will help removing many specific cases in the code. It already made
it possible to remove the "srv" and "be" parameters to tcpv4_connect_server().
Those 3 parts are the buffer side, the remote side and the communication
functions. This change has no functional effect but is needed to proceed
further.
I/O handlers are still delicate to manipulate. They have no type, they're
just raw functions which have no knowledge of themselves. Let's have them
declared as applets once for all. That way we can have multiple applets
share the same handler functions and we can store their names there. When
we later need to add more parameters (eg: usage stats), we'll be able to
do so in the applets themselves.
The CLI functions has been prefixed with "cli" instead of "stats" as it's
clearly what is going on there.
The applet descriptor in the stream interface should get all the applet
specific data (st0, ...) but this will be done in the next patch so that
we don't pollute this one too much.
Till now, the forwarding code was making use of the hdr_content_len member
to hold the size of the last chunk parsed. As such, it was reset after being
scheduled for forwarding. The issue is that this entry was reset before the
data could be viewed by backend.c in order to parse a POST body, so the
"balance url_param check_post" did not work anymore.
In order to fix this, we need two things :
- the chunk size (reset upon every forward)
- the total body size (not reset)
hdr_content_len was thus replaced by the former (hence the size of the patch)
as it makes more sense to have it stored that way than the way around.
This patch should be backported to 1.4 with care, considering that it affects
the forwarding code.
I have written a small patch to enable a correct PostgreSQL health check
It works similar to mysql-check with the very same parameters.
E.g.:
listen pgsql 127.0.0.1:5432
mode tcp
option pgsql-check user pgsql
server masterdb pgsql.server.com:5432 check inter 10000
One of the requirements we have is to run multiple instances of haproxy on a
single host; this is so that we can split the responsibilities (and change
permissions) between product teams. An issue we ran up against is how we
would distinguish between the logs generated by each instance. The solution
we came up with (please let me know if there is a better way) is to override
the application tag written to syslog. We can then configure syslog to write
these to different files.
I have attached a patch adding a global option 'log-tag' to override the
default syslog tag 'haproxy' (actually defaults to argv[0]).
Haproxy does not include the hostname rather the IP of the machine in
the syslog headers it sends. Unfortunately this means that for each log
line rsyslog does a reverse dns on the client IP and in the case of
non-routable IPs one gets the public hostname not the internal one.
While this is valid according to RFC3164 as one might imagine this is
troublsome if you have some machines with public IPs, internal IPs, no
reverse DNS entries, etc and you want a standardized hostname based log
directory structure. The rfc says the preferred value is the hostname.
This patch adds a global "log-send-hostname" statement which accepts an
optional string to force the host name. If unset, the local host name
is used.
HTTP pipelining currently needs to monitor the response buffer to wait
for some free space to be able to send a response. It was not possible
for the HTTP analyser to be called based on response buffer activity.
Now we introduce a new buffer flag BF_WAKE_ONCE which is set when the
HTTP request analyser is set on the response buffer and some activity
is detected. This is not clean at all but once of the only ways to fix
the issue before we make it possible to register events for analysers.
Also it appeared that one realign condition did not cover all cases.
This counter will help quickly spot whether there are new errors or not.
It is also assigned to each capture so that a script can keep trace of
which capture was taken when.
Debugging parsing errors can be greatly improved if we know what the parser
state was and what the buffer flags were (especially for closed inputs/outputs
and full buffers). Let's add that to the error snapshots.
When the number of servers is a multiple of the size of the input set,
map-based hash can be inefficient. This typically happens with 64
servers when doing URI hashing. The "avalanche" hash-type applies an
avalanche hash before performing a map lookup in order to smooth the
distribution. The result is slightly less smooth than the map for small
numbers of servers, but still better than the consistent hashing.
We'll use this hash at other places, let's make it globally available.
The function has also been renamed because its "chash_hash" name was
not appropriate.
Ross West reported that int32_t breaks compilation on FreeBSD. Since an
int is 32-bit on all supported platforms and we already rely on that,
change the type.
Enhance pattern convs and fetch argument parsing, now fetchs and convs callbacks used typed args.
Add more details on error messages on parsing pattern expression function.
Update existing pattern convs and fetchs to new proto.
Create stick table key type "binary".
Manage Truncation and padding if pattern's fetch-converted result don't match table key size.
MAXPATHLEN may be used at other places, it's unconvenient to have it
redefined in a few files. Also, since checking it requires including
sys/param.h, some versions of it cause a macro declaration conflict
with MIN/MAX which are defined in tools.h. The solution consists in
including sys/param.h in both files so that we ensure it's loaded
before the macros are defined and MAXPATHLEN is checked.
The introduction of a new PROXY protocol for proxied connections requires
an early analyser to decode the incoming connection and set the session
flags accordingly.
Some more work is needed, among which setting a flag on the session to
indicate it's proxied, and copying the original parameters for later
comparisons with new ACLs (eg: real_src, ...).
inetaddr_host_lim_ret() used to make use of const char** for some
args, but that make it impossible ot use char** due to the way
controls are made by gcc. So let's change that.
This option makes haproxy preserve any persistence cookie emitted by
the server, which allows the server to change it or to unset it, for
instance, after a logout request.
(cherry picked from commit 52e6d75374c7900c1fe691c5633b4ae029cae8d5)
The stats web interface must be read-only by default to prevent security
holes. As it is now allowed to enable/disable servers, a new keyword
"stats admin" is introduced to activate this admin level, conditioned by ACLs.
(cherry picked from commit 5334bab92ca7debe36df69983c19c21b6dc63f78)
Based on a patch provided by Judd Montgomery, it is now possible to
enable/disable servers from the stats web interface. This allows to select
several servers in a backend and apply the action to them at the same time.
Currently, there are 2 known limitations :
- The POST data are limited to one packet
(don't alter too many servers at a time).
- Expect: 100-continue is not supported.
(cherry picked from commit 7693948766cb5647ac03b48e782cfee2b1f14491)
If a cookie comes in with a first or last date, and they are configured on
the backend, they're checked. If a date is expired or too far in the future,
then the cookie is ignored and the specific reason appears in the cookie
field of the logs.
(cherry picked from commit faa3019107eabe6b3ab76ffec9754f2f31aa24c6)
These functions only require 5 chars to encode 30 bits, and don't expect
any padding. They will be used to encode dates in cookies.
(cherry picked from commit a7e2b5fc4612994c7b13bcb103a4a2c3ecd6438a)
The set-cookie status flags were not very handy and limited. Reorder
them to save some room for additional values and add the "U" flags
(for Updated expiration date) that will be used with expirable cookies
in insert mode.
(cherry picked from commit 5bab52f821bb0fa99fc48ad1b400769e66196ece)
We'll need one more bit to store and report the request cookie's status.
Doing this required moving a few bits around. However, now in 1.4 all bits
are used, there's no room left.
Cookie flags will need
(cherry picked from commit 09ebca0413c43620ddc375b5b4ab31a25d47b3f4)
In all cookie persistence modes but prefix, we now support cookies whose
value is suffixed with some contents after a vertical bar ('|'). This will
be used to pass an optional expiration date. So as of now we only consider
the part of the cookie value which is used before the vertical bar.
(cherry picked from commit a4486bf4e5b03b5a980d03fef799f6407b2c992d)
Add two new arguments to the "cookie" keyword, to be able to
fix a max idle and max life on them. Right now only the parameter
parsing is implemented.
(cherry picked from commit 9ad5dec4c3bb8f29129f292cb22d3fc495fcc98a)
HTTP content-based health checks will be involved in searching text in pages.
Some pages may not fit in the default buffer (16kB) and sometimes it might be
desired to have larger buffers in order to find patterns. Running checks on
smaller URIs is always preferred of course.
(cherry picked from commit 043f44aeb835f3d0b57626c4276581a73600b6b1)
This patch adds the "http-check expect [r]{string,status}" statements
which enable health checks based on whether the response status or body
to an HTTP request contains a string or matches a regex.
This probably is one of the oldest patches that remained unmerged. Over
the time, several people have contributed to it, among which FinalBSD
(first and second implementations), Nick Chalk (port to 1.4), Anze
Skerlavaj (tests and fixes), Cyril Bonté (general fixes), and of course
myself for the final fixes and doc during integration.
Some people already use an old version of this patch which has several
issues, among which the inability to search for a plain string that is
not at the beginning of the data, and the inability to look for response
contents that are provided in a second and subsequent recv() calls. But
since some configs are already deployed, it was quite important to ensure
a 100% compatible behaviour on the working cases.
Thus, that patch fixes the issues while maintaining config compatibility
with already deployed versions.
(cherry picked from commit b507c43a3ce9a8e8e4b770e52e4edc20cba4c37f)
This patch provides a new "option ldap-check" statement to enable
server health checks based on LDAPv3 bind requests.
(cherry picked from commit b76b44c6fed8a7ba6f0f565dd72a9cb77aaeca7c)
There was no consistency between all the functions used to exchange data
between a buffer and a stream interface. Also, the functions used to send
data to a buffer did not consider the possibility that the buffer was
shutdown for read.
Now the functions are called buffer_{put,get}_{char,block,chunk,string}.
The old buffer_feed* functions have been left available for existing code
but marked deprecated.
This counter is incremented for each incoming connection and each active
listener, and is used to prevent haproxy from stopping upon SIGUSR1. It
will thus be possible for some tasks in increment this counter in order
to prevent haproxy from dying until they have completed their job.
Signal zero is never delivered by the system. However having a signal to
which functions and tasks can subscribe to be notified of a stopping event
is useful. So this patch does two things :
1) allow signal zero to be delivered from any function of signal handler
2) make soft_stop() deliver this signal so that tasks can be notified of
a stopping condition.
The two new functions below make it possible to register any number
of functions or tasks to a system signal. They will be called in the
registration order when the signal is received.
struct sig_handler *signal_register_fct(int sig, void (*fct)(struct sig_handler *), int arg);
struct sig_handler *signal_register_task(int sig, struct task *task, int reason);
In case of binding failure during startup, we wait for some time sending
signals to old pids so that they release the ports we need. But if there
aren't any old pids anymore, it's useless to wait, we prefer to fail fast.
Along with this change, we now have the number of old pids really found
in the nb_oldpids variable.
In case of HTTP keepalive processing, we want to release the counters tracked
by the backend. Till now only the second set of counters was released, while
it could have been assigned by the frontend, or the backend could also have
assigned the first set. Now we reuse to unused bits of the session flags to
mark which stick counters were assigned by the backend and to release them as
appropriate.
The assumption that there was a 1:1 relation between tracked counters and
the frontend/backend role was wrong. It is perfectly possible to track the
track-fe-counters from the backend and the track-be-counters from the
frontend. Thus, in order to reduce confusion, let's remove this useless
{fe,be} reference and simply use {1,2} instead. The keywords have also been
renamed in order to limit confusion. The ACL rule action now becomes
"track-sc{1,2}". The ACLs are now "sc{1,2}_*" instead of "trk{fe,be}_*".
That means that we can reasonably document "sc1" and "sc2" (sticky counters
1 and 2) as sort of patterns that are available during the whole session's
life and use them just like any other pattern.
Having a single tracking pointer for both frontend and backend counters
does not work. Instead let's have one for each. The keyword has changed
to "track-be-counters" and "track-fe-counters", and the ACL "trk_*"
changed to "trkfe_*" and "trkbe_*".
It is now possible to dump some select table entries based on criteria
which apply to the stored data. This is enabled by appending the following
options to the end of the "show table" statement :
data.<data_type> {eq|ne|lt|gt|le|ge} <value>
For intance :
show table http_proxy data.conn_rate gt 5
show table http_proxy data.gpc0 ne 0
The compare applies to the integer value as it would be displayed, and
operates on signed long long integers.
It's a bit cumbersome to have to know all possible storable types
from the stats interface. Instead, let's have generic types for
all data, which will facilitate their manipulation.
It is now possible to dump a table's contents with keys, expire,
use count, and various data using the command above on the stats
socket.
"show table" only shows main table stats, while "show table <name>"
dumps table contents, only if the socket level is admin.
This patch adds support for the following session counters :
- http_req_cnt : HTTP request count
- http_req_rate: HTTP request rate
- http_err_cnt : HTTP request error count
- http_err_rate: HTTP request error rate
The equivalent ACLs have been added to check the tracked counters
for the current session or the counters of the current source.
This counter may be used to track anything. Two sets of ACLs are available
to manage it, one gets its value, and the other one increments its value
and returns it. In the second case, the entry is created if it did not
exist.
Thus it is possible for example to mark a source as being an abuser and
to keep it marked as long as it does not wait for the entry to expire :
# The rules below use gpc0 to track abusers, and reject them if
# a source has been marked as such. The track-counters statement
# automatically refreshes the entry which will not expire until a
# 1-minute silence is respected from the source. The second rule
# evaluates the second part if the first one is true, so GPC0 will
# be increased once the conn_rate is above 100/5s.
stick-table type ip size 200k expire 1m store conn_rate(5s),gpc0
tcp-request track-counters src
tcp-request reject if { trk_get_gpc0 gt 0 }
tcp-request reject if { trk_conn_rate gt 100 } { trk_inc_gpc0 gt 0}
Alternatively, it is possible to let the entry expire even in presence of
traffic by swapping the check for gpc0 and the track-counters statement :
stick-table type ip size 200k expire 1m store conn_rate(5s),gpc0
tcp-request reject if { src_get_gpc0 gt 0 }
tcp-request track-counters src
tcp-request reject if { trk_conn_rate gt 100 } { trk_inc_gpc0 gt 0}
It is also possible not to track counters at all, but entry lookups will
then be performed more often :
stick-table type ip size 200k expire 1m store conn_rate(5s),gpc0
tcp-request reject if { src_get_gpc0 gt 0 }
tcp-request reject if { src_conn_rate gt 100 } { src_inc_gpc0 gt 0}
The '0' at the end of the counter name is there because if we find that more
counters may be useful, other ones will be added.
This function looks up a key, updates its expiration date, or creates
it if it was not found. acl_fetch_src_updt_conn_cnt() was updated to
make use of it.
These counters maintain incoming and outgoing byte rates in a stick-table,
over a period which is defined in the configuration (2 ms to 24 days).
They can be used to detect service abuse and enforce a certain bandwidth
limits per source address for instance, and block if the rate is passed
over. Since 32-bit counters are used to compute the rates, it is important
not to use too long periods so that we don't have to deal with rates above
4 GB per period.
Example :
# block if more than 5 Megs retrieved in 30 seconds from a source.
stick-table type ip size 200k expire 1m store bytes_out_rate(30s)
tcp-request track-counters src
tcp-request reject if { trk_bytes_out_rate gt 5000000 }
# cause a 15 seconds pause to requests from sources in excess of 2 megs/30s
tcp-request inspect-delay 15s
tcp-request content accept if { trk_bytes_out_rate gt 2000000 } WAIT_END
These counters maintain incoming connection rates and session rates
in a stick-table, over a period which is defined in the configuration
(2 ms to 24 days). They can be used to detect service abuse and
enforce a certain accept rate per source address for instance, and
block if the rate is passed over.
Example :
# block if more than 50 requests per 5 seconds from a source.
stick-table type ip size 200k expire 1m store conn_rate(5s),sess_rate(5s)
tcp-request track-counters src
tcp-request reject if { trk_conn_rate gt 50 }
# cause a 3 seconds pause to requests from sources in excess of 20 requests/5s
tcp-request inspect-delay 3s
tcp-request content accept if { trk_sess_rate gt 20 } WAIT_END
We're now able to return errors based on the validity of an argument
passed to a stick-table store data type. We also support ARG_T_DELAY
to pass delays to stored data types (eg: for rate counters).
Some data types will require arguments (eg: period for a rate counter).
This patch adds support for such arguments between parenthesis in the
"store" directive of the stick-table statement. Right now only integers
are supported.
When a session tracks a counter, automatically increase the cumulated
connection count. This makes src_updt_conn_cnt() almost useless. In
fact it might still be used to update different tables.
The new "bytes_in_cnt" and "bytes_out_cnt" session counters have been
added. They're automatically updated when session counters are updated.
They can be matched with the "src_kbytes_in" and "src_kbytes_out" ACLs
which apply to the volume per source address. This can be used to deny
access to service abusers.
The new "conn_cur" session counter has been added. It is automatically
updated upon "track XXX" directives, and the entry is touched at the
moment we increment the value so that we don't consider further counter
updates as real updates, otherwise we would end up updating upon completion,
which may not be desired. Probably that some other event counters (eg: HTTP
requests) will have to be updated upon each event though.
This counter can be matched against current session's source address using
the "src_conn_cur" ACL.
The "_cnt" suffix is already used by ACLs to count various data,
so it makes sense to use the same one in "conn_cnt" instead of
"conn_cum" to count cumulated connections.
This is not a problem because no version was emitted with those
keywords.
Thus we'll try to stick to the following rules :
xxxx_cnt : cumulated event count for criterion xxxx
xxxx_cur : current number of concurrent entries for criterion xxxx
xxxx_rate: event rate for criterion xxxx
This patch adds the ability to set a pointer in the session to an
entry in a stick table which holds various counters related to a
specific pattern.
Right now the syntax matches the target syntax and only the "src"
pattern can be specified, to track counters related to the session's
IPv4 source address. There is a special function to extract it and
convert it to a key. But the goal is to be able to later support as
many patterns as for the stick rules, and get rid of the specific
function.
The "track-counters" directive may only be set in a "tcp-request"
statement right now. Only the first one applies. Probably that later
we'll support multi-criteria tracking for a single session and that
we'll have to name tracking pointers.
No counter is updated right now, only the refcount is. Some subsequent
patches will have to bring that feature.
The buffer_feed* functions that are used to send data to buffers did only
support sending contiguous chunks while they're relying on memcpy(). This
patch improves on this by making them able to write in two chunks if needed.
Thus, the buffer_almost_full() function has been improved to really consider
the remaining space and not just what can be written at once.
Sometimes it's necessary to be able to perform some "layer 6" analysis
in the backend. TCP request rules were not available till now, although
documented in the diagram. Enable them in backend now.
Some config parsing functions need to return composite status codes
when they rely on other functions. Let's provide a few such codes
for general use and extend them later.
Some freq counters will have to work on periods different from 1 second.
The original freq counters rely on the period to be exactly one second.
The new ones (freq_ctr_period) let the user define the period in ticks,
and all computations are operated over that period. When reading a value,
it indicates the amount of events over that period too.
We'll need to divide 64 bits by 32 bits with new frequency counters.
Gcc does not know when it can safely do that, but the way we build
our operations let us be sure. So let's provide an optimised version
for that purpose.
This member will be used later when frontends are created on the
fly by some tasks. It will also be usable later if we need to
support multiple config instances for example.
When a connection is closed on a stream interface, some iohandlers
will need to be informed in order to release some resources. This
normally happens upon a shutr+shutw. It is the equivalent of the
fd_delete() call which is done for real sockets, except that this
time we release internal resources.
It can also be used with real sockets because it does not cost
anything else and might one day be useful.
The quote_arg() function can be used to quote an argument or indicate
"end of line" if it's null or empty. It should be useful to more precisely
report location of problems in the configuration.
When an entry already exists, we just need to update its expiration
timer. Let's have a dedicated function for that instead of spreading
open code everywhere.
This change also ensures that an update of an existing sticky session
really leads to an update of its expiration timer, which was apparently
not the case till now. This point needs to be checked in 1.4.
Till now sticky sessions only held server IDs. Now there are other
data types so it is not acceptable anymore to overwrite the server ID
when writing something. The server ID must then only be written from
the caller when appropriate. Doing this has also led to separate
lookup and storage.
This one can be parsed on the "stick-table" after with the "store"
keyword. It will hold the number of connections matching the entry,
for use with ACLs or anything else.
The stick_tables will now be able to store extra data for a same key.
A limited set of extra data types will be defined and for each of them
an offset in the sticky session will be assigned at startup time. All
of this information will be stored in the stick table.
The extra data types will have to be specified after the new "store"
keyword of the "stick-table" directive, which will reserve some space
for them.
pattern.c depended on stick_table while in fact it should be the opposite.
So we move from pattern.c everything related to stick_tables and invert the
dependency. That way the code becomes more logical and intuitive.
The name 'exps' and 'keys' in struct stksess was confusing because it was
the same name as in the table which holds all of them, while they only hold
one node each. Remove the trailing 's' to more clearly identify who's who.
Right now we're only able to store a server ID in a sticky session.
The goal is to be able to store anything whose size is known at startup
time. For this, we store the extra data before the stksess pointer,
using a negative offset. It will then be easy to cumulate multiple
data provided they each have their own offset.
It's very disturbing to see the "denied req" counter increase without
any other session counter moving. In fact, we can't count a rejected
TCP connection as "denied req" as we have not yet instanciated any
session at all. Let's use a new counter for that.
Now we're able to reject connections very early, so we need to use a
different counter for the connections that are received and the ones
that are accepted and converted into sessions, so that the rate limits
can still apply to the accepted ones. The session rate must still be
used to compute the rate limit, so that we can reject undesired traffic
without affecting the rate.
Analysers don't care (and must not care) about a few flags such as
BF_AUTO_CLOSE or BF_AUTO_CONNECT, so those flags should not be listed
in the BF_MASK_STATIC bitmask.
We should also recheck if some buffer flags should be ignored or not
in process_session() when deciding if we must loop again or not.
A new function session_accept() is now called from the lower layer to
instanciate a new session. Once the session is instanciated, the upper
layer's frontent_accept() is called. This one can be service-dependant.
That way, we have a 3-phase accept() sequence :
1) protocol-specific, session-less accept(), which is pointed to by
the listener. It defaults to the generic stream_sock_accept().
2) session_accept() which relies on a frontend but not necessarily
for use in a proxy (eg: stats or any future service).
3) frontend_accept() which performs the accept for the service
offerred by the frontend. It defaults to frontend_accept() which
is really what is used by a proxy.
The TCP/HTTP proxies have been moved to this mode so that we can now rely on
frontend_accept() for any type of session initialization relying on a frontend.
The next step will be to convert the stats to use the same system for the stats.
The conn_retries still lies in the session and its initialization depends
on the backend when it may not yet be known. Let's first move it to the
stream interface.
It's not normal to initialize the server-side stream interface from the
accept() function, because it may change later. Thus, we introduce a new
stream_sock_prepare_interface() function which is called just before the
connect() and which sets all of the stream_interface's callbacks to the
default ones used for real sockets. The ->connect function is also set
at the same instant so that we can easily add new server-side protocols
soon.
The connection timeout stored in the buffer has not been used since the
stream interface were introduced. Let's get rid of it as it's one of the
things that complicate factoring of the accept() functions.
We can disable the monitor-net rules on a listener if this flag is not
set in the listener's options. This will be useful when we don't want
to check that fe->addr is set or not for non-TCP frontends.
The new LI_O_TCP_RULES listener option indicates that some TCP rules
must be checked upon accept on this listener. It is now checked by
the frontend and the L4 rules are evaluated only in this case. The
flag is only set when at least one tcp-req rule is present in the
frontend.
The L4 rules check function has now been moved to proto_tcp.c where
it ought to be.
For a long time we had two large accept() functions, one for TCP
sockets instanciating proxies, and another one for UNIX sockets
instanciating the stats interface.
A lot of code was duplicated and both did not work exactly the same way.
Now we have a stream_sock layer accept() called for either TCP or UNIX
sockets, and this function calls the frontend-specific accept() function
which does the rest of the frontend-specific initialisation.
Some code is still duplicated (session & task allocation, stream interface
initialization), and might benefit from having an intermediate session-level
accept() callback to perform such initializations. Still there are some
minor differences that need to be addressed first. For instance, the monitor
nets should only be checked for proxies and not for other connection templates.
Last, we renamed l->private as l->frontend. The "private" pointer in
the listener is only used to store a frontend, so let's rename it to
eliminate this ambiguity. When we later support detached listeners
(eg: FTP), we'll add another field to avoid the confusion.
The 'client.c' file now only contained frontend-specific functions,
so it has naturally be renamed 'frontend.c'. Same for client.h. This
has also been an opportunity to remove some cross references from
files that should not have depended on it.
In the end, this file should contain a protocol-agnostic accept()
code, which would initialize a session, task, etc... based on an
accept() from a lower layer. Right now there are still references
to TCP.
Some functions which act on generic buffer contents without being
tcp-specific were historically in proto_tcp.c. This concerns ACLs
and RDP cookies. Those have been moved away to more appropriate
locations. Ideally we should create some new files for each layer6
protocol parser. Let's do that later.
Just like we do on health checks, we should consider that ACLs that make
use of buffer data are layer 6 and not layer 4, because we'll soon have
to distinguish between pure layer 4 ACLs (without any buffer) and these
ones.
This ACL was missing in complex setups where the status of a remote site
has to be considered in switching decisions. Until there, using a server's
status in an ACL required to have a dedicated backend, which is a bit heavy
when multiple servers have to be monitored.
The code is now ready to support loading pattern from filesinto trees. For
that, it will be required that the ACL keyword has a flag ACL_MAY_LOOKUP
and that the expr is case sensitive. When that is true, the pattern will
have a flag ACL_PAT_F_TREE_OK to indicate that it is possible to feed the
tree instead of a usual pattern if the parsing function is able to do this.
The tree's root is pre-initialized in the pattern's value so that the
function can easily find it. At that point, if the parsing function decides
to use the tree, it just sets ACL_PAT_F_TREE in the return flags so that
the caller knows the tree has been used and the pattern can be recycled.
That way it will be possible to load some patterns into the tree when it
is compatible, and other ones as linear linked lists. A good example of
this might be IPv4 network entries : right now we support holes in masks,
but this very rare feature is not compatible with binary lookup in trees.
So the parser will be able to decide itself whether the pattern can go to
the tree or not.
If we want to be able to match ACLs against a lot of possible values, we
need to put those values in trees. That will only work for exact matches,
which is normally just what is needed.
Right now, only IPv4 and string matching are planned, but others might come
later.
This is used to disable persistence depending on some conditions (for
example using an ACL matching static files or a specific User-Agent).
You can see it as a complement to "force-persist".
In the configuration file, the force-persist/ignore-persist declaration
order define the rules priority.
Used with the "appsesion" keyword, it can also help reducing memory usage,
as the session won't be hashed the persistence is ignored.
Some servers do not completely conform with RFC2616 requirements for
keep-alive when they receive a request with "Connection: close". More
specifically, they don't bother using chunked encoding, so the client
never knows whether the response is complete or not. One immediately
visible effect is that haproxy cannot maintain client connections alive.
The second issue is that truncated responses may be cached on clients
in case of network error or timeout.
Óscar Frías Barranco reported this issue on Tomcat 6.0.20, and
Patrik Nilsson with Jetty 6.1.21.
Cyril Bonté proposed this smart idea of pretending we run keep-alive
with the server and closing it at the last moment as is already done
with option forceclose. The advantage is that we only change one
emitted header but not the overall behaviour.
Since some servers such as nginx are able to close the connection
very quickly and save network packets when they're aware of the
close negociation in advance, we don't enable this behaviour by
default.
"option http-pretend-keepalive" will have to be used for that, in
conjunction with "option http-server-close".
Using get_ip_from_hdr2() we can look for occurrence #X or #-X and
extract the IP it contains. This is typically designed for use with
the X-Forwarded-For header.
Using "usesrc hdr_ip(name,occ)", it becomes possible to use the IP address
found in <name>, and possibly specify occurrence number <occ>, as the
source to connect to a server. This is possible both in a server and in
a backend's source statement. This is typically used to use the source
IP previously set by a upstream proxy.
The transparent proxy address selection was set in the TCP connect function
which is not the most appropriate place since this function has limited
access to the amount of parameters which could produce a source address.
Instead, now we determine the source address in backend.c:connect_server(),
right after calling assign_server_address() and we assign this address in
the session and pass it to the TCP connect function. This cannot be performed
in assign_server_address() itself because in some cases (transparent mode,
dispatch mode or http_proxy mode), we assign the address somewhere else.
This change will open the ability to bind to addresses extracted from many
other criteria (eg: from a header).
We'll need another flag in the 'options' member close to PR_O_TPXY_*,
and all are used, so let's move this easy one to options2 (which are
already used for SQL checks).
The following patch fixed an issue but brought another one :
296897 [MEDIUM] connect to servers even when the input has already been closed
The new issue is that when a connection is inspected and aborted using
TCP inspect rules, now it is sent to the server before being closed. So
that test is not satisfying. A probably better way is not to prevent a
connection from establishing if only BF_SHUTW_NOW is set but BF_SHUTW
is not. That way, the BF_SHUTW flag is not set if the request has any
data pending, which still fixes the stats issue, but does not let any
empty connection pass through.
Also, as a safety measure, we extend buffer_abort() to automatically
disable the BF_AUTO_CONNECT flag. While it appears to always be OK,
it is by pure luck, so better safe than sorry.
We are seeing both real servers repeatedly going on- and off-line with
a period of tens of seconds. Packet tracing, stracing, and adding
debug code to HAProxy itself has revealed that the real servers are
always responding correctly, but HAProxy is sometimes receiving only
part of the response.
It appears that the real servers are sending the test page as three
separate packets. HAProxy receives the contents of one, two, or three
packets, apparently randomly. Naturally, the health check only
succeeds when all three packets' data are seen by HAProxy. If HAProxy
and the real servers are modified to use a plain HTML page for the
health check, the response is in the form of a single packet and the
checks do not fail.
(...)
I've added buffer and length variables to struct server, and allocated
space with the rest of the server initialisation.
(...)
It seems to be working fine in my tests, and handles check responses
that are bigger than the buffer.
today I've noticed that the stats page still displays v1.3 in the
"Updates" link, due to the PRODUCT_BRANCH value in version.h, then
it's maybe time to send you the result (notice that the patch updates
PRODUCT_BRANCH to "1.4").
--
Cyril Bonté
When trying to spot some complex bugs, it's often needed to access
information on stuck sessions, which is quite difficult. This new
command helps one get detailed information about a session, with
flags, timers, states, etc... The buffer data are not dumped yet.