The column used to report the throttle percentage when a server is in
slowstart is based on the time only. This is wrong, because server weights
in slowstart are updated at most once a second, so the reported value is
wrong at least fo rone second during each step, which means all the time
when using short delays (< 20s).
The second point is that it's disturbing to see a weight < 100% without
any throttle at the end of the period (during the last second), because
the effective weight has not yet been updated.
Instead, we now compute the exact ratio between eweight and uweight and
report it. It's always accurate and describes the value being used instead
of using only the date.
It can be backported to 1.4 though it's not particularly important.
A crash was reported by Igor at owind when changing a server's weight
on the CLI. Lukas Tribus could reproduce a related bug where setting
a server's weight would result in the new weight being multiplied by
the initial one. The two bugs are the same.
The incorrect weight calculation results in the total farm weight being
larger than what was initially allocated, causing the map index to be out
of bounds on some hashes. It's easy to reproduce using "balance url_param"
with a variable param, or with "balance static-rr".
It appears that the calculation is made at many places and is not always
right and not always wrong the same way. Thus, this patch introduces a
new function "server_recalc_eweight()" which is dedicated to this task
of computing ->eweight from many other elements including uweight and
current time (for slowstart), and all users now switch to use this
function.
The patch is a bit large but the code was not trivially fixable in a way
that could guarantee this situation would not occur anymore. The fix is
much more readable and has been verified to work with all algorithms,
with both consistent and map-based hashes, and even with static-rr.
Slowstart was tested as well, just like enable/disable server.
The same bug is very likely present in 1.4 as well, so the patch will
probably need to be backported eventhough it will not apply as-is.
Thanks to Lukas and Igor for the information they provided to reproduce it.
Paramatise the following functions over the check of a server
* set_server_down
* set_server_up
* srv_getinter
* server_status_printf
* set_server_check_status
* set_server_disabled
* set_server_enabled
Generally the server parameter of these functions has been removed.
Where it is still needed it is obtained using check->server.
This is in preparation for associating a agent check
with a server which runs as well as the server's existing check.
By paramatising these functions they may act on each of the checks
without further significant modification.
Explanation of the SSP_O_HCHK portion of this change:
* Prior to this patch SSP_O_HCHK serves a single purpose which
is to tell server_status_printf() weather it should print
the details of the check of a server or not.
With the paramatisation that this patch adds there are two cases.
1) Printing the details of the check in which case a
valid check parameter is needed.
2) Not printing the details of the check in which case
the contents check parameter are unused.
In case 1) we could pass SSP_O_HCHK and a valid check and;
In case 2) we could pass !SSP_O_HCHK and any value for check
including NULL.
If NULL is used for case 2) then SSP_O_HCHK becomes supurfulous
and as NULL is used for case 2) SSP_O_HCHK has been removed.
Signed-off-by: Simon Horman <horms@verge.net.au>
When a process with large stick tables is replaced by a new one and remains
present until the last connection finishes, it keeps these data in memory
for nothing since they will never be used anymore by incoming connections,
except during syncing with the new process. This is especially problematic
when dealing with long session protocols such as WebSocket as it becomes
possible to stack many processes and eat a lot of memory.
So the idea here is to know if a table still needs to be synced or not,
and to purge all unused entries once the sync is complete. This means that
after a few hundred milliseconds when everything has been synchronized with
the new process, only a few entries will remain allocated (only the ones
held by sessions during the restart) and all the remaining memory will be
freed.
Note that we carefully do that only after the grace period is expired so as
not to impact a possible proxy that needs to accept a few more connections
before leaving.
Doing this required to add a sync counter to the stick tables, to know how
many peer sync sessions are still in progress in order not to flush the entries
until all synchronizations are completed.
In preparation of more flexibility in the stick counters, make their
number configurable. It still defaults to 3 which is the minimum
accepted value. Changing the value alone is not sufficient to get
more counters, some bitfields still need to be updated and the TCP
actions need to be updated as well, but this update tries to be
easier, which is nice for experimentation purposes.
This function is also called directly from backend.c, so let's stop
building fake args to call it as a sample fetch, and have a lower
layer more generic function instead.
We're having a lot of duplicate code just because of minor variants between
fetch functions that could be dealt with if the functions had the pointer to
the original keyword, so let's pass it as the last argument. An earlier
version used to pass a pointer to the sample_fetch element, but this is not
the best solution for two reasons :
- fetch functions will solely rely on the keyword string
- some other smp_fetch_* users do not have the pointer to the original
keyword and were forced to pass NULL.
So finally we're passing a pointer to the keyword as a const char *, which
perfectly fits the original purpose.
Remove event_accept() in include/proto/proto_http.h and use correct function
name in other two files instead of event_accept().
Signed-off-by: Godbach <nylzhaowei@gmail.com>
It was a bit inconsistent to have gpc start at 0 and sc start at 1,
so make sc start at zero like gpc. No previous release was issued
with sc3 anyway, so no existing setup should be affected.
Some actions were clearly missing to process response headers. This
patch adds a new "http-response" ruleset which provides the following
actions :
- allow : stop evaluating http-response rules
- deny : stop and reject the response with a 502
- add-header : add a header in log-format mode
- set-header : set a header in log-format mode
Since commit cfd97c6f was merged into 1.5-dev14 (BUG/MEDIUM: checks:
prevent TIME_WAITs from appearing also on timeouts), some valid health
checks sometimes used to show some TCP resets. For example, this HTTP
health check sent to a local server :
19:55:15.742818 IP 127.0.0.1.16568 > 127.0.0.1.8000: S 3355859679:3355859679(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742841 IP 127.0.0.1.8000 > 127.0.0.1.16568: S 1060952566:1060952566(0) ack 3355859680 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742863 IP 127.0.0.1.16568 > 127.0.0.1.8000: . ack 1 win 257
19:55:15.745402 IP 127.0.0.1.16568 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:15.745488 IP 127.0.0.1.8000 > 127.0.0.1.16568: FP 1:146(145) ack 23 win 257
19:55:15.747109 IP 127.0.0.1.16568 > 127.0.0.1.8000: R 23:23(0) ack 147 win 257
After some discussion with Chris Huang-Leaver, it appeared clear that
what we want is to only send the RST when we have no other choice, which
means when the server has not closed. So we still keep SYN/SYN-ACK/RST
for pure TCP checks, but don't want to see an RST emitted as above when
the server has already sent the FIN.
The solution against this consists in implementing a "drain" function at
the protocol layer, which, when defined, causes as much as possible of
the input socket buffer to be flushed to make recv() return zero so that
we know that the server's FIN was received and ACKed. On Linux, we can make
use of MSG_TRUNC on TCP sockets, which has the benefit of draining everything
at once without even copying data. On other platforms, we read up to one
buffer of data before the close. If recv() manages to get the final zero,
we don't disable lingering. Same for hard errors. Otherwise we do.
In practice, on HTTP health checks we generally find that the close was
pending and is returned upon first recv() call. The network trace becomes
cleaner :
19:55:23.650621 IP 127.0.0.1.16561 > 127.0.0.1.8000: S 3982804816:3982804816(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650644 IP 127.0.0.1.8000 > 127.0.0.1.16561: S 4082139313:4082139313(0) ack 3982804817 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650666 IP 127.0.0.1.16561 > 127.0.0.1.8000: . ack 1 win 257
19:55:23.651615 IP 127.0.0.1.16561 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:23.651696 IP 127.0.0.1.8000 > 127.0.0.1.16561: FP 1:146(145) ack 23 win 257
19:55:23.652628 IP 127.0.0.1.16561 > 127.0.0.1.8000: F 23:23(0) ack 147 win 257
19:55:23.652655 IP 127.0.0.1.8000 > 127.0.0.1.16561: . ack 24 win 257
This change should be backported to 1.4 which is where Chris encountered
this issue. The code is different, so probably the tcp_drain() function
will have to be put in the checks only.
The req.hdr and res.hdr fetch methods do not work well on headers which
are allowed to contain commas, such as User-Agent, Date or Expires.
More specifically, full-length matching is impossible if a comma is
present.
This patch introduces 4 new fetch functions which are designed to work
with these full-length headers :
- req.fhdr, req.fhdr_cnt
- res.fhdr, res.fhdr_cnt
These ones do not stop at commas and permit to return full-length header
values.
Since 1.5-dev12 and commit 3bf1b2b8 (MAJOR: channel: stop relying on
BF_FULL to take action), the HTTP parser switched to channel_full()
instead of BF_FULL to decide whether a buffer had enough room to start
parsing a request or response. The problem is that channel_full()
intentionally ignores outgoing data, so a corner case exists where a
large response might still be left in a response buffer with just a
few bytes left (much less than the reserve), enough to accept a second
response past the last data, but not enough to permit the HTTP processor
to add some headers. Since all the processing relies on this space being
available, we can get some random crashes when clients pipeline requests.
The analysis of a core from haproxy configured with 20480 bytes buffers
shows this : with enough "luck", when sending back the response for the
first request, the client is slow, the TCP window is congested, the socket
buffers are full, and haproxy's buffer fills up. We still have 20230 bytes
of response data in a 20480 response buffer. The second request is sent to
the server which returns 214 bytes which fit in the small 250 bytes left
in this buffer. And the buffer arrangement makes it possible to escape all
the controls in http_wait_for_response() :
|<------ response buffer = 20480 bytes ------>|
[ 2/2 | 3 | 4 | 1/2 ]
^ start of circular buffer
1/2 = beginning of previous response (18240)
2/2 = end of previous response (1990)
3 = current response (214)
4 = free space (36)
- channel_full() returns false (20230 bytes are going to leave)
- the response headers does not wrap at the end of the buffer
- the remaining linear room after the headers is larger than the
reserve, because it's the previous response which wraps :
=> response is processed
Header rewriting causes it to reach 260 bytes, 10 bytes larger than what
the buffer could hold. So all computations during header addition are
wrong and lead to the corruption we've observed.
All the conditions are very hard to meet (which explains why it took
almost one year for this bug to show up) and are almost impossible to
reproduce on purpose on a test platform. But the bug is clearly there.
This issue was reported by Dinko Korunic who kindly devoted a lot of
time to provide countless traces and cores, and to experiment with
troubleshooting patches to knock the bug down. Thanks Dinko!
No backport is needed, but all 1.5-dev versions between dev12 and dev18
included must be upgraded. A workaround consists in setting option
forceclose to prevent pipelined requests from being processed.
By properly affecting the flags and values, it becomes easier to add
more tracked counters, for example for experimentation. It also slightly
reduces the code and the number of tests. No counters were added with
this patch.
This patch adds a "scope" box in the statistics page in order to
display only proxies with a name that contains the requested value.
The scope filter is preserved across all clicks on the page.
While ACL args were resolved after all the config was parsed, it was not the
case with sample fetch args because they're almost everywhere now.
The issue is that ACLs now solely rely on sample fetches, so their args
resolving doesn't work anymore. And many fetches involving a server, a
proxy or a userlist don't work at all.
The real issue is that at the bottom layers we have no information about
proxies, line numbers, even ACLs in order to report understandable errors,
and that at the top layers we have no visibility over the locations where
fetches are referenced (think log node).
After failing multiple unsatisfying solutions attempts, we now have a new
concept of args list. The principle is that every proxy has a list head
which contains a number of indications such as the config keyword, the
context where it's used, the file and line number, etc... and a list of
arguments. This list head is of the same type as the elements, so it
serves as a template for adding new elements. This way, it is filled from
top to bottom by the callers with the information they have (eg: line
numbers, ACL name, ...) and the lower layers just have to duplicate it and
add an element when they face an argument they cannot resolve yet.
Then at the end of the configuration parsing, a loop passes over each
proxy's list and resolves all the args in sequence. And this way there is
all necessary information to report verbose errors.
The first immediate benefit is that for the first time we got very precise
location of issues (arg number in a keyword in its context, ...). Second,
in order to do this we had to parse log-format and unique-id-format a bit
earlier, so that was a great opportunity for doing so when the directives
are encountered (unless it's a default section). This way, the recorded
line numbers for these args are the ones of the place where the log format
is declared, not the end of the file.
Userlists report slightly more information now. They're the only remaining
ones in the ACL resolving function.
The acl_expr struct used to hold a pointer to the ACL keyword. But since
we now have all the relevant pointers, we don't need that anymore, we just
need the pointer to the keyword as a string in order to return warnings
and error messages.
So let's change this in order to remove the dependency on the acl_keyword
struct from acl_expr.
During this change, acl_cond_kw_conflicts() used to return a pointer to an
ACL keyword but had to be changed to return a const char* for the same reason.
The ACLs now use the fetch's ->use and ->val to decide upon compatibility
between the place where they are used and where the information are fetched.
The code is capable of reporting warnings about very fine incompatibilities
between certain fetches and an exact usage location, so it is expected that
some new warnings will be emitted on some existing configurations.
Two degrees of detection are provided :
- detecting ACLs that never match
- detecting keywords that are ignored
All tests show that this seems to work well, though bugs are still possible.
Proxy's acl_requires was a copy of all bits taken from ACLs, but we'll
get rid of ACL flags and only rely on sample fetches soon. The proxy's
acl_requires was only used to allocate an HTTP context when needed, and
was even forced in HTTP mode. So better have a flag which exactly says
what it's supposed to be used for.
ACL fetch functions used to directly reference a fetch function. Now
that all ACL fetches have their sample fetches equivalent, we can make
ACLs reference a sample fetch keyword instead.
In order to simplify the code, a sample keyword name may be NULL if it
is the same as the ACL's, which is the most common case.
A minor change appeared, http_auth always expects one argument though
the ACL allowed it to be missing and reported as such afterwards, so
fix the ACL to match this. This is not really a bug.
The file acl.c is a real mess, it both contains functions to parse and
process ACLs, and some sample extraction functions which act on buffers.
Some other payload analysers were arbitrarily dispatched to proto_tcp.c.
So now we're moving all payload-based fetches and ACLs to payload.c
which is capable of extracting data from buffers and rely on everything
that is protocol-independant. That way we can safely inflate this file
and only use the other ones when some fetches are really specific (eg:
HTTP, SSL, ...).
As a result of this cleanup, the following new sample fetches became
available even if they're not really useful :
always_false, always_true, rep_ssl_hello_type, rdp_cookie_cnt,
req_len, req_ssl_hello_type, req_ssl_sni, req_ssl_ver, wait_end
The function 'acl_fetch_nothing' was wrong and never used anywhere so it
was removed.
The "rdp_cookie" sample fetch used to have a mandatory argument while it
was optional in ACLs, which are supposed to iterate over RDP cookies. So
we're making it optional as a fetch too, and it will return the first one.
If a log-format involves some sample fetches that may not be present at
the logging instant, we can now report a warning.
Note that this is done both for log-format and for add-header and carefully
respects the original fetch keyword's capabilities.
Samples fetches were relying on two flags SMP_CAP_REQ/SMP_CAP_RES to describe
whether they were compatible with requests rules or with response rules. This
was never reliable because we need a finer granularity (eg: an HTTP request
method needs to parse an HTTP request, and is available past this point).
Some fetches are also dependant on the context (eg: "hdr" uses request or
response depending where it's involved, causing some abiguity).
In order to solve this, we need to precisely indicate in fetches what they
use, and their users will have to compare with what they have.
So now we have a bunch of bits indicating where the sample is fetched in the
processing chain, with a few variants indicating for some of them if it is
permanent or volatile (eg: an HTTP status is stored into the transaction so
it is permanent, despite being caught in the response contents).
The fetches also have a second mask indicating their validity domain. This one
is computed from a conversion table at registration time, so there is no need
for doing it by hand. This validity domain consists in a bitmask with one bit
set for each usage point in the processing chain. Some provisions were made
for upcoming controls such as connection-based TCP rules which apply on top of
the connection layer but before instantiating the session.
Then everywhere a fetch is used, the bit for the control point is checked in
the fetch's validity domain, and it becomes possible to finely ensure that a
fetch will work or not.
Note that we need these two separate bitfields because some fetches are usable
both in request and response (eg: "hdr", "payload"). So the keyword will have
a "use" field made of a combination of several SMP_USE_* values, which will be
converted into a wider list of SMP_VAL_* flags.
The knowledge of permanent vs dynamic information has disappeared for now, as
it was never used. Later we'll probably reintroduce it differently when
dealing with variables. Its only use at the moment could have been to avoid
caching a dynamic rate measurement, but nothing is cached as of now.
Break out set weight processing code.
This is in preparation for reusing the code.
Also, remove duplicate check in nested if clauses.
{px->lbprm.algo & BE_LB_PROP_DYN) is checked by
the immediate outer if clause, so there is no need
to check it a second time.
Signed-off-by: Simon Horman <horms@verge.net.au>
Commit 2b0108ad accidently got rid of the ability to emit a "-" for
empty log fields. This can happen for captured request and response
cookies, as well as for fetches. Since we don't want to have this done
for headers however, we set the default log method when parsing the
format. It is still possible to force the desired mode using +M/-M.
We now have http_apply_redirect_rule() which does all the redirect-specific
job instead of having this inside http_process_req_common().
Also one of the benefit gained from uniformizing this code is that both
keep-alive and close response do emit the PR-- flags. The fix for the
flags could probably be backported to 1.4 though it's very minor.
The previous function http_perform_redirect() was becoming confusing
so it was renamed http_perform_server_redirect() since it only applies
to server-based redirection.
At the moment, we need trash chunks almost everywhere and the only
correctly implemented one is in the sample code. Let's move this to
the chunks so that all other places can use this allocator.
Additionally, the get_trash_chunk() function now really returns two
different chunks. Previously it used to always overwrite the same
chunk and point it to a different buffer, which was a bit tricky
because it's not obvious that two consecutive results do alias each
other.
The dumpstats code looks like a spaghetti plate. Several functions are
supposed to be able to do several things but rely on complex states to
dispatch the work to independant functions. Most of the HTML output is
performed within the switch/case statements of the whole state machine.
Let's clean this up by adding new functions to emit the data and have
a few more iterators to avoid relying on so complex states.
The new stats dump sequence looks like this for CLI and for HTTP :
cli_io_handler()
-> stats_dump_sess_to_buffer() // "show sess"
-> stats_dump_errors_to_buffer() // "show errors"
-> stats_dump_raw_info_to_buffer() // "show info"
-> stats_dump_raw_info()
-> stats_dump_raw_stat_to_buffer() // "show stat"
-> stats_dump_csv_header()
-> stats_dump_proxy()
-> stats_dump_px_hdr()
-> stats_dump_fe_stats()
-> stats_dump_li_stats()
-> stats_dump_sv_stats()
-> stats_dump_be_stats()
-> stats_dump_px_end()
http_stats_io_handler()
-> stats_http_redir()
-> stats_dump_http() // also emits the HTTP headers
-> stats_dump_html_head() // emits the HTML headers
-> stats_dump_csv_header() // emits the CSV headers (same as above)
-> stats_dump_http_info() // note: ignores non-HTML output
-> stats_dump_proxy() // same as above
-> stats_dump_http_end() // emits HTML trailer
The log-format parser reached a limit making it hard to add new features.
It also suffers from a weak handling of certain incorrect corner cases,
for example "%{foo}" is emitted as a litteral while syntactically it's an
argument to no variable. Also the argument parser had to redo some of the
job with some cases causing minor memory leaks (eg: ignored args).
This work aims at improving the situation so that slightly better reporting
is possible and that it becomes possible to extend the log format. The code
has a few more states but looks significantly simpler. The parser is now
capable of reporting ignored arguments and truncated lines.
stream_int_chk_rcv_conn() did not clear connection flags before updating them. It
is unsure whether this could have caused the stalled transfers that have been
reported since dev15.
In order to avoid such further issues, we now use a simple inline function to do
all the job.
Looking at the assembly code that updt_fd() and alloc/release_spec_entry
produce in the polling loops, it's clear that gcc has to recompute pointers
several times in a row because of limited spare registers. By better
grouping adjacent structure updates, we improve the code size by around
60 bytes in the fast path on x86.
The stick counters were in two distinct sets of struct members,
causing some code to be duplicated. Now we use an array, which
enables some processing to be performed in loops. This allowed
the code to be shrunk by 700 bytes.
Until now it was only possible to use track-sc1/sc2 with "src" which
is the IPv4 source address. Now we can use track-sc1/sc2 with any fetch
as well as any transformation type. It works just like the "stick"
directive.
Samples are automatically converted to the correct types for the table.
Only "tcp-request content" rules may use L7 information, and such information
must already be present when the tracking is set up. For example it becomes
possible to track the IP address passed in the X-Forwarded-For header.
HTTP request processing now also considers tracking from backend rules
because we want to be able to update the counters even when the request
was already parsed and tracked.
Some more controls need to be performed (eg: samples do not distinguish
between L4 and L6).
Sessions using client certs are huge (more than 1 kB) and do not fit
in session cache, or require a huge cache.
In this new implementation sshcachesize set a number of available blocks
instead a number of available sessions.
Each block is large enough (128 bytes) to store a simple session (without
client certs).
Huge sessions will take multiple blocks depending on client certificate size.
Note: some unused code for session sync with remote peers was temporarily
removed.
When the PROXY protocol header is expected and fails, leading to an
abort of the incoming connection, we now emit a log message. If option
dontlognull is set and it was just a port probe, then nothing is logged.
Since the introduction of SSL, it became quite annoying not to get any useful
info in logs about handshake failures. Let's improve reporting for embryonic
sessions by checking a per-connection error code and reporting it into the logs
if an error happens before the session is completely instanciated.
The "dontlognull" option is supported in that if a connection does not talk
before being aborted, nothing will be emitted.
At the moment, only timeouts are considered for SSL and the PROXY protocol,
but next patches will handle more errors.