The new tune.idletimer value allows one to set a different value for
idle stream detection. The default value remains set to one second.
It is possible to disable it using zero, and to change the default
value at build time using DEFAULT_IDLE_TIMER.
Disabling the streamer flags after an idle period will help TCP proxies
to better adapt to the streams they're forwarding, especially with SSL
where this will allow the SSL sender to use smaller records. This is
typically used to optimally relay HTTP and derivatives such as SPDY or
HTTP/2 in pure TCP mode when haproxy is used as an SSL offloader.
This idea was first proposed by Ilya Grigorik on the haproxy mailing
list, and his tests seem to confirm the improvement :
https://www.mail-archive.com/haproxy@formilux.org/msg12576.html
tcp-check must not reinitialize the SSL stack upon each check!
It's done once after the config parsing and leaks memory and eats
performance when done upon every check.
This bug was introduced in 1.5-dev22, no backport is needed.
It happens that latest change broke some monitoring tools which expect the
field to be found at the same position as indicated in the doc. Let's move
it to the last column instead.
I forgot to remove one human_time() in the CSV output for the backend's
lastsess entry in previous patch, which caused the value to be reported
as "1m18s" for example instead of 78.
Summary:
Track and report last session time on the stats page for each server
in every backend, as well as the backend.
This attempts to address the requirement in the ROADMAP
- add a last activity date for each server (req/resp) that will be
displayed in the stats. It will be useful with soft stop.
The stats page reports this as time elapsed since last session. This
change does not adequately address the requirement for long running
session (websocket, RDP... etc).
By having the stream interface pass the CF_STREAMER flag to the
snd_buf() primitive, we're able to tell the send layer whether
we're sending large chunks or small ones.
We use this information in SSL to adjust the max record dynamically.
This results in small chunks respecting tune.ssl.maxrecord at the
beginning of a transfer or for small transfers, with an automatic
switch to full records if the exchanges last long. This allows the
receiver to parse HTML contents on the fly without having to retrieve
16kB of data, which is even more important with small initcwnd since
the receiver does not need to wait for round trips to start fetching
new objects. However, sending large files still produces large chunks.
For example, with tune.ssl.maxrecord = 2859, we see 5 write(2885)
sent in two segments each and 6 write(16421).
This idea was first proposed on the haproxy mailing list by Ilya Grigorik.
This prevents us from passing other useful info and requires the
upper levels to know these flags. Let's use a new flags category
instead : CO_SFL_*. For now, only MSG_MORE has been remapped.
When no check type is configured (so the basic connection check), we
want the connection success to be immediately reported. Unfortunately,
it did not happen because in this case the connection is not registered
for read nor for write, and the wake_srv() callback does not handle this
case where no data transfer was requested. However, having option tcp-check
hides this problem because the check type follows a different setup mode,
by having check->type != 0 and the connection believing it must try to
send data.
The effect was that without any option, checks would succeed only at the
end of the check interval. So let's just add the wake-up condition.
This bug appeared with the recent polling changes, no backport is needed.
As a workaround, using "option tcp-check" fixes the problem.
Useless strncpy were done in those two sample fetches, the
"struct chunk" allows us to dump the specified len.
The encode_string() in capture.req.uri was judged inappropriate and was
deleted.
The return type was fixed to SMP_T_CSTR.
A typo made first step of a tcpcheck to be a connect step. This patch
prevents this behavior. The bug was introduced in 1.5-dev22 with
"tcp-check connect" and only affects these directives. No backport is
needed.
Add 2 sample fetchs allowing to extract the method and the uri of an
HTTP request.
FIXME: the sample fetches parser can't add the LW_REQ requirement, at
the moment this flag is used automatically when you use sample fetches.
Note: also fixed the alphabetical order of other capture.req.* keywords
in the doc.
Released version 1.5-dev22 with the following main changes :
- MEDIUM: tcp-check new feature: connect
- MEDIUM: ssl: Set verify 'required' as global default for servers side.
- MINOR: ssl: handshake optim for long certificate chains.
- BUG/MINOR: pattern: pattern comparison executed twice
- BUG/MEDIUM: map: segmentation fault with the stats's socket command "set map ..."
- BUG/MEDIUM: pattern: Segfault in binary parser
- MINOR: pattern: move functions for grouping pat_match_* and pat_parse_* and add documentation.
- MINOR: standard: The parse_binary() returns the length consumed and his documentation is updated
- BUG/MINOR: payload: the patterns of the acl "req.ssl_ver" are no parsed with the good function.
- BUG/MEDIUM: pattern: "pat_parse_dotted_ver()" set bad expect_type.
- BUG/MINOR: sample: The c_str2int converter does not fail if the entry is not an integer
- BUG/MEDIUM: http/auth: Sometimes the authentication credentials can be mix between two requests
- MINOR: doc: Bad cli function name.
- MINOR: http: smp_fetch_capture_header_* fetch captured headers
- BUILD: last release inadvertently prepended a "+" in front of the date
- BUG/MEDIUM: stream-int: fix the keep-alive idle connection handler
- BUG/MEDIUM: backend: do not re-initialize the connection's context upon reuse
- BUG: Revert "OPTIM/MEDIUM: epoll: fuse active events into polled ones during polling changes"
- BUG/MINOR: checks: successful check completion must not re-enable MAINT servers
- MINOR: http: try to stick to same server after status 401/407
- BUG/MINOR: http: always disable compression on HTTP/1.0
- OPTIM: poll: restore polling after a poll/stop/want sequence
- OPTIM: http: don't stop polling for read on the client side after a request
- BUG/MEDIUM: checks: unchecked servers could not be enabled anymore
- BUG/MEDIUM: stats: the web interface must check the tracked servers before enabling
- BUG/MINOR: channel: CHN_INFINITE_FORWARD must be unsigned
- BUG/MINOR: stream-int: do not clear the owner upon unregister
- MEDIUM: stats: add support for HTTP keep-alive on the stats page
- BUG/MEDIUM: stats: fix HTTP/1.0 breakage introduced in previous patch
- Revert "MEDIUM: stats: add support for HTTP keep-alive on the stats page"
- MAJOR: channel: add a new flag CF_WAKE_WRITE to notify the task of writes
- OPTIM: session: set the READ_DONTWAIT flag when connecting
- BUG/MINOR: http: don't clear the SI_FL_DONT_WAKE flag between requests
- MINOR: session: factor out the connect time measurement
- MEDIUM: session: prepare to support earlier transitions to the established state
- MEDIUM: stream-int: make si_connect() return an established state when possible
- MINOR: checks: use an inline function for health_adjust()
- OPTIM: session: put unlikely() around the freewheeling code
- MEDIUM: config: report a warning when multiple servers have the same name
- BUG: Revert "OPTIM: poll: restore polling after a poll/stop/want sequence"
- BUILD/MINOR: listener: remove a glibc warning on accept4()
- BUG/MAJOR: connection: fix mismatch between rcv_buf's API and usage
- BUILD: listener: fix recent accept4() again
- BUG/MAJOR: ssl: fix breakage caused by recent fix abf08d9
- BUG/MEDIUM: polling: ensure we update FD status when there's no more activity
- MEDIUM: listener: fix polling management in the accept loop
- MINOR: protocol: improve the proto->drain() API
- MINOR: connection: add a new conn_drain() function
- MEDIUM: tcp: report in tcp_drain() that lingering is already disabled on close
- MEDIUM: connection: update callers of ctrl->drain() to use conn_drain()
- MINOR: connection: add more error codes to report connection errors
- MEDIUM: tcp: report connection error at the connection level
- MEDIUM: checks: make use of chk_report_conn_err() for connection errors
- BUG/MEDIUM: unique_id: HTTP request counter is not stable
- DOC: fix misleading information about SIGQUIT
- BUG/MAJOR: fix freezes during compression
- BUG/MEDIUM: stream-interface: don't wake the task up before end of transfer
- BUILD: fix VERDATE exclusion regex
- CLEANUP: polling: rename "spec_e" to "state"
- DOC: add a diagram showing polling state transitions
- REORG: polling: rename "spec_e" to "state" and "spec_p" to "cache"
- REORG: polling: rename "fd_spec" to "fd_cache"
- REORG: polling: rename the cache allocation functions
- REORG: polling: rename "fd_process_spec_events()" to "fd_process_cached_events()"
- MAJOR: polling: rework the whole polling system
- MAJOR: connection: remove the CO_FL_WAIT_{RD,WR} flags
- MEDIUM: connection: remove conn_{data,sock}_poll_{recv,send}
- MEDIUM: connection: add check for readiness in I/O handlers
- MEDIUM: stream-interface: the polling flags must always be updated in chk_snd_conn
- MINOR: stream-interface: no need to call fd_stop_both() on error
- MEDIUM: connection: no need to recheck FD state
- CLEANUP: connection: use conn_ctrl_ready() instead of checking the flag
- CLEANUP: connection: use conn_xprt_ready() instead of checking the flag
- CLEANUP: connection: fix comments in connection.h to reflect new behaviour.
- OPTIM: raw-sock: don't speculate after a short read if polling is enabled
- MEDIUM: polling: centralize polled events processing
- MINOR: polling: create function fd_compute_new_polled_status()
- MINOR: cli: add more information to the "show info" output
- MEDIUM: listener: add support for limiting the session rate in addition to the connection rate
- MEDIUM: listener: apply a limit on the session rate submitted to SSL
- REORG: stats: move the stats socket states to dumpstats.c
- MINOR: cli: add the new "show pools" command
- BUG/MEDIUM: counters: flush content counters after each request
- BUG/MEDIUM: counters: fix stick-table entry leak when using track-sc2 in connection
- MINOR: tools: add very basic support for composite pointers
- MEDIUM: counters: stop relying on session flags at all
- BUG/MINOR: cli: fix missing break in command line parser
- BUG/MINOR: config: correctly report when log-format headers require HTTP mode
- MAJOR: http: update connection mode configuration
- MEDIUM: http: make keep-alive + httpclose be passive mode
- MAJOR: http: switch to keep-alive mode by default
- BUG/MEDIUM: http: fix regression caused by recent switch to keep-alive by default
- BUG/MEDIUM: listener: improve detection of non-working accept4()
- BUILD: listener: add fcntl.h and unistd.h
- BUG/MINOR: raw_sock: correctly set the MSG_MORE flag
A new tcp-check rule type: connect.
It allows HAProxy to test applications which stand on multiple ports or
multiple applications load-balanced through the same backend.
Due to a typo, the MSG_MORE flag used to replace MSG_NOSIGNAL and
MSG_DONTWAIT. Fortunately, sockets are always marked non-blocking,
so the loss of MSG_DONTWAIT is harmless, and the NOSIGNAL is covered
by the interception of the SIGPIPE. So no issue could have been
caused by this bug.
On ARM, glibc does not implement accept4() and simply returns ENOSYS
which was not caught as a reason to fall back to accept(), resulting
in a spinning process since poll() would call again.
Let's change the error detection mechanism to save the broken status
of the syscall into a local variable that is used to fall back to the
legacy accept().
In addition to this, since the code was becoming a bit messy, the
accept4() was removed, so now the fallback code and the legacy code
are the same. This will also increase bug report accuracy if needed.
This is 1.5-specific, no backport is needed.
Yesterday's commit 70dffda ("MAJOR: http: switch to keep-alive mode by default")
broke HTTP/1.0 handling without keep-alive when keep-alive is enabled both in
the frontend and in the backend.
Before this patch, it used to work because tunnel mode was the default one,
so if no mode was present in the frontend and a mode was set in the backend,
the backend was the first one to parse the header. This is what the original
patch tried to do with keep-alive by default, causing the version and the
connection header to be ignored if both the frontend and the backend were
running in keep-alive mode.
The fix consists in always parsing the header in non-tunnel mode, and
processing the rest of the logic in at least once, and again if the
backend works in a different mode than the frontend.
This is 1.5-specific, no backport is needed.
The authentication function "get_http_auth()" extract credentials from
the request and keep it this values in shared cache. This function set
a flag in the session indicating that the authentication is already
parsed and the value stored in the cache are avalaible. If this flag is
set the authorization header is not re-parsed and the shared cache is
used.
If two request are simultaneous processsed, the first one check the
credentials. After this, the second request check also it's credentials
and change the data stored in the shared cache. When the first request
re-check credentials (for many reasons), they are changed. The change
can introduce a segfault.
This patch deactivate the cache upon success. When we need
authentication information from one request, they are re-parsed and
re-decoded. However, a failure to retrieve credentials is still
cached to avoid useless lookups.
This fix needs to be backported to 1.4 as well.
Since we support HTTP keep-alive, there is no more reason for staying
in tunnel mode by default. It is confusing for new users and creates
more issues than it solves. Option "http-tunnel" is available to force
to use it if really desired.
Switching to KA by default has implied to change the value of some
option flags and some transaction flags so that value zero (default)
matches keep-alive. That explains why more code has been changed than
expected. Tests have been run on the 25 combinations of frontend and
backend options, plus a few with option http-pretend-keepalive, and
no anomaly was found.
The relation between frontend and backends remains the same. Options
have been updated to take precedence over http-keep-alive which is now
implicit.
All references in the doc to haproxy not supporting keep-alive have
been fixed, and the doc for config options has been updated.
There's no particular reason for having keep-alive + httpclose combine
into forceclose when set in different frontend/backend sections, since
keep-alive does not close anything by default. Let's have this still
combination remain httpclose only.
At the very beginning of haproxy, there was "option httpclose" to make
haproxy add a "Connection: close" header in both directions to invite
both sides to agree on closing the connection. It did not work with some
rare products, so "option forceclose" was added to do the same and actively
close the connection. Then client-side keep-alive was supported, so option
http-server-close was introduced. Now we have keep-alive with a fourth
option, not to mention the implicit tunnel mode.
The connection configuration has become a total mess because all the
options above may be combined together, despite almost everyone thinking
they cancel each other, as judging from the common problem reports on the
mailing list. Unfortunately, re-reading the doc shows that it's not clear
at all that options may be combined, and the opposite seems more obvious
since they're compared. The most common issue is options being set in the
defaults section that are not negated in other sections, but are just
combined when the user expects them to be overloaded. The migration to
keep-alive by default will only make things worse.
So let's start to address the first problem. A transaction can only work in
5 modes today :
- tunnel : haproxy doesn't bother with what follows the first req/resp
- passive close : option http-close
- forced close : option forceclose
- server close : option http-server-close with keep-alive on the client side
- keep-alive : option http-keep-alive, end to end
All 16 combination for each section fall into one of these cases. Same for
the 256 combinations resulting from frontend+backend different modes.
With this patch, we're doing something slightly different, which will not
change anything for users with valid configs, and will only change the
behaviour for users with unsafe configs. The principle is that these options
may not combined anymore, and that the latest one always overrides all the
other ones, including those inherited from the defaults section. The "no
option xxx" statement is still supported to cancel one option and fall back
to the default one. It is mainly needed to ignore defaults sections (eg:
force the tunnel mode). The frontend+backend combinations have not changed.
So for examplen the following configuration used to put the connection
into forceclose :
defaults http
mode http
option httpclose
frontend foo.
option http-server-close
=> http-server-close+httpclose = forceclose before this patch! Now
the frontend's config replaces the defaults config and results in
the more expected http-server-close.
All 25 combinations of the 5 modes in (frontend,backend) have been
successfully tested.
In order to prepare for upcoming changes, a new "option http-tunnel" was
added. It currently only voids all other options, and has the lowest
precedence when mixed with another option in another frontend/backend.
If no CA file specified on a server line, the config parser will show an error.
Adds an cmdline option '-dV' to re-set verify 'none' as global default on
servers side (previous behavior).
Also adds 'ssl-server-verify' global statement to set global default to
'none' or 'required'.
WARNING: this changes the default verify mode from "none" to "required" on
the server side, and it *will* break insecure setups.
When using some log-format directives in header insertion without HTTP mode,
the config parser used to report a cryptic message about option httplog being
downgraded to tcplog and with "(null):0" as the file name and line number.
This is because the lfs_file and lfs_line were not properly set for some valid
use cases of log-format directives. Now we cover http-request and http-response
as well.
Yesterday's commit 12833bb ("MINOR: cli: add the new "show pools" command")
missed a "break" statement causing trouble to the "show map" command. Spotted
by Thierry Fournier.
Till now, we had one flag per stick counter to indicate if it was
tracked in a backend or in a frontend. We just had to add another
flag per stick-counter to indicate if it relies on contents or just
connection. These flags are quite painful to maintain and tend to
easily conflict with other flags if their number is changed.
The correct solution consists in moving the flags to the stkctr struct
itself, but currently this struct is made of 2 pointers, so adding a
new entry there to store only two bits will cause at least 16 more bytes
to be eaten per counter due to alignment issues, and we definitely don't
want to waste tens to hundreds of bytes per session just for things that
most users don't use.
Since we only need to store two bits per counter, an intermediate
solution consists in replacing the entry pointer with a composite
value made of the original entry pointer and the two flags in the
2 unused lower bits. If later a need for other flags arises, we'll
have to store them in the struct.
A few inline functions have been added to abstract the retrieval
and assignment of the pointers and flags, resulting in very few
changes. That way there is no more dependence on the number of
stick-counters and their position in the session flags.
In 1.5-dev19, commit e25c917 ("MEDIUM: counters: add support for tracking
a third counter") introduced the third track counter. However, there was
a hard-coded test in the accept() error path to release only sc0 and sc1.
So it seems that if tracking sc2 at the connection level and deciding to
reject once the track-sc2 has been done, there could be some leaking of
stick-table entries which remain marked used forever, thus which can never
be purged nor expired. There's no memory leak though, it's just that
entries are unexpirable forever.
The simple solution consists in removing the test and always calling
the inline function which iterates over all entries.
One year ago, commit 5d5b5d8 ("MEDIUM: proto_tcp: add support for tracking
L7 information") brought support for tracking L7 information in tcp-request
content rules. Two years earlier, commit 0a4838c ("[MEDIUM] session-counters:
correctly unbind the counters tracked by the backend") used to flush the
backend counters after processing a request.
While that earliest patch was correct at the time, it became wrong after
the second patch was merged. The code does what it says, but the concept
is flawed. "TCP request content" rules are evaluated for each HTTP request
over a single connection. So if such a rule in the frontend decides to
track any L7 information or to track L4 information when an L7 condition
matches, then it is applied to all requests over the same connection even
if they don't match. This means that a rule such as :
tcp-request content track-sc0 src if { path /index.html }
will count one request for index.html, and another one for each of the
objects present on this page that are fetched over the same connection
which sent the initial matching request.
Worse, it is possible to make the code do stupid things by using multiple
counters:
tcp-request content track-sc0 src if { path /foo }
tcp-request content track-sc1 src if { path /bar }
Just sending two requests first, one with /foo, one with /bar, shows
twice the number of requests for all subsequent requests. Just because
both of them persist after the end of the request.
So the decision to flush backend-tracked counters was not the correct
one. In practice, what is important is to flush countent-based rules
since they are the ones evaluated for each request.
Doing so requires new flags in the session however, to keep track of
which stick-counter was tracked by what ruleset. A later change might
make this easier to maintain over time.
This bug is 1.5-specific, no backport to stable is needed.
show pools
Dump the status of internal memory pools. This is useful to track memory
usage when suspecting a memory leak for example. It does exactly the same
as the SIGQUIT when running in foreground except that it does not flush
the pools.
Just like the previous commit, we sometimes want to limit the rate of
incoming SSL connections. While it can be done for a frontend, it was
not possible for a whole process, which makes sense when multiple
processes are running on a system to server multiple customers.
The new global "maxsslrate" setting is usable to fix a limit on the
session rate going to the SSL frontends. The limits applies before
the SSL handshake and not after, so that it saves the SSL stack from
expensive key computations that would finally be aborted before being
accounted for.
The same setting may be changed at run time on the CLI using
"set rate-limit ssl-session global".
It's sometimes useful to be able to limit the connection rate on a machine
running many haproxy instances (eg: per customer) but it removes the ability
for that machine to defend itself against a DoS. Thus, better also provide a
limit on the session rate, which does not include the connections rejected by
"tcp-request connection" rules. This permits to have much higher limits on
the connection rate without having to raise the session rate limit to insane
values.
The limit can be changed on the CLI using "set rate-limit sessions global",
or in the global section using "maxsessrate".
In addition to previous outputs, we also emit the cumulated number of
connections, the cumulated number of requests, the maximum allowed
SSL connection concurrency, the current number of SSL connections and
the cumulated number of SSL connections. This will help troubleshoot
systems which experience memory shortage due to SSL.
If the string not start with a number, the converter fails. In other, it
converts a maximum of characters to a number and stop to the first
character that not match a number.
This is a regression introducted by the patches "MINOR: pattern: Each
pattern sets the expected input type" and "MEDIUM: acl: Last patch
change the output type". The expected value is SMP_T_CSTR in place of
SMP_T_UINT.
This bug impact all the acl using the parser "pat_parse_dotted_ver()".
The two acl are "req_ssl_ver()" and "req.ssl_ver()".
This is a recent bug, no backport is needed.
This function is used to compute the new polling state based on
the previous state. All pollers have to do this in their update
loop, so better centralize the logic for it.
Currently, each poll loop handles the polled events the same way,
resulting in a lot of duplicated, complex code. Additionally, epoll
was the only one to handle newly created FDs immediately.
So instead, let's move that code to fd.c in a new function dedicated
to this task : fd_process_polled_events(). All pollers now use this
function.
This is the reimplementation of the "done" action : when we experience
a short read, we're almost certain that we've exhausted the system's
buffers and that we'll meet an EAGAIN if we attempt to read again. If
the FD is not yet polled, the stream interface already takes care of
stopping the speculative read. When the FD is already being polled, we
have two options :
- either we're running from a level-triggered poller, in which case
we'd rather report that we've reached the end so that we don't
speculate over the poller and let it report next time data are
available ;
- or we're running from an edge-triggered poller in which case we
have no choice and have to see the EAGAIN to re-enable events.
At the moment we don't have any edge-triggered poller, so it's desirable
to avoid speculative I/O that we know will fail.
Note that this must not be ported to SSL since SSL hides the real
readiness of the file descriptor.
Thanks to this change, we observe no EAGAIN anymore during keep-alive
transfers, and failed recvfrom() are reduced by half in http-server-close
mode (the client-facing side is always being polled and the second recv
can be avoided). Doing so results in about 5% performance increase in
keep-alive mode. Similarly, we used to have up to about 1.6% of EAGAIN
on accept() (1/maxaccept), and these have completely disappeared under
high loads.
It's easier and safer to rely on conn_xprt_ready() everywhere than to
check the flag itself. It will also simplify adding extra checks later
if needed. Some useless controls for !xprt have been removed, as the
XPRT_READY flag itself guarantees xprt is set.
It's easier and safer to rely on conn_ctrl_ready() everywhere than to
check the flag itself. It will also simplify adding extra checks later
if needed. Some useless controls for !ctrl have been removed, as the
CTRL_READY flag itself guarantees ctrl is set.
We already have everything in the connection flags using the
CO_FL_DATA_*_ENA bits combined with the fd's ready state, so
we do not need to check fdtab[fd].ev anymore. This considerably
simplifies the connection handling logic since it doesn't
have to mix connection flags with past polling states.
We don't need to call fd_stop_both() since we already call
conn_cond_update_polling() which will do it. This call was introduced by
commit d29a066 ("BUG/MAJOR: connection: always recompute polling status
upon I/O").
We used to only update the polling flags in data phase, but after that
we could update other flags. It does not seem possible to trigger a
bug here but it's not very safe either. Better always keep them up to
date.
The recv/send callbacks must check for readiness themselves instead of
having their callers do it. This will strengthen the test and will also
ensure we never refrain from calling a handshake handler because a
direction is being polled while the other one is ready.
We simply remove these functions and replace their calls with the
appropriate ones :
- if we're in the data phase, we can simply report wait on the FD
- if we're in the socket phase, we may also have to signal the
desire to read/write on the socket because it might not be
active yet.
These flags were used to report the readiness of the file descriptor.
Now this readiness is directly checked at the file descriptor itself.
This removes the need for constantly synchronizing updates between the
file descriptor and the connection and ensures that all layers share
the same level of information.
For now, the readiness is updated in conn_{sock,data}_poll_* by directly
touching the file descriptor. This must move to the lower layers instead
so that these functions can disappear as well. In this state, the change
works but is incomplete. It's sensible enough to avoid making it more
complex.
Now the sock/data updates become much simpler because they just have to
enable/disable access to a file descriptor and not to care anymore about
its readiness.