The copy of the startup logs used to rely on a re-allocated memory area
on the fly, that would attempt to be delivered at once over the CLI. But
if it's too large (too many warnings) it will take time to start up, and
may not even show up on the CLI as it doesn't fit in a buffer.
The ring buffer infrastructure solves all this with no more code, let's
switch to this instead. It simply requires a parsing function to attach
the ring via ring_attach_cli() and all the rest is automatically handled.
Initially this was imagined as a code cleanup, until a test with a config
involving 100k backends and just one occurrence of
"load-server-state-from-file global" in the defaults section took approx
20 minutes to parse due to the O(N^2) cost of concatenating the warnings
resulting in ~1 TB of data to be copied, while it took only 0.57s with
the ring.
Ideally this patch should be backported to 2.0 and 1.9, though it relies
on the ring infrastructure which will then also need to be backported.
Configs able to trigger the bug are uncommon, so another workaround for
older versions without backporting the rings would consist in simply
limiting the size of the error message in print_message() to something
always printable, which will only return the first errors.
ring_attach_cli() is called by the keyword parsing function to dump a
ring to the CLI. It can only work with a specific handler and release
function. Let's make it set them appropriately instead of having the
caller know these functions. This way adding a command to dump a ring
is as simple as declaring a parsing function calling ring_attach_cli().
It was set to MAX_SYSLOG_LEN (1K). It is a bit short to print debug
traces. Especially when part of a buffers is dump. Now, the maximum length is
set to BUFSIZE (16K).
This is mandatory to process input one more time to add the EOM in the HTX
message and to set CS_FL_EOI on the conn-stream. Otherwise, in the stream, a
SHUTR will be reported on the corresponding channel without the EOI. It may be
erroneously interpreted as an abort.
This patch must be backported to 2.0 and 1.9.
Errors during the payload or the trailers parsing are reported with the
HTX_FL_PARSING_ERROR flag on the HTX message and not a negative return
value. This change was introduced when the fonctions to convert an H1 message to
HTX one were moved to a dedicated file. But the h1 mux was not fully updated
accordingly.
No backport needed except if the commits about file h1_htx.c are backported.
This adds two extra metrics per server, one for the current number of idle
connections and one for the configured limit :
* haproxy_server_idle_connections_current
* haproxy_server_idle_connections_limit
The following metrics have been renamed without the "_http" part :
* http_queue_time_average_seconds => queue_time_average_seconds
* http_connect_time_average_seconds => connect_time_average_seconds
* http_response_time_average_seconds => response_time_average_seconds
* http_total_time_average_seconds => total_time_average_seconds
These metrics are reported per backend and per server and are not specific to
HTTP sessions.
Now, for the sessions, the maximum times (queue, connect, response, total) are
reported in addition of the averages over the last 1024 connections. These
metrics are reported per backend and per server. Here are the metrics name :
* haproxy_backend_max_queue_time_seconds
* haproxy_backend_max_connect_time_seconds
* haproxy_backend_max_response_time_seconds
* haproxy_backend_max_total_time_seconds
and
* haproxy_server_max_queue_time_seconds
* haproxy_server_max_connect_time_seconds
* haproxy_server_max_response_time_seconds
* haproxy_server_max_total_time_seconds
This patch is related to #272.
Now, for the sessions, the maximum times (queue, connect, response, total) are
reported in addition of the averages over the last 1024 connections. These
values are called qtime_max, ctime_max, rtime_max and ttime_max.
This patch is related to #272.
For backends and servers, some average times for last 1024 connections are
already calculated. For the moment, the averages for the time passed in the
queue, the connect time, the response time (for HTTP session only) and the total
time are calculated. Now, in addition, the maximum time observed for these
values are also stored.
In addition, These new counters are cleared as all other max values with the CLI
command "clear counters".
This patch is related to #272.
This change make the payload filtering uniform between TCP and HTTP
filters. Now, in TCP, like in HTTP, there is only one callback responsible to
forward data. Thus, old callbacks, tcp_data() and tcp_forward_data(), are
replaced by a single callback function, tcp_payload(). This new callback gets
the offset in the payload to (re)start the filtering and the maximum amount of
data it can forward. It is the filter's responsibility to be compatible with HTX
streams. If not, it must not set the flag FLT_CFG_FL_HTX.
Because of this change, nxt and fwd offsets are no longer needed. Thus they are
removed from the filter structure with their update functions,
flt_change_next_size() and flt_change_forward_size(). Moreover, the trace filter
has been updated accordingly.
This patch breaks the compatibility with the old API. Thus it should probably
not be backported. But, AFAIK, there is no TCP filter, thus the breakage is very
limited.
For now, TCP callbacks are incompatible with the HTX streams because they are
designed to manipulate raw buffers. A new callback will probably be added to be
used in both modes, raw and HTX. So, for HTX streams, these callbacks are
ignored. This should not be a real problem because there is no known filters,
expect the trace filter, implementing these callbacks.
This patch must be backported to 2.0 and 1.9.
The rcsid variable is static an unused, causing a build warning. Let's
just add __attribute__((unused)) to shut the warning.
This may be backported to 2.0.
A corner case was opened in the listener_accept() code by commit 3f0d02bbc2
("MAJOR: listener: do not hold the listener lock in listener_accept()"). The
issue is when one listener (or a group of) managed to eat all the proxy's or
all the process's maxconn, and another listener tries to accept a new socket.
This results in the atomic increment to detect the excess connection count
and immediately abort, without pausing the listener, thus the call is
immediately performed again. This doesn't happen when the test is run on a
single listener because this listener got limited when crossing the limit.
But with 2 or more listeners, we don't have this luxury.
The solution consists in limiting the listener as soon as we have to
decline accepting an incoming connection. This means that the listener
will not be marked full yet if it gets the exact connection count but
this is not a problem in practice since all other listeners will only be
marked full after their first attempt. Thus from now on, a listener is
only full once it has already failed taking an incoming connection.
This bug was definitely responsible for the unreproduceable occasional
reports of high CPU usage showing epoll_wait() returning immediately
without accepting an incoming connection, like in bug #129.
This fix must be backported to 1.9 and 1.8.
The "shutdown sessions" admin-mode command used to open-code the list
traversal while there's already a function for this: srv_shutdown_streams().
Better use it.
The "shutdown session server" command used to open-code the list traversal
while there's already a function for this: srv_shutdown_streams(). Better
use it.
Since previous commit a132e5efa9 ("BUG/MEDIUM: Make sure we leave the
session list in session_free().") it's pointless to delete the conn
element inside "if" blocks given that the second test is always true
as well. Let's simplify this with a single LIST_DEL_INIT() before the
test.
In session_free(), if we're about to destroy a connection that had no mux,
make sure we leave the session_list before calling conn_free(). Otherwise,
conn_free() would call session_unown_conn(), which would potentially free
the associated srv_list, but session_free() also frees it, so that would
lead to a double free, and random memory corruption.
This should be backported to 1.9 and 2.0.
There is a very short race in the queues which happens in the following
situation:
- stream A on thread 1 is being processed by a server
- stream B on thread 2 waits in the backend queue for a server
- stream B on thread 2 is fed up with waiting and expires, calls
stream_free() which calls pendconn_free(), which sees the
stream attached
- at the exact same instant, stream A finishes on thread 1, sees
one stream is waiting (B), detaches it and wakes it up
- stream B continues pendconn_free() and calls pendconn_unlink()
- pendconn_unlink() now detaches the node again and performs a
second deletion (harmless since idempotent), and decrements
srv/px->nbpend again
=> the number of connections on the proxy or server may reach -1 if/when
this race occurs.
It is extremely tight as it can only occur during the test on p->leaf_p
though it has been witnessed at least once. The solution consists in
testing leaf_p again once the lock is held to make sure the element was
not removed in the mean time.
This should be backported to 2.0 and 1.9, probably even 1.8.
In tasklet_remove_from_tasket_list(), we can be called for a tasklet that is
either in the private task list, or in the shared tasklet list. Take that into
account and always use MT_LIST_DEL() to remove it, otherwise if we're in the
shared list and another thread attempts to add a tasklet in it, bad things
will happen.
__tasklet_remove_from_tasklet_list() is left unchanged, it's only supposed
to be used by process_runnable_task() to remove task/tasklets from the private
tast list.
This should not be backported.
This should fix github issue #357.
We need to call vars_init() when the list is empty otherwise we
can't use variables in the response scope. This regression was
introduced by cda7f3f5 (MINOR: stream: don't prune variables if
the list is empty).
The following config reproduces the issue:
defaults
mode http
frontend in
bind *:11223
http-request set-var(req.foo) str("foo") if { path /bar }
http-request set-header bar %[var(req.foo)] if { var(req.foo) -m found }
http-response set-var(res.bar) str("bar")
http-response set-header foo %[var(res.bar)] if { var(res.bar) -m found }
use_backend out
backend out
server s1 127.0.0.1:11224
listen back
bind *:11224
http-request deny deny_status 200
> GET /ba HTTP/1.1
> Host: localhost:11223
> User-Agent: curl/7.66.0
> Accept: */*
>
< HTTP/1.0 200 OK
< Cache-Control: no-cache
< Content-Type: text/html
> GET /bar HTTP/1.1
> Host: localhost:11223
> User-Agent: curl/7.66.0
> Accept: */*
>
< HTTP/1.0 200 OK
< Cache-Control: no-cache
< Content-Type: text/html
< foo: bar
This must be backported as far as 1.9.
Documentation states that the interval between 2 DNS resolution is
driven by "timeout resolve <time>" directive.
From a code point of view, this was applied unless the latest status of
the resolution was VALID. In such case, "hold valid" was enforce.
This is a bug, because "hold" timers are not here to drive how often we
want to trigger a DNS resolution, but more how long we want to keep an
information if the status of the resolution itself as changed.
This avoid flapping and prevent shutting down an entire backend when a
DNS server is not answering.
This issue was reported by hamshiva in github issue #345.
Backport status: 1.8
As reported by David Birdsong on the ML, the HTTP action do-resolve does
not use the DNS cache.
Actually, the action is "registred" to the resolution for said name to
be resolved and wait until an other requester triggers the it. Once the
resolution is finished, then the action is updated with the result.
To trigger this, you must have a server with runtime DNS resolution
enabled and run a do-resolve action with the same fqdn AND they use the
same resolvers section.
This patch fixes this behavior by ensuring the resolution associated to
the action has a valid answer which is not considered as expired. If
those conditions are valid, then we can use it (it's the "cache").
Backport status: 2.0
Since the legacy HTTP mode was removed, the stream is always released at the end
of each HTTP transaction and a new is created to handle the next request for
keep-alive connections. So the HTTP transaction is no longer reset and the
function http_reset_txn() can be removed.
All TCP and HTTP captures are stored in 2 arrays, one for the request and
another for the response. In HAPRoxy 1.5, these arrays are part of the HTTP
transaction and thus are released during its cleanup. Because in this version,
the transaction is part of the stream (in 1.5, streams are still called
sessions), the cleanup is always performed, for HTTP and TCP streams.
In HAProxy 1.6, the HTTP transaction was moved out from the stream and is now
dynamically allocated only when required (becaues of an HTTP proxy or an HTTP
sample fetch). In addition, still in 1.6, the captures arrays were moved from
the HTTP transaction to the stream. This way, it is still possible to capture
elements from TCP rules for a full TCP stream. Unfortunately, the release is
still exclusively performed during the HTTP transaction cleanup. Thus, for a TCP
stream where the HTTP transaction is not required, the TCP captures, if any, are
never released.
Now, all captures are released when the stream is freed. This fixes the memory
leak for TCP streams. For streams with an HTTP transaction, the captures are now
released when the transaction is reset and not systematically during its
cleanup.
This patch must be backported as fas as 1.6.
Runtime traces are now supported for the streams, only if compiled with
debug. process_stream() is covered as well as TCP/HTTP analyzers and filters.
In traces, the first argument is always a stream. So it is easy to get the info
about the channels and the stream-interfaces. The second argument, when defined,
is always a HTTP transaction. And the third one is an HTTP message. The trace
message is adapted to report HTTP info when possible.
The macros DBG_TRACE_*() can be used instead of existing trace macros to emit
trace messages in debug mode only, ie, when HAProxy is compiled with DEBUG_FULL
or DEBUG_DEV. Otherwise, these macros do nothing. So it is possible to add
traces for development purpose without impacting performance of production
instances.
Names of these macros may enter in conflict with the macros of the runtime
tracing mechanism. So the prefix "FLT_" has been added to avoid any ambiguities.
Despite the addition of the mux layer, no change have been made on how to enable
the TCP splicing on process_stream(). We still check if transport layer on both
sides support the splicing, but we don't check the muxes support. So it is
possible to start to splice data with an unencrypted H2 connection on a side and
an H1 connection on the other. This leads to a freeze of the stream until a
client or server timeout is reached.
This patch fixed a part of the issue #356. It must be backported as far as 1.8.
The mux H1 announces the support of the TCP splicing. It only works for payload
data. It works for messages with an explicit content-length or for tunnelled
data. For chunked messages, the mux H1 should normally not try to xfer more than
the current chunk through the pipe. Unfortunately, this works on the read side
but the send is completely bogus. During the output formatting, the announced
size of chunks does not handle the size that will be spliced. Because there is
no formatting when spliced data are sent, the produced message is malformed and
rejected by the peer.
For now, because it is quick and simple, the TCP splicing is disabled for
chunked messages. I will try to enable it again in a proper way. I don't know
for now if it will be backportable in previous versions. This will depend on the
amount of changes required to handle it.
This patch fixes a part of the issue #356. It must be backported to 2.0 and 1.9.
This patch is easy to review: let's call parse_logsrv() function to parse
"log" directive as this is already for other sections for proxies.
This enable us to log incoming TCP connections for the listeners for "peers"
sections.
Update the documentation for "peers" section.
These keywords received a second argument with commit ae6f125 ("MINOR:
sample: add us/ms support to date/http_date"). Each argument is optional,
it's not either both or none.
If the SSL_CTX of a previous instance (ckch_inst) was used as a
default_ctx, replace the default_ctx of the bind_conf by the first
SSL_CTX inserted in the SNI tree.
Use the RWLOCK of the sni tree to handle the change of the default_ctx.
When trying to update a certificate <file>.{rsa,ecdsa,dsa}, but this one
does not exist and if <file> was used as a regular file in the
configuration, the error was ambiguous. Correct it so we can return a
certificate not found error.
Commit bc6ca7c ("MINOR: ssl/cli: rework 'set ssl cert' as 'set/commit'")
broke the ability to commit a unique certificate which does not use a
bundle extension .{rsa,ecdsa,dsa}.
When doing an 'ssl set cert' with a certificate which does not exist in
configuration, the appctx->ctx.ssl.old_ckchs->path was duplicated while
app->ctx.ssl.old_ckchs was NULL, resulting in a NULL dereference.
Move the code so the 'not referenced' error is done before this.
Released version 2.1-dev4 with the following main changes :
- BUG/MINOR: cli: don't call the kw->io_release if kw->parse failed
- BUG/MINOR: mux-h2: Don't pretend mux buffers aren't full anymore if nothing sent
- BUG/MAJOR: stream-int: Don't receive data from mux until SI_ST_EST is reached
- DOC: remove obsolete section about header manipulation
- BUG/MINOR: ssl/cli: cleanup on cli_parse_set_cert error
- MINOR: ssl/cli: rework the 'set ssl cert' IO handler
- BUILD: CI: comment out cygwin build, upgrade various ssl libraries
- DOC: Improve documentation of http-re(quest|sponse) replace-(header|value|uri)
- BUILD/MINOR: tools: shut up the format truncation warning in get_gmt_offset()
- BUG/MINOR: spoe: fix off-by-one length in UUID format string
- BUILD/MINOR: ssl: shut up a build warning about format truncation
- BUILD: do not disable -Wformat-truncation anymore
- MINOR: chunk: add chunk_istcat() to concatenate an ist after a chunk
- Revert "MINOR: istbuf: add b_fromist() to make a buffer from an ist"
- MINOR: mux: Add a new method to get informations about a mux.
- BUG/MEDIUM: stream_interface: Only use SI_ST_RDY when the mux is ready.
- BUG/MEDIUM: servers: Only set SF_SRV_REUSED if the connection if fully ready.
- MINOR: doc: fix busy-polling performance reference
- MINOR: config: allow no set-dumpable config option
- MINOR: init: always fail when setrlimit fails
- MINOR: ssl/cli: rework 'set ssl cert' as 'set/commit'
- CLEANUP: ssl/cli: remove leftovers of bundle/certs (it < 2)
- REGTEST: vtest can now enable mcli with its own flag
- BUG/MINOR: config: Update cookie domain warn to RFC6265
- MINOR: sample: add us/ms support to date/http_date
- BUG/MINOR: ssl/cli: check trash allocation in cli_io_handler_commit_cert()
- BUG/MEDIUM: mux-h2: report no available stream on a connection having errors
- BUG/MEDIUM: mux-h2: immediately remove a failed connection from the idle list
- BUG/MEDIUM: mux-h2: immediately report connection errors on streams
- BUG/MINOR: stats: properly check the path and not the whole URI
- BUG/MINOR: ssl: segfault in cli_parse_set_cert with old openssl/boringssl
- BUG/MINOR: ssl: ckch->chain must be initialized
- BUG/MINOR: ssl: double free on error for ckch->{key,cert}
- MINOR: ssl: BoringSSL ocsp_response does not need issuer
- BUG/MEDIUM: ssl/cli: fix dot research in cli_parse_set_cert
- MINOR: backend: Add srv_name sample fetche
- DOC: Add GitHub issue config.yml
The sample fetche can get srv_name without foreach
`core.backends["bk"].servers`.
Then we can get Server class quickly via
`core.backends[txn.f:be_name()].servers[txn.f:srv_name()]`.
Issue#342