This adds a deinit_idle_conns() function that's called on deinit to
release the per-thread idle connection management tasks. The global
task was already taken care of.
The freeing of pre-check callbacks was missing when this feature was
recently added with commit b53eb8790 ("MINOR: init: add the pre-check
callback"), let's do it to make valgrind happy.
Tim reported in issue #1676 that just like startup_logs, trash buffers
are not released on deinit since they're thread-local, but valgrind
notices it when quitting before creating threads like "-c -f ...". Let's
just subscribe the function to deinit in addition to threads' end.
The two "free(x);x=NULL;" in free_trash_buffers_per_thread() were
also simplified using ha_free().
Tim reported in issue #1676 that we don't release startup logs if we
warn during startup and quit before creating threads (e.g. -c -f ...).
Let's subscribe deinit_errors_buffers() to both thread's end and
deinit. That's OK since it uses both per-thread and global variables,
and is idempotent.
Low footprint client machines may not have enough memory to download a
complete 16KB TLS record at once. With the new option the maximum
record size can be defined on the server side.
Note: Before limiting the the record size on the server side, a client should
consider using the TLS Maximum Fragment Length Negotiation Extension defined
in RFC6066.
This patch fixes GitHub issue #1679.
In issue #1677, Tim reported that we don't correctly free some shared
pools on exit. What happens in fact is that pool_destroy() is meant to
be called once per pool *pointer*, as it decrements the use count for
each pass and only releases the pool when it reaches zero. But since
pool_destroy_all() iterates over the pools list, it visits each pool
only once and will not eliminate some of them, which thus remain in the
list.
In an ideal case, the function should loop over all pools for as long
as the list is not empty, but that's pointless as we know we're exiting,
so let's just set the users count to 1 before the call so that
pool_destroy() knows it can delete and release the entry.
This could be backported to all versions (memory.c in 2.0 and older) but
it's not a real problem in practice.
We thought that we could get rid of some DISGUISE() with commit a80e4a354
("MINOR: fd: add functions to set O_NONBLOCK and FD_CLOEXEC") thanks to
the calls being in a function but that was without counting on Coverity.
Let's put it directly in the function since most if not all callers don't
care about this result.
It appears that it is safe to call perform a clean deinit at this point, so
let's do this to exercise the deinit paths some more.
Running `valgrind --leak-check=full --show-leak-kinds=all ./haproxy -vv` with
this change reports:
==261864== HEAP SUMMARY:
==261864== in use at exit: 344 bytes in 11 blocks
==261864== total heap usage: 1,178 allocs, 1,167 frees, 1,102,089 bytes allocated
==261864==
==261864== 24 bytes in 1 blocks are still reachable in loss record 1 of 2
==261864== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==261864== by 0x324BA6: hap_register_pre_check (init.c:92)
==261864== by 0x155824: main (haproxy.c:3024)
==261864==
==261864== 320 bytes in 10 blocks are still reachable in loss record 2 of 2
==261864== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==261864== by 0x26E54E: cfg_register_postparser (cfgparse.c:4238)
==261864== by 0x155824: main (haproxy.c:3024)
==261864==
==261864== LEAK SUMMARY:
==261864== definitely lost: 0 bytes in 0 blocks
==261864== indirectly lost: 0 bytes in 0 blocks
==261864== possibly lost: 0 bytes in 0 blocks
==261864== still reachable: 344 bytes in 11 blocks
==261864== suppressed: 0 bytes in 0 blocks
which is looking pretty good.
A config like the following:
global
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
resolvers unbound
nameserver unbound 127.0.0.1:53
will report the following leak when running a configuration check:
==241882== 6,991 (6,952 direct, 39 indirect) bytes in 1 blocks are definitely lost in loss record 8 of 13
==241882== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==241882== by 0x25938D: cfg_parse_resolvers (resolvers.c:3193)
==241882== by 0x26A1E8: readcfgfile (cfgparse.c:2171)
==241882== by 0x156D72: init (haproxy.c:2016)
==241882== by 0x156D72: main (haproxy.c:3037)
because the `.px` member of `struct resolvers` is not freed.
The offending allocation was introduced in
c943799c86 which is a reorganization that
happened during development of 2.4.x. This fix can likely be backported without
issue to 2.4+ and is likely not needed for earlier versions as the leak happens
during deinit only.
To make the deinit function a proper inverse of the init function we need to
free the `http_err_chunks`:
==252081== 311,296 bytes in 19 blocks are still reachable in loss record 50 of 50
==252081== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==252081== by 0x2727EE: http_str_to_htx (http_htx.c:914)
==252081== by 0x272E60: http_htx_init (http_htx.c:1059)
==252081== by 0x26AC87: check_config_validity (cfgparse.c:4170)
==252081== by 0x155DFE: init (haproxy.c:2120)
==252081== by 0x155DFE: main (haproxy.c:3037)
A memory leak was introduced when ignore-empty option was added to redirect
rules. If there is no location, when this option is set, the redirection is
aborted and the processing continues. But when this happened, the trash buffer
allocated to format the redirect response was not released.
The bug was introduced by commit bc1223be7 ("MINOR: http-rules: add a new
"ignore-empty" option to redirects.").
This patch should fix the issue #1675. It must be backported to 2.5.
If the "close-spread-time" option is set to "infinite", active
connection closing during a soft-stop can be disabled. The 'connection:
close' header or the GOAWAY frame will not be added anymore to the
server's response and active connections will only be closed once the
clients disconnect. Idle connections will not be closed all at once when
the soft-stop starts anymore, and each idle connection will follow its
own timeout based on the multiple timeouts set in the configuration (as
is the case during regular execution).
This feature request was described in GitHub issue #1614.
This patch should be backported to 2.5. It depends on 'MEDIUM: global:
Add a "close-spread-time" option to spread soft-stop on time window'.
While weak symbols were finally fixed with commit fb1b6f5bc ("BUILD:
compiler: use a more portable set of asm(".weak") statements"), it
was an error to think that initcall symbols were also weak. They must
not be and they're only global. The reason is that any externally
linked code loaded as a .so would drop its weak symbols when being
loaded, hence its initcalls that may contain various function
registration calls.
The ambiguity came from the fact that we initially reused the initcall's
HA_GLOBL macro for OSX then generalized it, then turned it to a choice
between .globl and .weak based on the OS, while in fact we needed a
macro to define weak symbols.
Let's rename the macro to HA_WEAK() to make it clear it's only for weak
symbols, and redefine HA_GLOBL() that initcall needs.
This will need to be backported wherever the commit above is backported
(at least 2.5 for now).
HAProxy crashes when "show ssl ca-file" is being called on a ca-file
which contains a lot of certificates. (127 in our test with
/etc/ssl/certs/ca-certificates.crt).
The root cause is the fonction does not yield when there is no available
space when writing the details, and we could write a lot.
To fix the issue, we try to put the data with ci_putchk() after every
show_cert_detail() and we yield if the ci_putchk() fails.
This patch also cleans up a little bit the code:
- the end label is now a real end with a return 1;
- i0 is used instead of (long)p1
- the ID is stored upon yield
Since the httpclient verify now has a fallback which disable the SSL in
the httpclient without exiting haproxy at startup, we can safely
re-enable it by default.
It could still be disabled with "httpclient-ssl-verify none".
The cafile_tree was never free upon deinit, making valgrind and ASAN
complains when haproxy quits.
This could be backported as far as 2.2 but it requires the
ssl_store_delete_cafile_entry() helper from
5daff3c8ab.
Emit a warning when the ca-file couldn't be loaded for the httpclient,
and disable the SSL of the httpclient.
We must never be in a case where the verify is disabled without any
configuration, so better disable the SSL completely.
Move the check on the scheme above the initialization of the applet so
we could abort before initializing the appctx.
There were plenty of leftovers from old code that were never removed
and that are not needed at all since these files do not use any
definition depending on fcntl.h, let's drop them.
This gets rid of most open-coded fcntl() calls, some of which were passed
through DISGUISE() to avoid a useless test. The FD_CLOEXEC was most often
set without preserving previous flags, which could become a problem once
new flags are created. Now this will not happen anymore.
Instead of seeing each location manipulate the fcntl() themselves and
often forget to check previous flags, let's centralize the functions to
do this. It also allows to drop fcntl.h from most call places and will
ease the adoption of different OS-specific mechanisms if needed. Note
that the fd_set_nonblock() function purposely doesn't check the previous
flags as it's meant to be used on new FDs only.
Despite what the 'close-spread-time' option should do, the
'connection:close' header was always added to HTTP responses during
soft-stops even with a soft-stop window defined.
This patch adds the proper random based closing to HTTP connections
during a soft-stop (based on the time left in the soft close window).
It should be backported to 2.5 once 'MEDIUM: global: Add a
"close-spread-time" option to spread soft-stop on time window' is
backported as well.
Some older systems may routinely return EWOULDBLOCK for some syscalls
while we tend to check only for EAGAIN nowadays. Modern systems define
EWOULDBLOCK as EAGAIN so that solves it, but on a few older ones (AIX,
VMS etc) both are different, and for portability we'd need to test for
both or we never know if we risk to confuse some status codes with
plain errors.
There were few entries, the most annoying ones are the switch/case
because they require to only add the entry when it differs, but the
other ones are really trivial.
__comp_fetch_init() only presets the maxzlibmem, and only when both
USE_ZLIB and DEFAULT_MAXZLIBMEM are set. The intent is to preset a
default value to protect the system against excessive memory usage
when no setting is set by the user.
Nowadays the entry in the global struct is always there so there's no
point anymore in passing via a constructor to possibly set this value.
Let's go the cleaner way by always presetting DEFAULT_MAXZLIBMEM to 0
in defaults.h unless these conditions are met, and always assigning it
instead of pre-setting the entry to zero. This is more straightforward
and removes some ifdefs and the last constructor. In addition, now the
setting has a chance of being found.
The constructor present there could be replaced with an initcall.
This one is set at level STG_PREPARE because it also zeroes the
lock_stats, and it's a bit odd that it could possibly have been
scheduled to run after other constructors that might already
preset some of these locks by accident.
Transport layers (raw_sock, ssl_sock, xprt_handshake and xprt_quic)
were using 4 constructors and 2 destructors. The 4 constructors were
replaced with INITCALL and the destructors with REGISTER_POST_DEINIT()
so that we do not depend on this anymore.
Pollers are among the few remaining blocks still using constructors
to register themselves. That's not needed anymore since the initcalls
so better turn to initcalls.
On some systems, the hard limit for ulimit -n may be huge, in the order
of 1 billion, and using this to automatically compute maxconn doesn't
work as it requires way too much memory. Users tend to hard-code maxconn
but that's not convenient to manage deployments on heterogenous systems,
nor when porting configs to developers' machines. The ulimit-n parameter
doesn't work either because it forces the limit. What most users seem to
want (and it makes sense) is to respect the system imposed limits up to
a certain value and cap this value. This is exactly what fd-hard-limit
does.
This addresses github issue #1622.
Almost all of our hash-based LB algorithms are implemented as special
cases of something that can now be achieved using sample expressions,
and some of them have adopted some options to adapt their behavior in
ways that could also be achieved using converters.
There are users who want to hash other parameters that are combined
into variables, and who set headers from these values and use
"balance hdr(name)" for this.
Instead of constantly implementing specific options and having users
hack around when they want a real hash, let's implement a native hash
mode that applies to a standard sample expression. This way, any
fetchable element (including variables) may be used to construct the
hash, even modified by any converter if desired.
Any type except bool could cast to bin, while it can cast to string.
That's a bit inconsistent, and prevents a boolean from being used as
the entry of a hash function while any other type can. This is a
problem when passing via variable where someone could use:
... set-var(txn.bar) always_false
to temporarily disable something, but this would result in an empty
hash output when later doing:
... var(txn.bar),sdbm
Instead of using c_int2bin() as is done for the string output, better
enfore an set of inputs or exactly 0 or 1 so that a poorly written sample
fetch function does not result in a difficult to debug hash output.
Surprisingly, while about all calls to a sample cast function carefully
avoid calling c_none(), sample_fetch_as_type() makes no effort regarding
this, while by nature, the function is most often called with an expected
output type similar to the one of the expresison. Let's add it to shorten
the most common call path.
The use_backend and use-server contexts were not enumerated in
smp_resolve_args, and while use-server doesn't currently take an
expression, at least use_backend supports that, and both entries ought
to be listed for completeness. Now an error in a use_backend rule
becomes more precise, from:
[ALERT] (12373) : config : parsing [use-srv.cfg:33]: unable to find
backend 'foo' referenced in arg 1 of sample fetch
keyword 'nbsrv' in proxy 'echo'.
to:
[ALERT] (12307) : config : parsing [use-srv.cfg:33]: unable to find
backend 'foo' referenced in arg 1 of sample fetch
keyword 'nbsrv' in use_backend expression in proxy
'echo'.
This may be backported though this is totally harmless.
Since commit dd7e6c6dc ("BUG/MINOR: http-rules: completely free incorrect
TCP rules on error") free_act_rule() is called on some error paths, and one
of them involves incomplete redirect rules that may cause a crash if the
rule wasn't yet initialized, as shown in this config snippet:
frontend ft
mode http
bind *:8001
http-request redirect location /%[always_false,sdbm]
Let's simply make release_http_redir() more robust against null redirect
rules.
No backport needed since it seems that the only way to trigger this was
the extra check above that was merged during 2.6-dev.
The function checking captures defined in tcp-request content ruleset didn't
use the right rule arguments. "arg.trk_ctr" was used instead of "arg.cap".
This patch must be backported as far as 2.2.
Since the 2.5, it is possible to define TCP/HTTP ruleset in defaults
sections. However, rules defining a capture in defaults sections was not
properly handled because they was not shared with the proxies inheriting
from the defaults section. This led to crash when haproxy tried to store a
new capture.
So now, to fix the issue, when a new proxy is created, the list of captures
points to the list of its defaults section. It may be NULL or not. All new
caputres are prepended to this list. It is not a problem to share the same
defaults section between several proxies, because it is not altered and we
take care to not release it when corresponding proxies are freed but only
when defaults proxies are freed. To do so, defaults proxies are now
unreferenced at the end of free_proxy() function instead of the beginning.
This patch should fix the issue #1674. It must be backported to 2.5.
Captures must only be defined in proxies with the frontend capabilities or
in defaults sections used by proxies with the frontend capabilities. Thus,
an extra check is added to be sure a defaults section defining a capture
will never be references by a backend.
Note that in this case, only named captures in "tcp-request content" or
"http-request" rules are possible. It is not possible in a defaults section
to decalre a capture slot. Not yet at least.
This patch must be backported to 2.5. It is releated to issue #1674.
When using qc_stream_desc_ack(), the stream instance may be freed if
there is no more data in its buffers. This also means that all frames
still stored waiting for ACK for this stream are freed via
qc_stream_desc_free().
This is particularly important in quic_stream_try_to_consume() where we
loop over the frames tree of the stream. A use-after-free is present in
cas the stream has been freed in the trace "stream consumed" which
dereference the frame. Fix this by first checking if the stream has been
freed or not.
This bug was detected by using ASAN + quic traces enabled.
Released version 2.6-dev7 with the following main changes :
- BUILD: calltrace: fix wrong include when building with TRACE=1
- MINOR: ssl: Use DH parameters defined in RFC7919 instead of hard coded ones
- MEDIUM: ssl: Disable DHE ciphers by default
- BUILD: ssl: Fix compilation with OpenSSL 1.0.2
- MINOR: mux-quic: split xfer and STREAM frames build
- REORG: quic: use a dedicated module for qc_stream_desc
- MINOR: quic-stream: use distinct tree nodes for quic stream and qcs
- MINOR: quic-stream: add qc field
- MEDIUM: quic: implement multi-buffered Tx streams
- MINOR: quic-stream: refactor ack management
- MINOR: quic: limit total stream buffers per connection
- MINOR: mux-quic: implement immediate send retry
- MINOR: cfg-quic: define tune.quic.conn-buf-limit
- MINOR: ssl: Add 'show ssl providers' cli command and providers list in -vv option
- REGTESTS: ssl: Update error messages that changed with OpenSSLv3.1.0-dev
- BUG/MEDIUM: quic: Possible crash with released mux
- BUG/MINOR: mux-quic: unsubscribe on release
- BUG/MINOR: mux-quic: handle null timeout
- BUG/MEDIUM: logs: fix http-client's log srv initialization
- BUG/MINOR: mux-quic: remove dead code in qcs_xfer_data()
- DEV: stream: Fix conn-streams dump in full stream message
- CLEANUP: conn-stream: Rename cs_conn_close() and cs_conn_drain_and_close()
- CLEANUP: conn-stream: Rename cs_applet_release()
- MINOR: conn-stream: Rely on endpoint shutdown flags to shutdown an applet
- BUG/MINOR: cache: Disable cache if applet creation fails
- BUG/MINOR: backend: Don't allow to change backend applet
- BUG/MEDIUM: conn-stream: Set back CS to RDY state when the appctx is created
- MINOR: stream: Don't needlessly detach server endpoint on early client abort
- MINOR: conn-stream: Make cs_detach_* private and use cs_destroy() from outside
- MINOR: init: add the pre-check callback
- MEDIUM: httpclient: change the init sequence
- MEDIUM: httpclient/ssl: verify required
- MINOR: httpclient/mworker: disable in the master process
- MEDIUM: httpclient/ssl: verify is configurable and disabled by default
- BUG/MAJOR: connection: Never remove connection from idle lists outside the lock
- BUG/MEDIUM: mux-quic: fix stalled POST requets
- BUG/MINOR: mux-quic: fix POST with abortonclose
- MINOR: task: add a new task_instant_wakeup() function
- MEDIUM: queue: use tasklet_instant_wakeup() to wake tasks
- DOC: remove my name from the config doc
I was surprised to notice that my name was still present as the author
at the top of the config manual. It turns out that this line and a few
other ones in this file remained unchanged since commit 6a06a40501 that
added this doc 15 years ago! It's long been time to get rid of this!
It's long been known that queues didn't scale with threads for various
reasons ranging from the cost of the queue lock to the cost of the
massive amount of inter-thread wakeups.
But some recent reports showing deplorable perfs with threads used at
100% CPU helped us notice that the two elements above add on top of
each other:
- with plenty of inter-thread wakeups, the scheduler takes a lot of
time to dequeue pending tasks from the shared queue ;
- the lock held by the scheduler to do this slows down subsequent
task_wakeup() calls from the the queue that are made under the
queue's lock
- the queue's lock slows down addition of new requests to the queue
and adds up to the number of needed queue entries for a steady
traffic.
But the cost of the share queue has no reason for being paid because
it had already been paid when process_stream() added the request to
the queue. As such an instant wakeup is perfectly fit for this.
This is exactly what this patch does, it uses tasklet_instant_wakeup()
to dequeue pending requests, which has the effect of not bloating the
shared queue, hence not requiring the global queue lock, which in turn
results in the wakeup to be much faster, and the queue lock to be much
shorter. In the end, a test with 4k concurrent connections that was
being limited to 40-80k requests/s before with 16 threads, some of
which were stuck at 100% CPU now reaches 570k req/s with 4% idle.
Given that it's been found that it was possible to trigger the watchdog
on the queue lock under extreme conditions, and that such conditions
could happen when users want to protect their servers during a DoS, it
would definitely make sense to backport it to the most recent releases
(2.5 and 2.4 seem like good candidates especially because their scheduler
is modern enough to receive the change above). If a backport is performed,
the following patch is needed:
MINOR: task: add a new task_instant_wakeup() function
This function's purpose is to wake up either a local or remote task,
bypassing the tree-based run queue. It is meant for fast wakeups that
are supposed to be equivalent to those used with tasklets, i.e. a task
had to pause some processing and can complete (typically a resource
becomes available again). In all cases, it's important to keep in mind
that the task must have gone through the regular scheduling path before
being blocked, otherwise the task priorities would be ignored.
The reason for this is that some wakeups are massively inter-thread
(e.g. server queues), that these inter-thread wakeups cause a huge
contention on the shared runqueue lock. A user reported 47% CPU spent
in process_runnable_tasks with only 32 threads and 80k requests in
queues. With this mechanism, purely one-to-one wakeups can avoid
taking the lock thanks to the mt_list used for the shared tasklet
queue.
Right now the shared tasklet queue moves everything to the TL_URGENT
queue. It's not dramatic but it would seem better to have a new shared
list dedicated to tasks, and that would deliver into TL_NORMAL, for an
even better fairness. This could be improved in the future.
Remove CS_EP_EOS set erroneously on qc_rcv_buf().
This fixes POST with abortonclose. Previously, request was preemptively
aborted by haproxy due to the incorrect EOS flag.
For the moment, EOS flag is not set anymore. It should be set to warn
about a premature close from the client.