We want to track the frames which have been duplicated during retransmissions so
that to avoid uselessly retransmitting frames which would already have been
acknowledged. ->origin new member is there to store the frame from which a copy
was done, ->reflist is a list to store the frames which are copies.
Also ensure all the frames are zeroed and that their ->reflist list member is
initialized.
Add QUIC_FL_TX_FRAME_ACKED flag definition to mark a TX frame as acknowledged.
Add a loop in the bidi STREAM function. This will call repeatdly
qcc_decode_qcs() and dequeue buffered frames.
This is useful when reception of more data is interrupted because the
MUX buffer was full. qcc_decode_qcs() has probably free some space so it
is useful to immediatly retry reception of buffered frames of the qcs
tree.
This may fix occurences of stalled Rx transfers with large payload.
Note however that there is still room for improvment. The conn-stream
layer is not able at this moment to retrigger demuxing. This is because
the mux io-handler does not treat Rx : this may continue to cause
stalled tranfers.
Previously, h3 layer was not able to demux a DATA frame if not fully
received in the Rx buffer. This causes evident limitation and prevents
to be able to demux a frame bigger than the buffer.
Improve h3_data_to_htx() to support partial frame demuxing. The demux
state is preserved in the h3s new fields : this is useful to keep the
current type and length of the demuxed frame.
Define a new structure h3s used to provide context for a H3 stream. This
structure is allocated and stored in the qcs thanks to previous commit
which provides app-layer context storage.
For now, h3s is empty. It will soon be completed to be able to support
stateful demux : this is required to be able to demux an incomplete
frame if the rx buffer is full.
Define 2 new callback for qcc_app_ops : attach and detach. They are
called when a qcs instance is respectively allocated and freed. If
implemented, they can allocate a custom context stored in the new
abstract field ctx of qcs.
For now, h3 and hq-interop does not use these new callbacks. They will
be soon implemented by the h3 layer to allocate a context used for
stateful demuxing.
This change is required to support the demuxing of H3 frames bigger than
a buffer.
Edit the functions used for HEADERS and DATA parsing. They now return
the number of bytes handled.
This change will help to demux H3 frames bigger than the buffer.
Improve the reception for STREAM frames. In qcc_recv(), if the frame is
bigger than the remaining space in rx buffer, do not reject it wholly.
Instead, copy as much data as possible. The rest of the data is
buffered.
This is necessary to handle H3 frames bigger than a buffer. The H3 code
does not demux until the frame is complete or the buffer is full.
Without this, the transfer on payload larger than the Rx buffer can
rapidly freeze.
Handle wrapping buffer in h3_data_to_htx(). If data is wrapping, first
copy the contiguous data, then copy the data in front of the buffer.
Note that h3_headers_to_htx() is not able to handle wrapping data. For
the moment, a BUG_ON was added as a reminder. This cas never happened,
most probably because HEADERS is the first frame of the stream.
Always set HTX flag HTX_SL_F_XFER_LEN for http/3. This is correct
becuase the size of H3 requests is always known thanks to the protocol
framing.
This may fix occurences of incomplete POST requests when the client side
of the connection has been closed before.
Add new qcs fields to count the sum of bytes received for each stream.
This is necessary to enforce flow-control for reception on the peer.
For the moment, the implementation is partial. No MAX_STREAM_DATA or
FLOW_CONTROL_ERROR are emitted. BUG_ON statements are here as a
remainder.
This means that for the moment we do not support POST payloads greater
that the initial max-stream-data announced (256k currently).
At least, we now ensure that we never buffer a frame which overflows the
flow-control limit : this ensures that the memory consumption per stream
should stay under control.
qcs and its field are not properly freed if the conn-stream allocation
fail in qcs_new(). Fix this by having a proper deinit code with a
dedicated label.
qcc_get_stream() was used when qcs and qc_stream_desc shared the same
node-tree. This is not the case anymore since
e4301da5ed
MINOR: quic-stream: use distinct tree nodes for quic stream and qcs
Now this function is broken as the qcc tree only contains qcs.
Thankfully it is unused so it can be removed without impact.
Comments were not properly edited since the splitting of functions for
stream emission. Also "payload" argument has been renamed to "in" as it
better reflects the function purpose.
This adds a deinit_idle_conns() function that's called on deinit to
release the per-thread idle connection management tasks. The global
task was already taken care of.
The freeing of pre-check callbacks was missing when this feature was
recently added with commit b53eb8790 ("MINOR: init: add the pre-check
callback"), let's do it to make valgrind happy.
Tim reported in issue #1676 that just like startup_logs, trash buffers
are not released on deinit since they're thread-local, but valgrind
notices it when quitting before creating threads like "-c -f ...". Let's
just subscribe the function to deinit in addition to threads' end.
The two "free(x);x=NULL;" in free_trash_buffers_per_thread() were
also simplified using ha_free().
Tim reported in issue #1676 that we don't release startup logs if we
warn during startup and quit before creating threads (e.g. -c -f ...).
Let's subscribe deinit_errors_buffers() to both thread's end and
deinit. That's OK since it uses both per-thread and global variables,
and is idempotent.
Low footprint client machines may not have enough memory to download a
complete 16KB TLS record at once. With the new option the maximum
record size can be defined on the server side.
Note: Before limiting the the record size on the server side, a client should
consider using the TLS Maximum Fragment Length Negotiation Extension defined
in RFC6066.
This patch fixes GitHub issue #1679.
In issue #1677, Tim reported that we don't correctly free some shared
pools on exit. What happens in fact is that pool_destroy() is meant to
be called once per pool *pointer*, as it decrements the use count for
each pass and only releases the pool when it reaches zero. But since
pool_destroy_all() iterates over the pools list, it visits each pool
only once and will not eliminate some of them, which thus remain in the
list.
In an ideal case, the function should loop over all pools for as long
as the list is not empty, but that's pointless as we know we're exiting,
so let's just set the users count to 1 before the call so that
pool_destroy() knows it can delete and release the entry.
This could be backported to all versions (memory.c in 2.0 and older) but
it's not a real problem in practice.
We thought that we could get rid of some DISGUISE() with commit a80e4a354
("MINOR: fd: add functions to set O_NONBLOCK and FD_CLOEXEC") thanks to
the calls being in a function but that was without counting on Coverity.
Let's put it directly in the function since most if not all callers don't
care about this result.
It appears that it is safe to call perform a clean deinit at this point, so
let's do this to exercise the deinit paths some more.
Running `valgrind --leak-check=full --show-leak-kinds=all ./haproxy -vv` with
this change reports:
==261864== HEAP SUMMARY:
==261864== in use at exit: 344 bytes in 11 blocks
==261864== total heap usage: 1,178 allocs, 1,167 frees, 1,102,089 bytes allocated
==261864==
==261864== 24 bytes in 1 blocks are still reachable in loss record 1 of 2
==261864== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==261864== by 0x324BA6: hap_register_pre_check (init.c:92)
==261864== by 0x155824: main (haproxy.c:3024)
==261864==
==261864== 320 bytes in 10 blocks are still reachable in loss record 2 of 2
==261864== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==261864== by 0x26E54E: cfg_register_postparser (cfgparse.c:4238)
==261864== by 0x155824: main (haproxy.c:3024)
==261864==
==261864== LEAK SUMMARY:
==261864== definitely lost: 0 bytes in 0 blocks
==261864== indirectly lost: 0 bytes in 0 blocks
==261864== possibly lost: 0 bytes in 0 blocks
==261864== still reachable: 344 bytes in 11 blocks
==261864== suppressed: 0 bytes in 0 blocks
which is looking pretty good.
A config like the following:
global
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
resolvers unbound
nameserver unbound 127.0.0.1:53
will report the following leak when running a configuration check:
==241882== 6,991 (6,952 direct, 39 indirect) bytes in 1 blocks are definitely lost in loss record 8 of 13
==241882== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==241882== by 0x25938D: cfg_parse_resolvers (resolvers.c:3193)
==241882== by 0x26A1E8: readcfgfile (cfgparse.c:2171)
==241882== by 0x156D72: init (haproxy.c:2016)
==241882== by 0x156D72: main (haproxy.c:3037)
because the `.px` member of `struct resolvers` is not freed.
The offending allocation was introduced in
c943799c86 which is a reorganization that
happened during development of 2.4.x. This fix can likely be backported without
issue to 2.4+ and is likely not needed for earlier versions as the leak happens
during deinit only.
To make the deinit function a proper inverse of the init function we need to
free the `http_err_chunks`:
==252081== 311,296 bytes in 19 blocks are still reachable in loss record 50 of 50
==252081== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==252081== by 0x2727EE: http_str_to_htx (http_htx.c:914)
==252081== by 0x272E60: http_htx_init (http_htx.c:1059)
==252081== by 0x26AC87: check_config_validity (cfgparse.c:4170)
==252081== by 0x155DFE: init (haproxy.c:2120)
==252081== by 0x155DFE: main (haproxy.c:3037)
A memory leak was introduced when ignore-empty option was added to redirect
rules. If there is no location, when this option is set, the redirection is
aborted and the processing continues. But when this happened, the trash buffer
allocated to format the redirect response was not released.
The bug was introduced by commit bc1223be7 ("MINOR: http-rules: add a new
"ignore-empty" option to redirects.").
This patch should fix the issue #1675. It must be backported to 2.5.
If the "close-spread-time" option is set to "infinite", active
connection closing during a soft-stop can be disabled. The 'connection:
close' header or the GOAWAY frame will not be added anymore to the
server's response and active connections will only be closed once the
clients disconnect. Idle connections will not be closed all at once when
the soft-stop starts anymore, and each idle connection will follow its
own timeout based on the multiple timeouts set in the configuration (as
is the case during regular execution).
This feature request was described in GitHub issue #1614.
This patch should be backported to 2.5. It depends on 'MEDIUM: global:
Add a "close-spread-time" option to spread soft-stop on time window'.
While weak symbols were finally fixed with commit fb1b6f5bc ("BUILD:
compiler: use a more portable set of asm(".weak") statements"), it
was an error to think that initcall symbols were also weak. They must
not be and they're only global. The reason is that any externally
linked code loaded as a .so would drop its weak symbols when being
loaded, hence its initcalls that may contain various function
registration calls.
The ambiguity came from the fact that we initially reused the initcall's
HA_GLOBL macro for OSX then generalized it, then turned it to a choice
between .globl and .weak based on the OS, while in fact we needed a
macro to define weak symbols.
Let's rename the macro to HA_WEAK() to make it clear it's only for weak
symbols, and redefine HA_GLOBL() that initcall needs.
This will need to be backported wherever the commit above is backported
(at least 2.5 for now).
HAProxy crashes when "show ssl ca-file" is being called on a ca-file
which contains a lot of certificates. (127 in our test with
/etc/ssl/certs/ca-certificates.crt).
The root cause is the fonction does not yield when there is no available
space when writing the details, and we could write a lot.
To fix the issue, we try to put the data with ci_putchk() after every
show_cert_detail() and we yield if the ci_putchk() fails.
This patch also cleans up a little bit the code:
- the end label is now a real end with a return 1;
- i0 is used instead of (long)p1
- the ID is stored upon yield
Since the httpclient verify now has a fallback which disable the SSL in
the httpclient without exiting haproxy at startup, we can safely
re-enable it by default.
It could still be disabled with "httpclient-ssl-verify none".
The cafile_tree was never free upon deinit, making valgrind and ASAN
complains when haproxy quits.
This could be backported as far as 2.2 but it requires the
ssl_store_delete_cafile_entry() helper from
5daff3c8abc658760a0d0c5fbbc633bfff1afe44.
Emit a warning when the ca-file couldn't be loaded for the httpclient,
and disable the SSL of the httpclient.
We must never be in a case where the verify is disabled without any
configuration, so better disable the SSL completely.
Move the check on the scheme above the initialization of the applet so
we could abort before initializing the appctx.
There were plenty of leftovers from old code that were never removed
and that are not needed at all since these files do not use any
definition depending on fcntl.h, let's drop them.
This gets rid of most open-coded fcntl() calls, some of which were passed
through DISGUISE() to avoid a useless test. The FD_CLOEXEC was most often
set without preserving previous flags, which could become a problem once
new flags are created. Now this will not happen anymore.
Instead of seeing each location manipulate the fcntl() themselves and
often forget to check previous flags, let's centralize the functions to
do this. It also allows to drop fcntl.h from most call places and will
ease the adoption of different OS-specific mechanisms if needed. Note
that the fd_set_nonblock() function purposely doesn't check the previous
flags as it's meant to be used on new FDs only.
Despite what the 'close-spread-time' option should do, the
'connection:close' header was always added to HTTP responses during
soft-stops even with a soft-stop window defined.
This patch adds the proper random based closing to HTTP connections
during a soft-stop (based on the time left in the soft close window).
It should be backported to 2.5 once 'MEDIUM: global: Add a
"close-spread-time" option to spread soft-stop on time window' is
backported as well.
Some older systems may routinely return EWOULDBLOCK for some syscalls
while we tend to check only for EAGAIN nowadays. Modern systems define
EWOULDBLOCK as EAGAIN so that solves it, but on a few older ones (AIX,
VMS etc) both are different, and for portability we'd need to test for
both or we never know if we risk to confuse some status codes with
plain errors.
There were few entries, the most annoying ones are the switch/case
because they require to only add the entry when it differs, but the
other ones are really trivial.
__comp_fetch_init() only presets the maxzlibmem, and only when both
USE_ZLIB and DEFAULT_MAXZLIBMEM are set. The intent is to preset a
default value to protect the system against excessive memory usage
when no setting is set by the user.
Nowadays the entry in the global struct is always there so there's no
point anymore in passing via a constructor to possibly set this value.
Let's go the cleaner way by always presetting DEFAULT_MAXZLIBMEM to 0
in defaults.h unless these conditions are met, and always assigning it
instead of pre-setting the entry to zero. This is more straightforward
and removes some ifdefs and the last constructor. In addition, now the
setting has a chance of being found.
The constructor present there could be replaced with an initcall.
This one is set at level STG_PREPARE because it also zeroes the
lock_stats, and it's a bit odd that it could possibly have been
scheduled to run after other constructors that might already
preset some of these locks by accident.
Transport layers (raw_sock, ssl_sock, xprt_handshake and xprt_quic)
were using 4 constructors and 2 destructors. The 4 constructors were
replaced with INITCALL and the destructors with REGISTER_POST_DEINIT()
so that we do not depend on this anymore.
Pollers are among the few remaining blocks still using constructors
to register themselves. That's not needed anymore since the initcalls
so better turn to initcalls.
On some systems, the hard limit for ulimit -n may be huge, in the order
of 1 billion, and using this to automatically compute maxconn doesn't
work as it requires way too much memory. Users tend to hard-code maxconn
but that's not convenient to manage deployments on heterogenous systems,
nor when porting configs to developers' machines. The ulimit-n parameter
doesn't work either because it forces the limit. What most users seem to
want (and it makes sense) is to respect the system imposed limits up to
a certain value and cap this value. This is exactly what fd-hard-limit
does.
This addresses github issue #1622.
Almost all of our hash-based LB algorithms are implemented as special
cases of something that can now be achieved using sample expressions,
and some of them have adopted some options to adapt their behavior in
ways that could also be achieved using converters.
There are users who want to hash other parameters that are combined
into variables, and who set headers from these values and use
"balance hdr(name)" for this.
Instead of constantly implementing specific options and having users
hack around when they want a real hash, let's implement a native hash
mode that applies to a standard sample expression. This way, any
fetchable element (including variables) may be used to construct the
hash, even modified by any converter if desired.