Very simple patch to define and declare a pool for the QUIC TLS encryptions levels.
It will be used to dynamically allocate such objects to be attached to the
QUIC connection object (quic_conn struct) and to remove from quic_conn struct the
static array of encryption levels (see ->els[]).
Add a pool to dynamically handle the memory used for the QUIC TLS packet number spaces.
Remove the static array of packet number spaces at QUIC connection level (struct
quic_conn) and add three new members to quic_conn struc as pointers to quic_pktns
struct, one by packet number space as follows:
->ipktns for Initial packet number space,
->hpktns for Handshake packet number space and
->apktns for Application packet number space.
Also add a ->pktns_list new member (struct list) to quic_conn struct to attach
the list of the packet number spaces allocated for the QUIC connection.
Implement ssl_to_quic_pktns() to map and retrieve the addresses of these pointers
from TLS stack encryption levels.
Modify quic_pktns_init() to initialize these members.
Modify ha_quic_set_encryption_secrets() and ha_quic_add_handshake_data() to
allocate the packet numbers and initialize the encryption level.
Implement quic_pktns_release() which takes pointers to pointers to packet number
space objects to release the memory allocated for a packet number space attached
to a QUIC connection and reset their address values.
Modify qc_new_conn() to allocation only the Initial packet number space and
Initial encryption level.
Modify QUIC loss detection API (quic_loss.c) to use the new ->pktns_list
list attached to a QUIC connection in place of a static array of packet number
spaces.
Replace at several locations the use of elements of an array of packet number
spaces by one of the three pointers to packet number spaces
haproxy/quic_tls-t.h is the correct place to quic_enc_level structure
definition.
Should be backported as far as 2.6 to ease any further backport to come.
While HTTP makes no promises of sub-message delivery, haproxy tries to
make reasonable efforts to be friendly to applications relying on this,
particularly though the "option http-no-delay" statement. However, it
was reported that when slz compression is being used, a few bytes can
remain pending for more data to complete them in the SLZ queue when
built on a 64-bit little endian architecture. This is because aligning
blocks on byte boundary is costly (requires to switch to literals and
to send 4 bytes of block size), so incomplete bytes are left pending
there until they reach at least 32 bits. On other architecture, the
queue is only 8 bits long.
Robert Newson from Apache's CouchDB project explained that the heartbeat
used by CouchDB periodically delivers a single LF character, that it used
to work fine before the change enlarging the queue for 64-bit platforms,
but only forwards once every 3 LF after the change. This was definitely
caused by this incomplete byte sequence queuing. Zlib is not affected,
and the code shows that ->flush() is always called. In the case of SLZ,
the called function is rfc195x_flush_or_finish() and when its "finish"
argument is zero, no flush is performed because there was no such flush()
operation.
The previous patch implemented a flush() operation in SLZ, and this one
makes rfc195x_flush_or_finish() call it when finish==0. This way each
delivered data block will now provoke a flush of the queue as is done
with zlib.
This may very slightly degrade the compression ratio, but another change
is needed to condition this on "option http-no-delay" only in order to
be consistent with other parts of the code.
This patch (and the preceeding slz one) should be backported at least to
2.6, but any further change to depend on http-no-delay should not.
In some cases it may be desirable for latency reasons to forcefully
flush the queue even if it results in suboptimal compression. In our
case the queue might contain up to almost 4 bytes, which need an EOB
and a switch to literal mode, followed by 4 bytes to encode an empty
message. This means that each call can add 5 extra bytes in the ouput
stream. And the flush may also result in the header being produced for
the first time, which can amount to 2 or 10 bytes (zlib or gzip). In
the worst case, a total of 19 bytes may be emitted at once upon a flush
with 31 pending bits and a gzip header.
This is libslz upstream commit cf8c4668e4b4216e930b56338847d8d46a6bfda9.
The 32-bits version field of the Retry paquet was inversed by the code. As this
field must be in the network byte order on the wire, this code has supposed that
the sender of the Retry packet will always be little endian. Hopefully this is
often the case on our Intel machines ;)
Must be backported as far as 2.6.
The 4 bits least significant bits of the first byte in a Retry packet must be
random. There are generated calling statistical_prng_range() with 16 as argument.
Must be backported as far as 2.6.
When a stick-table is defined within a peers section, the name is
prefixed with the peers section name. However when checking for
duplicate table names, the check was using the table name without
the prefix, and would thus never match.
Must be backported as far as 2.6.
This patch introduces the "client-sigalgs" keyword for the server line,
which allows to configure the list of server signature algorithms
negociated during the handshake. Also available as
"ssl-default-server-client-sigalgs" in the global section.
This patch introduces the "sigalgs" keyword for the server line, which
allows to configure the list of server signature algorithms negociated
during the handshake. Also available as "ssl-default-server-sigalgs" in
the global section.
This warning happened in 2.9-dev with commit 723c73f8a ("MEDIUM: mux-h1:
Split h1_process_mux() to make code more readable"). It's the usual gcc
crap that relies on comments to disable the warning but which drops these
comments between the preprocessor and the compiler, so using any split
build system (distcc, ccache etc) reintroduces the warning. Use the more
reliable and portable __fallthrough instead. No backport needed.
Return a more acurate error than the previous patch, CO_ER_SSL_EMPTY is
the code for "Connection closed during SSL handshake" which is more
precise than CO_ER_SSL_ABORT ("Connection error during SSL handshake").
No backport needed.
During a SSL_do_handshake(), SSL_ERROR_ZERO_RETURN can be returned in case
the remote peer sent a close_notify alert. Previously this would set the
connection error to CO_ER_SSL_HANDSHAKE, this patch sets it to
CO_ER_SSL_ABORT to have a more acurate error.
This bug was introduced by this commit which was not sufficient:
BUG/MINOR: quic: Possible endless loop in quic_lstnr_dghdlr()
It was revealed by the blackhole interop runner test with neqo as client.
qc_conn_release() could be called after having locke the CID tree when two different
threads was creating the same connection at the same time. Indeed in this case
the last thread which tried to create a new connection for the same an already existing
CID could not manage to insert an already inserted CID in the connection CID tree.
This was expected. It had to destroy the newly created for nothing connection calling
qc_conn_release(). But this function also locks this tree calling free_quic_conn_cids() leading to a deadlock.
A solution would have been to delete the new CID created from its tree before
calling qc_conn_release().
A better solution is to stop inserting the first CID from qc_new_conn(), and to
insert it into the CID tree only if there was not an already created connection.
This is whas is implemented by this patch.
Must be backported as far as 2.7.
Aurelien Darragon found a case of leak when working on ticket #2184.
When a reexec_on_failure() happens *BEFORE* protocol_bind_all(), the
worker is not fork and the mworker_proc struct is still there with
its 2 socketpairs.
The socketpair that is supposed to be in the master is already closed in
mworker_cleanup_proc(), the one for the worker was suppposed to
be cleaned up in mworker_cleanlisteners().
However, since the fd is not bound during this failure, the fd is never
closed.
This patch fixes the problem by setting the fd to -1 in the mworker_proc
after the fork, so we ensure that this it won't be close if everything
was done right, and then we try to close it in mworker_cleanup_proc()
when it's not set to -1.
This could be triggered with the script in ticket #2184 and a `ulimit -H
-n 300`. This will fail before the protocol_bind_all() when trying to
increase the nofile setrlimit.
In recent version of haproxy, there is a BUG_ON() in fd_insert() that
could be triggered by this bug because of the global.maxsock check.
Must be backported as far as 2.6.
The problem could exist in previous version but the code is different
and this won't be triggered easily without other consequences in the
master.
In GH #2187 it was mentioned that the ifnone-forwardfor regtest
did not cover the case where forwardfor ifnone is explicitly set in
the frontend but forwardfor option is not used in the backend.
Expected behavior in this case is that the frontend takes the precedence
because the backend did not specify the option.
Adding this missing case to prevent regressions in the future.
A regression was introduced in 730b983 ("MINOR: proxy: move 'forwardfor'
option to http_ext")
Indeed, when the forwardfor if-none option is specified on the frontend
but forwardfor is not specified at all on the backend: if-none from the
frontend is ignored.
But this behavior conflicts with the historical one, if-none should only
be ignored if forwardfor is also enabled on the backend and if-none is
not set there.
It should fix GH #2187.
This should be backported in 2.8 with 730b983 ("MINOR: proxy: move
'forwardfor' option to http_ext")
h1_append_chunk_size() and h1_prepend_chunk_crlf() functions were marked as
possibly unused to be able to add them in standalone commits. Now these
functions are used, the __maybe_unused statement can be removed.
When the HTX was introduced, we have lost the support for the kernel
splicing for chunked messages. Thanks to this patch set, it is possible
again. Of course, we still need to keep the H1 parser synchronized. Thus
only the chunk content can be spliced. We still need to read the chunk
envelope using a buffer.
There is no reason to backport this feature. But, just in case, this patch
depends on following patches:
* "MEDIUM: filters/htx: Don't rely on HTX extra field if payload is filtered"
* "MINOR: mux-h1: Add function to prepend the chunk crlf to the output buffer"
* "MINOR: mux-h1: Add function to append the chunk size to the output buffer"
* "REORG: mux-h1: Rename functions to emit chunk size/crlf in the output buffer"
* "MEDIUM: mux-h1: Split h1_process_mux() to make code more readable"
If an HTTP data filter is registered on a channel, we must not rely on the
HTX extra field because the payload may be changed and we cannot predict if
this value will change or not. It is too errorprone to let filters deal with
this reponsibility. So we set it to 0 when payload filtering is performed,
but only if the payload length can be determined. It is important because
this field may be used when data are forwarded. In fact, it will be used by
the H1 multiplexer to be able to splice chunk-encoded payload.
h1_process_mux() function was pretty huge a quite hard to debug. So, the
funcion is split in sub-functions. Each sub-function is responsible to a
part of the message (start-line, headers, payload, trailers...). We are
still relying on a HTTP parser to format the message to be sure to detect
errors. Functionnaly speaking, there is no change. But the code is now more
readable.
Depending on the timing, time to time, the log messages can be mixed. A
client can start and be fully handled by HAProxy (including its log message)
before the log message of the previous client was emitted or received. To
fix the issue, a barrier was added to be sure to eval the "expect" rule on
logs before starting the next client.
It appears that dconv dislikes the "see also" part being on the same line as
the regular paragraph. The beginning of the line does not show up in the
rendered version.
Attempt to fix this by inserting an additional newline which is consistent with
other options.
This bug arrived with this commit:
MINOR: quic: Remove pool_zalloc() from qc_new_conn()
Missing initialization of largest packet number received during a keyupdate phase.
This prevented the keyupdate feature from working and made the keyupdate interop
tests to fail for all the clients.
Furthermore, ->flags from quic_tls_ctx was also not initialized. This could
also impact the keyupdate feature at least.
No backport needed.
Replace a "less than" comparison between two tick variable by a call to tick_is_lt()
in quic_loss_pktns(). This bug could lead to a wrong packet loss detection
when the loss time computed values could wrap. This is the case 20 seconds after
haproxy has started.
Must be backported as far as 2.6.
In ticket #2184, HAProxy is crashing in a BUG_ON() after a lot of reload
when the previous processes did not exit.
Each worker has a socketpair which is a FD in the master, when reloading
this FD still exists until the process leaves. But the global.maxconn
value is not incremented for each of these FD. So when there is too much
workers and the number of FD reaches maxsock, the next FD inserted in
the poller will crash the process.
This patch fixes the issue by increasing the maxsock for each remaining
worker.
Must be backported in every maintained version.
This bug was introduced by this commit:
MINOR: quic: Remove pool_zalloc() from qc_new_conn()
The transport parameters was not initialized. This leaded to a crash when
dumping the received ones from TRACE()s.
Also reset the lengths of the CIDs attached to a quic_conn struct to 0 value
to prevent them from being dumped from traces when not already initialized.
No backport needed.
Replace a call to pool_zalloc() by a call to pool_malloc() into quic_dgram_parse
to allocate quic_rx_packet struct objects.
Initialize almost all the members of quic_rx_packet struct.
->saddr is initialized by quic_rx_pkt_retrieve_conn().
->pnl and ->pn are initialized by qc_do_rm_hp().
->dcid and ->scid are initialized by quic_rx_pkt_parse() which calls
quic_packet_read_long_header() for a long packet. For a short packet,
only ->dcid will be initialized.
pool_zalloc() is replaced by pool_alloc() into qc_conn_alloc_ssl_ctx() to allocate
a ssl_sock_ctx struct. ssl_sock_ctx struct member are all initiliazed to null values
excepted ->ssl which is initialized by the next statement: a call to qc_ssl_sess_init().
qc_new_conn() is ued to initialize QUIC connections with quic_conn struct objects.
This function calls quic_conn_release() when it fails to initialize a connection.
quic_conn_release() is also called to release the memory allocated by a QUIC
connection.
Replace pool_zalloc() by pool_alloc() in this function and initialize
all quic_conn struct members which are referenced by quic_conn_release() to
prevent use of non initialized variables in this fonction.
The ebtrees, the lists attached to quic_conn struct must be initialized.
The tasks must be reset to their NULL default values to be safely destroyed
by task_destroy(). This is all the case for all the TLS cipher contexts
of the encryption levels (struct quic_enc_level) and those for the keyupdate.
The packet number spaces (struct quic_pktns) must also be initialized.
->prx_counters pointer must be initialized to prevent quic_conn_prx_cntrs_update()
from dereferencing this pointer.
->latest_rtt member of quic_loss struct must also be initialized. This is done
by quic_loss_init() called by quic_path_init().
This may happen when the initilization of a new QUIC conn fails with qc_new_conn()
when receiving an Initial paquet. This is done after having allocated a CID with
new_quic_cid() called by quic_rx_pkt_retrieve_conn() which stays in the listener
connections tree without a QUIC connection attached to. Then when the listener
receives another Initial packet for the same CID, quic_rx_pkt_retrieve_conn()
returns NULL again (no QUIC connection) but with an thread ID already bound to the
connection, leading the datagram to be requeued in the same datagram handler thread
queue. And so on.
To fix this, the connection is created after having created the connection ID.
If this fails, the connection is deallocated.
During the race condition, when two different threads handle two datagrams for
the same connection, in addition to releasing the newer created connection ID,
the newer QUIC connection must also be released.
Must be backported as far as 2.7.
quic_conn_prx_cntrs_update() may be called from quic_conn_release() with
NULL as value for ->prx_counters member. This is the case when qc_new_conn() fails
when allocating <buf_area>. In this case quic_conn_prx_cntrs_update() BUG_ON().
Must be backported as far as 2.7.
->tx.pto_probe member of quic_pktns struct was not initialized by quic_pktns_init().
This bug never occured because all quic_pktns structs are attached to quic_conn
structs which are always pool_zalloc()'ed.
Must be backported as far as 2.6.
On soft-stop, netns_sig_stop() function is called to purge the shared
namespace tree by iterating over each entries to free them.
However, once an entry is cleaned up and removed from the tree, the entry
itself isn't freed and this results into a minor leak when soft-stopping
because entry was allocated using calloc() in netns_store_insert() when
parsing the configuration.
This could be backported in every stable versions.
When support for 'namespace' keyword was added for the 'default-server'
directive in 22f41a2 ("MINOR: server: Make 'default-server' support
'namespace' keyword."), we forgot to copy the attribute from the parent
to the newly created server.
This resulted in the 'namespace' keyword being parsed without errors when
used from a 'default-server' directive, but in practise the option was
simply ignored.
There's no need to duplicate the netns struct because it is stored in
a shared list, so copying the pointer does the job.
This patch partially fixes GH #2038 and should be backported to all
stable versions.
The local address was dumped as "from" address by dump_quic_full() and
the peer address as "to" address. This patch fixes this issue.
Furthermore, to support the server side (QUIC client) to come, it is preferable
to stop using "from" and "to" labels to dump the local and peer addresses which
is confusing for a QUIC client which uses its local address as "from" address.
To mimic netstat, this is "Local Address" and "Foreign Address" which will
be displayed by "show quic" CLI command and "local_addr" and "foreign_addr"
for "show quic full" command to mention the local addresses and the peer
addresses.
Must be backported as far as 2.7.
This bug arrived with this commit which was supposed to fix another one:
BUG/MINOR: quic: Wrong Application encryption level selection when probing
The aim of this patch was to prevent the Application encryption to be selected
when probing leading to ACK only packets to be sent if the ack delay timer
had fired in the meantime, leading to crashes when no 01-RTT had been sent
because the ack range tree is empty in this case.
This statement is not correct (qc->pktns->flags & QUIC_FL_PKTNS_PROBE_NEEDED)
because qc->pktns is an array of packet number space. But it is equivalent
to (qc->pktns[QUIC_TLS_PKTNS_INITIAL].flags & QUIC_FL_PKTNS_PROBE_NEEDED).
That said, the patch mentionned above is not more useful since this following
which disable the ack time during the handshakes:
BUG/MINOR: quic: Do not use ack delay during the handshakes
This commit revert the first patch mentionned above.
Must be backported as far as 2.6.
It was reported in issue #2181, strange behavior during the new SSL
hanshake failure logs.
Errors were logged with the code 0, which is unknown to OpenSSL.
This patch mades 2 changes:
- It stops using ERR_error_string() when the SSL error code is 0
- It uses ERR_error_string_n() to be thread-safe
Must be backported to 2.8.