This patch allows the low level packet parser to drop packets with type for discarded
packet number spaces. Furthermore, this prevents it from reallocating new encryption
levels and packet number spaces already released/discarded. When a packet number space
is discarded, it MUST NOT be reallocated.
As the packet number space discarding is done asap the type of packet received is
known, some packet number space discarding check may be safely removed from qc_try_rm_hp()
and qc_qel_may_rm_hp() which are called after having parse the packet header, and
is type.
As the packet number spaces and encryption level are dynamically allocated,
the information about the packet number space discarded status must be kept
somewhere else than in these objects.
quic_tls_discard_keys() is no more useful.
Modify quic_pktns_discard() to do the same job: flag the quic_conn object
has having discarded packet number space.
Implement quic_tls_pktns_is_disarded() to check if a packet number space is
discarded. Note the Application data packet number space is never discarded.
There is no need to check that the packet number space associated to the encryption
level to handle the CRYPTO frames is available when entering qc_handle_crypto_frm().
This has already been done by the caller: qc_treat_rx_pkts().
Release the memory allocated for the Initial and Handshake encryption level
under packet number spaces as soon as possible.
qc_treat_rx_pkts() has been modified to do that for the Initial case. This must
be done after all the Initial packets have been parsed and the underlying TLS
secrets have been flagged as to be discarded. As the Initial encryption level is
removed from the list attached to the quic_conn object inside a list_for_each_entry()
loop, this latter had to be converted into a list_for_each_entry_safe() loop.
The memory allocated by the Handshake encryption level and packet number spaces
is released just before leaving the handshake I/O callback (qc_conn_io_cb()).
As ->iel and ->hel pointer to Initial and Handshake encryption are reset to null
value when releasing the encryption levels memory, some check have been added
at several place before dereferencing them. Same thing for the ->ipktns and ->htpktns
pointer to packet number spaces.
Also take the opportunity of this patch to modify qc_dgrams_retransmit() to
use packet number space variables in place of encryption level variables to shorten
some statements. Furthermore this reflects the jobs it has to do: retransmit
UDP datagrams from packet number spaces.
quic_conn_io_cb() is the I/O handler callback used during the handshakes.
quic_conn_app_io_cb() is used after the handshakes. Both call qc_rm_hp_pkts()
before parsing the RX packets ordered by their packet numbers calling qc_treat_rx_pkts().
qc_rm_hp_pkts() is there to remove the header protection to reveal the packet
numbers, then reorder the packets by their packet numbers.
qc_rm_hp_pkts() may be safely directly called by qc_treat_rx_pkts(), which is
itself called by the I/O handlers.
Insert a loop around the existing code of qc_treat_rx_pkts() to handle all
the RX packets by encryption level for all the encryption levels allocated
for the connection, i.e. for all the encryption levels with secrets derived
from the ones provided by the TLS stack through ->set_encryption_secrets().
The maximum length of the secrets derived by the TLS stack is 384 bits.
This reduces the size of the objects provided by the "quic_tls_secret" pool by
16 bytes.
Should be backported as far as 2.6
Replace ->els static array of encryption levels by 4 pointers into the QUIC
connection object, quic_conn struct.
->iel denotes the Initial encryption level,
->eel the Early-Data encryption level,
->hel the Handshaske encryption level and
->ael the Application Data encryption level.
Add ->qel_list to this structure to list the encryption levels after having been
allocated. Modify consequently the encryption level object itself (quic_enc_level
struct) so that it might be added to ->qel_list QUIC connection list of
encryption levels.
Implement qc_enc_level_alloc() to initialize the value of a pointer to an encryption
level object. It is used to initialized the pointer newly added to the quic_conn
structure. It also takes a packet number space pointer address as argument to
initialize it if not already initialized.
Modify quic_tls_ctx_reset() to call it from quic_conn_enc_level_init() which is
called by qc_enc_level_alloc() to allocate an encryption level object.
Implement 2 new helper functions:
- ssl_to_qel_addr() to match and pointer address to a quic_encryption level
attached to a quic_conn object with a TLS encryption level enum value;
- qc_quic_enc_level() to match a pointer to a quic_encryption level attached
to a quic_conn object with an internal encryption level enum value.
This functions are useful to be called from ->set_encryption_secrets() and
->add_handshake_data() TLS stack called which takes a TLS encryption enum
as argument (enum ssl_encryption_level_t).
Replace all the use of the qc->els[] array element values by one of the newly
added ->[ieha]el quic_conn struct member values.
Very simple patch to define and declare a pool for the QUIC TLS encryptions levels.
It will be used to dynamically allocate such objects to be attached to the
QUIC connection object (quic_conn struct) and to remove from quic_conn struct the
static array of encryption levels (see ->els[]).
Add a pool to dynamically handle the memory used for the QUIC TLS packet number spaces.
Remove the static array of packet number spaces at QUIC connection level (struct
quic_conn) and add three new members to quic_conn struc as pointers to quic_pktns
struct, one by packet number space as follows:
->ipktns for Initial packet number space,
->hpktns for Handshake packet number space and
->apktns for Application packet number space.
Also add a ->pktns_list new member (struct list) to quic_conn struct to attach
the list of the packet number spaces allocated for the QUIC connection.
Implement ssl_to_quic_pktns() to map and retrieve the addresses of these pointers
from TLS stack encryption levels.
Modify quic_pktns_init() to initialize these members.
Modify ha_quic_set_encryption_secrets() and ha_quic_add_handshake_data() to
allocate the packet numbers and initialize the encryption level.
Implement quic_pktns_release() which takes pointers to pointers to packet number
space objects to release the memory allocated for a packet number space attached
to a QUIC connection and reset their address values.
Modify qc_new_conn() to allocation only the Initial packet number space and
Initial encryption level.
Modify QUIC loss detection API (quic_loss.c) to use the new ->pktns_list
list attached to a QUIC connection in place of a static array of packet number
spaces.
Replace at several locations the use of elements of an array of packet number
spaces by one of the three pointers to packet number spaces
haproxy/quic_tls-t.h is the correct place to quic_enc_level structure
definition.
Should be backported as far as 2.6 to ease any further backport to come.
While HTTP makes no promises of sub-message delivery, haproxy tries to
make reasonable efforts to be friendly to applications relying on this,
particularly though the "option http-no-delay" statement. However, it
was reported that when slz compression is being used, a few bytes can
remain pending for more data to complete them in the SLZ queue when
built on a 64-bit little endian architecture. This is because aligning
blocks on byte boundary is costly (requires to switch to literals and
to send 4 bytes of block size), so incomplete bytes are left pending
there until they reach at least 32 bits. On other architecture, the
queue is only 8 bits long.
Robert Newson from Apache's CouchDB project explained that the heartbeat
used by CouchDB periodically delivers a single LF character, that it used
to work fine before the change enlarging the queue for 64-bit platforms,
but only forwards once every 3 LF after the change. This was definitely
caused by this incomplete byte sequence queuing. Zlib is not affected,
and the code shows that ->flush() is always called. In the case of SLZ,
the called function is rfc195x_flush_or_finish() and when its "finish"
argument is zero, no flush is performed because there was no such flush()
operation.
The previous patch implemented a flush() operation in SLZ, and this one
makes rfc195x_flush_or_finish() call it when finish==0. This way each
delivered data block will now provoke a flush of the queue as is done
with zlib.
This may very slightly degrade the compression ratio, but another change
is needed to condition this on "option http-no-delay" only in order to
be consistent with other parts of the code.
This patch (and the preceeding slz one) should be backported at least to
2.6, but any further change to depend on http-no-delay should not.
In some cases it may be desirable for latency reasons to forcefully
flush the queue even if it results in suboptimal compression. In our
case the queue might contain up to almost 4 bytes, which need an EOB
and a switch to literal mode, followed by 4 bytes to encode an empty
message. This means that each call can add 5 extra bytes in the ouput
stream. And the flush may also result in the header being produced for
the first time, which can amount to 2 or 10 bytes (zlib or gzip). In
the worst case, a total of 19 bytes may be emitted at once upon a flush
with 31 pending bits and a gzip header.
This is libslz upstream commit cf8c4668e4b4216e930b56338847d8d46a6bfda9.
The 32-bits version field of the Retry paquet was inversed by the code. As this
field must be in the network byte order on the wire, this code has supposed that
the sender of the Retry packet will always be little endian. Hopefully this is
often the case on our Intel machines ;)
Must be backported as far as 2.6.
The 4 bits least significant bits of the first byte in a Retry packet must be
random. There are generated calling statistical_prng_range() with 16 as argument.
Must be backported as far as 2.6.
When a stick-table is defined within a peers section, the name is
prefixed with the peers section name. However when checking for
duplicate table names, the check was using the table name without
the prefix, and would thus never match.
Must be backported as far as 2.6.
This patch introduces the "client-sigalgs" keyword for the server line,
which allows to configure the list of server signature algorithms
negociated during the handshake. Also available as
"ssl-default-server-client-sigalgs" in the global section.
This patch introduces the "sigalgs" keyword for the server line, which
allows to configure the list of server signature algorithms negociated
during the handshake. Also available as "ssl-default-server-sigalgs" in
the global section.
This warning happened in 2.9-dev with commit 723c73f8a ("MEDIUM: mux-h1:
Split h1_process_mux() to make code more readable"). It's the usual gcc
crap that relies on comments to disable the warning but which drops these
comments between the preprocessor and the compiler, so using any split
build system (distcc, ccache etc) reintroduces the warning. Use the more
reliable and portable __fallthrough instead. No backport needed.
Return a more acurate error than the previous patch, CO_ER_SSL_EMPTY is
the code for "Connection closed during SSL handshake" which is more
precise than CO_ER_SSL_ABORT ("Connection error during SSL handshake").
No backport needed.
During a SSL_do_handshake(), SSL_ERROR_ZERO_RETURN can be returned in case
the remote peer sent a close_notify alert. Previously this would set the
connection error to CO_ER_SSL_HANDSHAKE, this patch sets it to
CO_ER_SSL_ABORT to have a more acurate error.
This bug was introduced by this commit which was not sufficient:
BUG/MINOR: quic: Possible endless loop in quic_lstnr_dghdlr()
It was revealed by the blackhole interop runner test with neqo as client.
qc_conn_release() could be called after having locke the CID tree when two different
threads was creating the same connection at the same time. Indeed in this case
the last thread which tried to create a new connection for the same an already existing
CID could not manage to insert an already inserted CID in the connection CID tree.
This was expected. It had to destroy the newly created for nothing connection calling
qc_conn_release(). But this function also locks this tree calling free_quic_conn_cids() leading to a deadlock.
A solution would have been to delete the new CID created from its tree before
calling qc_conn_release().
A better solution is to stop inserting the first CID from qc_new_conn(), and to
insert it into the CID tree only if there was not an already created connection.
This is whas is implemented by this patch.
Must be backported as far as 2.7.
Aurelien Darragon found a case of leak when working on ticket #2184.
When a reexec_on_failure() happens *BEFORE* protocol_bind_all(), the
worker is not fork and the mworker_proc struct is still there with
its 2 socketpairs.
The socketpair that is supposed to be in the master is already closed in
mworker_cleanup_proc(), the one for the worker was suppposed to
be cleaned up in mworker_cleanlisteners().
However, since the fd is not bound during this failure, the fd is never
closed.
This patch fixes the problem by setting the fd to -1 in the mworker_proc
after the fork, so we ensure that this it won't be close if everything
was done right, and then we try to close it in mworker_cleanup_proc()
when it's not set to -1.
This could be triggered with the script in ticket #2184 and a `ulimit -H
-n 300`. This will fail before the protocol_bind_all() when trying to
increase the nofile setrlimit.
In recent version of haproxy, there is a BUG_ON() in fd_insert() that
could be triggered by this bug because of the global.maxsock check.
Must be backported as far as 2.6.
The problem could exist in previous version but the code is different
and this won't be triggered easily without other consequences in the
master.
In GH #2187 it was mentioned that the ifnone-forwardfor regtest
did not cover the case where forwardfor ifnone is explicitly set in
the frontend but forwardfor option is not used in the backend.
Expected behavior in this case is that the frontend takes the precedence
because the backend did not specify the option.
Adding this missing case to prevent regressions in the future.
A regression was introduced in 730b983 ("MINOR: proxy: move 'forwardfor'
option to http_ext")
Indeed, when the forwardfor if-none option is specified on the frontend
but forwardfor is not specified at all on the backend: if-none from the
frontend is ignored.
But this behavior conflicts with the historical one, if-none should only
be ignored if forwardfor is also enabled on the backend and if-none is
not set there.
It should fix GH #2187.
This should be backported in 2.8 with 730b983 ("MINOR: proxy: move
'forwardfor' option to http_ext")
h1_append_chunk_size() and h1_prepend_chunk_crlf() functions were marked as
possibly unused to be able to add them in standalone commits. Now these
functions are used, the __maybe_unused statement can be removed.
When the HTX was introduced, we have lost the support for the kernel
splicing for chunked messages. Thanks to this patch set, it is possible
again. Of course, we still need to keep the H1 parser synchronized. Thus
only the chunk content can be spliced. We still need to read the chunk
envelope using a buffer.
There is no reason to backport this feature. But, just in case, this patch
depends on following patches:
* "MEDIUM: filters/htx: Don't rely on HTX extra field if payload is filtered"
* "MINOR: mux-h1: Add function to prepend the chunk crlf to the output buffer"
* "MINOR: mux-h1: Add function to append the chunk size to the output buffer"
* "REORG: mux-h1: Rename functions to emit chunk size/crlf in the output buffer"
* "MEDIUM: mux-h1: Split h1_process_mux() to make code more readable"
If an HTTP data filter is registered on a channel, we must not rely on the
HTX extra field because the payload may be changed and we cannot predict if
this value will change or not. It is too errorprone to let filters deal with
this reponsibility. So we set it to 0 when payload filtering is performed,
but only if the payload length can be determined. It is important because
this field may be used when data are forwarded. In fact, it will be used by
the H1 multiplexer to be able to splice chunk-encoded payload.
h1_process_mux() function was pretty huge a quite hard to debug. So, the
funcion is split in sub-functions. Each sub-function is responsible to a
part of the message (start-line, headers, payload, trailers...). We are
still relying on a HTTP parser to format the message to be sure to detect
errors. Functionnaly speaking, there is no change. But the code is now more
readable.
Depending on the timing, time to time, the log messages can be mixed. A
client can start and be fully handled by HAProxy (including its log message)
before the log message of the previous client was emitted or received. To
fix the issue, a barrier was added to be sure to eval the "expect" rule on
logs before starting the next client.
It appears that dconv dislikes the "see also" part being on the same line as
the regular paragraph. The beginning of the line does not show up in the
rendered version.
Attempt to fix this by inserting an additional newline which is consistent with
other options.
This bug arrived with this commit:
MINOR: quic: Remove pool_zalloc() from qc_new_conn()
Missing initialization of largest packet number received during a keyupdate phase.
This prevented the keyupdate feature from working and made the keyupdate interop
tests to fail for all the clients.
Furthermore, ->flags from quic_tls_ctx was also not initialized. This could
also impact the keyupdate feature at least.
No backport needed.
Replace a "less than" comparison between two tick variable by a call to tick_is_lt()
in quic_loss_pktns(). This bug could lead to a wrong packet loss detection
when the loss time computed values could wrap. This is the case 20 seconds after
haproxy has started.
Must be backported as far as 2.6.
In ticket #2184, HAProxy is crashing in a BUG_ON() after a lot of reload
when the previous processes did not exit.
Each worker has a socketpair which is a FD in the master, when reloading
this FD still exists until the process leaves. But the global.maxconn
value is not incremented for each of these FD. So when there is too much
workers and the number of FD reaches maxsock, the next FD inserted in
the poller will crash the process.
This patch fixes the issue by increasing the maxsock for each remaining
worker.
Must be backported in every maintained version.
This bug was introduced by this commit:
MINOR: quic: Remove pool_zalloc() from qc_new_conn()
The transport parameters was not initialized. This leaded to a crash when
dumping the received ones from TRACE()s.
Also reset the lengths of the CIDs attached to a quic_conn struct to 0 value
to prevent them from being dumped from traces when not already initialized.
No backport needed.
Replace a call to pool_zalloc() by a call to pool_malloc() into quic_dgram_parse
to allocate quic_rx_packet struct objects.
Initialize almost all the members of quic_rx_packet struct.
->saddr is initialized by quic_rx_pkt_retrieve_conn().
->pnl and ->pn are initialized by qc_do_rm_hp().
->dcid and ->scid are initialized by quic_rx_pkt_parse() which calls
quic_packet_read_long_header() for a long packet. For a short packet,
only ->dcid will be initialized.