The "bind" parsing code was duplicated for the peers section and as a
result it wasn't kept updated, resulting in slightly different error
behavior (e.g. errors were not freed, warnings were emitted as alerts)
Let's first unify it into a new dedicated function that properly reports
and frees the error.
There's been some great confusion between proto_type, ctrl_type and
sock_type. It turns out that ctrl_type was improperly chosen because
it's not the control layer that is of this or that type, but the
transport layer, and it turns out that the transport layer doesn't
(normally) denaturate the underlying control layer, except for QUIC
which turns dgrams to streams. The fact that the SOCK_{DGRAM|STREAM}
set of values was used added to the confusion.
Let's replace it with xprt_type which reuses the later introduced
PROTO_TYPE_* values, and update the comments to explain which one
works at what level.
Just trying "quic4@:4433" with USE_QUIC not set rsults in such a cryptic
error:
[ALERT] (14610) : config : parsing [quic-mini.cfg:44] : 'bind' : unsupported protocol family 2 for address 'quic4@:4433'
Let's at least add the stream and datagram statuses to indicate what was
being looked for:
[ALERT] (15252) : config : parsing [quic-mini.cfg:44] : 'bind' : unsupported stream protocol for datagram family 2 address 'quic4@:4433'
Still not very pretty but gives a little bit more info.
In case the str2listener() parser reports a generic error with no message
when parsing the argument of a "bind" statement in a "peers" section, the
reported error indicates an invalid address on the empty arg. This has
existed since 2.0 with commit 355b2033e ("MINOR: cfgparse: SSL/TLS binding
in "peers" sections."), so this must be backported till 2.0.
As specified by the RFC reception of different STREAM data for the same
offset should be treated with a CONNECTION_CLOSE with error
PROTOCOL_VIOLATION.
Use ncbuf API to detect this case : if add operation fails with
NCB_RET_DATA_REJ with add mode NCB_ADD_COMPARE.
Send a CONNECTION_CLOSE on reception of a STREAM frame for a STREAM id
exceeding the maximum value enforced. Only implemented for bidirectional
streams for the moment.
Send a CONNECTION_CLOSE if the peer emits more data than authorized by
our flow-control. This is implemented for both stream and connection
level.
Fields have been added in qcc/qcs structures to differentiate received
offsets for limit enforcing with consumed offsets for sending of
MAX_DATA/MAX_STREAM_DATA frames.
Define an API to easily set a CONNECTION_CLOSE. This will mainly be
useful for the MUX when an error is detected which require to close the
whole connection.
On the MUX side, a new flag is added when a CONNECTION_CLOSE has been
prepared. This will disable add future send operations.
We rely on <conn_opening> stats counter and tune.quic.retry_threshold
setting to dynamically start sending Retry packets. We continue to send such packets
when "quic-force-retry" setting is set. The difference is when we receive tokens.
We check them regardless of this setting because the Retry could have been
dynamically started. We must also send Retry packets when we receive Initial
packets without token if the dynamic Retry threshold was reached but only for connection
which are not currently opening or in others words for Initial packets without
connection already instantiated. Indeed, we must not send Retry packets for all
Initial packets without token. For instance a client may have already sent an
Initial packet without receiving Retry packet because the Retry feature was not
started, then the Retry starts on exeeding the threshold value due to others
connections, then finally our client decide to send another Initial packet
(to ACK Initial CRYPTO data for instance). It does this without token. So, for
this already existing connection we must not send a Retry packet.
This QUIC specific keyword may be used to set the theshold, in number of
connection openings, beyond which QUIC Retry feature will be automatically
enabled. Its default value is 100.
First commit to handle the QUIC stats counters. There is nothing special to say
except perhaps for ->conn_openings which is a gauge to count the number of
connection openings. It is incremented after having instantiated a quic_conn
struct, then decremented when the handshake was successful (handshake completed
state) or failed or when the connection timed out without reaching the handshake
completed state.
Move the code which finalizes the QUIC connections initialisations after
having called qc_new_conn() into this function to benefit from its
error handling to release the memory allocated for QUIC connections
the initialization of which could not be finalized.
If we add a new stats module to C source files including only
stats.h we get these errors:
include/haproxy/stats.h:39:31: error: array type has incomplete element type
‘struct name_desc’
39 | extern const struct name_desc stat_fields[];
include/haproxy/stats.h:55:50: warning: ‘struct listener’ declared inside
parameter list will not be visible outside of this definition or declaration
55 | int stats_fill_li_stats(struct proxy *px, struct listener *l, int flags,
name_desc struct is defined in tools-t.h and listener struct in listner-t.h.
Here is the format of a token:
- format (1 byte)
- ODCID (from 9 up 21 bytes)
- creation timestamp (4 bytes)
- salt (16 bytes)
A format byte is required to distinguish the Retry token from others sent in
NEW_TOKEN frames.
The Retry token is ciphered after having derived a strong secret from the cluster secret
and generated the AEAD AAD, as well as a 16 bytes long salt. This salt is
added to the token. Obviously it is not ciphered. The format byte is not
ciphered too.
The AAD are built by quic_generate_retry_token_aad() which concatenates the version,
the client SCID and the IP address and port. We had to implement quic_saddr_cpy()
to copy the IP address and port to the AAD buffer. Only the Retry SCID is generated
on our side to build a Retry packet, the others fields come from the first packet
received by the client. It must reuse this Retry SCID in response to our Retry packet.
So, we have not to store it on our side. Everything is offloaded to the client (stateless).
quic_generate_retry_token() must be used to generate a Retry packet. It calls
quic_pkt_encrypt() to cipher the token.
quic_generate_retry_check() must be used to check the validity of a Retry token.
It is able to decipher a token which arrives into an Initial packet in response
to a Retry packet. It calls parse_retry_token() after having deciphered the token
to store the ODCID into a local quic_cid struct variable. Finally this ODCID may
be stored into the transport parameter thanks to qc_lstnr_params_init().
The Retry token lifetime is 10 seconds. This lifetime is also checked by
quic_generate_retry_check(). If quic_generate_retry_check() fails, the received
packet is dropped without anymore packet processing at this time.
This function does exactly the same thing as quic_tls_decrypt(), except that
it does reuse its input buffer as output buffer. This is needed
to decrypt the Retry token without modifying the packet buffer which
contains this token. Indeed, this would prevent us from decryption
the packet itself as the token belong to the AEAD AAD for the packet.
This function must be used to derive strong secrets from a non pseudo-random
secret (cluster-secret setting in our case) and an IV. First it call
quic_hkdf_extract_and_expand() to do that for a temporary strong secret (tmpkey)
then two calls to quic_hkdf_expand() reusing this strong temporary secret
to derive the final strong secret and IV.
In issue #1563, Coverity reported a very interesting issue about a
possible UAF in the config parser if the config file ends in with a
very large line followed by an empty one and the large one causes an
allocation failure.
The issue essentially is that we try to go on with the next line in case
of allocation error, while there's no point doing so. If we failed to
allocate memory to read one config line, the same may happen on the next
one, and blatantly dropping it while trying to parse what follows it. In
the best case, subsequent errors will be incorrect due to this prior error
(e.g. a large ACL definition with many patterns, followed by a reference of
this ACL).
Let's just immediately abort in such a condition where there's no recovery
possible.
This may be backported to all versions once the issue is confirmed to be
addressed.
Thanks to Ilya for the report.
The local and remote TPs were both processed through the same function
quic_transport_params_init(). This caused the remote TPs to be
overwritten with values configured for our local usage.
Change this by reserving quic_transport_params_init() only for our local
TPs. Remote TPs are simply initialized via
quic_dflt_transport_params_cpy().
This bug could result in a connection closed in error by the client due
to a violation of its TPs. For example, curl client closed the
connection after receiving too many CONNECTION_ID due to an invalid
active_connection_id value used.
If an unlisted errno is reported, abort the process. If a crash is
reported on this condition, we must determine if the error code is a
bug, should interrupt emission on the fd or if we can retry the syscall.
If sendto returns an error, we should not retry the call and break from
the sending loop. An exception is made for EINTR which allows to retry
immediately the syscall.
This bug caused an infinite loop reproduced when the process is in the
closing state by SIGUSR1 but there is still QUIC data emission left.
Instead of using the health-check to subscribe to read/write events, we now
rely on the conn-stream. Indeed, on the server side, the conn-stream's
endpoint is a multiplexer. Thus it seems appropriate to handle subscriptions
for read/write events the same way than for the streams. Of course, the I/O
callback function is not the same. We use srv_chk_io_cb() instead of
cs_conn_io_cb().
event_srv_chk_io() function is renamed srv_chk_io_cb() to be consistant with
the I/O callback function of connections. In addition, this function is
exported. It will be required to use the conn-stream's subscriptions.
The connection is already closed by the health-check itself. Thus there is
now reason to duplicate this part in the .wake callback function. It is
enough to wake the health-check and wait.
The buffer wait list is used to deal with buffer allocation failure. But at
the end of health-check, it must be reinitialized. There is no reason to
reason to get a buffer between two health-check runs. And in fact, the
associated flags, CHK_ST_IN_ALLOC and CHK_ST_OUT_ALLOC, are already cleared
at the end of a health-check.
This patch must be backported as far as 2.2. On the 2.2, MT_LIST_ADDED and
MT_LIST_DEL must be used instead of LIST_INLIST and LIST_DEL_INIT.
When the line parsing failed because outline buffer must be reallocated, if
my_realloc2() call fails, the buffer size must be reset. Indeed, in this case
the current line is skipped, a fatal error is reported and we jump to the next
line. At this stage the outline buffer is NULL. If the buffer size is not reset,
the next call to parse_line() crashes because we try to write in the buffer. We
fail to detect the outline buffer is too small to copy any character.
To fix the issue, outlinesize variable must be set to 0 when outline allocation
failed.
This patch should fix the issue #1563. It must be backported as far as 2.2.
Release the QCS RX buffer if emptied afer qcs_consume(). This improves
memory usage and avoids a QCS to keep an allocated buffer, particularly
when no data is received anymore. Buffer is automatically reallocated if
needed via qc_get_ncbuf().
qc_free_ncbuf() may now be used with a NCBUF_NULL buffer as parameter.
This is useful when using this function on a QCS with no allocated
buffer. This case was not reproduced for the moment, but it will soon
become more present as buffers will be released if emptied.
Also a call to offer_buffers() is added to conform with the dynamic
buffer management of haproxy.
This commit is similar to the previous one but deals with MAX_DATA for
connection-level data flow control. It uses the same function
qcc_consume_qcs() to update flow control level and generate a MAX_DATA
frame if needed.
Send MAX_STREAM_DATA frames when at least half of the allocated
flow-control has been demuxed, frame and cleared. This is necessary to
support QUIC STREAM with received data greater than a buffer.
Transcoders must use the new function qcc_consume_qcs() to empty the QCS
buffer. This will allow to monitor current flow-control level and
generate a MAX_STREAM_DATA frame if required. This frame will be emitted
via qc_io_cb().
Adjust the mechanism for MAX_STREAMS_BIDI emission. When a bidirectional
stream is removed, current flow-control level is checked. If needed, a
MAX_STREAMS_BIDI frame is generated and inserted in a new list in the
QCS instance. The new frames will be emitted at the start of qc_send().
This has no impact on the current MAX_STREAMS_BIDI behavior. However,
this mechanism is more flexible and will allow to implement quickly
MAX_STREAM_DATA/MAX_DATA emission.
Slightly change the interface for qcc_recv() between MUX and XPRT. The
MUX is now responsible to call qcc_decode_qcs(). This is cleaner as now
the XPRT does not have to deal with an extra QCS parameter and the MUX
will call qcc_decode_qcs() only if really needed.
This change is possible since there is no extra buffering for
out-of-order STREAM frames and the XPRT does not have to handle buffered
frames.
Previously, qc_io_cb() of mux-quic only dealt with TX. Add support for
RX in it. This is done through a new function qc_recv(qcc). It loops
over all QCS instances and call qcc_decode_qcs(qcs).
This has no impact from the quic-conn layer as qcc_decode_qcs(qcs) is
called directly. However, this allows to have a resume point when demux
is blocked on the upper layer HTX full buffer.
Note that for the moment, only RX for bidirectional streams is managed
in qc_io_cb(). Unidirectional streams use their own mechanism for both
TX/RX. It should be unified in the near future in a refactoring.
Flag QCS if HTX buffer is full on demux. This will block all future
operations on QCS demux and should limit unnecessary decode_qcs() calls.
The flag is cleared on rcv_buf operation called by conn-stream.
Previously, H3 demuxer refused to proceed the payload if the frame was
not entirely received and the QCS buffer is not full. This code was
duplicated from the H2 demuxer.
In H2, this is a justified optimization as only one frame at a time can
be demuxed. However, this is not the case in H3 with interleaved frames
in the lower layer QUIC STREAM frames.
This condition is now removed. H3 demuxer will proceed payload as soon
as possible. An exception is kept for HEADERS frame as the code is not
able to deal with partial HEADERS.
With this change, H3 demuxer should consume less memory. To ensure that
we never received a HEADER bigger than the RX buffer, we should use the
H3 SETTINGS_MAX_FIELD_SECTION_SIZE.
This commit is an adaptation from the following patch :
commit d0de677682
Author: Willy Tarreau <w@1wt.eu>
Date: Fri Feb 4 09:05:37 2022 +0100
BUG/MINOR: mux-h2: update the session's idle delay before creating the stream
This should fix the incorrect timeouts present in httplog format for
QUIC requests.
First adjusted some typos in comments inside the function. Second,
change the naming of some variable to reduce confusion.
A special case has been inserted when advance is done inside a GAP block
and this block is the last of the buffer. In this case, the whole buffer
will be emptied, equivalent to a ncb_init() operation.
ncb_is_empty() was plainly incorrect as it directly dereferences the
memory to read offset blocks instead of ncb_read_off(). The result is
undefined.
Also, BUG_ON() statement is wrong when the buffer starts with a data
block. In this case, ncb_head() is not the first gap offset but instead
just random data. The calculated sum in BUG_ON() statement has thus no
meaning and may cause an abort. Adjust this by reorganizing the whole
function. Only the first data block size is read. If and only if not
nul, the first gap size is then checked.
ncb_is_full() has been rewritten to share the same model as
ncb_is_empty().
quic_rx_pkts_del() function removes packets from QUIC RX buffer. In most
cases, the buffer will be emptied after it. In this case, it's useful to
realign it. This will avoid future data wrapping and use of an
unnecessary junk to fill a too small contiguous space.
The quic-conn manages a buffer to store received QUIC packets. When the
buffer wraps, the gap is filled until the end with junk and packets can
be inserted at the start of the buffer.
On the other end, deletion is implemented via quic_rx_pkts_del().
Packets are removed one by one if their refcount is nul. If junk is
found, the buffer is emptied until its wrap.
This seems to work in most cases but a bug was found in a particular
case : on insertion if buffer gap is not at the end of the buffer. In
this case, the gap was filled, which is useless as now the buffer is
full and the packet cannot be inserted. Worst, on deletion, when junk is
removed there is a risk to removed new packets. This can happens in the
following case :
1. buffer contig space is too small, junk is inserted in the middle of
it
2. on quic_rx_pkts_del() invocation, a packet is removed, but not the
next one because its refcount is still positive. When a new packet is
received, it will be stored after the junk.
3. on next quic_rx_pkts_del(), when junk is removed, all contig data is
cleared, with newer packets data too.
This will cause a transfer between a client and haproxy to be stalled.
This can be reproduced with big enough POST requests. I triggered it
with ngtcp2 and 10M of posted data.
Hopefully, the solution of this bug is simple. If contig space is not
big enough to store a packet, but the space is not at the end of the
buffer, no junk is inserted and the packet is dropped as we cannot
buffered it. This ensures that junk is only present at the end of the
buffer and when removed no packets data is purged with it.
In httpclient_applet_init() function, ss_dst variable is always defined
before the call to sockaddr_alloc(). There is no reason to test it.
This patch should fix the issue #1706.
labels used in goto statement was not called in the right order. Thus if
there is an error during the appctx startup, it is possible to leak a task.
This patch should fix the issue #1703. No backport needed.