First adjusted some typos in comments inside the function. Second,
change the naming of some variable to reduce confusion.
A special case has been inserted when advance is done inside a GAP block
and this block is the last of the buffer. In this case, the whole buffer
will be emptied, equivalent to a ncb_init() operation.
quic_rx_pkts_del() function removes packets from QUIC RX buffer. In most
cases, the buffer will be emptied after it. In this case, it's useful to
realign it. This will avoid future data wrapping and use of an
unnecessary junk to fill a too small contiguous space.
It is now possible to start an appctx on a thread subset. Some controls were
added here and there. It is forbidden to start a backend appctx on another
thread than the local one. If a frontend appctx is started on another thread
or a thread subset, the applet .init callback function must be defined. This
callback function is responsible to finalize the appctx startup. It can be
performed synchornously. In this case, the appctx is started on the local
thread. It is not really useful but it is valid. Or it can be performed
asynchronously. In this case, .init callback function is called when the
appctx is woken up for the first time. When this happens, the appctx
affinity is set to the current thread to be able to start the session and
the stream.
In the same way than for the tasks, the applets api was changed to be able
to start a new appctx on a thread subset. For now the feature is
disabled. Only appctx_new_here() is working. But it will be possible to
start an appctx on a specific thread or a subset via a mask.
appctx_free_on_early_error() must be used to release a freshly created
frontend appctx if an error occurred during the init stage. It takes care to
release the stream instead of the appctx if it exists. For a backend appctx,
it just calls appctx_free().
appctx_finalize_startup() may be used to finalize the frontend appctx
startup. It is responsible to create the appctx's session and the frontend
conn-stream. On error, it is the caller responsibility to release the
appctx. However, the session is released if it was created. On success, if
an error is encountered in the caller function, the stream must be released
instead of the appctx.
This function should ease the init stage when new appctx is created.
It is just a helper function that call the .init applet callback function,
if it exists. This will simplify a bit the init stage when a new applet is
started. For now, this callback function is only used when a new service is
started.
Applets were moved at the same level than multiplexers. Thus, gradually,
applets code is changed to be less dependent from the stream. With this
commit, the frontend appctx are ready to own the session. It means a
frontend appctx will be responsible to release the session.
When HAProxy is linked to an OpenSSLv3 library, this option can be used
to load a provider during init. You can specify multiple ssl-provider
options, which will be loaded in the order they appear. This does not
prevent OpenSSL from parsing its own configuration file in which some
other providers might be specified.
A linked list of the providers loaded from the configuration file is
kept so that all those providers can be unloaded during cleanup. The
providers loaded directly by OpenSSL will be freed by OpenSSL.
The "http-restrict-req-hdr-names" option can now be set to restrict allowed
characters in the request header names to the "[a-zA-Z0-9-]" charset.
Idea of this option is to not send header names with non-alphanumeric or
hyphen character. It is especially important for FastCGI application because
all those characters are converted to underscore. For instance,
"X-Forwarded-For" and "X_Forwarded_For" are both converted to
"HTTP_X_FORWARDED_FOR". So, header names can be mixed up by FastCGI
applications. And some HAProxy rules may be bypassed by mangling header
names. In addition, some non-HTTP compliant servers may incorrectly handle
requests when header names contain characters ouside the "[a-zA-Z0-9-]"
charset.
When this option is set, the policy must be specify:
* preserve: It disables the filtering. It is the default mode for HTTP
proxies with no FastCGI application configured.
* delete: It removes request headers with a name containing a character
outside the "[a-zA-Z0-9-]" charset. It is the default mode for
HTTP backends with a configured FastCGI application.
* reject: It rejects the request with a 403-Forbidden response if it
contains a header name with a character outside the
"[a-zA-Z0-9-]" charset.
The option is evaluated per-proxy and after http-request rules evaluation.
This patch may be backported to avoid any secuirty issue with FastCGI
application (so as far as 2.2).
quic_rx_strm_frm type was used to buffered STREAM frames received out of
order. Now the MUX is able to deal directly with these frames and
buffered it inside its ncbuf.
Rx has been simplified since the conversion of buffer to a ncbuf. The
old buffer can now be removed. The frms tree is also removed. It was
used previously to stored out-of-order received STREAM frames. Now the
MUX is able to buffer them directly into the ncbuf.
Add a ncbuf for data reception on qcs. Thanks to this, the MUX is able
to buffered all received frame directly into the buffer. Flow control
parameters will be used to ensure there is never an overflow.
This change will simplify Rx path with the future deletion of acked
frames tree previously used for frames out of order.
Redefine the initial local flow-control to enforce by us. Use bufsize as
the maximum offset allowed to be received.
This change is part of an adjustement on the Rx path. Mux buffer will be
converted to a ncbuf. Flow-control parameters must ensure that we never
receive a frame larger than the buffer. With this, all received frames
will be stored in the MUX buffer.
The two functions became exact copies since there's no more special case
for the appctx owner. Let's merge them into a single one, that simplifies
the code.
This one is the pointer to the conn_stream which is always in the
endpoint that is always present in the appctx, thus it's not needed.
This patch removes it and replaces it with appctx_cs() instead. A
few occurences that were using __cs_strm(appctx->owner) were moved
directly to appctx_strm() which does the equivalent.
The former takes a conn_stream still attached to a valid appctx,
which also complicates the termination of the applet. Instead, let's
pass the appctx which already points to the endpoint, this allows us
to properly detach the conn_stream before the call, which is cleaner
and safer.
The mux ->detach() function currently takes a conn_stream. This causes
an awkward situation where the caller cs_detach_endp() has to partially
mark it as released but not completely so that ->detach() finds its
endpoint and context, and it cannot be done later since it's possible
that ->detach() deletes the endpoint. As such the endpoint link between
the conn_stream and the mux's stream is in a transient situation while
we'd like it to be clean so that the mux's ->detach() code can call any
regular function it wants that knows the regular semantics of the
relation between the CS and the endpoint.
A better approach consists in slightly modifying the detach() API to
better match the reality, which is that the endpoint is detached but
still alive and that it's the only part the function is interested in.
As such, this patch modifies the function to take an endpoint there,
and by analogy (or simplicity) does the same for ->attach(), even
though it looks less important there since we're always attaching an
endpoint to a conn_stream anyway. It is possible that in the future
the API could evolve to use more endpoints that provide a bit more
flexibility in the API, but at this point we don't need to go further.
Muxes and applets need to have both a pointer to the endpoint and to the
conn_stream. It would seem more natural that they only have a pointer to
the endpoint (that is always there) and that this one has an optional
pointer to the conn_stream. This would reduce the number of elements to
manipulate in lower level code. In addition, the conn_stream is not much
used from the lower layers (wake and exceptional events mostly).
Wherever we need to report an error, we have an even easier access to
the endpoint than the conn_stream. Let's first adjust the API to use
the endpoint and rename the function accordingly to cs_ep_set_error().
A new function ncb_advance() is implemented. This is used to advance the
buffer head pointer. This will consume the front data while forming a
new gap at the end for future data.
On success NCB_RET_OK is returned. The operation can be rejected if a
too small new gap is formed in front of the buffer.
Define three different ways to proceed insertion. This configures how
overlapping data is treated.
- NCB_ADD_PRESERVE : in this mode, old data are kept during insertion.
- NCB_ADD_OVERWRT : new data will overwrite old ones.
- NCB_ADD_COMPARE : this mode adds a new test in check stage. The
overlapping old and new data must be identical or else the insertion
is not conducted. An error NCB_RET_DATA_REJ is used in this case.
The mode is specified with a new argument to ncb_add() function.
Implement a new function ncb_add() to insert data in ncbuf. This
operation is conducted in two stages. First, a simulation will be run to
ensure that insertion can be proceeded. If a gap is formed, either
before or after the new data, it must be big enough to store its header,
or else the insertion is aborted.
After this check stage, the insertion is conducted block by block with
the function pair ncb_fill_data_blk()/ncb_fill_gap_blk().
A new type ncb_ret is used as a return value. For the moment, only
success or gap-size error is used. It is planned to add new error types
in the future when insertion will be extended.
Relax the constraint for gap storage when this is the last block.
ncb_blk API functions will consider that if a gap is stored near the end
of the buffer, without the space to store its header, the gap will cover
entirely the buffer end.
For these special cases, the gap size/data size are not write/read
inside the gap to prevent an overflow. Such a gap is designed in
functions as "reduced gap" and will be flagged with the value
NCB_BK_F_FIN.
This should reduce the rejection on future add operation when receiving
data in-order. Without reduced gap handling, an insertion would be
rejected if it covers only partially the last buffer bytes, which can be
a very common case.
Implement two new functions to report the total data stored accross the
whole buffer and the data stored at a specific offset until the next gap
or the buffer end.
To facilitate implementation of these new functions and also future
add/delete operations, a new abstraction is introduced : ncb_blk. This
structure represents a block of either data or gap in the buffer. It
simplifies operation when moving forward in the buffer. The first buffer
block can be retrieved via ncb_blk_first(buf). The block at a specific
offset is accessed via ncb_blk_find(buf, off).
This abstraction is purely used in functions but not stored in the ncbuf
structure per-se. This is necessary to keep the minimal memory
footprint.
Define the new type ncbuf. It can be used as a buffer with
non-contiguous data and wrapping support.
To reduce as much as possible the memory footprint, size of data and
gaps are stored in the gaps themselves. This put some limitation on the
buffer usage. A reserved space is present just before the head to store
the size of the first data block. Also, add and delete operations will
be constrained to ensure minimal gap sizes are preserved.
The sizes stored in the gaps are represented by a custom type named
ncb_sz_t. This type is a typedef to easily change it : this has a
direct impact on the maximum buffer size (MAX(ncb_sz_t) - sizeof(ncb_sz_t))
and the minimal gap sizes (sizeof(ncb_sz_t) * 2)).
Currently, it is set to uint32_t.
Add send_stateless_reset() to send a stateless reset packet. It prepares
a packet to build a 1-RTT packet with quic_stateless_reset_token_cpy()
to copy a stateless reset token derived from the cluster secret with
the destination connection ID received as salt.
Also add QUIC_EV_STATELESS_RST new trace event to at least to have a trace
of the connection which are reset.
This function will have to call another one from quic_tls.[ch] soon.
As we do not want to include quic_tls.h from xprt_quic.h because
quic_tls.h already includes xprt_quic.h, let's moving it into
xprt_quic.c.
This is a wrapper function around OpenSSL HKDF API functions to
use the "extract-then-expand" HKDF mode as defined by rfc5869.
This function will be used to derived stateless reset tokens
from secrets ("cluster-secret" conf. keyword) and CIDs (as salts).
It could be usefull to set a ASCII secret which could be used for different
usages. For instance, it will be used to derive QUIC stateless reset tokens.
A ->time_received new member is added to quic_rx_packet to store the time the
packet are received. ->largest_time_received is added the the packet number
space structure to store this timestamp for the packet with a new largest
packet number to be acknowledged. QUIC_FL_PKTNS_NEW_LARGEST_PN new flag is
added to mark a packet number space as having to acknowledged a packet wih a
new largest packet number. In this case, the packet number space ack delay
must be recalculated.
Add quic_compute_ack_delay_us() function to compute the ack delay from the value
of the time a packet was received. Used only when a packet with a new largest
packet number.
The call to quic_dflt_transport_params_cpy() is already first done by
quic_transport_params_init() which is a good thing. But this function was also
called each time we parsed a transport parameters with quic_transport_param_decode(),
re-initializing to default values some of them. The transport parameters concerned
by this bug are the following:
- max_udp_payload_size
- ack_delay_exponent
- max_ack_delay
- active_connection_id_limit
So, let's remove this call to quic_dflt_transport_params_cpy() which has nothing
to do here!
As we do not have any task to be wake up by the poller after sendto() error,
we add an sendto() error counter to the quic_conn struct.
Dump its values from qc_send_ppkts().
This patch adds a lock on the struct dgram_conn to ensure
that an other thread cannot trash a fd or alter its status
while the current thread processing it on for send/receive/connect
operations.
Starting with the 2.4 version this could cause a crash when a DNS
request is failing, setting the FD of the dgram structure to -1. If the
dgram structure is reused after that, a read access to fdtab[-1] is
attempted. The crash was only triggered when compiled with ASAN.
In previous versions the concurrency issue also exists but is less
likely to crash.
This patch must be backported until v2.4 and should be
adapt for v < 2.4.
The statefile before this patch can only parse lines within 512
characters, now as we made the value to 2000, it can support a
line of length of 2kB.
This patch fixes GitHub issue #1530.
It should be backported to all stable releases.
As was first reported by Ilya in issue #1513, compiling with gcc-12
adds warnings about size 0 around each BUG_ON() call due to the
ABORT_NOW() macro that tries to dereference pointer value 1.
The problem is known, seems to be complex inside gcc and could only
be worked around for now by adjusting a pointer limit so that the
warning still catches NULL derefs in the first page but not other
values commonly used in kernels and boot loaders:
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=91f7d7e1b
It's described in more details here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104657https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103768
And some projects had to work around it using various approaches,
some of which are described in the bugs reports above, plus another
one here:
https://mail.coreboot.org/hyperkitty/list/seabios@seabios.org/thread/HLK3BHP2T3FN6FZ46BIPIK3VD5FOU74Z/
In haproxy we can hide it by hiding the pointer in a DISGUISE() macro,
but this forces the pointer to be loaded into a register, so that
register is lost precisely where we want to get the maximum of them.
In our case we purposely use a low-value non-null pointer because:
- it's mandatory that this value fits within an unmapped page and
only the lowest one has this property
- we really want to avoid register loads for the address, as these
will be lost and will complicate the bug analysis, and they tend
to be used for large addresses (i.e. instruction length limit).
- the compiler may decide to optimize away the null deref when it
sees it (seen in the past already)
As such, the current workaround merged in gcc-12 is not effective for
us.
Another approach consists in using pragmas to silently disable
-Warray-bounds and -Wnull-dereference only for this part. The problem
is that pragmas cannot be placed into macros.
The resulting solution consists in defining a forced-inlined function
only to trigger the crash, and surround the dereference with pragmas,
themselves conditionned to gcc >= 5 since older versions don't
understand them (but they don't complain on the dereference at least).
This way the code remains the same even at -O0, without the stack
pointer being modified nor any address register being modified on
common archs (x86 at least). A variation could have been to rely on
__builtin_trap() but it's not everywhere and it behaves differently
on different platforms (undefined opcode or a nasty abort()) while
the segv remains uniform and effective.
This may need to be backported to older releases once users start to
complain about gcc-12 breakage.
The obsolete stats states STAT_ST_* were marked as deprecated with recent
commit 6ef1648dc ("CLEANUP: stats: rename the stats state values an mark
the old ones deprecated"), except that this feature requires gcc 6 and
above. Let's use the macro that depends on this condition instead.
The issue appeared on 2.6-dev9 so no backport is needed.