We're adding a new argument "relaxed" to h2_make_htx_request() so that
we can control its level of acceptance of certain invalid requests at
the proxy level with "option accept-invalid-http-request". The goal
will be to add deactivable checks that are still desirable to have by
default. For now no test is subject to it.
As its name implies, this function checks if a path component has any
forbidden headers starting at the designated location. The goal is to
seek from the result of a successful ist_find_range() for more precise
chars. Here we're focusing on 0x00-0x1F, 0x20 and 0x23 to make sure
we're not too strict at this point.
This looks up the character range <min>..<max> in the input string and
returns a pointer to the first one found. It's essentially the equivalent
of ist_find_ctl() in that it searches by 32 or 64 bits at once, but deals
with a range.
This function is not H2 specific but rather generic to HTTP. We'll
need it in H3 soon, so let's move it to HTTP and rename it to
http_header_has_forbidden_char().
When the connection enters the "connection closing" state after having sent
a datagram with CONNECTION_CLOSE frames inside its packets, a lot of memory
may be freed from quic_conn objects (QUIC connection). This is done allocating
a reduced sized object which keeps enough information to handle the remaining
incoming packets for the connection in "connection closing" state, and to
continue to send again the previous datagram with CONNECTION_CLOSE frames inside
which has already been sent.
Define a new quic_cc_conn struct which represents the connection object after
entering the "connection close" state and after having release the quic_conn
connection object.
Define <pool_head_quic_cc_conn> new pool for these quic_cc_conn struct objects.
Define QUIC_CONN_COMMON structure which is shared between quic_conn struct object
(the connection before entering "connection close" state), and new quic_cc_conn
struct object (the connection after entering "connection close"). So, all the
members inside QUIC_CONN_COMMON may be indifferently dereferenced from a
quic_conn struct or a quic_cc_conn struct pointer.
Implement qc_new_cc_conn() function to allocate such connections in
"connection close" state. This function is responsible of copying the
required information from the original connection (quic_conn) to the remaining
connection (quic_cc_conn). Among others initialization, it redefined the
QUIC packet handler task to quic_cc_conn_io_cb() and the idle timer task
to qc_cc_idle_timer_task(). quic_cc_conn_io_cb() drains the received and
resend the datagram which CONNECTION_CLOSE frame which has already been sent
when entering "connection close" state. qc_cc_idle_timer_task() only releases
the remaining quic_cc_conn struct object.
Modify quic_conn_release() to allocate quic_cc_conn struct objects from the
original connection passed as argument. It does nothing if this original
connection is not in closing state, or if the idle timer has already expired.
Implement quic_release_cc_conn() to release a "connection close" connection.
It is called when its timer expires or if an error occured when sending
a packet from this connection when the peer is no more reachable.
Add "quic_cids" new pool to allocate the ->cids trees of quic_conn objects.
Replace ->cids member of quic_conn objects by pointer to "quic_cids" and
adapt the code consequently. Nothing special.
Add a new pool <pool_head_quic_cc_buf> for buffer used when building datagram
wich CONNECTION_CLOSE frames inside with QUIC_MIN_CC_PKTSIZE(128) as minimum
size.
Add ->cc_buf_area to quic_conn struct to store such buffers.
Add ->cc_dgram_len to store the size of the "connection close" datagrams
and ->cc_buf a buffer struct to be used with ->cc_buf_area as ->area member
value.
Implement qc_get_txb() to be called in place of qc_txb_alloc() to allocate
a struct "quic_cc_buf" buffer when the connection needs an immediate close
or a buffer struct if not.
Modify qc_prep_hptks() and qc_prep_app_pkts() to allow them to use such
"quic_cc_buf" buffer when an immediate close is required.
Move rx.bytes, tx.bytes and tx.prep_bytes quic_conn struct member to
bytes anonymous struct (bytes.rx, bytes.tx and bytes.prep member respectively).
They are moved before being defined into a bytes anonoymous struct common to
a future struct to be defined.
Consequently adapt the code.
Add a BUG_ON() to quic_peer_validated_addr() to check the amplification limit
is respected when it return false(0), i.e. when the connection is not validated.
Implement quic_may_send_bytes() which returns the number of bytes which may be
sent when the connection has not already been validated and call this functions
at several places when this is the case (after having called
quic_peer_validated_addr()).
Furthermore, this patch improves the code maintainability. Some patches to
come will have to rename ->[rt]x.bytes quic_conn struct members.
A HTTP server may provide a complete response even prior receiving the
full request. In this case, RFC 9114 allows the server to abort read
with a STOP_SENDING with error code H3_NO_ERROR.
This scenario was notably reproduced with haproxy and an inactive
server. If the client send a POST request, haproxy may provide a full
HTTP 503 response before the end of the full request.
Fields of sedesc structure were documented in the comment about the
structure itself. It was not really convenient, hard to read, hard to
update. So comments about the fields are moved on the corresponding field
line, as usual.
There is a mechanisme in the H1 and H2 multiplexer to skip the payload when
a response is returned to the client when it must not contain any payload
(response to a HEAD request or a 204/304 response). However, this does not
work when the splicing is used. The H2 multiplexer does not support the
splicing, so there is no issue. But with the mux-h1, when data are sent
using the kernel splicing, the mux on the server side is not aware the
client side should skip the payload. And once the data are put in a pipe,
there is no way to stop the sending.
It is a defect of the current design. This will be easier to deal with this
case when the mux-to-mux forwarding will be implemented. But for now, to fix
the issue, we should add an HTX flag on the start-line to pass the info from
the client side to the server side and be able to disable the splicing in
necessary.
The associated reg-test was improved to be sure it does not fail when the
splicing is configured.
This patch should be backported as far as 2.4..
Move the TX part of the code to quic_tx.c.
Add quic_tx-t.h and quic_tx.h headers for this TX part code.
The definition of quic_tx_packet struct has been move from quic_conn-t.h to
quic_tx-t.h.
Same thing for the TX part:
Move the RX part of the code to quic_rx.c.
Add quic_rx-t.h and quic_rx.h headers for this TX part code.
The definition of quic_rx_packet struct has been move from quic_conn-t.h to
quic_rx-t.h.
Move the code which directly calls the functions of the OpenSSL QUIC API into
quic_ssl.c new C file.
Some code have been extracted from qc_conn_finalize() to implement only
the QUIC TLS part (see quic_tls_finalize()) into quic_tls.c.
qc_conn_finalize() has also been exported to be used from this new quic_ssl.c
C module.
To accelerate the compilation of quic_conn.c file, export the code in relation
with the traces from quic_conn.c to quic_trace.c.
Also add some headers (quic_trace-t.h and quic_trace.h).
This setting which may be used into a "global" section, enables the QUIC listener
bindings when haproxy is compiled with the OpenSSL wrapper. It has no effect
when haproxy is compiled against a TLS stack with QUIC support, typically quictls.
This wrapper needs to have an access to an encoded version of the local transport
parameter (to be sent to the peer). They are provided to the TLS stack thanks to
qc_ssl_compat_add_tps_cb() callback.
These encoded transport parameters were attached to the QUIC connection but
removed by this commit to save memory:
MINOR: quic: Stop storing the TX encoded transport parameters
This patch restores these transport parameters and attaches them again
to the QUIC connection (quic_conn struct), but only when the QUIC OpenSSL wrapper
is compiled.
Implement qc_set_quic_transport_params() to encode the transport parameters
for a connection and to set them into the stack and make this function work
for both the OpenSSL wrapper or any other TLS stack with QUIC support. Its uses
the encoded version of the transport parameters attached to the connection
when compiled for the OpenSSL wrapper, or local parameters when compiled
with TLS stack with QUIC support. These parameters are passed to
quic_transport_params_encode() and SSL_set_quic_transport_params() as before
this patch.
Include haproxy/quic_openssl_compat.h from haproxy/openssl-compat.h when the
compilation of the QUIC openssl wrapper for TLS stacks is enabled with
USE_QUIC_OPENSSLCOMPAT.
Highly inspired from nginx openssl wrapper code.
This wrapper implement this list of functions:
SSL_set_quic_method(),
SSL_quic_read_level(),
SSL_quic_write_level(),
SSL_set_quic_transport_params(),
SSL_provide_quic_data(),
SSL_process_quic_post_handshake()
and SSL_QUIC_METHOD QUIC specific bio method which are also implemented by quictls
to support QUIC from OpenSSL. So, its aims is to support QUIC from a standard OpenSSL
stack without QUIC support. It relies on the OpenSSL keylog feature to retreive
the secrets derived by the OpenSSL stack during a handshake and to pass them to
the ->set_encryption_secrets() callback as this is done by quictls. It makes
usage of a callback (quic_tls_compat_msg_callback()) to handle some TLS messages
only on the receipt path. Some of them must be passed to the ->add_handshake_data()
callback as this is done with quictls to be sent to the peer as CRYPTO data.
quic_tls_compat_msg_callback() callback also sends the received TLS alert with
->send_alert() callback.
AES 128-bits with CCM mode is not supported at this time. It is often disabled by
the OpenSSL stack, but as it can be enabled by "ssl-default-bind-ciphersuites",
the wrapper will send a TLS alerts (Handhshake failure) if this algorithm is
negotiated between the client and the server.
0rtt is also not supported by this wrapper.
The aim of this patch is to allow the building of QUIC datagrams with
as much as packets with different encryption levels inside during handshake.
At this time, this is possible only for at most two encryption levels.
That said, most of the time, a server only needs to use two encryption levels
by datagram, except during retransmissions.
Modify qc_prep_pkts(), the function responsible of building datagrams, to pass
a list of encryption levels as parameter in place of two encryption levels. This
function is also used when retransmitting datagrams. In this case this is a
customized/flexible list of encryption level which is passed to this function.
Add ->retrans new member to quic_enc_level struct, to be used as attach point
to list of encryption level used only during retransmission, and ->retrans_frms
new member which is a pointer to a list of frames to be retransmitted.
This context may be released at the same time as the Initial TLS context.
This is done calling quic_tls_ctx_secs_free() and pool_free() in two code locations.
Implement quic_nictx_free() to do that.
Shorten ->negotiated_ictx quic_conn struct member (->nictx).
This variable is used during version negotiation. Indeed, a connection
may have to support support several QUIC versions of paquets during
the handshake. ->nictx is the QUIC TLS cipher context used for the negotiated
QUIC version.
This patch allows a connection to dynamically allocate this TLS cipher context.
Add a new pool (pool_head_quic_tls_ctx) for such QUIC TLS cipher context object.
Modify qc_new_conn() to initialize ->nictx to NULL value.
quic_tls_ctx_secs_free() frees all the secrets attached to a QUIC TLS cipher context.
Modify it to do nothing if it is called with a NULL TLS cipher context.
Modify to allocate ->nictx from qc_conn_finalize() just before initializing
its secrets. qc_conn_finalize() allocates -nictx only if needed (if a new QUIC
version was negotiated).
Modify qc_conn_release() which release a QUIC connection (quic_conn struct) to
release ->nictx TLS cipher context.
There is no need to keep an encoded version of the QUIC listener transport
parameters attache to the connection.
Remove ->enc_params and ->enc_params_len member of quic_conn struct.
Use variables to build the encoded transport parameter local to
ha_quic_set_encryption_secrets() before they are passed to
SSL_set_quic_transport_params().
Modify qc_ssl_sess_init() prototype. It was expected to be used with
the encoded transport parameters as passed parameter, but they were not
used. Cleanup this function.
parse_cpu_set() stopped returning the undocumented -1 which was a
leftover from an earlier attempt, changed from ulong to int since
it only returns a success/failure and no more a mask. Thus it must
not return -1 and its callers must only test for != 0, as is
documented.
This field used to store the cpumap of the first thread in a group, and
was used till 2.4 to hold some default settings, after which it was no
longer used. Let's just drop it.
We're currently having a problem with the porting from cpu_map from
processes to thread-groups as it happened in 2.7 with commit 5b09341c0
("MEDIUM: cpu-map: replace the process number with the thread group
number"), though it seems that it has deeper roots even in 2.0 and
that it was progressively made worng over time.
The issue stems in the way the per-process and per-thread cpu-sets were
employed over time. Originally only processes were supported. Then
threads were added after an optional "/" and it was documented that
"cpu-map 1" is exactly equivalent to "cpu-map 1/all" (this was clarified
in 2.5 by commit 317804d28 ("DOC: update references to process numbers
in cpu-map and bind-process").
The reality is different: when processes were still supported, setting
"cpu-map 1" would apply the mask to the process itself (and only when
run in the background, which is not documented either and is also a
bug for another fix), and would be combined with any possible per-thread
mask when calculating the threads' affinity, possibly resulting in empty
sets. However, "cpu-map 1/all" would only set the mask for the threads
and not the process. As such the following:
cpu-map 1 odd
cpu-map 1/1-8 even
would leave no CPU while doing:
cpu-map 1/all odd
cpu-map 1/1-8 even
would allow all CPUs.
While such configs are very unlikely to ever be met (which is why this
bug is tagged minor), this is becoming quite more visible while testing
automatic CPU binding during 2.9 development because due to this bug
it's much more common to end up with incorrect bindings.
This patch fixes it by simply removing the .proc entry from cpu_map and
always setting all threads' maps. The process is no longer arbitrarily
bound to the group 1's mask, but in case threads are disabled, we'll
use thread 1's mask since it contains the configured CPUs.
This fix should be backported at least to 2.6, but no need to insist if
it resists as it's easier to break cpu-map than to fix an unlikely issue.
Since we'll soon want to adjust the "thread-groups" degree of freedom
based on the presence of cpu-map, we first need to be able to detect
if cpu-map was used. This function scans all cpu-map sets to detect if
any is present, and returns true accordingly.
lua_yieldk ctx argument is of type lua_KContext which is typedefed to
intptr_t when available so it can be used to store pointers.
But the wrapper function hlua_yieldk() passes it as a regular it so it
breaks that promise.
Changing hlua_yieldk() prototype so that ctx argument is of type
lua_KContext.
This bug had no functional impact because ctx argument is not being
actively used so far. This may be backported to all stable versions
anyway.
The internal tick clock was used to export the timestamp int the token
on retry packets. Doing this in cluster mode the nodes don't
understand the timestamp from tokens generated by others.
This patch re-work this using the the real current date (wall-clock time).
Timestamp are also now considered in secondes instead of milleseconds.
This patch should be backported until v2.6
sink_write() currently relies on sink->maxlen to know when to stop
writing a given payload.
But it could be useful to pass a smaller, explicit value to sink_write()
to stop before the ring maxlen, for instance if the ring is shared between
multiple feeders.
sink_write() now takes an optional maxlen parameter:
if maxlen is > 0, then sink_write will stop writing at maxlen if maxlen
is smaller than ring->maxlen, else only ring->maxlen will be considered.
[for haproxy <= 2.7, patch must be applied by hand: that is:
__sink_write() and sink_write() should be patched to take maxlen into
account and function calls to sink_write() should use 0 as second argument
to keep original behavior]
Thierry Fournier reported an annoying side-effect when using the debug()
converter.
Consider the following examples:
[1] http-request set-var(txn.test) bool(true),ipmask(24)
[2] http-request redirect location /match if { bool(true),ipmask(32) }
When starting haproxy with [1] example we get:
config : parsing [test.conf:XX] : error detected in frontend 'fe' while parsing 'http-request set-var(txn.test)' rule : converter 'ipmask' cannot be applied.
With [2], we get:
config : parsing [test.conf:XX] : error detected in frontend 'fe' while parsing 'http-request redirect' rule : error in condition: converter 'ipmask' cannot be applied in ACL expression 'bool(true),ipmask(32)'.
Now consider the following examples which are based on [1] and [2]
but with the debug() sample conv inserted in-between those incompatible
sample types:
[1*] http-request set-var(txn.test) bool(true),debug,ipmask(24)
[2*] http-request redirect location /match if { bool(true),debug,ipmask(32) }
According to the documentation, "it is safe to insert the debug converter
anywhere in a chain, even with non-printable sample types".
Thus we don't expect any side-effect from using it within a chain.
However in current implementation, because of debug() returning SMP_T_ANY
type which is a pseudo type (only resolved at runtime), the sample
compatibility checks performed at parsing time are completely uneffective.
(haproxy will start and no warning will be emitted)
The indesirable effect of this is that debug() prevents haproxy from
warning you about impossible type conversions, hiding potential errors
in the configuration that could result to unexpected evaluation or logic
while serving live traffic. We better let haproxy warn you about this kind
of errors when it has the chance.
With our previous examples, this could cause some inconveniences. Let's
say for example that you are testing a configuration prior to deploying
it. When testing the config you might want to use debug() converter from
time to time to check how the conversion chain behaves.
Now after deploying the exact same conf, you might want to remove those
testing debug() statements since they are no longer relevant.. but
removing them could "break" the config and suddenly prevent haproxy from
starting upon reload/restart. (this is the expected behavior, but it
comes a bit too late because of debug() hiding errors during tests..)
To fix this, we introduce a new output type for sample expressions:
SMP_T_SAME - may only be used as "expected" out_type (parsing time)
for sample converters.
As it name implies, it is a way for the developpers to indicate that the
resulting converter's output type is guaranteed to match the type of the
sample that is presented on the converter's input side.
(converter may alter data content, but data type must not be changed)
What it does is that it tells haproxy that if switching to the converter
(by looking at the converter's input only, since outype is SAME) is
conversion-free, then the converter type can safely be ignored for type
compatibility checks within the chain.
debug()'s out_type is thus set to SMP_T_SAME instead of ANY, which allows
it to fully comply with the doc in the sense that it does not impact the
conversion chain when inserted between sample items.
Co-authored-by: Thierry Fournier <thierry.f.78@gmail.com>
SMP_T_ADDR support was added in b805f71 ("MEDIUM: sample: let the cast
functions set their output type").
According to the above commit, it is made clear that the ADDR type is
a pseudo/generic type that may be used for compatibility checks but
that cannot be emitted from a fetch or converter.
With that in mind, all conversions from ADDR to other types were
explicitly disabled in the compatibility matrix.
But later, when map_*_ip functions were updated in b2f8f08 ("MINOR: map:
The map can return IPv4 and IPv6"), we started using ADDR as "expected"
output type for converters. This still complies with the original
description from b805f71, because it is used as the theoric output
type, and is never emitted from the converters themselves (only "real"
types such as IPV4 or IPV6 are actually being emitted at runtime).
But this introduced an ambiguity as well as a few bugs, because some
compatibility checks are being performed at config parse time, and thus
rely on the expected output type to check if the conversion from current
element to the next element in the chain is theorically supported.
However, because the compatibility matrix doesn't support ADDR to other
types it is currently impossible to use map_*_ip converters in the middle
of a chain (the only supported usage is when map_*_ip converters are
at the very end of the chain).
To illustrate this, consider the following examples:
acl test str(ok),map_str_ip(test.map) -m found # this will work
acl test str(ok),map_str_ip(test.map),ipmask(24) -m found # this will raise an error
Likewise, stktable_compatible_sample() check for stick tables also relies
on out_type[table_type] compatibility check, so map_*_ip cannot be used
with sticktables at the moment:
backend st_test
stick-table type string size 1m expire 10m store http_req_rate(10m)
frontend fe
bind localhost:8080
mode http
http-request track-sc0 str(test),map_str_ip(test.map) table st_test # raises an error
To fix this, and prevent future usage of ADDR as expected output type
(for either fetches or converters) from introducing new bugs, the
ADDR=>? part of the matrix should follow the ANY type logic.
That is, ADDR, which is a pseudo-type, must be compatible with itself,
and where IPV4 and IPV6 both support a backward conversion to a given
type, ADDR must support it as well. It is done by setting the relevant
ADDR entries to c_pseudo() in the compatibility matrix to indicate that
the operation is theorically supported (c_pseudo() will never be executed
because ADDR should not be emitted: this only serves as a hint for
compatibility checks at parse time).
This is what's being done in this commit, thanks to this the broken
examples documented above should work as expected now, and future
usage of ADDR as out_type should not cause any issue.
This function is used for ANY=>!ANY conversions in the compatibility
matrix to help differentiate between real NOOP (c_none) and pseudo
conversions that are theorically supported at config parse time but can
never occur at runtime,. That is, to explicit the fact that actual related
runtime operations (e.g.: ANY->IPV4) are not NOOP since they might require
some conversion to be performed depending on the input type.
When checking the conf we don't know the effective out types so
cast[pseudo type][pseudo type] is allowed in the compatibility matrix,
but at runtime we only expect cast[real type][(real type || pseudo type)]
because fetches and converters may not emit pseudo types, thus using
c_none() everywhere was too ambiguous.
The process will crash if c_pseudo() is invoked to help catch bugs:
crashing here means that a pseudo type has been encountered on a
converter's input at runtime (because it was emitted earlier in the
chain), which is not supported and results from a broken sample fetch
or converter implementation. (pseudo types may only be used as out_type
in sample definitions for compatibility checks at parsing time)
Both sample_parse_expr() and parse_acl_expr() implement some code
logic to parse sample conv list after respective fetch or acl keyword.
(Seems like the acl one was inspired by the sample one historically)
But there is clearly code duplication between the two functions, making
them hard to maintain.
Hopefully, the parsing logic between them has stayed pretty much the
same, thus the sample conv parsing part may be moved in a dedicated
helper parsing function.
This is what's being done in this commit, we're adding the new function
sample_parse_expr_cnv() which does a single thing: parse the converters
that are listed right after a sample fetch keyword and inject them into
an already existing sample expression.
Both sample_parse_expr() and parse_acl_expr() were adapted to now make
use of this specific parsing function and duplicated code parts were
cleaned up.
Although sample_parse_expr() remains quite complicated (numerous function
arguments due to contextual parsing data) the first goal was to get rid of
code duplication without impacting the current behavior, with the added
benefit that it may allow further code cleanups / simplification in the
future.