Commit Graph

19287 Commits

Author SHA1 Message Date
Aurelien DARRAGON
e5958d0292 BUG/MEDIUM: stats: fix resolvers dump
In ("BUG/MEDIUM: stats: Rely on a local trash buffer to dump the stats"),
we forgot to apply the patch in resolvers.c which provides the
stats_dump_resolvers() function that is involved when dumping with "resolvers"
domain.

As a consequence, resolvers dump was broken because stats_dump_one_line(),
which is used in stats_dump_resolv_to_buffer(), implicitely uses trash_chunk
from stats.c to prepare the dump, and stats_putchk() is then called with
global trash (currently empty) as output data.

Given that trash_dump variable is static and thus only available within stats.c
we change stats_putchk() function prototype so that the function does not take
the output buffer as an argument. Instead, stats_putchk() will implicitly use
the local trash_dump variable declared in stats.c.

It will also prevent further mixups between stats_dump_* functions and
stats_putchk().

This needs to be backported with ("BUG/MEDIUM: stats: Rely on a local trash
buffer to dump the stats")
2023-02-06 07:53:03 +01:00
Aurelien DARRAGON
14656844cc BUG/MINOR: stats: fix source buffer size for http dump
In ("BUG/MINOR: stats: use proper buffer size for http dump"),
we used trash.size as source buffer size before applying the htx
overhead computation.

It is safer to use res->buf.size instead since res_htx (which is <htx> argument
passed to stats_putchk() in http context) is made from res->buf:

in http_stats_io_handler:
    | res_htx = htx_from_buf(&res->buf);

This will prevent the hang bug from showing up again if res->buf.size were to be
less than trash.size (which is set according to tune.bufsize).

This should be backported with ("BUG/MINOR: stats: use proper buffer size for http dump")
2023-02-06 07:53:03 +01:00
Willy Tarreau
e74d77b301 [RELEASE] Released version 2.8-dev3
Released version 2.8-dev3 with the following main changes :
    - BUG/MINOR: sink: make sure to always properly unmap a file-backed ring
    - DEV: haring: add a new option "-r" to automatically repair broken files
    - BUG/MINOR: ssl: Fix leaks in 'update ssl ocsp-response' CLI command
    - MINOR: ssl: Remove debug fprintf in 'update ssl ocsp-response' cli command
    - MINOR: connection: add a BUG_ON() to detect destroying connection in idle list
    - MINOR: mux-quic/h3: send SETTINGS as soon as transport is ready
    - BUG/MINOR: h3: fix GOAWAY emission
    - BUG/MEDIUM: mux-quic: fix crash on H3 SETTINGS emission
    - BUG/MEDIUM: hpack: fix incorrect huffman decoding of some control chars
    - BUG/MINOR: log: release global log servers on exit
    - BUG/MINOR: ring: release the backing store name on exit
    - BUG/MINOR: sink: free the forwarding task on exit
    - CLEANUP: trace: remove the QUIC-specific ifdefs
    - MINOR: trace: add a TRACE_ENABLED() macro to determine if a trace is active
    - MINOR: trace: add a trace_no_cb() dummy callback for when to use no callback
    - MINOR: trace: add the long awaited TRACE_PRINTF()
    - MINOR: h2: add h2_phdr_to_ist() to make ISTs from pseudo headers
    - MEDIUM: mux-h2/trace: add tracing support for headers
    - CLEANUP: mux-h2/trace: shorten the name of the header enc/dec functions
    - DEV: hpack: fix `trash` build regression
    - MINOR: http_htx: add http_append_header() to append value to header
    - MINOR: http_htx: add http_prepend_header() to prepend value to header
    - MINOR: sample: add ARGC_OPT
    - MINOR: proxy: introduce http only options
    - MINOR: proxy/http_ext: introduce proxy forwarded option
    - REGTEST: add ifnone-forwardfor test
    - MINOR: proxy: move 'forwardfor' option to http_ext
    - MINOR: proxy: move 'originalto' option to http_ext
    - MINOR: http_ext: introduce http ext converters
    - MINOR: http_ext: add rfc7239_is_valid converter
    - MINOR: http_ext: add rfc7239_field converter
    - MINOR: http_ext: add rfc7239_n2nn converter
    - MINOR: http_ext: add rfc7239_n2np converter
    - REGTEST: add RFC7239 forwarded header tests
    - OPTIM: http_ext/7239: introduce c_mode to save some space
    - MINOR: http_ext/7239: warn the user when fetch is not available
    - MEDIUM: proxy/http_ext: implement dynamic http_ext
    - MINOR: cfgparse/http_ext: move post-parsing http_ext steps to http_ext
    - DOC: config: fix option spop-check proxy compatibility
    - BUG/MINOR: fcgi-app: prevent 'use-fcgi-app' in default section
    - DOC: config: 'http-send-name-header' option may be used in default section
    - BUG/MINOR: mux-h2: Fix possible null pointer deref on h2c in _h2_trace_header()
    - BUG/MINOR: http_ext/7239: ipv6 dumping relies on out of scope variables
    - BUG/MEDIUM: h3: do not crash if no buf space for trailers
    - OPTIM: h3: skip buf realign if no trailer to encode
    - MINOR: mux-quic/h3: define stream close callback
    - BUG/MEDIUM: h3: handle STOP_SENDING on control stream
    - BUG/MINOR: h3: reject RESET_STREAM received for control stream
    - MINOR: h3: add missing traces on closure
    - BUG/MEDIUM: ssl: wrong eviction from the session cache tree
    - BUG/MINOR: h3: fix crash due to h3 traces
    - BUG/MINOR: h3: fix crash due to h3 traces
    - BUG/MEDIUM: thread: consider secondary threads as idle+harmless during boot
    - BUG/MINOR: stats: use proper buffer size for http dump
    - BUILD: makefile: fix PCRE overriding specific lib path
    - MINOR: quic: remove fin from quic_stream frame type
    - MINOR: quic: ensure offset is properly set for STREAM frames
    - MINOR: quic: define new functions for frame alloc
    - MINOR: quic: refactor frame deallocation
    - MEDIUM: quic: implement a retransmit limit per frame
    - MINOR: quic: add config for retransmit limit
    - OPTIM: htx: inline the most common memcpy(8)
    - CLEANUP: quic: no need for atomics on packet refcnt
    - MINOR: stats: add by HTTP version cumulated number of sessions and requests
    - BUG/MINOR: quic: Possible stream truncations under heavy loss
    - BUG/MINOR: quic: Too big PTO during handshakes
    - MINOR: quic: Add a trace about variable states in qc_prep_fast_retrans()
    - BUG/MINOR: quic: Do not ignore coalesced packets in qc_prep_fast_retrans()
    - MINOR: quic: When probing Handshake packet number space, also probe the Initial one
    - BUG/MAJOR: quic: Possible crash when processing 1-RTT during 0-RTT session
    - MEDIUM: quic: Remove qc_conn_finalize() from the ClientHello TLS callbacks
    - BUG/MINOR: quic: Unchecked source connection ID
    - MEDIUM: listener: move the analysers mask to the bind_conf
    - MINOR: listener: move maxseg and tcp_ut to bind_conf
    - MINOR: listener: move maxaccept from listener to bind_conf
    - MINOR: listener: move the backlog setting from listener to bind_conf
    - MINOR: listener: move the maxconn parameter to the bind_conf
    - MINOR: listener: move the ->accept callback to the bind_conf
    - MINOR: listener: remove the useless ->default_target field
    - MINOR: listener: move the nice field to the bind_conf
    - MINOR: listener: move the NOLINGER option to the bind_conf
    - MINOR: listener: move the NOQUICKACK option to the bind_conf
    - MINOR: listener: move the DEF_ACCEPT option to the bind_conf
    - MINOR: listener: move TCP_FO to bind_conf
    - MINOR: listener: move the ACC_PROXY and ACC_CIP options to bind_conf
    - MINOR: listener: move LI_O_UNLIMITED and LI_O_NOSTOP to bind_conf
    - MINOR: listener: get rid of LI_O_TCP_L4_RULES and LI_O_TCP_L5_RULES
    - CLEANUP: listener: remove the now unused options field
    - MINOR: listener: remove the now useless LI_F_QUIC_LISTENER flag
    - CLEANUP: config: remove test for impossible case regarding bind thread mask
    - MINOR: thread: add a simple thread_set API
    - MEDIUM: listener/config: make the "thread" parser rely on thread_sets
    - CLEANUP: config: stop using bind_tgroup and bind_thread
    - CLEANUP: listener/thread: remove now unused bind_conf's bind_tgroup/bind_thread
    - CLEANUP: listener/config: remove the special case for shards==1
    - MEDIUM: config: restrict shards, not bind_conf to one group each
    - BUG/MEDIUM: quic: do not split STREAM frames if no space
    - BUILD: thread: fix build warnings with older gcc compilers
2023-02-04 10:51:05 +01:00
Willy Tarreau
15c8428060 BUILD: thread: fix build warnings with older gcc compilers
The "{ 0 }" form to initialize an empty structure triggers build warnings
on gcc 4.8, let's use the more common "{ }" instead.
2023-02-04 10:49:01 +01:00
Amaury Denoyelle
f2f08f88ef BUG/MEDIUM: quic: do not split STREAM frames if no space
When building STREAM frames in a packet buffer, if a frame is too large
it will be splitted in two. A shorten version will be used and the
original frame will be modified to represent the remaining space.

To ensure there is enough space to store the frame data length encoded
as a QUIC integer, we use the function max_available_room(). This
function can return 0 if there not only a small space left which is
insufficient for the frame header and the shorten data. Prior to this
patch, this wasn't check and an empty unneeded STREAM frame was built
and sent for nothing.

Change this by checking the value return by max_available_room(). If 0,
do not try to split this frame and continue to the next ones in the
packet.

On 2.6, this patch serves as an optimization which will prevent the building
of unneeded empty STREAM frames.

On 2.7, this behavior has the side-effect of triggering a BUG_ON()
statement on quic_build_stream_frame(). This BUG_ON() ensures that we do
not use quic_frame with OFF bit set if its offset is 0. This can happens
if the condition defined above is reproduced for a STREAM frame at
offset 0. An empty unneeded frame is built as descibed. The problem is
that the original frame is modified with its OFF bit set even if the
offset is still 0.

This must be backported up to 2.6.
2023-02-03 19:19:50 +01:00
Willy Tarreau
50440457e3 MEDIUM: config: restrict shards, not bind_conf to one group each
Now that we're using thread_sets there's no need to restrict an entire
bind_conf to 1 group, the real concern being the FD, we can move that
restriction to the shard only. This means that as long as we have enough
shards and that they're properly aligned on group boundaries (i.e. shards
are an integer divider of the number of threads), we can support "bind"
lines spanning more than one group.

The check is still performed for shards to span more than one group,
and an error is emitted when this happens. But at least now it becomes
possible to have this:

    global
       nbthread 256

    frontend foo
       bind :1111 shards 4
       bind :2222 shards by-thread
2023-02-03 18:00:21 +01:00
Willy Tarreau
484093df80 CLEANUP: listener/config: remove the special case for shards==1
In fact this case is already handled by the regular shards code, there
is no need to special-case it.
2023-02-03 18:00:21 +01:00
Willy Tarreau
f2988e1447 CLEANUP: listener/thread: remove now unused bind_conf's bind_tgroup/bind_thread
Not needed anymore since last commit, let's get rid of it.
2023-02-03 18:00:21 +01:00
Willy Tarreau
e6b88592a8 CLEANUP: config: stop using bind_tgroup and bind_thread
Let's now retrieve the first thread group and its mask from the
thread_set so that we don't need these fields in the bind_conf anymore.
For now we're still limited to the first group (like before) but that
allows to get rid of these fields and to make sure that there's nothing
"special" being done there anymore.
2023-02-03 18:00:21 +01:00
Willy Tarreau
f0de8cacc4 MEDIUM: listener/config: make the "thread" parser rely on thread_sets
Instead of reading and storing a single group and a single mask for a
"thread" directive on a bind line, we now store the complete range in
a thread set that's stored in the bind_conf. The bind_parse_thread()
function now just calls parse_thread_set() to complete the current set,
which starts empty, and thread_resolve_group_mask() was updated to
support retrieving thread group numbers or absolute thread numbers
directly from the pre-filled thread_set, and continue to feed bind_tgroup
and bind_thread. The CLI parsers which were pre-initialized to set the
bind_tgroup to 1 cannot do it anymore as it would prevent one from
restricting the thread set. Instead check_config_validity() now detects
the CLI frontend and passes the info down to thread_resolve_group_mask()
that will automatically use only the group 1's threads for these
listeners. The same is done for the peers listeners for now.

At this step it's already possible to start with all previous valid
configs as well as extended ones supporting comma-delimited thread
sets. In addition the parser already accepts large ranges spanning
multiple groups, but since the underlying listeners infrastructure
is not read, for now we're maintaining a specific check against this
at the higher level of the config validity check.

The patch is a bit large because thread resolution is performed in
multiple steps, so we need to adjust all of them at once to preserve
functional and technical consistency.
2023-02-03 18:00:21 +01:00
Willy Tarreau
bef43dfa60 MINOR: thread: add a simple thread_set API
The purpose is to be able to store large thread sets, defined by ranges
that may cross group boundaries, as well as define lists of groups and
masks. The thread_set struct implements the storage, and the parser is
in parse_thread_set(), with a focus on "bind" lines, but not only.
2023-02-03 18:00:21 +01:00
Willy Tarreau
53c6c673ac CLEANUP: config: remove test for impossible case regarding bind thread mask
During 2.5 development, a fallback was implemented for bind "thread"
directives that would not map to existing threads, with commit e3f4d7496
("MEDIUM: config: resolve relative threads on bind lines to absolute ones").
The approch consisted in remapping the threads to other ones. But now
that relative threads and not absolute threads are stored in this mask,
this case cannot happen anymore, and this confusing hack is not needed
anymore.
2023-02-03 18:00:20 +01:00
Willy Tarreau
9e2682afed MINOR: listener: remove the now useless LI_F_QUIC_LISTENER flag
This flag is only used to tag a QUIC listener, which we now know by
its bind_conf's xprt as well. It's only used to decide whether or not
to perform an extra initialization step on the listener. Let's drop it
as well as the flags field.

With the various fields and options moved, the listener struct reduced
by 48 bytes total.
2023-02-03 18:00:20 +01:00
Willy Tarreau
b25634d23e CLEANUP: listener: remove the now unused options field
All options that made sense were moved to the bind_conf, and remaining
ones were removed. This field isn't used at all anymore. The thr_idx
field was moved there to plug the hole.
2023-02-03 18:00:20 +01:00
Willy Tarreau
4c1d3a953d MINOR: listener: get rid of LI_O_TCP_L4_RULES and LI_O_TCP_L5_RULES
LI_O_TCP_L4_RULES and LI_O_TCP_L5_RULES are only set by from the proxy
based on the presence or absence of tcp_req l4/l5 rules. It's basically
as cheap to check the list as it is to check the flag, except that there
is no need to maintain a copy. Let's get rid of them, and this may ease
addition of more dynamic stuff later.
2023-02-03 18:00:20 +01:00
Willy Tarreau
1714680cec MINOR: listener: move LI_O_UNLIMITED and LI_O_NOSTOP to bind_conf
These two flags are entirely for internal use and are even per proxy
in practice since they're used for peers and CLI to indicate (for the
first one) that the listener(s) are not subject to connection limits,
and for the second that the listener(s) should not be stopped on
soft-stop. No need to keep them in the listeners, let's move them to
the bind_conf under names BC_O_UNLIMITED and BC_O_NOSTOP.
2023-02-03 18:00:20 +01:00
Willy Tarreau
f1b4730f7d MINOR: listener: move the ACC_PROXY and ACC_CIP options to bind_conf
These are only set per bind line and used when creating a sessions,
we can move them to the bind_conf under the names BC_O_ACC_PROXY and
BC_O_ACC_CIP respectively.
2023-02-03 18:00:20 +01:00
Willy Tarreau
c492f1b17f MINOR: listener: move TCP_FO to bind_conf
It's set per bind line ("tfo") and only used in tcp_bind_listener() so
there's no point keeping the address family tests, let's just store the
flag in the bind_conf under the name BC_O_TCP_FO.
2023-02-03 18:00:20 +01:00
Willy Tarreau
d9b4d21248 MINOR: listener: move the DEF_ACCEPT option to the bind_conf
This option is set per bind line, and was only set stored when the
address family is AF_INET4 or AF_INET6. That's pointless since it's
used only in tcp_bind_listener() which is only used for such families
as well, so it can now be moved to the bind_conf under the name
BC_O_DEF_ACCEPT.
2023-02-03 18:00:20 +01:00
Willy Tarreau
9bdcf42922 MINOR: listener: move the NOQUICKACK option to the bind_conf
It solely depends on the bind line so let's move it there under the
name BC_O_NOQUICKACK.
2023-02-03 18:00:20 +01:00
Willy Tarreau
cfb7c2f515 MINOR: listener: move the NOLINGER option to the bind_conf
It's currently declared per-frontend, though it would make sense to
support it per-line but in no case per-listener. Let's move the option
to a bind_conf option BC_O_NOLINGER.
2023-02-03 18:00:20 +01:00
Willy Tarreau
7dbd4187dc MINOR: listener: move the nice field to the bind_conf
This is another bind line setting which can move to the bind_conf.
Note that it leaves a 2-byte hole in the listener struct.
2023-02-03 18:00:20 +01:00
Willy Tarreau
d5983cef80 MINOR: listener: remove the useless ->default_target field
This field is used by stream_new() to optionally set the applet the
stream will connect to for simple proxies like the CLI for example.
But it has never been configurable to anything and is always strictly
equal to the frontend's ->default_target. Let's just drop it and make
stream_new() only use the frontend's. It makes more sense anyway as
we don't want the proxy to work differently based on the "bind" line.
This idea was brought in 1.6 hoping that the h2 implementation would
use applets for decoding (which was dropped after the very first
attempt in 1.8).
2023-02-03 18:00:20 +01:00
Willy Tarreau
3083615410 MINOR: listener: move the ->accept callback to the bind_conf
The accept callback directly derives from the upper layer, generally
it's session_accept_fd(). As such it's also defined per bind line
so it makes sense to move it there.
2023-02-03 18:00:20 +01:00
Willy Tarreau
758c69d951 MINOR: listener: move the maxconn parameter to the bind_conf
The maxconn is set per bind line so let's move it there. This might
possibly even slightly reduce inter-thread contention since this one
is read-mostly and it was stored next to nbconn which changes for
each connection setup or teardown.
2023-02-03 18:00:20 +01:00
Willy Tarreau
1920f897d8 MINOR: listener: move the backlog setting from listener to bind_conf
The backlog setting is also defined by the bind_conf, so let's move
it there.
2023-02-03 18:00:20 +01:00
Willy Tarreau
882f2485a1 MINOR: listener: move maxaccept from listener to bind_conf
Like for previous values, maxaccept is really per-bind_conf, so let's
move it there. Some frontends (peers, log) set it to 1 so the assignment
was slightly moved.
2023-02-03 18:00:20 +01:00
Willy Tarreau
ee378165fb MINOR: listener: move maxseg and tcp_ut to bind_conf
These two arguments were only set and only used with tcpv4/tcpv6. Let's
just store them into the bind_conf instead of duplicating them for all
listeners since they're fixed per "bind" line.
2023-02-03 18:00:20 +01:00
Willy Tarreau
7866e8e50d MEDIUM: listener: move the analysers mask to the bind_conf
When bind_conf were created, some elements such as the analysers mask
ought to have moved there but that wasn't the case. Now that it's
getting clearer that bind_conf provides all binding parameters and
the listener is essentially a listener on an address, it's starting
to get really confusing to keep such parameters in the listener, so
let's move the mask to the bind_conf. We also take this opportunity
for pre-setting the mask to the frontend's upon initalization. Now
several loops have one less argument to take care of.
2023-02-03 18:00:20 +01:00
Frédéric Lécaille
0aa79953c9 BUG/MINOR: quic: Unchecked source connection ID
The SCID (source connection ID) used by a peer (client or server) is sent into the
long header of a QUIC packet in clear. But it is also sent into the transport
parameters (initial_source_connection_id). As these latter are encrypted into the
packet, one must check that these two pieces of information do not differ
due to a packet header corruption. Furthermore as such a connection is unusuable
it must be killed and must stop as soon as possible processing RX/TX packets.

Implement qc_kill_con() to flag a connection as unusable and to kille it asap
waking up the idle timer task to release the connection.

Add a check to quic_transport_params_store() to detect that the SCIDs do not
match and make it call qc_kill_con().

Add several tests about connection to be killed at several critial locations,
especially in the TLS stack callback to receive CRYPTO data from or derive secrets,
and before preparing packet after having received others.

Must be backported to 2.6 and 2.7.
2023-02-03 17:55:55 +01:00
Frédéric Lécaille
af25a69c8b MEDIUM: quic: Remove qc_conn_finalize() from the ClientHello TLS callbacks
This is a bad idea to make the TLS ClientHello callback call qc_conn_finalize().
If this latter fails, this would generate a TLS alert and make the connection
send packet whereas it is not functional. But qc_conn_finalize() job was to
install the transport parameters sent by the QUIC listener. This installation
cannot be done at any time. This must be done after having possibly negotiated
the QUIC version and before sending the first Handshake packets. It seems
the better moment to do that in when the Handshake TX secrets are derived. This
has been found inspecting the ngtcp2 code. Calling SSL_set_quic_transport_params()
too late would make the ServerHello to be sent without the transport parameters.

The code for the connection update which was done from qc_conn_finalize() has
been moved to quic_transport_params_store(). So, this update is done as soon as
possible.

Add QUIC_FL_CONN_TX_TP_RECEIVED to flag the connection as having received the
peer transport parameters. Indeed this is required when the ClientHello message
is splitted between packets.

Add QUIC_FL_CONN_FINALIZED to protect the connection from calling qc_conn_finalize()
more than one time. This latter is called only when the connection has received
the transport parameters and after returning from SSL_do_hanshake() which is the
function which trigger the TLS ClientHello callback call.

Remove the calls to qc_conn_finalize() from from the TLS ClientHello callbacks.

Must be backported to 2.6. and 2.7.
2023-02-03 17:55:55 +01:00
Frédéric Lécaille
8417beb7da BUG/MAJOR: quic: Possible crash when processing 1-RTT during 0-RTT session
This bug was revealed by some C1 interop tests (heavy hanshake packet
corruption) when receiving 1-RTT packets with a key phase update.
This lead the packet to be decrypted with the next key phase secrets.
But this latter is initialized only after the handshake is complete.

In fact, 1-RTT must never be processed before the handshake is complete.
Relying on the "qc->mux_state == QC_MUX_NULL" condition to check the
handshake is complete is wrong during 0-RTT sessions when the mux
is initialized before the handshake is complete.

Must be backported to 2.7 and 2.6.
2023-02-03 17:55:55 +01:00
Frédéric Lécaille
37ed4a3842 MINOR: quic: When probing Handshake packet number space, also probe the Initial one
This is not really a bug fix but an improvement. When the Handshake packet number
space has been detected as needed to be probed, we should also try to probe the
Initial packet number space if there are still packets in flight. Furthermore
we should also try to send up to two datagrams.

Must be backported to 2.6 and 2.7.
2023-02-03 17:55:55 +01:00
Frédéric Lécaille
055e82657e BUG/MINOR: quic: Do not ignore coalesced packets in qc_prep_fast_retrans()
This function is called only when probing only one packet number space
(Handshake) or two times the same one (Application). So, there is no risk
to prepare two times the same frame when uneeded because we wanted to
probe two packet number spaces. The condition "ignore the packets which
has been coalesced to another one" is not necessary. More importantly
the bug is when we want to prepare a Application packet which has
been coalesced to an Handshake packet. This is always the case when
the first Application packet is sent. It is always coalesced to
an Handshake packet with an ACK frame. So, when lost, this first
application packet was never resent. It contains the HANDSHAKE_DONE
frame to confirm the completion of the handshake to the client.

Must be backported to 2.6 and 2.7.
2023-02-03 17:55:55 +01:00
Frédéric Lécaille
6dead91b8a MINOR: quic: Add a trace about variable states in qc_prep_fast_retrans()
This has already been very useful to diagnose retransmission issues.

Must be backported to 2.6 and 2.7.
2023-02-03 17:55:55 +01:00
Frédéric Lécaille
b75eecc874 BUG/MINOR: quic: Too big PTO during handshakes
During the handshake and when the handshake has not been confirmed
the acknowledgement delays reported by the peer may be larger
than max_ack_delay. max_ack_delay SHOULD be ignored before the
handshake is completed when computing the PTO. But the current code considered
the wrong condition "before the hanshake is completed".

Replace the enum value QUIC_HS_ST_COMPLETED by QUIC_HS_ST_CONFIRMED to
fix this issue. In quic_loss.c, the parameter passed to quic_pto_pktns()
is renamed to avoid any possible confusion.

Must be backported to 2.7 and 2.6.
2023-02-03 17:55:55 +01:00
Frédéric Lécaille
dd419461ef BUG/MINOR: quic: Possible stream truncations under heavy loss
This may happen during retransmission of frames which can be splitted
(CRYPTO, or STREAM frames). One may have to split a frame to be
retransmitted due to the QUIC protocol properties (packet size limitation
and packet field encoding sizes). The remaining part of a frame which
cannot be retransmitted must be detached from the original frame it is
copied from. If not, when the really sent part will be acknowledged
the remaining part will be acknowledged too but not sent!

Must be backported to 2.7 and 2.6.
2023-02-03 17:55:55 +01:00
Frédéric Lécaille
9969adbcdc MINOR: stats: add by HTTP version cumulated number of sessions and requests
Add cum_sess_ver[] new array of counters to count the number of cumulated
HTTP sessions by version (h1, h2 or h3).
Implement proxy_inc_fe_cum_sess_ver_ctr() to increment these counter.
This function is called each a HTTP mux is correctly initialized. The QUIC
must before verify the application operations for the mux is for h3 before
calling proxy_inc_fe_cum_sess_ver_ctr().
ST_F_SESS_OTHER stat field for the cumulated of sessions others than
HTTP sessions is deduced from ->cum_sess_ver counter (for all the session,
not only HTTP sessions) from which the HTTP sessions counters are substracted.

Add cum_req[] new array of counters to count the number of cumulated HTTP
requests by version and others than HTTP requests. This new member replace ->cum_req.
Modify proxy_inc_fe_req_ctr() which increments these counters to pass an HTTP
version, 0 special values meaning "other than an HTTP request". This is the case
for instance for syslog.c from which proxy_inc_fe_req_ctr() is called with 0
as version parameter.
ST_F_REQ_TOT stat field compputing for the cumulated number of requests is modified
to count the sum of all the cum_req[] counters.

As this patch is useful for QUIC, it must be backported to 2.7.
2023-02-03 17:55:49 +01:00
Willy Tarreau
1ea5f410ff CLEANUP: quic: no need for atomics on packet refcnt
This is a leftover from the implementation's history, but the
quic_rx_packet and quic_tx_packet ref counts were still atomically
updated. It was found in perf top that the cost of the atomic inc
in quic_tx_packet_refinc() alone was responsible for 1% of the CPU
usage at 135 Gbps. Given that packets are only processed on their
assigned thread we don't need that anymore and this can be replaced
with regular non-atomic operations.

Doing this alone has reduced the CPU usage of qc_do_build_pkt()
from 3.6 to 2.5% and increased the overall bit rate by about 1%.
2023-02-03 13:39:20 +01:00
Willy Tarreau
23aa79d9a9 OPTIM: htx: inline the most common memcpy(8)
On high traffic benchmarks, it's visible the the CPU is dominated by
calls to memcpy(), and many of those come from htx functions. It was
measured that 63% of those coming from htx are made on 8-byte blocks
which really are not worth a call to the function since a single
read-write cycle does it fine.

This commit adds an inline htx_memcpy() function that explicitly
checks for this length and just copies the data without that call.
It's even likely that it could be detected on const sizes, though
that was not done. This is already effective in reducing the number
of calls to memcpy().
2023-02-03 13:39:18 +01:00
Amaury Denoyelle
24d5b72ca9 MINOR: quic: add config for retransmit limit
Define a new configuration option "tune.quic.max-frame-loss". This is
used to specify the limit for which a single frame instance can be
detected as lost. If exceeded, the connection is closed.

This should be backported up to 2.7.
2023-02-03 11:56:46 +01:00
Amaury Denoyelle
e4abb1f2da MEDIUM: quic: implement a retransmit limit per frame
Add a <loss_count> new field in quic_frame structure. This field is set
to 0 and incremented each time a sent packet is declared lost. If
<loss_count> reached a hard-coded limit, the connection is deemed as
failing and is closed immediately with a CONNECTION_CLOSE using
INTERNAL_ERROR.

By default, limit is set to 10. This should ensure that overall memory
usage is limited if a peer behaves incorrectly.

This should be backported up to 2.7.
2023-02-03 11:56:42 +01:00
Amaury Denoyelle
57b3eaa793 MINOR: quic: refactor frame deallocation
Define a new function qc_frm_free() to handle frame deallocation. New
BUG_ON() statements ensure that the deallocated frame is not referenced
by other frame. To support this, all LIST_DELETE() have been replaced by
LIST_DEL_INIT(). This should enforce that frame deallocation is robust.

As a complement, qc_frm_unref() has been moved into quic_frame module.
It is justified as this is a utility function related to frame
deallocation. It allows to use it in quic_pktns_tx_pkts_release() before
calling qc_frm_free().

This should be backported up to 2.7.
2023-02-03 11:55:41 +01:00
Amaury Denoyelle
40c24f1a10 MINOR: quic: define new functions for frame alloc
Define two utility functions for quic_frame allocation :
* qc_frm_alloc() is used to allocate a new frame
* qc_frm_dup() is used to allocate a new frame by duplicating an
  existing one

Theses functions are useful to centralize quic_frame initialization.
Note that pool_zalloc() is replaced by a proper pool_alloc() + explicit
initialization code.

This commit will simplify implementation of the per frame retransmission
limitation. Indeed, a new counter will be added in quic_frame structure
which must be initialized to 0.

This should be backported up to 2.7.
2023-02-03 10:44:26 +01:00
Amaury Denoyelle
1dac018d9f MINOR: quic: ensure offset is properly set for STREAM frames
Care must be taken when reading/writing offset for STREAM frames. A
special OFF bit is set in the frame type to indicate that the field is
present. If not set, it is assumed that offset is 0.

To represent this, offset field of quic_stream structure must always be
initialized with a valid value in regards with its frame type OFF bit.

The previous code has no bug in part because pool_zalloc() is used to
allocate quic_frame instances. To be able to use pool_alloc(), offset is
always explicitely set to 0. If a non-null value is used, OFF bit is set
at the same occasion. A new BUG_ON() statement is added on frame builder
to ensure that the caller has set OFF bit if offset is non null.

This should be backported up to 2.7.
2023-02-03 09:46:55 +01:00
Amaury Denoyelle
2216b0866e MINOR: quic: remove fin from quic_stream frame type
A dedicated <fin> field was used in quic_stream structure. However, this
info is already encoded in the frame type field as specified by QUIC
protocol.

In fact, only code for packet reception used the <fin> field. On the
sending side, we only checked for the FIN bit. To align both sides,
remove the <fin> field and only used the FIN bit.

This should be backported up to 2.7.
2023-02-03 09:46:55 +01:00
Amaury Denoyelle
565e3cc43a BUILD: makefile: fix PCRE overriding specific lib path
PCRE relies on pcre-config binary tool to provide includes/libs paths.
This may generate standard entries such as '/usr/lib' which will
override more specific ones if present before them on the linking step.

This situation was encountered when building with both QuicTLS and PCRE.
This generates a linking error as the default SSL libraries were used
for linking even with correct SSL flags pointing to QuicTLS dirs.

To fix this issue, USE_PCRE and its affiliated options have been moved
at the end of 'use_opts' variable. Indeed, related CFLAGS/LDFLAGS are
concatenated in their order of appearance through the macro
collect_opts_flags (see include/make/options.mk). PCRE in the last
position ensures it won't override specific entries declared before.
2023-02-03 09:42:49 +01:00
Aurelien DARRAGON
5e7ecbec99 BUG/MINOR: stats: use proper buffer size for http dump
In an attempt to fix GH #1873, ("BUG/MEDIUM: stats: Rely on a local trash
buffer to dump the stats") explicitly reduced output buffer size to leave
enough space for htx overhead under http context.

Github user debtsandbooze, who first reported the issue, came back to us
and said he was still able to make the http dump "hang" with the new fix.

After some tests, it became clear that htx_add_data_atonce() could fail from
time to time in stats_putchk(), even if htx was completely empty:

In http context, buffer size is maxed out at channel_htx_recv_limit().
Unfortunately, channel_htx_recv_limit() is not what we're looking for here
because limit() doesn't compute the proper htx overhead.

Using buf_room_for_htx_data() instead of channel_htx_recv_limit() to compute
max "usable" data space seems to be the last piece of work required for
the previous fix to work properly.

This should be backported everywhere the aforementioned commit is.
2023-02-02 17:10:11 +01:00
Aurelien DARRAGON
739281b3d6 BUG/MEDIUM: thread: consider secondary threads as idle+harmless during boot
idle and harmless bits in the tgroup_ctx structure were not explicitly
set during boot.

    | struct tgroup_ctx ha_tgroup_ctx[MAX_TGROUPS] = { };

As the structure is first statically initialized,
.threads_harmless and .threads_idle are automatically zero-
initialized by the compiler.

Unfortulately, this means that such threads are not considered idle
nor harmless by thread_isolate(_full)() functions until they enter
the polling loop (thread_harmless_now() and thread_idle_now() are
respectively called before entering the polling loop)

Because of this, any attempt to call thread_isolate() or thread_isolate_full()
during a startup phase with nbthreads >= 2 will cause thread_isolate to
loop until every secondary threads make it through their first polling loop.

If the startup phase is aborted during boot (ie: "-c" option to check the
configuration), secondary threads may be initialized but will never be started
(ie: they won't enter the polling loop), thus thread_isolate()
could would loop forever in such cases.

We can easily reveal the bug with this patch reproducer:

    |  diff --git a/src/haproxy.c b/src/haproxy.c
    |  index e91691658..0b733f6ee 100644
    |  --- a/src/haproxy.c
    |  +++ b/src/haproxy.c
    |  @@ -2317,6 +2317,10 @@ static void init(int argc, char **argv)
    |   		if (pr || px) {
    |   			/* At least one peer or one listener has been found */
    |   			qfprintf(stdout, "Configuration file is valid\n");
    |  +			printf("haproxy will loop...\n");
    |  +			thread_isolate();
    |  +			printf("we will never reach this\n");
    |  +			thread_release();
    |   			deinit_and_exit(0);
    |   		}
    |   		qfprintf(stdout, "Configuration file has no error but will not start (no listener) => exit(2).\n");

Now we start haproxy with a valid config:
$> haproxy -c -f valid.conf
Configuration file is valid
haproxy will loop...

^C

------------------------------------------------------------------------------

This did not cause any issue so far because no early deinit paths require
full thread isolation. But this may change when new features or requirements
are introduced, so we should fix this before it becomes a real issue.

To fix this, we explicitly assign .threads_harmless and .threads_idle
to .threads_enabled value in thread_map_to_groups() function during boot.
This is the proper place to do this since as long as .threads_enabled is not
explicitly set, its default value is also 0 (zero-initialized by the compiler)

code snippet from thread_isolate() function:
       ulong te = _HA_ATOMIC_LOAD(&ha_tgroup_info[tgrp].threads_enabled);
       ulong th = _HA_ATOMIC_LOAD(&ha_tgroup_ctx[tgrp].threads_harmless);

       if ((th & te) == te)
           break;

Thus thread_isolate(_full()) won't be looping forever in thread_isolate()
even if it were to be used before thread_map_to_groups() is executed.

No backport needed unless this is a requirement.
2023-02-02 08:21:15 +01:00
Amaury Denoyelle
78adb4b451 BUG/MINOR: h3: fix crash due to h3 traces
This commit is identical to the preceeding patch. However, these traces
are from another patch with a different backport scope :
    56a86ddfb9
    MINOR: h3: add missing traces on closure

This must be backported up to 2.7 where above patch is scheduled.
2023-01-31 16:09:47 +01:00