Commit Graph

23061 Commits

Author SHA1 Message Date
Aurelien DARRAGON
3ba924a4da MINOR: action: add do-log action
Thanks to the two previous commits, we can now expose the do-log action
on all available action contexts, including the new quic-init context.

Each context is responsible for exposing the do-log action by registering
the relevant log steps, saving the idendifier, and then store it in the
rule's context so that do_log_action() automatically uses it to produce
the log during runtime.

To use the feature, it is simply needed to use "do-log" (without argument)
on an action directive, example:

   tcp-request connection do-log

As mentioned before, each context where the action is exposed has its own
log step identifier. Currently known identifiers are:

  quic-initial:           quic-init
  tcp-request connection: tcp-req-conn
  tcp-request session:    tcp-req-sess
  tcp-request content:    tcp-req-cont
  tcp-response content:   tcp-res-cont
  http-request:           http-req
  http-response:          http-res
  http-after-response:    http-after-res

Thus, these "additional" logging steps can be used as-is under log-profile
section (after "on" keyword). However, although the parser will accept
them, it makes no sense to use them with the "log-steps" proxy keyword,
since the only path for these origins to trigger a log generation is
through the explicit use of "do-log" action.

This need was described in GH #401, it should help to conditionally
trigger logs using ACL at specific key points.. and may either be used
alone or combined with "log-steps" to add additional log "trackers" during
transaction handling.

Documentation was updated and some examples were added.
2024-10-04 21:38:14 +02:00
Aurelien DARRAGON
0e271f1d2a MINOR: log: add do_log_parse_act() helper func
Function may be used from places where per-context actions are usually
registered (tcp_act.c, http_act.c, quic_rules.c.. to name a few) in
order to expose the do_log() action.
2024-10-04 21:38:08 +02:00
Aurelien DARRAGON
e63c7da508 MINOR: log: add do_log() logging helper
do_log() is quite similar to sess_log() or strm_log(), excepts that it
may be called at any time during session handling in an opportunistic
way as long as the session exists (the stream may or may not exist).

Also, it will try to emit the log as INFO by default, unless set-log-level
is used on the stream, or error origin flag is set.
2024-10-04 21:38:02 +02:00
Amaury Denoyelle
f6599cf5a6 MEDIUM: quic: decount out-of-order ACK data range for MUX txbuf window
This commit is the last one of a serie whose objective is to restore
QUIC transfer throughput performance to the state prior to the recent
QUIC MUX buffer allocator rework.

This gain is obtained by reporting received out-of-order ACK data range
to the QUIC MUX which can then decount room in its txbuf window. This is
implemented in QUIC streamdesc layer by adding a new invokation of
notify_room callback. This is done into qc_stream_buf_store_ack() which
handle out-of-order ACK data range.

Previous commit has introduced merging of overlapping ACK data range. As
such, it's easy to only report the newly acknowledged data range.

As with in-order ACKs, this new notification is only performed on
released streambuf. As such, when a streambuf instance is released,
notify_room notification now also reports the total length of
out-of-order ACK data range currently stored. This value is stored in a
new streambuf member <room> to avoid unnecessary tree lookup.

This <room> member also serves on in-order ACK notification to reduce
the notified room. This prevents to report invalid values when overlap
ranges are treated first out-of-order and then in-order, which would
cause an invalid QUIC MUX txbuf window value.

After this change has been implemented, performance has been
significantly improved, both with ngtcp2-client rate usage and on
interop goodput test. These values are now similar to the rate observed
on older haproxy version before QUIC MUX buffer allocator rework.
2024-10-04 18:09:51 +02:00
Amaury Denoyelle
ae3e768d32 MEDIUM: quic: merge contiguous/overlapping buffered ack stream range
Transfer throughput was deteriorated since recent rework of QUIC MUX
txbuf allocator. This was partially restorated with the commit to
decount individual in-order ACK from the MUX buffer window.

To fully retrieve the old performance level, all ACKs must be decounted
when handled by QUIC streamdesc layer, event out-of-order ranges.
However, this is not easily implemented as several ranges may exist in
parallel with overlap on the underlying data. It would cause
miscalculation for QUIC MUX buffer window if such ranges were blindly
reported.

The proper solution is to first implement merge of contiguous or
overlapping ACK data ranges to reduce the number of stored ranges to the
minimal. This is the purpose of this patch. This is implemented in a new
static function named qc_stream_buf_store_ack() into streamdesc layer.

The merge algorithm is simple enough. First, it ensures the newly added
range is not already fully covered by a preexisting entry. Then, it
checks if there is contiguity/overlap with one or several ranges
starting at the same of a greater offset. If true, the newly added entry
is extended to cover them all, and all contiguous/overlapped ranges are
removed. Finally, if there is contiguity or overlap with an entry
starting at a smaller offset, no new range is instantiated and instead
the smaller offset is extended.

Now that contiguous or overlapped ranges cannot exits anymore, ACK data
ranges tree instiatiation can used EB_ROOT_UNIQUE.

Outside of the longer term objective which is to decount out-of-order
ACKs from MUX txbuf window, this commit could also improve some
performance and/or memory usage for connections where stream data
fragmentation and packet reording is high.
2024-10-04 18:07:52 +02:00
Amaury Denoyelle
e7578084b0 MINOR: quic: implement dedicated type for out-of-order stream ACK
QUIC streamdesc layer is responsible to handle reception of ACK for
streams. It removes stream data from the underlying buffers on ACK
reception.

Streamdesc layer treats ACK in order at the stream level. Out of order
ACKs are buffered in a tree until they can be handled on older data
acknowledgement reception. Previously, qf_stream instance which comes
from the quic_tx_packet was used as tree node to buffer such ranges.

Introduce a new type dedicated to represent out of order stream ack data
range. This type is named qc_stream_ack. It contains minimal infos only
relative to the acknowledged stream data range.

This allows to reduce size of frequently used quic_frame with the
removal of tree node from qf_stream. Another side effect of this change
is that now quic_frame are always released immediately on ACK reception,
both in-order and out-of-order. This allows to also release the
quic_tx_packet instance which should reduce memory consumption.

The drawback of this change is that qc_stream_ack instance must be
allocated on out-of-order ACK reception. As such, qc_stream_desc_ack()
may fail if an error happens on allocation. For the moment, such error
is silenly recovered up to qc_treat_rx_pkts() with the dropping of the
received packet containing the ACK frame. In the future, it may be
useful to close the connection as this error may only happens on low
memory usage.
2024-10-04 17:56:45 +02:00
Amaury Denoyelle
4ff87db5fe MEDIUM: quic: decount acknowledged data for MUX txbuf window
Recently, a new allocation mechanism was implemented for Tx buffers used
by QUIC MUX. Now, underlying congestion window size is used to determine
if it is still possible or not to allocate a new buffer when necessary.

This mechanism has render the QUIC stack more flexible. However, it also
has brought some performance degradation, with transfer time longer in
certain environment. It was first discovered on the measurement results
of the interop. It can also easily be reproduced using the following
ngtcp2-client example which forces a very small congestion window due to
frequent loss :

 $ ngtcp2-client -q --no-quic-dump --no-http-dump --exit-on-all-streams-close -r 0.1 127.0.0.1 20443 "https://[::]:20443/?s=10m"

This performance decrease is caused by the allocator which is now too
strict. It may cause buffer underrun frequently at the MUX layer when
the congestion window is too small, as new buffers cannot be allocated
until the current one is fully acknowledged. This resuls in transfers
with very bad throughput utilisation. The objective of this new serie of
patches is to relax some restrictions to permit QUIC MUX to allocate new
buffers more quickly, while preserving the initial limitation based on
congestion window size.

An interesting method for this is to notify QUIC MUX about newly
available room on individual ACK reception, without waiting for the full
bffer acknowledgement. This is easily implemented by adding a new
notify_room invokation in QUIC streamdesc layer on ACK reception.
However, ACK reception are handled in-order at the stream level. Out of
order ACKs are buffered and are not decounted for now. This will be
implemented in a future commit.

Note that for a single buffer instance, data can in parallel be written
by QUIC MUX and removed on ACK reception. This could cause room
notification to QUIC MUX layer to report invalid values. As such, ACK
reception are only accounted for released buffers. This ensures that
such buffers won't received any new data. In the same time, buffer room
is notified on release operation as it does not need acknowledgement.

This commit has permit to improve performance for the ngtcp2-client
scenario above. However, it is not yet sufficient enough for interop
goodput test.
2024-10-04 17:31:26 +02:00
Amaury Denoyelle
324a49ed4d MINOR: quic: strengthen qc_release_frm()
quic_frame is the type used to represent frames emitted in a QUIC Tx
packet. Each frame is attached to a packet, and can also be linked to
other frames from the the same packet, or duplicated frames for
retransmission. As such, quic_frame free operation is a tedious process.

qc_release_frm() has been implemented to ensure quic_frame is always
properly freed after detaching from all its list attach point. One
particular point is to ensure that when a frame is released, the frame
origin and all origin copies, including the current <frm> are flagged as
acked and detached from the reflist. Add a BUG_ON() to ensure this loop
is properly conducted when dealing with the current <frm> instance.
2024-10-04 16:00:05 +02:00
Christopher Faulet
131b877565 BUG/MINOR: stats: Fix the name for the total number of streams created
Because of a copy/paste error, CurrStreams was reused by mistake. It should
be "CumStreams"

No backports needed.
2024-10-04 15:44:40 +02:00
Amaury Denoyelle
c1d714156e BUG/MAJOR: mux-quic: do not crash on empty STREAM frame emission
Most of the time STREAM frames emitted by QUIC MUX have some data in it.
However, it is possible to use an empty frame when a delayed FIN must be
transferred.

Recently, QUIC MUX send callback notification has been refactored. Now,
this callback is blindly called by quic_conn lower layer each time a
STREAM frame is built into a newly Tx packet. QUIC MUX is responsible to
ensure the notified frame corresponds to newly emitted data or
retransmission. Offsets are used for this comparison, but this requires
special care for empty FIN frames.

Sadly, the comparison written to determine if an empty FIN frame was
sent for the first time or retransmitted is not correct. This caused
such frame to always be dismissed as retransmission in QUIC MUX sent
callback. This prevented the related QCS instance to be removed from the
send_list, causing qcc_io_send() to retry a new emission. This was
finally interrupted by the BUG_ON() assertion to prevent an infinite
loop.

Fix this crash by updating the condition in QUIC MUX send callback. For
empty STREAM frame, it is sufficient to check if QC_SF_FIN_STREAM was
already removed or not to detect a retransmission. Indeed, empty STREAM
frames are never used outside of delayed FIN reporting.

No need to backport. This crash was introduced in the current dev branch
by the following commit.
  d7f4e5abf0
  MEDIUM: quic: strengthen MUX send notification
2024-10-04 11:31:11 +02:00
Willy Tarreau
7cdc9325a1 [RELEASE] Released version 3.1-dev9
Released version 3.1-dev9 with the following main changes :
    - MINOR: tools: add minimal file name management
    - CLEANUP: stick-table: make the file location point to a global file name
    - MINOR: proxy: use the global file names for conf->file
    - CLEANUP: cfgparse: factor proxy vs log-forward collisions
    - BUG/MINOR: cfgparse: detect another uncaught case of duplicate defaults
    - MINOR: proxy: add a list of orphaned defaults sections
    - MEDIUM: cfgparse: drop duplicate named defaults sections after use
    - OPTIM: cfgparse: speed up duplicate server detection
    - MEDIUM: cfgparse: warn about deprecated use of duplicate server names
    - BUG/MINOR: server: shut down streams under thread isolation
    - BUG/MINOR: proxy: also make the cli and resolvers use the global name
    - REGTESTS: log: fix log-profile.vtc
    - MEDIUM: mailers: warn about deprecated legacy mailers
    - BUG/MEDIUM: cli: Be sure to catch immediate client abort
    - DEV: flags/applet: decode appctx flags
    - BUG/MEDIUM: cli: Deadlock when setting frontend maxconn
    - MINOR: log: fix indent in strm_log()
    - MINOR: log: introduce extra log profile steps
    - MINOR: log: handle extra log origins in _process_send_log_override()
    - MINOR: log: introduce log_orig flags
    - MINOR: log: explicitly handle extra log origins as error when relevant
    - MINOR: log: support extra log origins for '%OG' alias
    - MINOR: proxy: add log_steps struct member
    - MINOR: log: introduce "log-steps" proxy keyword
    - MINOR: log: add log_orig_proxy() helper function
    - MEDIUM: log: consider log-steps proxy setting for existing log origins
    - DOC: config: document proxy "log-steps" keyword
    - REGTESTS: add a test for proxy "log-steps"
    - Revert "BUG/MINOR: server: shut down streams under thread isolation"
    - MINOR: task: define two new one-shot events for use with WOKEN_OTHER or MSG
    - BUG/MEDIUM: stream: make stream_shutdown() async-safe
    - BUG/MINOR: server: make sure the HMAINT state is part of MAINT
    - BUG/MINOR: queue: make sure that maintenance redispatches server queue
    - MINOR: server: make srv_shutdown_sessions() call pendconn_redistribute()
    - BUILD: tools: only include execinfo.h for the real backtrace() function
    - MINOR: tools: do not attempt to use backtrace() on linux without glibc
    - OPTIM: channel: speed up co_getline()'s search of the end of line
    - OPTIM: stconn: Don't pretend mux have more data to deliver on EOI/EOS/ERROR
    - BUG/MINOR: mcli: Pretend the mux have more data to deliver between two commands
    - MINOR: action: Export release_expr_int_action() release function
    - MINOR: stream: Rely on a per-stream max connection retries value
    - MINOR: stream: Support dynamic changes of the number of connection retries
    - MINOR: stream/stats: Expose the current number of streams in stats
    - MINOR: stream/stats: Expose the total number of streams ever created in stats
    - BUG/MINOR: cfgparse-global: fix allowed args number for setenv
    - MINOR: cfgparse-global: add dedicated parser for *env keywords
    - MINOR: mux-quic: complete Tx infos for QCS dump
    - MINOR: quic: ensure txbuf realloc is only performed on empty buffer
    - MINOR: mux-quic: strengthen qcs_send_metadata() usage
    - MINOR: quic: remove unneeded notification of txbuf room
    - MINOR: quic: refactor MUX send notification
    - MEDIUM: quic: strengthen MUX send notification
    - MINOR: quic: refactor STREAM room notification
    - MINOR: quic: do not remove qc_stream_desc automatically on ACK handling
    - MINOR: quic: store streambuf in a streamdesc tree
    - MINOR: quic: move buffered ACK to streambuf
    - MEDIUM: quic: handle out-of-order ACK at streamdesc layer
    - MEDIUM: quic: refactor buffered STREAM ACK consuming
    - BUG/MEDIUM: queue: always dequeue the backend when redistributing the last server
    - MINOR: config/trace: Add a 'traces' section to declare debug traces
    - MINOR: trace: Be able to chain commands for a source in one line
    - MINOR: tcpcheck: Add support for an option host header value for httpchk option
    - BUG/MINOR: mux-h1: Fix condition to set EOI on SE during zero-copy forwarding
    - MINOR: mux-h1: Use a dedicated function to conditionnaly set EOI flag on SE
    - BUG/MINOR: http-ana: Disable fast-fwd for unfinished req waiting for upgrade
    - BUG/MINOR: mux-quic: fix crash on qcc_init() early return
    - BUG/MINOR: quic: fix trace on releasing STREAM frame after ack
2024-10-03 17:47:33 +02:00
Amaury Denoyelle
b74df9fbc9 BUG/MINOR: quic: fix trace on releasing STREAM frame after ack
Fix NULL argument pass to qc_release_frm(). This allows to give more
context on the traces inside it. Note that no crash occured as QUIC
traces always check validity on first arg before derefencing it.

No backport needed.
2024-10-02 17:10:51 +02:00
Amaury Denoyelle
58b7a72d07 BUG/MINOR: mux-quic: fix crash on qcc_init() early return
qcc_release() may be used in case qcc_init() cannot complete. In this
case, connection instance is NULL. As such, it cannot be dereferenced
without testing it first.

This should fix github coverity report #2739.

No backport needed.
2024-10-02 17:06:31 +02:00
Christopher Faulet
cea1379cf1 BUG/MINOR: http-ana: Disable fast-fwd for unfinished req waiting for upgrade
If a request is waiting for a protocol upgrade but it is not finished, the
data fast-forwarding is disabled. Otherwise, the request analyzers will miss
the end of the message.

This case is possible since the commit 01fb1a54 ("BUG/MEDIUM: mux-h1/mux-h2:
Reject upgrades with payload on H2 side only"). Indeed, before, a protocol
upgrade was not allowed for request with payload. But it is now possible and
this comes with a side-effect. It is not really satisfying but for now there
is no other way to sync the muxes and the applicative stream. It seems to be
a reasonnable fix for now, waiting for a deeper refactoring.

This patch must be backported with the commit above.
2024-10-02 10:31:40 +02:00
Christopher Faulet
267ba1d889 MINOR: mux-h1: Use a dedicated function to conditionnaly set EOI flag on SE
The same conditions are evaluated in h1_process_demux() and h1_fastfwd() to
know if SE_FL_EOI flag must be set or not on the sedesc. So now, a dedicated
function is used.
2024-10-02 10:22:51 +02:00
Christopher Faulet
6b39e245e1 BUG/MINOR: mux-h1: Fix condition to set EOI on SE during zero-copy forwarding
During zero-copy data forwarding, the producer must set the EOI flag on the SE
when end of the message is reached. It is already done but there is a case where
this flag is set while it should not. When a request wants to perform a protocol
upgrade and it is waiting for the server response, the flag must not be set
because the HTTP message is finished but some data are possibly still expected,
depending on the server response. On a 101-switching-protocol, more data will be
sent because the producer is switch to TUNNEL state.

So, now, the right condition is used. In DONE state, SE_FL_EOI flag is set on the sedesc iff:

  - it is the response
  - it is the request and the response is also in DONNE state
  - it is a request but no a protocol upgrade nor a CONNECT

This patch must be backported as far as 2.9.
2024-10-02 10:22:51 +02:00
Christopher Faulet
27ee292731 MINOR: tcpcheck: Add support for an option host header value for httpchk option
Support for headers and body hidden in the version for the "option httpchk"
directive was removed. However a Host header is mandatory for HTTP/1.1
requests and some servers may return an error if it is not set. For now, to
add it, an "http-check send" rule must be added. But it is not really handy
to use an extra config line for this purpose.

So now, it is possible to set the host header value, a log-format string, as
extra argument to "option httpchk" directive. It must be the fourth argument:

  option httpchk GET / HTTP/1.1 www.srv.com

While this patch is not a bug fix, it is simple enough to be backported if
necessary. On 2.9 and older, lf_init_expr() does not exist and LIST_INIT() must
be used instead.
2024-10-02 10:22:51 +02:00
Christopher Faulet
c39c351a73 MINOR: trace: Be able to chain commands for a source in one line
In the configuration file or on the CLI, configuring traces for a specific
source is a bit painful because this must be done in several lines. Thanks
to this patch, it is now possible to fully configure traces for a source in
one line. For instance, the following on the CLI:

  trace h1 sink stderr; trace h1 level developer; trace h1 verbosity complete; trace h1 start now

can now be replaced by:

  trace h1 sink stderr level developer verbosity complete start now

The same is true for the 'trace' directives in the configuration file.
2024-10-02 10:22:51 +02:00
Christopher Faulet
15a520d474 MINOR: config/trace: Add a 'traces' section to declare debug traces
It is no longer supported to declare debug traces, via 'trace' directive, in
a global section. A 'traces' directive must be used instead. The syntax of
the 'trace' directive in these sections remains the same. But it is no
longer experimental.

The main reason for this change is to avoid to have a ring section defined
before a global one. Indeed, for now, forward declarations of ring sections
are not supported. So to configure traces, you had to add a ring section
before the global one defining the traces. Most of time, that meant to have
two global sections :

  global
    [...] # global settings

  ring <name>
    [...]

  global
    [...] # trace config

In addition, it will be possible to easily extend the traces section by
adding some new directives.
2024-10-02 10:22:51 +02:00
Willy Tarreau
53f52e67a0 BUG/MEDIUM: queue: always dequeue the backend when redistributing the last server
An interesting bug was revealed by commit 5541d4995d ("BUG/MEDIUM: queue:
deal with a rare TOCTOU in assign_server_and_queue()"). When shutting
down a server to redistribute its connections, no check is made on the
backend's queue. If we're turning off the last server and the backend
has pending connections, these ones will wait there till the queue
timeout. But worse, since the commit above, we can enter an endless loop
in the following situation:

  - streams are present in the backend's queue
  - streams are purged on the last server via srv_shutdown_streams()
  - that one calls pendconn_redistribute(srv) which does not purge
    the backend's pendconns
  - a stream performs some load balancing and enters assign_server_and_queue()
  - assign_server() is called in turn
  - the LB algo is non-deterministic and there are entries in the
    backend's queue. The function notices it and returns SRV_STATUS_FULL
  - assign_server_and_queue() calls pendconn_add() to add the connection
    to the backend's queue
  - on return, pendconn_must_try_again() is called, it figures there's
    no stream served anymore on the server nor the proxy, so it removes
    the pendconn from the queue and returns 1
  - assign_server_and_queue() loops back to the beginning to try again,
    while the conditions have not changed, resulting in an endless loop.

Ideally a change count should be used in the queues so that it's possible
to detect that some dequeuing happened and/or that a last stream has left.
But that wouldn't completely solve the problem that is that we must never
ever add to a queue when there's no server streams to dequeue the new
entries.

The current solution consists in making pendconn_redistribute() take care
of the proxy after the server in case there's no more server available on
the proxy. It at least ensures that no pending streams are left in the
backend's queue when shutting streams down or when the last server goes
down. The try_again loop remains necessary to deal with inevitable races
during pendconn additions. It could be limited to a few rounds, though,
but it should never trigger if the conditions are sufficient to permit
it to converge.

One way to reproduce the issue is to run a config with a single server
with maxconn 1 and plenty of threads, then run in loops series of:

 "disable server px/s;shutdown sessions server px/s;
  wait 100ms server-removable px/s; show servers conn px;
  enable server px/s"

on the CLI at ~10/s while injecting with around 40 concurrent conns at
40-100k RPS. In this case in 10s - 1mn the crash can appear with a
backtrace like this one for at least 1 thread:

  #0  pendconn_add (strm=strm@entry=0x17f2ce0) at src/queue.c:487
  #1  0x000000000064797d in assign_server_and_queue (s=s@entry=0x17f2ce0) at src/backend.c:1064
  #2  0x000000000064a928 in srv_redispatch_connect (s=s@entry=0x17f2ce0) at src/backend.c:1962
  #3  0x000000000064ac54 in back_handle_st_req (s=s@entry=0x17f2ce0) at src/backend.c:2287
  #4  0x00000000005ae1d5 in process_stream (t=t@entry=0x17f4ab0, context=0x17f2ce0, state=<optimized out>) at src/stream.c:2336

It's worth noting that other threads may often appear waiting after the
poller and one in server_atomic_sync() waiting for isolation, because
the event that is processed when shutting the server down is consumed
under isolation, and having less threads available to dequeue remaining
requests increases the probability to trigger the problem, though it is
not at all necessary (some less common traces never show them).

This should carefully be backported wherever the commit above was
backported.
2024-10-01 18:57:51 +02:00
Amaury Denoyelle
8d68717a41 MEDIUM: quic: refactor buffered STREAM ACK consuming
For the moment, streamdesc layer can only deal with in-order ACK at the
stream level. Received out-of-order ACKs are buffered in a tree attached
to a streambuf instance.

Previously, caller of qc_stream_desc_ack() was responsible to implement
consumption of these buffered ACKs. Refactor this by implementing it
directly at the streamdesc layer within qc_stream_desc_ack(). This
simplifies quic_rx ACK handling and ensure buffered ACKs are consumed as
soon as possible.
2024-10-01 16:22:23 +02:00
Amaury Denoyelle
cc4384aeb7 MEDIUM: quic: handle out-of-order ACK at streamdesc layer
qc_stream_desc_ack() is the entrypoint for streamdesc layer to handle a
new acknowledgement of previously emitted STREAM data.

Previously, it was only able to deal with in-order ACK offset. The
caller was responsible to buffer out-of-order ACKs. Change this by
dealing with the latter case directly in qc_stream_desc_ack(). This
notably simplify ACK handling in quic_rx module.
2024-10-01 16:22:20 +02:00
Amaury Denoyelle
62558a9285 MINOR: quic: move buffered ACK to streambuf
QUIC streamdesc layer is used to manage QUIC MUX stream txbuf data
storage until acknowledgment. Currently, it only supports in-order
acknowledgment at the stream level. This requires to be able to buffer
out-of-order ACKs until they can be handled.

Previously, these ACKs were stored in a tree to the streamdesc instance.
Move this indexed storage at the streambuf instance.

This commit is purely an architecture change. However, it will allow to
extend ACK management in future patches, such as the ability to merge
overlapping out-of-order ACKs.
2024-10-01 16:19:42 +02:00
Amaury Denoyelle
943e48dadd MINOR: quic: store streambuf in a streamdesc tree
qc_stream_desc layer is used by QUIC MUX to store emitted STREAM data
until their acknowledgement. Each stream with Tx capability can allocate
its own qc_stream_desc. In turn, each stream desc can have one or
multiple data buffers. This is useful when a MUX stream releases a
buffer and allocate a new one, to preserve bandwith without waiting to
receive all acknowledgement of the previous buffer.

Each buffer is encapsulated in a qc_stream_buf structure. Previously, it
was stored as a list into qc_stream_desc. Change this storage to use a
tree instead. Each buffer is indexed by their offset.

This commit does not introduce functional changes. However, this
rearchitecture will be necessary for future commit to extend ACK
management which require fetching individual buffer instance, not just
the first or last element of a streamdesc, by their offset.
2024-10-01 16:19:41 +02:00
Amaury Denoyelle
f4a83fbb14 MINOR: quic: do not remove qc_stream_desc automatically on ACK handling
qc_stream_desc_ack() is used to handle ACK received for STREAM frame. It
removes acknowledged data from their underlying buffer.

If all data were removed after ACK handling, qc_stream_desc instance
would automatically be freed at the end of qc_stream_desc_ack().
However, this renders the function complicated to use. Simplify this by
removing this automatic removal. Now, caller is responsible to check
after ACK handling if qc_stream_desc instance can be removed. This is
easily done using qc_stream_desc_done() helper.
2024-10-01 16:19:25 +02:00
Amaury Denoyelle
db68f8ed86 MINOR: quic: refactor STREAM room notification
qc_stream_desc is an intermediary layer between QUIC MUX and quic_conn.
It is a facility which permits to store data to emit and keep them for
retransmission until acknowledgment. This layer is responsible to notify
QUIC MUX each time a buffer is freed. This is necessary as MUX buffer
allocation is limited by the underlying congestion window size.

Refactor this to use a mechanism similar to send notification. A new
callback notify_room can now be registered to qc_stream_desc instance.
This is set by QUIC MUX to qmux_ctrl_room(). On MUX QUIC free, special
care is now taken to reset notify_room callback to NULL.

Thanks to this refactoring, further adjustment have been made to refine
the architecture. One of them is the removal of qc_stream_desc
QC_SD_FL_OOB_BUF, which is now converted to a MUX layer flag
QC_SF_TXBUF_OOB.
2024-10-01 16:19:25 +02:00
Amaury Denoyelle
d7f4e5abf0 MEDIUM: quic: strengthen MUX send notification
Previous commit implement a refactor of MUX send notification from
quic_conn layer. With this new architecture, a proper callback is
defined for each qc_stream_desc instance.

This architecture change allows to simplify notification from quic_conn
layer. First, ensure the MUX callback to properly ignore retransmission
of an already emitted frame. Luckily, this can be handled easily by
comparing offsets and FIN status. Also, each QCS instance can now be
unregistered from send notification just prior qc_stream_desc releasing.
This ensures a QCS is never manipulated from quic_conn after its
emission ending. Both these changes render the send notification more
robust. As a nice effect, flag QUIC_FL_CONN_TX_MUX_CONTEXT can be
removed as it is now unneeded.
2024-10-01 16:19:25 +02:00
Amaury Denoyelle
6ad99af0a9 MINOR: quic: refactor MUX send notification
For STREAM emission, MUX QUIC generates one or several frames and emit
them via qc_send_mux(). Lower layer may use them as-is, or split them to
lower chunk to fit in a QUIC packet. It is then responsible to notify
the MUX to report the amount of data sent.

Previously, this was done via a direct call from quic_conn to MUX using
qcc_streams_sent_done(). Modify this to have a better isolation accross
layers. Define a send callback handled by the qc_stream_desc instance.
This allows the MUX to register each QCS instance individually to the
renamved qmux_ctrl_send() which replaces qcc_streams_sent_done().

At quic_conn layer, qc_stream_desc_send() can be used now. This is a
wrapper to qc_stream_desc layer to invoke the send callback if
registered.

This mechanism of qc_stream_desc callback should be extended later to
implement other notifications accross the QUIC stack.
2024-10-01 16:19:25 +02:00
Amaury Denoyelle
4859d8e71d MINOR: quic: remove unneeded notification of txbuf room
When a stream buffer is freed, qc_stream_desc notify MUX. This is useful
if MUX is waiting for Tx buffer allocation.

Remove this notification in qc_stream_desc(). This is because the
function is called when all stream data have been acknowledged and thus
notified. This function can also be called with some data
unacknowledged, but in this case this is only true just before
connection closure. As such, it is useful to notify the MUX in this
condition.
2024-10-01 16:19:25 +02:00
Amaury Denoyelle
12782da020 MINOR: mux-quic: strengthen qcs_send_metadata() usage
This function is reserved for QCS instance where no data was emitted.
A BUG_ON() ensures this by checking that streamdesc buf_list is empty.

However, this condition would not be enough if data were previously
emitted but already fully acknowledged. Thus, extend the condition by
also checking the streamdesc ack_offset is 0.
2024-10-01 16:17:03 +02:00
Amaury Denoyelle
fdc16c1e01 MINOR: quic: ensure txbuf realloc is only performed on empty buffer
QUIC application protocol layer has the ability to either allocate a
standard buffer or a smaller one. The latter is useful when only small
data are transferred to prevent consuming too much of the QUIC MUX
buffer window.

This operation is performed using qc_stream_buf_realloc(). Add a new
BUG_ON() in it to ensure no data is present in the buffer. Indeed, this
would cause to data loss, or even crash when trying to acknowledge data.

Note that for the moment qc_stream_buf_realloc() is only use for HTTP/3
headers transmission, and this usage is conform to the new BUG_ON. This
commit is thus not a bug fix, but only to strengthen the API.
2024-10-01 11:51:51 +02:00
Amaury Denoyelle
172404a8ec MINOR: mux-quic: complete Tx infos for QCS dump
Complete debug info when a QCS instance is dumped either on traces or
show quic. Display the value of Tx offset both soft and real, along with
the current flow-control limit.
2024-10-01 11:51:51 +02:00
Valentine Krasnobaeva
f18b52cc80 MINOR: cfgparse-global: add dedicated parser for *env keywords
This commit prepares the config parser to support MODE_DISCOVERY and, thus,
refactored master-worker mode. The latter implies, that master process reads
only the 'DISCOVERY' tagged keywords from the global section and it must call
for this an appropriate keyword parser.

So, let's move the code, which parses *env keywords, from the global section
parser to its own keyword registered parser.
2024-10-01 10:37:29 +02:00
Valentine Krasnobaeva
df68f7ec96 BUG/MINOR: cfgparse-global: fix allowed args number for setenv
Keywords setenv and presetenv take 2 arguments: variable name and value.
So, the total number, that should be passed to alertif_too_many_args is 2
("setenv <name> <value>") instead of 3. For alertif_too_many_args the first
argument index is 0.

This should be backported in all stable versions.
2024-10-01 10:35:09 +02:00
Christopher Faulet
273d322b6f MINOR: stream/stats: Expose the total number of streams ever created in stats
A shared counter is added in the thread context to track the total number of
streams created on the thread. This number is then reported in stats. It
will be a useful information to diagnose some bugs.
2024-09-30 16:55:53 +02:00
Christopher Faulet
18ee22ff76 MINOR: stream/stats: Expose the current number of streams in stats
A shared counter is added in the thread context to track the current number
of streams. This number is then reported in stats. It will be a useful
information to diagnose some bugs.
2024-09-30 16:55:53 +02:00
Christopher Faulet
6a94b7419e MINOR: stream: Support dynamic changes of the number of connection retries
Thanks to the previous patch, it is now possible to add an action to
dynamically change the maxumum number of connection retires for a stream.
"set-retries" action may now be used to do so, from a "tcp-request content"
or a "http-request" rule. This action accepts an expression or an integer
between 0 and 100. The integer value is checked during the configuration
parsing and leads to an error if it is not in the expected range. However,
for the expression, the value is retrieve at runtime. So, invalid value are
just ignored.

Too high value is forbidden to avoid any trouble. 100 retries seems already
be an amazingly hight value. In addition, the option is only available on
backend or listen sections.

Because the max retries is limited to 100 at most, it can be stored as a
unsigned short. This save some space in the stream structure.
2024-09-30 16:55:53 +02:00
Christopher Faulet
91e785edc9 MINOR: stream: Rely on a per-stream max connection retries value
Instead of directly relying on the backend parameter to limit the number of
connection retries, we now use a per-stream value. This value is by default
inherited from the backend value when it is set. So for now, there is no
change except the stream value is used instead of the backend value. But
thanks to this change, it will be possible to dynamically change this value.
2024-09-30 16:55:53 +02:00
Christopher Faulet
0d91de2be4 MINOR: action: Export release_expr_int_action() release function
This function was only used by TCP actions and was private to tcp_act.c
file. However, it make sense to make it public to be used by any action
relying on an int-or-expression argument.
2024-09-30 16:55:53 +02:00
Christopher Faulet
688abb6f30 BUG/MINOR: mcli: Pretend the mux have more data to deliver between two commands
Since the commit "OPTIM: stconn: Don't pretend mux have more data to deliver
on EOI/EOS/ERROR", the SC no longer pretend its mux have more data to
deliver when one of EOI/EOS/ERROR flags are set on its sedesc.

However, for the master cli, it is an issue because any EOI/EOS at the end
of a command is in fact detected on the attempt to get the next command. To
do so, the stream is reset. Because if the commit above, the next received
is never performed. To fix the issue, when the stream is reset, the front SC
pretend its mux have more data to deliver.

This patch must only be bacported if the commit above is backported.
2024-09-30 16:55:53 +02:00
Christopher Faulet
bca5e14235 OPTIM: stconn: Don't pretend mux have more data to deliver on EOI/EOS/ERROR
Doing some benchs on the 3.0, we encountered a small loss on requests/sec on
small objects compared to the 2.8 . After bisecting the issue, it appeared
that this was introduced when the mux-to-mux zero-copy data forwarding was
implemented in 2.9-dev8. Extra subscribes on receives at the end of the
message were responsible of the loss.

A basic configuration, sending H2 requests to a H1 server returning
responses without payload is enough to observe the issue. With the following
command, we can observe a huge increase of epoll_ctl calls on 2.9/3.x:

  h2load -c 100 -m 10 -n 100000 http://...

On 2.8 we have around 3200 calls to epoll_ctl against more than 20k on 3.1.

The fix seems obvious. After a receive, there is no reason to state a mux
have more data to deliver if EOI/EOS/ERROR flag was set on the
stream-endpoint descriptor. With this change, extra calls to epoll_ctl
disappear. However it is a sensitive part so it is important to keep an eye
on it and to not backport it.

Thanks to Willy and Emeric to have spot the issue.
2024-09-30 16:55:48 +02:00
Willy Tarreau
11051ed9c7 OPTIM: channel: speed up co_getline()'s search of the end of line
Previously, co_getline() was essentially used for occasional parsing
in peers's banner or Lua, so it could afford to read one character at
a time. However now it's also used on the TCP log path, where it can
consume up to 40% CPU as mentioned in GH issue #2731. Let's speed it
up by using memchr() to look for the LF, and copying the data at once
using memcpy().

Previously it would take 2.44s to consume 1 GB of log on a single
thread of a Core i7-8650U, now it takes 1.56s (-36%).
2024-09-30 11:36:39 +02:00
Willy Tarreau
7caf073faa MINOR: tools: do not attempt to use backtrace() on linux without glibc
The function is provided by glibc. Nothing prevents us from using our
own outside of glibc there (tested on aarch64 with musl). We still do
not enable it by default as we don't yet know if all archs work well,
but it's sufficient to pass USE_BACKTRACE=1 when building with musl to
verify it's OK.
2024-09-29 09:52:23 +02:00
Willy Tarreau
1c4776dbc3 BUILD: tools: only include execinfo.h for the real backtrace() function
No need to include this possibly non-existing file when using our own
backtrace() implementation, it's only needed for the libc-provided one.
Because of this it's currently not possible to build musl with backtrace
enabled.
2024-09-29 09:52:23 +02:00
Willy Tarreau
1d403caf8a MINOR: server: make srv_shutdown_sessions() call pendconn_redistribute()
When shutting down server sessions, the queue was not considered, which
is a problem if some element reached the queue at the moment the server
was going down, because there will be no more requests to kick them out
of it. Let's always make sure we scan the queue to kick these streams
out of it and that they can possibly find a more suitable server. This
may make a difference in the time it takes to shut down a server on the
CLI when lots of servers are in the queue.

It might be interesting to backport this to 3.0 but probably not much
further.
2024-09-27 19:01:38 +02:00
Willy Tarreau
1385e33eb0 BUG/MINOR: queue: make sure that maintenance redispatches server queue
Turning a server to maintenance currently doesn't redispatch the server
queue unless there's an explicit "option redispatch" and no "option
persist", while the former has never really been the purpose of this
test. Better refine this so that forced maintenance also causes the
queue to be flushed, and possibly redispatched unless the proxy has
option persist. This way now when turning a server to maintenance,
the queue is immediately flushed and streams can decide what to do.

This can be backported, though there's no need to go far since it was
never directly reported and only noticed as part of debugging some
rare "shutdown sessions" strangeness, which it might participate to.
2024-09-27 18:54:07 +02:00
Willy Tarreau
a4d04c649a BUG/MINOR: server: make sure the HMAINT state is part of MAINT
In 1.8 when adding "set server fqdn" with commit b418c1228c ("MINOR:
server: cli: Add server FQDNs to server-state file and stats socket."),
the HMAINT flag was not made part of the MAINT ones, so technically
speaking when changing the FQDN, the server is not completely considered
as in maintenance mode.

In its defense, the code location around that was completely messy, with
the aggregator flag being hidden between other values and purposely but
discretely ignoring one of the flags, so the comments were updated to
make the intent clearer (particularly regarding CMAINT which looked like
it was also forgotten while it was on purpose).

This can be backported anywhere.
2024-09-27 18:40:15 +02:00
Willy Tarreau
b8e3b0a18d BUG/MEDIUM: stream: make stream_shutdown() async-safe
The solution found in commit b500e84e24 ("BUG/MINOR: server: shut down
streams under thread isolation") to deal with inter-thread stream
shutdown doesn't work fine because there exists code paths involving
a server lock which can then deadlock on thread_isolate(). A better
solution then consists in deferring the shutdown to the stream itself
and just wake it up for that.

The only thing is that TASK_WOKEN_OTHER is a bit too generic and we
need to pass at least 2 types of events (SF_ERR_DOWN and SF_ERR_KILLED),
so we're now leveraging the new TASK_F_UEVT1 and _UEVT2 flags on the
task's state to convey these info. The caller only needs to wake the
task up with these flags set, and the stream handler will then finish
the job locally using stream_shutdown_self().

This needs to be carefully backported to all branches affected by the
dequeuing issue and containing any of the 5541d4995d ("BUG/MEDIUM:
queue: deal with a rare TOCTOU in assign_server_and_queue()"), and/or
b11495652e ("BUG/MEDIUM: queue: implement a flag to check for the
dequeuing").
2024-09-27 12:15:41 +02:00
Willy Tarreau
b5281283bb MINOR: task: define two new one-shot events for use with WOKEN_OTHER or MSG
TASK_WOKEN_MSG only says "someone sent you a message" but doesn't convey
any info about the message. TASK_WOKEN_OTHER says "you're woken for another
reason" but doesn't tell which one. Most often they're used as-is by the
task handlers to report very specific situations.

For some important control notifications, having the ability to modulate
the message a little bit is useful, so let's define two user event types
UEVT1 and UEVT2 to be used in conjunction with TASK_WOKEN_MSG or _OTHER
so that the application can know that a specific condition was explicitly
requested. It will be used this way:

  task_wakeup(s->task, TASK_WOKEN_MSG | TASK_F_UEVT1);
or:
  task_wakeup(s->task, TASK_WOKEN_OTHER | TASK_F_UEVT2);

Since events are cumulative, keep in mind not to consider a 3rd value
as the combination of EVT1+EVT2; these really mean that the two events
appeared (though in unspecified order).
2024-09-27 11:56:10 +02:00
Willy Tarreau
d1c398b786 Revert "BUG/MINOR: server: shut down streams under thread isolation"
This reverts commit b500e84e24fd19ccbcdf4fae5165aeb07e46bd67.

Thread isolation does not work well for this, there exists code paths
which already hold the server's lock and result in a deadlock. Let's
revert that and address it better without isolation.
2024-09-27 10:17:31 +02:00