Commit Graph

23468 Commits

Author SHA1 Message Date
Willy Tarreau c329bfe3f5 BUILD: makefile: reorder object files by build time
mux_spop is quite long to build and was at the end. The rest did not
change much, but the build time is now dominated by hlua.o and mux_h2.o
and by a large margin. On the 80-core ARM mux_h2.o is present from
beginning to end and on the PC it's hlua.o, so both might have to be
split at some point to benefit from multi-core.

Nevertheless, the changes allowed to shrink about one second out of
the 18 it was taking on that machine.
2024-11-20 18:49:56 +01:00
Willy Tarreau f16edcd34c BUILD: makefile: build flags.c before haproxy to speed up the build
The end of the build is often super slow. In practice it's flags.o that
now takes ages (3.4 seconds) and blocks everything on a single core at
the end. Let's declare it before the haproxy target so that it starts
earlier. On a quad-2.2 GHz CPU, the build time goes down from 44 to 42s
and the end feels less painful.
2024-11-20 18:49:56 +01:00
Christopher Faulet 667ac8acc6 DOC: management: Clearly state "show errors" only reports malformed H1 messages
For now, only the H1 multiplexer is able to capture malformed messages. So
it is better to update the management guide accordingly to avoid any
confusion.
2024-11-20 18:08:17 +01:00
Christopher Faulet e863d8d681 DOC: config: Improve documentation of tune.http.maxhdr directive
The description was inproved to clrealy mentionned it is applied on received
requests and responses. In addition, a comment was added about HTTP/2 and
HTTP/3 limitation when messages are encoded to be sent.
2024-11-20 18:02:36 +01:00
Christopher Faulet 3bd9a9e7d7 BUG/MEDIUM: h3: Increase max number of headers when sending headers
In the same way than for the H2, the maximum number of headers that can be
encoded when headers are sent must be increased to match the limit imposed
when they are received.

Reasons are the sames. On receive path, the maximum number of headers
accepted must be higher than the configured limit to be able to handle
pseudo headers and cookies headers. On the sending path, the same limit must
be applied because the pseudo headers will consume some extra slots and the
cookie header could be splitted.

This patch should be backported as far as 2.6.
2024-11-20 17:44:22 +01:00
Christopher Faulet 785e633353 BUG/MEDIUM: h3: Properly limit the number of headers received
The number of headers are limited before the decoding but pseudo headers and
cookie headers consume extra slots. In practice, this lowers the maximum number
of headers that can be received.

To workaround this issue, the limit is doubled during the frame decoding to be
sure to have enough extra slots. And the number of headers is tested against the
configured limit after the HTX message was created to be able to report an
error. Unfortunatly no parsing error are reported because the QUIC multiplexer
is not able to do so for now.

The same is performed on trailers to be consistent with H2.

This patch should be backported as far as 2.6.
2024-11-20 17:44:22 +01:00
Christopher Faulet 63d2760dfa BUG/MEDIUM: mux-h2: Check the number of headers in HEADERS frame after decoding
There is no explicit test on the number of headers when a HEADERS frame is
received. It is implicitely limited by the size of the header list. But it
is twice the configured limit to be sure to decode the frame.

So now, a check is performed after the HTX message was created. This way, we
are sure to not exceed the configured limit after the decoding stage. If
there are too many headers, a parsing error is reported.

Note the same is performed on the trailers.

This patch should patially address the issue #2685. It should be backported
to all stable versions.
2024-11-20 17:44:22 +01:00
Christopher Faulet e415e3cb7a BUG/MEDIUM: mux-h2: Increase max number of headers when encoding HEADERS frames
When a HEADERS frame is encoded to be sent, the maximum number of headers
allowed in the frame is lower than on receiving path. This can lead to
report a sending error while the message was accepted. It could be
confusing.

In addition, the start-line is splitted into pseudo-headers and consummes
this way some header slots, increasing the difference between HEADERS frames
encoding and decoding. It is even more noticeable because when a HEADERS
frame is decoded, a margin is used to be able to handle splitted cookie
headers. Concretly, on decoding path, a limit of twice the maxumum number of
headers allowed in a message (tune.http.maxhdr * 2) is used. On encoding
path, the exact limit is used. It is not consistent.

Note that when a frame is decoded, we must use a larger limit because the
pseudo headers are reassembled in the start-line and must count for one. But
also because, most of time, the cookies are splitted into several headers
and are reassembled too.

To fix the issue, the same ratio is applied on sending path. A limit must be
defined because an dynamic allocation is not acceptable. Twice of the
configured limit should be good enough to support headers manipulation.

This patch should be backported to all stable versions.
2024-11-20 17:44:22 +01:00
Frederic Lecaille 349954601f MINOR: quic: add "bbr" new "quic-cc-algo" option
Add this new "bbr" option to the list of the congestion control algorithms which
may be set by "quic-cc-algo" setting.

This new algorithm is considered as experimental and may be enabled only if
"expose-experimental-directive" is set.

Also update the documentation for this new setting.
2024-11-20 17:34:22 +01:00
Frederic Lecaille e778b9a2b6 MINOR: quic: TX part modifications to support BBR.
Very few modifications: call ->on_transmit() and ->drs_on_transmit() congestion
control algorithm (quic_cc) callbacks from qc_send_ppkts() just after having
sents some packets.
2024-11-20 17:34:22 +01:00
Frederic Lecaille 44af88d856 MINOR: quic: RX part modifications to support BBR
qc_notify_cc_of_newly_acked_pkts() aim is to notify the congestion algorithm
of all the packet acknowledgements. It must call quic_cc_drs_update_rate_sample()
to update the delivery rate sampling information. It must also call
quic_cc_drs_on_ack_recv() to update the state of the delivery rate sampling part
used by BBR.
Finally, ->on_ack_rcvd() is called with the total number of bytes delivered
by the sender from the newly acknowledged packets with <bytes_delivered> as
parameter to do so. <pkt_delivered> store the per-packet number of bytes
delivered by the newly sent acknowledged packet (the packet with the highest
packet number). <bytes_lost> is also used and has been set by
qc_packet_loss_lookup() before calling qc_notify_cc_of_newly_acked_pkts().
2024-11-20 17:34:22 +01:00
Frederic Lecaille d85eb127e9 MINOR: quic: quic_loss modifications to support BBR
qc_packet_loss_lookup() aim is to detect the packet losses. This is this function
which must called ->on_pkt_lost() BBR specific callback. It also set
<bytes_lost> passed parameter to the total number of bytes detected as lost upon
an ACK frame receipt for its caller.
Modify qc_release_lost_pkts() to call ->congestion_event() with the send time
from the newest packet detected as lost.
Modify qc_release_lost_pkts() to call ->slow_start() callback only if define
by the congestion control algorithm. This is not the case for BBR.
2024-11-20 17:34:22 +01:00
Frederic Lecaille af75665cb7 MINOR: quic: quic_cc modifications to support BBR
Add several callbacks to quic_cc_algo struct which are only called by BBR.
->get_drs() may be used to retrieve the delivery rate sampling information
from an congestion algorithm struct (quic_cc).
->on_transmit() must be called before sending any packet a QUIC sender.
->on_ack_rcvd() must be called after having received an ACK.
->on_pkt_lost() must be called after having detected a packet loss.
->congestion_event() must be called after any congestion event detection
Modify quic_cc.c to call ->event only if defined. This is not the case
for BBR.
2024-11-20 17:34:22 +01:00
Frederic Lecaille d04adf44dc MINOR: quic: implement BBR congestion control algorithm for QUIC
Implement the version 3 of BBR for QUIC specified by the IETF in this draft:

https://datatracker.ietf.org/doc/draft-ietf-ccwg-bbr/

Here is an extract from the Abstract part to sum up the the capabilities of BBR:

BBR ("Bottleneck Bandwidth and Round-trip propagation time") uses recent
measurements of a transport connection's delivery rate, round-trip time, and
packet loss rate to build an explicit model of the network path. BBR then uses
this model to control both how fast it sends data and the maximum volume of data
it allows in flight in the network at any time. Relative to loss-based congestion
control algorithms such as Reno [RFC5681] or CUBIC [RFC9438], BBR offers
substantially higher throughput for bottlenecks with shallow buffers or random
losses, and substantially lower queueing delays for bottlenecks with deep buffers
(avoiding "bufferbloat"). BBR can be implemented in any transport protocol that
supports packet-delivery acknowledgment. Thus far, open source implementations
are available for TCP [RFC9293] and QUIC [RFC9000].

In haproxy, this implementation is considered as still experimental. It depends
on the newly implemented pacing feature.

BBR was asked in GH #2516 by @KazuyaKanemura, @osevan and @kennyZ96.
2024-11-20 17:34:22 +01:00
Frederic Lecaille 472d575950 MINOR: quic: implement delivery rate sampling algorithm
This patch implements an algorithm which may be used by congestion algorithms
for QUIC to estimate the current delivery rate of a sender. It is at least used
by BBR and could be used by others congestion algorithms as cubic.

This algorithm was specified by an RFC draft here:
https://datatracker.ietf.org/doc/html/draft-cheng-iccrg-delivery-rate-estimation
before being merged into BBR v3 here:
https://datatracker.ietf.org/doc/html/draft-cardwell-ccwg-bbr#section-4.5.2.2
2024-11-20 17:34:22 +01:00
Frederic Lecaille c08b877657 MINOR: window_filter: Implement windowed filter (only max)
Implement the Kathleen Nichols' algorithm used by several congestion control
algorithm implementation (TCP/BBR in Linux kernel, QUIC/BBR in quiche) to track
the maximum value of a data type during a fixe time interval.
In this implementation, counters which are periodically reset are used in place
of timestamps.
Only the max part has been implemented.
(see lib/minmax.c implemenation for Linux kernel).
2024-11-20 17:34:22 +01:00
Frederic Lecaille 7bbe8828ba MINOR: quic: Add the congestion window initial value to QUIC path
Add ->initial_wnd new member to quic_cc_path struct to keep the initial value
of the congestion window. This member is initialized as soon as a QUIC connection
is allocated. This modification is required for BBR congestion control algorithm.
2024-11-20 17:34:22 +01:00
William Lallemand 5ebecbe45b REGTESTS: disable temporarly mworker test on OSX
-Ws on VTest is not working correctly for an unknown reason, the polling
of the NOTIFY_SOCKET seems to timeout, and VTest never receives the
READY message.

This patch disables the reg-tests using -Ws on OS X.
2024-11-20 17:13:59 +01:00
William Lallemand b7d81b3511 REGTESTS: switch to -Ws for master-worker reg-tests
The -W mode implemented in VTest is not reliable anymore, because VTest
waits for the pidfile to be created. But with the new master-worker
mode, this file is created long before haproxy is ready. This can lead
to the test being started too soon, and failing from time to time.

The -Ws option allows to wait for haproxy to deliver a message to VTest
once it is ready.
2024-11-20 17:13:59 +01:00
Aurelien DARRAGON 2ce0db4e4b OPTION: map/hlua: make core.set_map() lookup more efficient
0844bed7d3 ("MEDIUM: map/acl: Improve pat_ref_set() efficiency (for
"set-map", "add-acl" action perfs)") improved lookup efficiency for
set-map http action, but the core.set_map() lua method which is built
on the same construct was overlooked. Let's also benefit from this optim
as it easily applies.
2024-11-20 16:14:13 +01:00
Willy Tarreau 311dc748b0 DOC: config: indent the list of environment variables
In the doc our lists are indented but for any reason this one was not,
making it harder to visually delimit. Let's just indent it. No need to
backport this, it's totally cosmetic and would need adaptations since
it was recently touched.
2024-11-20 15:57:09 +01:00
Valentine Krasnobaeva 41d906d69b DOC: configuration: update "Environment variables" chapter
There are some variables, which are set by HAProxy process (HAPROXY_*). Some
of them are handy to check or to redefine in the configuration, in order to
create conditional blocks and make the configuration more flexible. But it
wasn't clear in the documentation, which variables are really safe and usefull
to redefine and which ones could be only read via "show env" output.

Latest changes in master-worker architecture makes the existed description even
more confusing.

So let's sort all HAPROXY_* variables to four categories and let's also mark
explicitly, which ones are set in which process, when haproxy is started in
master-worker mode.

In addition, update examples in chapter "2.4. Conditional blocks". This might
bring more ideas for users how HAPROXY_* variables could be used in the
conditional blocks.
2024-11-20 15:56:50 +01:00
Valentine Krasnobaeva d6ccd1738b MINOR: startup: set HAPROXY_LOCALPEER only once
Before this patch HAPROXY_LOCALPEER variable could be set in init_early(),
in init_args() and in cfg_parse_global(). In master-worker mode, if localpeer
keyword set in the global section, HAPROXY_LOCALPEER in the worker
environment is set to this keyword's value, but in the master environment it
still keeps the default, a localhost name. This is confusing.

To fix it, let's set HAPROXY_LOCALPEER only once, when a worker or process in a
standalone mode has finished to parse its configuration. And let's set this
variable only for the worker process or for the process in a standalone mode,
because the master doesn't need it.

HAPROXY_LOCALPEER takes the value saved in localpeer global variable, which is
always set by default in init_early() to the local hostname. Then, localpeer
could be reset in init_args (-L option) and in cfg_parse_global() (while
parsing "localpeer" keyword).
2024-11-20 15:44:10 +01:00
Willy Tarreau 1171a23aec BUILD: makefile: make ERR apply to build options as well
Once in a while we find some makefiles ignoring some outdated arguments
and just emit a warning. What's annoying is that if users (say, distro
packagers), have purposely added ERR=1 to their build scripts to make
sure to fail on any warning, these ones will be ignored and the build
can continue with invalid or missing options.

William rightfully suggested that ERR=1 should also catch make's warnings
so this patch implements this, by creating a new "complain" variable that
points either to "error" or "warning" depending on $(ERR), and that is
used to send the messages using $(call $(complain),...). This does the
job right at little effort (tested from GNU make 3.82 to 4.3).

Note that for this purpose the ERR declaration was upped in the makefile
so that it appears before the new errors.mk file is included.
2024-11-20 14:58:35 +01:00
William Lallemand b861dc9371 MINOR: systemd: replace SOCK_CLOEXEC by fcntl call to FD_CLOEXEC
Since we build systemd.o for every target, we need it to be more
portable.

The SOCK_CLOEXEC argument from socket() is not portable and won't build
on some OS like macOS X.

This patch fixes the issue by replace SOCK_CLOEXEC by a fnctl set to
FD_CLOEXEC.
2024-11-20 14:26:23 +01:00
William Lallemand 1ceeeacbad CI: vtest: temporarily build from the sd-notify PR
Build VTest temporarily from the sd-notify PR until the
https://github.com/vtest/VTest/pull/41 is merged.

This PR allows starting with -Ws in order to have more reliables tests
in master-worker mode.
2024-11-20 12:07:38 +01:00
William Lallemand 15845247db MEDIUM: mworker: remove USE_SYSTEMD requirement for -Ws
Since sd_notify() is now implemented in src/systemd.c, there is no need
anymore to build its support conditionnally with USE_SYSTEMD.

This patch add supports for -Ws for every build and removes the
USE_SYSTEMD build option. It also remove every reference to USE_SYSTEMD
in the documentation and the CI.

This also allows to run the reg-tests in -Ws with the new VTest support.
2024-11-20 12:07:38 +01:00
Amaury Denoyelle 16147e6cf3 BUG/MINOR: cfgparse-quic: fix renaming of max-window-size
A patch has recently tried to rename QUIC max-window-size global
parameter to default-max-window-size to better reflect its usage.
However, only the documentation was edited but not cfgparse-quic.c.

Fix this by updating cfgparse-quic.c with the new default- naming.

No need to backport.
2024-11-20 11:12:06 +01:00
Christopher Faulet 17d4e6eaf9 MINOR: http-fetch: Add an option to 'query" to get the QS with the '?'
As mentionned by Thayne McCombs in #2728, it could be handy to have a sample
fetch function to retrieve the query string with the question mark
character.

Indeed, for now, "query" sample fetch function already extract the query
string from the path, but the question mark character is not
included. Instead of adding a new sample fetch function with a too similar
name, an optional argument is added to "query". If "with_qm" is passed as
argument, the question mark will be included in the query string, but only
if it is not empty.

Thanks to this patch, the following rule:

  http-request redirect location /destination?%[query] if { -m found query }  some_condition
  http-request redirect location /destination if some_condition

can now be expressed this way:

  http-request redirect location /destination%[query(with_qm)] if some_condition
2024-11-20 10:20:05 +01:00
Christopher Faulet 2a5da31cce BUG/MINOR: http-ana: Adjust the server status before the L7 retries
The server status must be adjusted, if necessary, at each retry. It is
properly performed when "obersve layer4" directive is set. But for the layer
7, only the last attempt was considered.

When the L7 retries were implemented, all retries were added before the
server status adjutement. So only the last attempt was considered. To fix
the issue, we must adjut the server status first, and then try to perform a
L7 retry.

This patch should fix the issue #2679. It must be backported to all stable
versions.
2024-11-20 09:22:06 +01:00
Willy Tarreau 5c15899410 DOC: configuration: wrap long line for "strstr()" conditional expression
This keyword had too long a description line, let's split it. This can be
backported to 2.8.
2024-11-20 09:04:53 +01:00
Willy Tarreau da1620b317 DOC: configuration: explain quotes and spaces in conditional blocks
Conditional blocks inherit the same tokenizer and argument parser as
the rest of the configuration, but are also silently concatenated
around groups of spaces and tabs. This can lead to subtle failures
for configs containing spaces around commas and parenthesis, where
a string comparison might silently fail for example. Let's better
document this particular case.

Thanks to Valentine for analysing and reporting the problem.

This can be backported to 2.4.
2024-11-20 09:04:53 +01:00
Willy Tarreau 962d5e038f DOC: configuration: explain the rules regarding spaces in arguments
Spaces around commas or parenthesis in expressions are generally part
of the value due to the long history of supporting unquoted arguments.
But this tends to come as a surprise to new users and sometimes creates
subtly invalid configurations. Let's add some text covering this.

This can be backported to 2.4.
2024-11-20 08:42:02 +01:00
Willy Tarreau 12fcd65468 MINOR: tasklet: support an optional set of wakeup flags to tasklet_wakeup_on()
tasklet_wakeup_on() and its derivates (tasklet_wakeup_after() and
tasklet_wakeup()) do not support passing a wakeup cause like
task_wakeup(). This is essentially due to an API limitation cause by
the fact that for a very long time the only reason for waking up was
to process pending I/O. But with the growing complexity of mux tasks,
it is becoming important to be able to skip certain heavy processing
when not strictly needed.

One possibility is to permit the caller of tasklet_wakeup() to pass
flags like task_wakeup(). Instead of going with a complex naming scheme,
let's simply make the flags optional and be zero when not specified. This
means that tasklet_wakeup_on() now takes either 2 or 3 args, and that the
third one is the optional flags to be passed to the callee. Eligible flags
are essentially the non-persistent ones (TASK_F_UEVT* and TASK_WOKEN_*)
which are cleared when the tasklet is executed. This way the handler
will find them in its <state> argument and will be able to distinguish
various causes for the call.
2024-11-19 20:13:41 +01:00
Willy Tarreau 0334cb28a9 MINOR: tasklet: make the low-level tasklet API take a flag
Everything in the tasklet layer supports flags, except that they are
just not implemented in the wakeup functions, while they are in the
task_wakeup functions. Initially it was not considered useful to pass
wakeup causes because these were essentially I/O, but with the growing
number of I/O handlers having to deal with various types of operations
(typically cheap I/O notifications on subscribe vs heavy parsing on
application-level wakeups), it would be nice to start to make this
distinction possible.

This commit extends _tasklet_wakeup_on() and _tasklet_wakeup_after()
to pass a set of flags that continues to be set as zero. For now this
changes nothing, but new functions will come.
2024-11-19 20:13:41 +01:00
Willy Tarreau e57581d76d MINOR: tools: add new macro DEFZERO to provide a default zero argument
This is the equivalent of DEFNULL except that it sets a zero value instead
of a NULL for a missing argument.
2024-11-19 20:13:41 +01:00
Willy Tarreau c5052bad8a MINOR: sched: add TASK_F_WANTS_TIME to make the scheduler update the call date
Currently tasks being profiled have th_ctx->sched_call_date set to the
current nanosecond in monotonic time. But there's no other way to have
this, despite the scheduler being capable of it. Let's just declare a
new task flag, TASK_F_WANTS_TIME, that makes the scheduler take the time
just before calling the handler. This way, a task that needs nanosecond
resolution on the call date will be able to be called with an up-to-date
date without having to abuse now_mono_time() if not needed. In addition,
if CLOCK_MONOTONIC is not supported (now_mono_time() always returns 0),
the date is set to the most recently known now_ns, which is guaranteed
to be atomic and is only updated once per poll loop.

This date can be more conveniently retrieved using task_mono_time().

This can be useful, e.g. for pacing. The code was slightly adjusted so
as to merge the common parts between the profiling case and this one.
2024-11-19 20:13:41 +01:00
Willy Tarreau 12969c1b17 MINOR: tinfo/clock: turn sched_call_date to 64-bits
We used to store it in 32-bits since we'd only use it for latency and CPU
usage calculation but usages will evolve so let's not truncate the value
anymore. Now we store the full 64 bits. Note that this doesn't even
increase the storage size due to alignment. The 3 usage places were
verified to still be valid (most were already cast to 32 bits anyway).
2024-11-19 20:13:41 +01:00
Willy Tarreau 33c461314c MINOR: stream: don't update s->lat_time when the wakeup date is not set
In 2.7 was added a stream wakeup latency calculation with commit
6a28a30efa ("MINOR: tasks: do not keep cpu and latency times in struct
task"). However, due to the transformation of the previous code, it
kept unconditionally updating s->lat_time even of the sched_wake_date
was zero. In other words, s->lat_time is constantly updated for the
huge majority of calls that are made without profiling. Let's just
check the sched_wake_date status before doing so.
2024-11-19 20:13:41 +01:00
Willy Tarreau 973c81ceec CLEANUP: tinfo: move sched_*_date/*_mono_time to the thread-local area
These ones are never atomically accessed, they have nothing to do in
the atomic ops cache line, let's move them to the thread-local area.
2024-11-19 20:13:41 +01:00
Willy Tarreau 8dc68f3c75 DOC: sched: document the missing TASK_F_UEVT* flags
These are user-defined one-shot events that are application-specific
and reset upon wakeup and were not documented. No backport is needed
since these were added to 3.1.
2024-11-19 20:13:41 +01:00
Willy Tarreau e5ca72cb6f DOC: sched: add missing scheduler API documentation for tasklet_wakeup_after()
This was added to 2.6 but the doc was forgotten. Let's add it. It's not
needed to backport this since it's only used for new developments.
2024-11-19 20:13:41 +01:00
Aurelien DARRAGON 501827ebe0 DOC: lua: fix yield-dependent methods expected contexts
Contrary to what the doc states, it is not expected (nor relevant) to
use yield-dependent methods such as core.yield() or core.(m)sleep() from
contexts that don't support yielding. Such contexts include body, init,
fetches and converters.

Thus the doc got it wrong since the beginning, because such methods were
never supported from the above contexts, yet it was listed in the list
of compatible contexts (probably the result of a copy-paste), which is
error-prone because it could either cause a Lua runtime error to be
thrown, or be ignored in some other cases.

It should be backported to all stable versions.
2024-11-19 19:36:02 +01:00
William Lallemand 6f746af915 REGTESTS: use -dW by default on every reg-tests
Every reg-test now runs without any warning, so let's acivate -dW by
default so the new ones will inheritate the option.

This patch reverts 9d511b3c ("REGTESTS: enable -dW on almost all tests
to fail on warnings") and adds -dW in the default HAPROXY_ARGS of
scripts/run-regtests.sh instead.
2024-11-19 16:53:10 +01:00
William Lallemand e1fb9a47e1 MEDIUM: stats-file: silently ignore be/fe mistmatch
Most of the invalid or unknow field in the stats-file parser are ignored
silently, which is not the case of the frontend/backend mismatch on a
guid, which is kind of strange.

Since this is ""documented"" to be ignored in the
reg-tests/stats/sample-stats-file file, let's also ignore this kind of
line. This will allow to run the associated reg-test with -dW.
2024-11-19 16:44:51 +01:00
Amaury Denoyelle 5a29fd6c61 MINOR: mux_quic/pacing: display pacing info on show quic
To improve debugging, extend "show quic" output to report if pacing is
activated on a connection. Two values will be displayed for pacing :

* a new counter paced_sent_ctr is defined in QCC structure. It will be
  incremented each time an emission is interrupted due to pacing.

* pacing engine now saves the number of datagrams sent in the last paced
  emission. This will be helpful to ensure burst parameter is valid.
2024-11-19 16:21:05 +01:00
Amaury Denoyelle 24cea66e07 MEDIUM: quic: define cubic-pacing congestion algorithm
Define a new QUIC congestion algorithm token 'cubic-pacing' for
quic-cc-algo bind keyword. This is identical to default cubic
implementation, except that pacing is used for STREAM frames emission.

This algorithm supports an extra argument to specify a burst size. This
is stored into a new bind_conf member named quic_pacing_burst which can
be reuse to initialize quic path.

Pacing support is still considered experimental. As such, 'cubic-pacing'
can only be used with expose-experimental-directives set.
2024-11-19 16:20:58 +01:00
Amaury Denoyelle 6dfc8fbf1d MINOR: quic: extend quic-cc-algo optional parameters
Modify quic-cc-algo for better extensability of optional parameters
parsing. This will be useful to support a new parameter for maximum
allowed pacing burst size.

Take this opportunity to refine quic-cc-algo documentation. Optional
parameters are now presented as a list which would be soon extended.
2024-11-19 16:20:52 +01:00
Amaury Denoyelle a6504c9cfb MINOR: quic: use dynamic cc_algo on bind_conf
A QUIC congestion algorithm can be specified on the bind line via
keyword quic-cc-algo. As such, bind_conf structure has a member
quic_cc_algo.

Previously, if quic-cc-algo was set, bind_conf member was initialized to
one of the globally defined CC algo structure. This patch changes
bind_conf quic_cc_algo initialization to point to a dynamically
allocated copy of CC algo structure.

With this change, it will be possible to tweak individually each CC algo
of a bind line. This will be used to activate pacing on top of the
congestion algorithm.

As bind_conf member is dynamically allocated now, its member is now
freed via free_proxy() to prevent any leak.
2024-11-19 16:16:48 +01:00
Amaury Denoyelle 796446a15e MAJOR: mux-quic: support pacing emission
Support pacing emission for STREAM frames at the QUIC MUX layer. This is
implemented by adding a quic_pacer engine into QCC structure.

The main changes have been written into qcc_io_send(). It now
differentiates cases when some frames have been rejected by transport
layer. This can occur as previously due to congestion or FD buffer full,
which requires subscribing on transport layer. The new case is when
emission has been interrupted due to pacing timing. In this case, QUIC
MUX I/O tasklet is rescheduled to run with the flag TASK_F_USR1.

On tasklet execution, if TASK_F_USR1 is set, all standard processing for
emission and reception is skipped. Instead, a new function
qcc_purge_sending() is called. Its purpose is to retry emission with the
saved STREAM frames list. Either all remaining frames can now be send,
subscribe is done on transport error or tasklet must be rescheduled for
pacing purging.

In the meantime, if tasklet is rescheduled due to other conditions,
TASK_F_USR1 is reset. This will trigger a full regeneration of STREAM
frames. In this case, pacing expiration must be check before calling
qcc_send_frames() to ensure emission is now allowed.
2024-11-19 16:16:48 +01:00