At the beginning of the 3.0-dev cycle, the zero-copy forwarding support was
added only for the cache applet with an option to disable it. This was a
hack, waiting for a better integration with applets. It is now possible to
implement the zero-copy forwarding for any applets. So the specific option
for the cache applet was renamed to be used for all applets. And this option
is now also checked for the stats applet.
Concretely, 'tune.cache.zero-copy-forwarding' was renamed to
'tune.applet.zero-copy-forwarding'.
This field was introduced when the first implementation of the zero-copy
forwarding was added. It is now useless. However, we must still save the
body-size of the object in the cache.
Default .rcv_buf and .snd_buf functions that applets can use are now
specialized to manipulate raw buffers or HTX buffers.
Thus a TCP applet should use appctx_raw_rcv_buf() and appctx_raw_snd_buf()
while HTTP applet should use appctx_htx_rcv_buf() and appctx_htx_snd_buf().
Note that the appctx is now directly passed to these functions instead of
the SC.
Just like for the cache applet, it is now possible to send response to the
opposite side using the zero-copy forwarding. Internal functions were
slightly updated but there is nothing special to say. Except the requested
size during the nego stage is not exact.
Till now, for chunked messages, the H1 mux used the size requested during
the zero-copy forwarding negotiation as the chunk size. And till now, this
was accurate because the requested size was indeed the chunk size on the
producer side.
But this will be a problem to implement the zero-copy forwarding on some
applets because the content size is not known during the nego but only when
it is produced. Thanks to previous patches, it is now possible to know the
requested size is not exact and we are able to reserve a larger space to
write the chunk size later, in h1_done_ff(), with some padding.
Now, during the zero-copy forwarding negotiation, when the requested size is
exact, we are now able to check if it is bigger than the expected one or
not. If it is indeed bigger than expeceted, the zero-copy forwarding is
disabled, the error will be triggered later on the normal sending path.
It is now possible to use a flag during zero-copy forwarding negotiation to
specify the requested size is exact, it means the producer really expect to
receive at least this amount of data.
It can be used by consumer to prepare some processing at this stage, based
on the requested size. For instance, in the H1 mux, it is used to write the
next chunk size.
It is now possible to impose the length to represent the chunk size in the
function used to prepended the chunk size in a buffer (so before the chunk
itself). It is thus possible to reserve a specific space for an unknown
chunk size and padding it with leading '0' to use all the space and avoid
holes.
During zero-copy forwarding negotiation, a pseudo flag was already used to
notify the consummer if the producer is able to use kernel splicing or not. But
this was not extensible. So, now we use a true bitfield to be able to pass flags
during the negotiation. NEGO_FF_FL_* flags may be used now.
Of course, for now, there is only one flags, the kernel splicing support on
producer side (NEGO_FF_FL_MAY_SPLICE).
The cache applet will be refactored to use its own buffer. Thus, for now,
the zero-copy forwarding support is removed and it will be reintrocuded
later.
The HTTP stat applets and all internal functions was adapted to use its own
buffers instead of the channels ones. The CLI part was not refactored yet,
thus there are still some access to channels in the file. But for the HTTP
part, we no longer use the channels at all.
To do so, the HTTP stats applet now uses default .rcv_buf and .snd_buf
callback function. In addition, it sets appctx flags instead of SE ones.
We no longer test the opposite stream-connector to detect aborted partial
post. Applets must not try to access to info ouside their scope. This make
the code more sensitive to changes and it is a common source of bug.
Tests on the sedesc flags at the begining of the I/O handler should be
enough.
Thanks to this patch, it is possible to an applet to directly send data to
the opposite endpoint. To do so, it must implement <fastfwd> appctx callback
function and set SE_FL_MAY_FASTFWD flag.
Everything will be handled by appctx_fastfwd() function. The applet is only
responsible to transfer data. If it sets <to_forward> value, it is used to
limit the amount of data to forward.
This patch introduces the support for the callback function responsible to
produce data via the zero-copy forwarding mechanism. There is no
implementation for now. But <to_forward> field was added in the appctx
structure to let an applet inform how much data it want to forward. It is
not mandatory but it will be used during the zero-copy forwarding
negociation.
We have indroduced flags to deal with end of input, end of stream and errors
at the applet level. With this patch we make the link with the endpoint
descriptor.
In appctx_rcv_buf(), applet flags are converted to SE flags.
There is no shutdown for reads and send with applets. Both are performed
when the appctx is released. So instead of 2 flags, like for
muxes/connections, only one flag is used. But the idea is the same:
acknowledge the event at the applet level.
The appctx state was never really used as a state. It is only used to know
when an applet should be freed on the next wakeup. This can be converted to
a flag and the state can be removed. This is what this patch does.
Dedicated appctx flags to report EOI, EOS and errors (pending or terminal) were
added with the functions to set these flags. It is pretty similar to what it
done on most of muxes.
Till now, we've extended the appctx state to add some flags. However, the
field name is misleading. So a bitfield was added to handle real flags. And
helper functions to manipulate this bitfield were added.
A dedicated function to run applets was introduced, in addition to the old
one, to deal with applets that use their own buffers. The main differnce
here is that this handler does not use channels at all. It performs a
synchronous send before calling the applet and performs a synchronous
receive just after.
No applets are plugged on this handler for now.
There is no tasklet to handle I/O subscriptions for applets, but functions
to deal with receives and sends from the SC layer were added. it meanse a
function to retrieve data from an applet with this synchronous version and a
function to push data to an applet wit this synchronous version.
It is pretty similar to the functions used for muxes but there are some
differences. So for now, we keep them separated.
Zero-copy forwarding is not supported for now. In addition, there is no
subscription mechanism.
In this patch, we add default functions to copy data from a channel to the
<inbuf> buffer of an applet (appctx_rcv_buf) and another on to copy data
from <outbuf> buffer of an applet to a channel (appctx_snd_buf).
These functions are not used for now, but they will be used by applets to
define their <rcv_buf> and <snd_buf> callback functions. Of course, it will
be possible for a specific applet to implement its own functions but these
ones should be good enough for most of applets. HTX and RAW buffers are
supported.
For now, it is not usable, but this patch introduce the support of callback
functions, in the applet structure, to exchange data between channels and
applets. It is pretty similar to callback functions defined by muxes.
It is the first patch of a series aimed to align applets on connections.
Here, dedicated buffers are added for applets. For now, buffers are
initialized and helpers function to deal with allocation are added. In
addition, flags to report allocation failures or full buffers are also
introduced. <inbuf> will be used to push data to the applet from the stream
and <outbuf> will be used to push data from the applet to the stream.
sc_attach_applet() was changed to be able to fail and callers were updated
accordingly. For now it cannot fail but if this changes, callers will be
prepared to handle errors.
In sc_attach_applet, an untyped pointer (void *) was used to attach a SC on
an applet. There is no reason to not use the right type here. So now a
pointer on an appctx is explicitly used.
IS_HXT_SC() macro is only usable if the stream-connector is attached to a
connection. It is a bit restrictive because this cannot work if the SC is
attached to an applet. So let's fix that be adding the support of applets
too.
wait_event structure was in connection header file because it is only used
by connections and muxes. But, this may change. For instance applets may be
good candidates to use it too. So, the structure is moved to the task header
file instead.
Avoid loss of precision when computing K cubic value.
Same issue when computing the congestion window value from cubic increase function
formula with possible integer varaiable wrap around.
Depends on this commit:
MINOR: quic: Code clarifications for QUIC CUBIC (RFC 9438)
Must be backported as far as 2.6.
The first version of our QUIC CUBIC implementation is confusing because relying on
TCP CUBIC linux kernel implementation and with references to RFC 8312 which is
obsoleted by RFC 9438 (August 2023) after our implementation. RFC 8312 is a little
bit hard to understand. RFC 9438 arrived with much more clarifications.
So, RFC 9438 is about "CUBIC for Fast Long-Distance Networks". Our implementation
for QUIC is not very well documented. As it was difficult to reread this
code, this patch adds only some comments at complicated locations and rename
some macros, variables without logic modifications at all.
So, the aim of this patch is to add first some comments and variables/macros renaming
to avoid embedding too much code modifications in the same big patch.
Some code modifications will come to adapt this CUBIC implementation to this new
RFC 9438.
Rename some macros:
CUBIC_BETA -> CUBIC_BETA_SCALED
CUBIC_C -> CUBIC_C_SCALED
CUBIC_BETA_SCALE_SHIFT -> CUBIC_SCALE_FACTOR_SHIFT (this is the scaling factor
which is used only for CUBIC_BETA_SCALED)
CUBIC_DIFF_TIME_LIMIT -> CUBIC_TIME_LIMIT
CUBIC_ONE_SCALED was added (scaled value of 1).
These cubic struct members were renamed:
->tcp_wnd -> ->W_est
->origin_point -> ->W_target
->epoch_start -> ->t_epoch
->remaining_tcp_inc -> remaining_W_est_inc
Local variables to quic_cubic_update() were renamed:
t -> elapsed_time
diff ->t
delta -> W_cubic_t
Add a grahpic curve about the CUBIC Increase function.
Add big copied & pasted RFC 9438 extracts in relation with the 3 different increase
function regions.
Same thing for the fast convergence.
Fix a typo about the reference to QUIC RFC 9002.
Must be backported as far as 2.6 to ease any further modifications to come.
This commit adds support for an optional second argument to BUG_ON(),
WARN_ON(), CHECK_IF(), that can be a constant string. When such an
argument is given, it will be printed on a second line after the
existing first message that contains the condition.
This can be used to provide more human-readable explanations about
what happened, such as "too low on memory" or "memory corruption
detected" that may help a user resolve the incident by themselves.
The ABORT_NOW() macro is not much used since we have BUG_ON(), but
there are situations where it makes sense, typically if the program
must always die regardless od DEBUG_STRICT, or if the condition must
always be evaluated (e.g. decompress something and check it).
It's not convenient not to have any hint about what happened there. But
providing too much info also results in wiping some registers, making
the trace less exploitable, so a compromise must be found.
What this patch does is to provide the support for an optional argument
to ABORT_NOW(). When an argument is passed (a string), then a message
will be emitted with the file name, line number, the message and a
trailing LF, before the stack dump and the crash. It should be used
reasonably, for example in functions that have multiple calls that need
to be more easily distinguished.
If we were to enable 'ocsp-update' on a certificate that does not have
an OCSP URI, we would exit ssl_sock_load_ocsp with a negative error code
which would raise a misleading error message ("<cert> has an OCSP URI
and OCSP auto-update is set to 'on' ..."). This patch simply fixes the
error message but an error is still raised.
This issue was raised in GitHub #2432.
It can be backported up to branch 2.8.
As seen in previous commit 59acb27001 ("BUILD: quic: Variable name typo
inside a BUG_ON()."), it can sometimes happen that with DEBUG forced
without DEBUG_STRICT, BUG_ON() statements are ignored. Sadly, it means
that typos there are not even build-tested.
This patch makes these statements reference sizeof(cond) to make sure
the condition is parsed. This doesn't result in any code being emitted,
but makes sure the expression is correct so that an issue such as the one
above will fail to build (which was verified).
This may be backported as it can help spot failed backports as well.
Since d480b7b ("MINOR: debug: make ABORT_NOW() store the caller's line
number when using abort"), building with 'DEBUG_USE_ABORT' fails with:
|In file included from include/haproxy/api.h:35,
| from include/haproxy/activity.h:26,
| from src/ev_poll.c:20:
|include/haproxy/thread.h: In function ‘ha_set_thread’:
|include/haproxy/bug.h:107:47: error: expected ‘;’ before ‘_with_line’
| 107 | #define ABORT_NOW() do { DUMP_TRACE(); abort()_with_line(__LINE__); } while (0)
| | ^~~~~~~~~~
|include/haproxy/bug.h:129:25: note: in expansion of macro ‘ABORT_NOW’
| 129 | ABORT_NOW(); \
| | ^~~~~~~~~
|include/haproxy/bug.h:123:9: note: in expansion of macro ‘__BUG_ON’
| 123 | __BUG_ON(cond, file, line, crash, pfx, sfx)
| | ^~~~~~~~
|include/haproxy/bug.h:174:30: note: in expansion of macro ‘_BUG_ON’
| 174 | # define BUG_ON(cond) _BUG_ON (cond, __FILE__, __LINE__, 3, "FATAL: bug ", "")
| | ^~~~~~~
|include/haproxy/thread.h:201:17: note: in expansion of macro ‘BUG_ON’
| 201 | BUG_ON(!thr->ltid_bit);
| | ^~~~~~
|compilation terminated due to -Wfatal-errors.
|make: *** [Makefile:1006: src/ev_poll.o] Error 1
This is because of a leftover: abort()_with_line(__LINE__);
^^
Fixing it by removing the extra parentheses after 'abort' since the
abort() call is now performed under abort_with_line() helper function.
This was raised by Ilya in GH #2440.
No backport is needed, unless the above commit gets backported.
Fetch will return true if the stream underwent a redispatch according to
"option redispatch" setting upon retries.
Documentation was added, and the "%rc" logformat alternative now mentions
the new fetch to properly emulate the logformat behavior.
This build issued was introduced by this previous commit which is a bugfix:
BUG/MINOR: quic: Wrong ack ranges handling when reaching the limit.
A BUG_ON() referenced <fist> variable in place of <first>.
Must be backported as far as 2.6 as the previous commit.