The ckch_conf_cmp() function allow to compare multiple ckch_conf
structures in order to check that multiple usage of the same crt in the
configuration uses the same ckch_conf definition.
A crt-list allows to use "crt-store" keywords that defines a ckch_store,
that can lead to inconsistencies when a crt is called multiple time with
different parameters.
This function compare and dump a list of differences in the err variable
to be output as error.
The variant ckch_conf_cmp_empty() compares the ckch_conf structure to an
empty one, which is useful for bind lines, that are not able to have
crt-store keywords.
These functions are used when a crt-store is already inialized and we
need to verify if the parameters are compatible.
ckch_conf_cmp() handles multiple cases:
- When the previous ckch_conf was declared with CKCH_CONF_SET_EMPTY, we
can't define any new keyword in the next initialisation
- When the previous ckch_conf was declared with keywords in a crtlist
(CKCH_CONF_SET_CRTLIST), the next initialisation must have the exact
same keywords.
- When the previous ckch_conf was declared in a "crt-store"
(CKCH_CONF_SET_CRTSTORE), the next initialisaton could use no keyword
at all or the exact same keywords.
This patch adds crt-store keywords from the crt-list on the CLI.
- keywords from crt-store can be used over the CLI when inserting
certificate in a crt-list
- keywords from crt-store are dumped when showing a crt-list content
over the CLI
The ckch_conf_kws.func function pointer needed a new "cli" parameter, in
order to differenciate loading that come from the CLI or from the
startup, as they don't behave the same. For example it must not try to
load a file on the filesystem when loading a crt-list line from the CLI.
dump_crtlist_sslconf() was renamed in dump_crtlist_conf() and takes a
new ckch_conf parameter in order to dump relevant crt-store keywords.
This option allow to disable completely the ocsp-update.
To achieve this, the ocsp-update.mode global keyword don't rely anymore
on SSL_SOCK_OCSP_UPDATE_OFF during parsing to call
ssl_create_ocsp_update_task().
Instead, we will inherit the SSL_SOCK_OCSP_UPDATE_* value from
ocsp-update.mode for each certificate which does not specify its own
mode.
To disable completely the ocsp without editing all crt entries,
ocsp-update.disable is used instead of "ocsp-update.mode" which is now
only used as the default value for crt.
Use the ocsp-update keyword in the crt-store section. This is not used
as an exception in the crtlist code anymore.
This patch introduces the "ocsp_update_mode" variable in the ckch_conf
structure.
The SSL_SOCK_OCSP_UPDATE_* enum was changed to a define to match the
ckch_conf on/off parser so we can have off to -1.
The callback used by ckch_store_load_files() only works with
PARSE_TYPE_STR.
This allows to use a callback which will use a integer type for PARSE_TYPE_INT
and PARSE_TYPE_ONOFF.
This require to change the type of the callback to void * to pass either
a char * or a int depending of the parsing type.
The ssl_sock_load_* functions were encapsuled in ckch_conf_load_*
function just to match the type.
This will allow to handle crt-store keywords that are ONOFF or INT
types.
Remove the "ocsp-update" keyword handling from the crt-list.
The code was made as an exception everywhere so we could activate the
ocsp-update for an individual certificate.
The feature will still exists but will be parsed as a "crt-store"
keyword which will still be usable in a "crt-list". This will appear in
future commits.
This commit also disable the reg-tests for now.
This patch allows the usage of "crt-store" keywords from a "crt-list".
The crtstore_parse_load() function was splitted into 2 functions, so the
keywords parsing is done in ckch_conf_parse().
With this patch, crt are loaded with ckch_store_new_load_files_conf() or
ckch_store_new_load_files_path() depending on weither or not there is a
"crt-store" keyword.
More checks need to be done on "crt" bind keywords to ensure that
keywords are compatible.
This patch does not introduce the feature on the CLI.
ckch_store_new_load_files_conf() is the equivalent of
new_ckch_store_load_files_path() but instead of trying to find the files
using a base filename, it will load them from a list of files.
This mask value is unused, so we can safely remove it. It is a chance
because its value was wrong. But there is no bug here, even in stable
versions, because it is no longer used in all versions.
There was a flag to skip the response payload on output, if any, by stating
it is bodyless. It is used for responses to HEAD requests or for 204/304
responses. This allow rewrites during analysis. For instance a HEAD request
can be rewrite to a GET request for any reason (ie, a server not supporting
HEAD requests). In this case, the server will send a response with a
payload. On frontend side, the payload will be skipped and a valid response
(without payload) will be sent to the client.
With this patch we introduce the corresponding flag for the request. It will
be used to skip the request payload. In addition, when payload must be
skipped for a request or a response, The zero-copy data forwarding is now
disabled.
After every release we say that MIN/MAX should be changed to be an
expression that only evaluates each operand once, and before every
version we forget to change it and we recheck that the code doesn't
misuse them. Let's fix them now.
Aurlien reported that clang's build was broken by the recent fix
845fb846c7 ("BUG/MEDIUM: stick-tables: properly mark stktable_data
as packed"), because it now wants to use a helper for some atomic
ops (to increment std_t_uint). While this makes no sense to do
something that slow on modern architectures like x86 and arm64 which
are fine with unaligned accesses, we actually we can simply mark the
struct as aligned to its smallest element which is 32-bit (but still
packed). With this, it was verified that it is enough for clang to
see that its 32-bit operations will always be aligned, while making
64-bit operations safe on 64-bit platforms that do not support unaligned
accesses.
This should be backported wherever the patch above is backported.
Implement basic support for glitches on QUIC multiplexer. This is mostly
identical too glitches for HTTP/2.
A new configuration option named tune.quic.frontend.glitches-threshold
is defined to limit the number of glitches on a connection before
closing it.
Glitches counter is incremented via qcc_report_glitch(). A new
qcc_app_ops callback <report_susp> is defined. On threshold reaching, it
allows to set an application error code to close the connection. For
HTTP/3, value H3_EXCESSIVE_LOAD is returned. If not defined, default
code INTERNAL_ERROR is used.
For the moment, no glitch are reported for QUIC or HTTP/3 usage. This
will be added in future patches as needed.
Rename enum values used for HTTP/3 and QPACK RFC defined codes. First
uses a prefix H3_ERR_* which serves as identifier between them. Also
separate QPACK values in a new dedicated enum qpack_err. This is deemed
cleaner.
There is two distinct enums both related to QPACK error management. The
first one is dedicated to RFC defined code. The other one is a set of
internal values returned by qpack_decode_fs(). There has been issues
discovered recently due to the confusion between them.
Rename internal values with the prefix QPACK_RET_*. The older name
QPACK_ERR_* will be used in a future commit for the first enum.
In order to forcefully unregister a buffer waiter during an inter-thread
takeover under isolation, we'll need to that the function works without
th_ctx but the target thread's ctx instead. Let's implement this by
passing the target thread as an argument. Now b_dequeue() simply calls
this one with tid. It's OK it's not on that critical a path, especially
since the list has been checked for existence before performing the call.
The stktable_data union is made of types of varying sizes, and depending
on which types are stored in a table, some offsets might not necessarily
be aligned. This results in a bus error for certain regtests (e.g.
lb-services) on MIPS64. This bug may impact MIPS64, SPARC64, armv7 when
accessing a 64-bit counter (e.g. bytes) and depending on how the compiler
emitted the operation, and cause a trap that's emulated by the OS on RISCV
(heavy cost). x86_64 and armv8 are not affected at all.
Let's properly mark the struct with __attribute__((packed)) so that the
compiler emits the suitable unaligned-compatible instructions when
accessing the fields.
This should be backported to all versions where it applies.
A test on MIPS64 revealed that the following reg tests would all
fail at the same place in htx_replace_stline() when updating
parts of the request line:
reg-tests/cache/if-modified-since.vtc
reg-tests/http-rules/h1or2_to_h1c.vtc
reg-tests/http-rules/http_after_response.vtc
reg-tests/http-rules/normalize_uri.vtc
reg-tests/http-rules/path_and_pathq.vtc
While the status line is normally aligned since it's the first block
of the HTX, it may become unaligned once replaced. The problem is, it
is a structure which contains some u16 and u32, and dereferencing them
on machines not natively supporting unaligned accesses makes them crash
or handle crap. Typically, MIPS/MIPS64/SPARC will crash, ARMv5 will
either crash or (more likely) return swapped values and do crap, and
RISCV will trap and turn to slow emulation.
We can assign the htx_sl struct the packed attribute, but then this
also causes the ints to fill the 2-bytes gap before them, always causing
unaligned accesses for this part on such machines. The patch does a bit
better, by explicitly filling this two-bytes hole, and packing the
struct.
This should be backported to all versions.
qpack_decode_fs() is used to decode QPACK field section on HTTP/3
headers parsing. Its return value is incoherent as it returns either
QPACK_DECOMPRESSION_FAILED defined in RFC 9204 or any other internal
values defined in qpack-dec.h. On failure, such return code is reused by
HTTP/3 layer to be reported via a CONNECTION_CLOSE frame. This is
incorrect if an internal error values was reported as it is not defined
by any specification.
Fir return values of qpack_decode_fs() in two ways. Firstly, fix invalid
usages of QPACK_DECOMPRESSION_FAILED when decoded content is too large
for the correct internal error QPACK_ERR_TOO_LARGE.
Secondly, adjust qpack_decode_fs() API to only returns internal code
values. A new internal enum QPACK_ERR_DECOMP is defined to replace
QPACK_DECOMPRESSION_FAILED. Caller is responsible to convert it to a
suitable error value. For other internal values, H3_INTERNAL_ERROR is
used. This is done through a set of convert functions.
This should be backported up to 2.6. Note that trailers are not
supported in 2.6 so chunk related to h3_trailers_to_htx() can be safely
skipped.
Now, if a pool_alloc() fails for a buffer and if conditions are met
based on the queue number, we'll try to get an emergency buffer.
Thanks to this the situation is way more stable now. With only 4 reserve
buffers and 1 buffer it's possible to reliably serve 500 concurrent end-
to-end H1 connections and consult stats in parallel in loops showing the
growing number of buf_wait events in "show activity" without facing an
instant stall like in the past. Lower values still cause quick stalls
though.
It's also apparent that some subsystems do not seem to detach from the
buffer_wait lists when leaving. For example several crashes in the H1
part showed list elements still present after a free(), so maybe some
operations performed inside h1_release() after the b_dequeue() call
can sometimes result in a new allocation. Same for streams, where
the dequeue is done relatively early.
The buffer reserve set by tune.buffers.reserve has long been unused, and
in order to deal gracefully with failed memory allocations we'll need to
resort to a few emergency buffers that are pre-allocated per thread.
These buffers are only for emergency use, so every time their count is
below the configured number a b_free() will refill them. For this reason
their count can remain pretty low. We changed the default number from 2
to 4 per thread, and the minimum value is now zero (e.g. for low-memory
systems). The tune.buffers.limit setting has always been a problem when
trying to deal with the reserve but now we could simplify it by simply
pushing the limit (if set) to match the reserve. That was already done in
the past with a static value, but now with threads it was a bit trickier,
which is why the per-thread allocators increment the limit on the fly
before allocating their own buffers. This also means that the configured
limit is saner and now corresponds to the regular buffers that can be
allocated on top of emergency buffers.
At the moment these emergency buffers are not used upon allocation
failure. The only reason is to ease bisecting later if needed, since
this commit only has to deal with resource management.
Now when trying to allocate a channel buffer, we can check if we've been
notified of availability via the producer stream connector callback, in
which case we should not consult the queue, or if we're doing a first
allocation and check the queue.
When the buffer allocation callback is notified of a buffer availability,
it will now set a MAYALLOC flag in addition to clearing the ALLOC one, for
each of the 3 levels where we may fail an allocation. The flag will be
cleared upon a successful allocation. This will soon be used to decide to
re-allocate without waiting again in the queue. For now it has no effect.
There's just a trick, we need to clear the various *_ALLOC flags before
testing h1_recv_allowed() otherwise it will return false!
When appctx_buf_available() is called, it now sets APPCTX_FL_IN_MAYALLOC
or APPCTX_FL_OUT_MAYALLOC depending on the reportedly permitted buffer
allocation, and these flags are cleared when the said buffers are
allocated. For now they're not used for anything else.
When the buffer allocation callback is notified of a buffer availability,
it will now set a MAYALLOC flag on the stream so that the stream knows it
is allowed to bypass the queue checks. For now this is not used.
We used to have two states for the channel's input buffer used by the SC,
NEED_BUFF or not, flipped by sc_need_buff() and sc_have_buff(). We want to
have a 3rd state, indicating that we've just got a desired buffer. Let's
add an HAVE_BUFF flag that is set by sc_have_buff() and that is cleared by
sc_used_buff(). This way by looking at HAVE_BUFF we know that we're coming
back from the allocation callback and that the offered buffer has not yet
been used.
Now b_alloc() will check the queues at the same and higher criticality
levels before allocating a buffer, and will refrain from allocating one
if these are not empty. The purpose is to put some priorities in the
allocation order so that most critical allocators are offered a chance
to complete.
However in order to permit a freshly dequeued task to allocate again while
siblings are still in the queue, there is a special DB_F_NOQUEUE flag to
pass to b_alloc() that will take care of this special situation.
When we want to allocate an in buffer, it's in order to pass data to
the applet, that will consume it, so it must be seen as the same as
a send() from the higher level, i.e. MUX_TX. And for the outbuf, it's
a stream endpoint returning data, i.e. DB_SE_RX.
Instead of having each caller of appctx_get_buf() think about setting
the blocking flag, better have the function do it, since it's already
handling the queue anyway. This way we're sure that both are consistent.
Now that we need to keep the bitmap in sync with the list heads, we don't
want tasks to leave just doing a LIST_DEL_INIT() without updating the map.
Let's provide a b_dequeue() function for that purpose. The function detects
when it's going to remove the last element and figures the queue number
based on the pointer since it points to the root. It's not used yet.
The introduction of buffer_wq[] in thread_ctx pushed a few fields around
and the cache line alignment is less satisfying. And more importantly, even
before this, all the lists in the local parts were 8-aligned, with the first
one split across two cache lines.
We can do better:
- sched_profile_entry is not atomic at all, the data it points to is
atomic so it doesn't need to be in the atomic-only region, and it can
fill the 8-hole before the lists
- the align(2*void) that was only before tasklets[] moves before all
lists (and it's a nop for now)
This now makes the lists and buffer_wq[] start on a cache line boundary,
leaves 48 bytes after the lists before the atomic-only cache line, and
leaves a full cache line at the end for 128-alignment. This way we still
have plenty of room in both parts with better aligned fields.
Let's turn the buffer_wq into an array of 4 list heads. These are chosen
by criticality. The DB_CRIT_TO_QUEUE() macro maps each criticality level
into one of these 4 queues. The goal here clearly is to make it possible
to wake up the most critical queues in priority in order to let some tasks
finish their job and release buffers that others can use.
In order to avoid having to look up all queues, a bit map indicates which
queues are in use, which also allows to avoid looping in the most common
case where queues are empty..
The code places that were used to manipulate the buffer_wq manually
now just call b_queue() or b_requeue(). This will simplify the multiple
list management later.
When failing an allocation we always do the same dance, add the
buffer_wait struct to a list if it's not, and return. Let's just add
dedicated functions to centralize this, this will be useful to implement
a bit more complex logic.
For now they're not used.
The goal is to indicate how critical the allocation is, between the
least one (growing an existing buffer ring) and the topmost one (boot
time allocation for the life of the process).
The 3 tcp-based muxes (h1, h2, fcgi) use a common allocation function
to try to allocate otherwise subscribe. There's currently no distinction
of direction nor part that tries to allocate, and this should be revisited
to improve this situation, particularly when we consider that mux-h2 can
reduce its Tx allocations if needed.
For now, 4 main levels are planned, to translate how the data travels
inside haproxy from a producer to a consumer:
- MUX_RX: buffer used to receive data from the OS
- SE_RX: buffer used to place a transformation of the RX data for
a mux, or to produce a response for an applet
- CHANNEL: the channel buffer for sync recv
- MUX_TX: buffer used to transfer data from the channel to the outside,
generally a mux but there can be a few specificities (e.g.
http client's response buffer passed to the application,
which also gets a transformation of the channel data).
The other levels are a bit different in that they don't strictly need to
allocate for the first two ones, or they're permanent for the last one
(used by compression).
There are 2 new ctl commands that may be used to retrieve the current number
of streams openned for a connection and its limit (the maximum number of
streams a mux connection supports).
For the PT and H1 muxes, the limit is always 1 and the current number of
streams is 0 for idle connections, otherwise 1 is returned.
For the H2 and the FCGI muxes, info are already available in the mux
connection.
For the QUIC mux, the limit is also directly available. It is the maximum
initial sub-ID of bidirectional stream allowed for the connection. For the
current number of streams, it is the number of SC attached on the connection
and the number of not already attached streams present in the "opening_list"
list.
A reason is now passed as parameter to muxes shutdowns to pass additional
info about the abort, if any. No info means no abort or only generic one.
For now, the reason is composed of 2 32-bits integer. The first on represents
the abort code and the other one represents the info about the code (for
instance the source). The code should be interpreted according to the associated
info.
One info is the source, encoding on 5 bits. Other bits are reserverd for now.
For now, the muxes are the only supported source. But we can imagine to extend
it to applets, streams, health-checks...
The current design is quite simple and will most probably evolved.. But the
idea is to let the opposite side forward some errors and let's a mux know
why its stream was aborted. At first glance, a abort reason must only be
evaluated if SE_SHW_SILENT flag is set.
The main goal at short term, is to forward some H2 RST_STREAM codes because
it is mandatory for gRPC applications, mainly to forward gRPC cancellation
from an H2 client to an H2 server. But we can imagine to alter this reason
at the applicative level to enrich it. It would also be used to report more
accurate errors in logs.
Instead of chaining 2 switchcases and performing encoding checks for all
nodes let's actually split the logic in 2: first handle simple node types
(text/separator), and then handle dynamic node types (tag, expr). Encoding
options are only evaluated for dynamic node types.
Also, last_isspace is always set to 0 after next_fmt label, since next_fmt
label is only used for dynamic nodes, thus != LOG_FMT_SEPARATOR.
Since LF_NODE_WITH_OPT() macro (which was introduced recently) is now
unused, let's get rid of it.
No functional change should be expected.
(Use diff -w to check patch changes since reindentation makes the patch
look heavy, but in fact it remains fairly small)
Split code related to proxies list looping in cli_parse_clear_counters()
to a new dedicated function. This function is placed in the new module
stats-proxy.
Create a new module stats-proxy. Move stats functions related to proxies
list looping in it. This allows to reduce stats source file dividing its
size by half.
Convert FN_AGE in stat_cols_px[] as generic columns. These values will
be automatically used for dump/preload of a stats-file.
Remove srv_lastsession() / be_lastsession() function which are now
useless as last_sess is calculated via me_generate_field().
last_change was a member present in both proxy and server struct. It is
used as an age statistics to report the last update of the object.
Move last_change into fe_counters/be_counters. This is necessary to be
able to manipulate it through generic stat column and report it into
stats-file.
Note that there is a change for proxy structure with now 2 different
last_change values, on frontend and backend side. Special care was taken
to ensure that the value is initialized only on the proxy side. The
other value is set to 0 unless a listen proxy is instantiated. For the
moment, only backend counter is reported in stats. However, with now two
distinct values, stats could be extended to report it on both side.
Implement support for FN_RATE stat column into stat-file.
For the output part, only minimal change is required. Reuse the function
read_freq_ctr() to print the same value in both stats output and
stats-file dump.
For counter preloading, define a new utility function
preload_freq_ctr(). This can be used to initialize a freq-ctr type by
preloading previous period value. Reuse this function in load_ctr()
during stats-file parsing.
At the moment, no rate column is defined as generic. Thus, this commit
does not have functional change. This will be changed as soon as FN_RATE
are converted to generic columns.
Move freq-ctr defined in proxy or server structures into their dedicated
fe_counters/be_counters struct.
Functionnaly no change here. This commit will allow to convert rate
stats column to generic one, which is mandatory to manipulate them in
the stats-file.
Currently, only FN_COUNTER are dumped and preloaded via a stats-file.
Thus in several places we relied on the assumption that only FN_COUNTER
are valid in stats-file context.
New stats types will soon be implemented as they are also eligilible to
statistics reloading on process startup. Thus, prepare stats-file
functions to remove any FN_COUNTER restriction.
As one of this change, generate_stat_tree() now uses stcol_is_generic()
for stats name tree indexing before stats-file parsing.
Also related to stats-file parsing, individual counter preloading step
as been extracted from line parsing in a dedicated new function
load_ctr(). This will allow to extend it to support multiple mechanism
of counter preloading depending on the stats type.