This commit cleans up cfg_parse_global() and prepares the config parser to
support MODE_DISCOVERY. This step is needed in early starting stage, just to
figura out in which mode the process was started, to set some necessary
parameteres needed for this mode and to continue the initialization
stage.
'pidfile' makes part of such common keywords, which are needed to be parsed
very early and which are used almost in all process modes (except the
foreground, '-d').
'pidfile' keyword parser is called by section parser and follows the common
API, i.e. it returns -1 on failure, 0 on success and 1 on recoverable error. In
case of recoverable error we've previously returned ERR_ALERT (0x10) and we have
emitted an alert message at startup. Section parser treats all rc > 0 as
ERR_WARN. So in case, if pidfile was already specified via command line, the
keyword parser will return 1 (in order to respect the common API), section
parser will treat this as ERR_WARN and a warning message will be emitted during
process startup instead of alert, as it was before.
In the case, when the given keyword was found in the global 'cfg_kws' list, we
go to 'out' label anyway, after testing rc returned by the keyword's parser. So
there is not a much gain if we perform 'goto out' jump specifically when rc > 0.
This patch fixes commits 118ac11ce
("MINOR: cfgparse-global: move mode's keywords in cfg_kw_list") and 83ff4db18
(MINOR: cfgparse-global: move no<poller_name> in cfg_kw_list).
'common_kw_list' serves to show the best-match keyword in cfg_parse_global(), if
the given keyword was not parsed in "if..else if.." cases. cfg_parse_global()
is still used as a parser for some keywords from the global section.
Mode-specific and no<poller_name> keywords now have their own parsers. They no
longer take place in the "if..else if.." from cfg_parse_global() and they are
registered in the 'cfg_kws' list. So, there is no longer need to duplicate
them in the 'common_kw_list'. Otherwise, they will be shown twice in parser
error message.
This patch fixes the commit 118ac11ce
("cfgparse-global: move mode's keywords in cfg_kw_list"). Error message
delivered by keyword parser in **err is always shown with ha_alert() by the
caller cfg_parse_global(). The caller always supplies these alerts with the
filename and the line number.
Previous commit switch to small buffers for HTTP/3 HEADERS emission.
This ensures that several parallel streams can allocate their own buffer
without hitting the connection buffer limit based now on the congestion
window size.
However, this prevents the transmission of responses with uncommonly
large headers. Indeed, if all headers cannot be encoded in a single
buffer, an error is reported which cause the whole connection closure.
Adjust this by implementing a realloc API exposed by QUIC MUX. This
allows application layer to switch from a small to a default buffer and
restart its processing. This guarantees that again headers not longer
than bufsize can be properly transferred.
A major change was recently implemented to change QUIC MUX Tx buffer
allocation limit, which is now based on the current connection
congestion window size. As this size may be smaller than the previous
static value, it is likely that the limit will be reached more
frequently.
When using HTTP/3, the majority of requests streams are used for small
object exchanges. Every responses start with a HEADERS frames which
should be much smaller in size than the default buffer. But as the whole
buffer size is accounted against the congestion window, a single stream
can block others even if only emitting a single HEADERS frame which is
suboptimal for bandwith usage, if the congestion window is small enough.
To adapt to this new situation, rely on the newly available small
buffers to transfer HEADERS frame response. This at least guarantee that
several parallel streams could allocate their own buffer for the first
part of the response, even with a small congestion window.
The situation could be further improve to use various indication on the
data size and select a small buffer if sufficient. This could be done
for example via the Content-length value or HTX extra field. However
this must be the subject of a dedicated patch.
This patch extends qc_stream_desc API to be able to allocate small
buffers. QUIC MUX API is similarly updated as ultimatly each application
protocol is responsible to choose between a default or a smaller buffer.
Internally, the type of allocated buffer is remembered via qc_stream_buf
instance. This is mandatory to ensure that the buffer is released in the
correct pool, in particular as small and standard buffers can be
configured with the same size.
This commit is purely an API change. For the moment, small buffers are
not used. This will changed in a dedicated patch.
Define a new buffer pool reserved to allocate smaller memory area. For
the moment, its usage will be restricted to QUIC, as such it is declared
in quic_stream module.
Add a new config option "tune.bufsize.small" to specify the size of the
allocated objects. A special check ensures that it is not greater than
the default bufsize to avoid unexpected effects.
QUIC MUX buffer allocation limit is now directly based on the underlying
congestion window size. previous static limit based on conn-tx-buffers
is now unused. As such, this commit adds a warning to users to prevent
that it is now obsolete.
Secondly, update max-window-size setting. It is now the main entrypoint
to limit both the maximum congestion window size and the number of QUIC
MUX allocated buffer on emission. Remove its special value '0' which was
used to automatically adjust it on now unused conn-tx-buffers.
Each QUIC MUX may allocate buffers for MUX stream emission. These
buffers are then shared with quic_conn to handle ACK reception and
retransmission. A limit on the number of concurrent buffers used per
connection has been defined statically and can be updated via a
configuration option. This commit replaces the limit to instead use the
current underlying congestion window size.
The purpose of this change is to remove the artificial static buffer
count limit, which may be difficult to choose. Indeed, if a connection
performs with minimal loss rate, the buffer count would limit severely
its throughput. It could be increase to fix this, but it also impacts
others connections, even with less optimal performance, causing too many
extra data buffering on the MUX layer. By using the dynamic congestion
window size, haproxy ensures that MUX buffering corresponds roughly to
the network conditions.
Using QCC <buf_in_flight>, a new buffer can be allocated if it is less
than the current window size. If not, QCS emission is interrupted and
haproxy stream layer will subscribe until a new buffer is ready.
One of the criticals parts is to ensure that MUX layer previously
blocked on buffer allocation is properly woken up when sending can be
retried. This occurs on two occasions :
* after an already used Tx buffer is cleared on ACK reception. This case
is already handled by qcc_notify_buf() via quic_stream layer.
* on congestion window increase. A new qcc_notify_buf() invokation is
added into qc_notify_send().
Finally, remove <avail_bufs> QCC field which is now unused.
This commit is labelled MAJOR as it may have unexpected effect and could
cause significant behavior change. For example, in previous
implementation QUIC MUX would be able to buffer more data even if the
congestion window is small. With this patch, data cannot be transferred
from the stream layer which may cause more streams to be shut down on
client timeout. Another effect may be more CPU consumption as the
connection limit would be hit more often, causing more streams to be
interrupted and woken up in cycle.
Define a new QCC counter named <buf_in_flight>. Its purpose is to
account the current sum of all allocated stream buffer size used on
emission.
For this moment, this counter is updated and buffer allocation and
deallocation. It will be used to replace <avail_bufs> once congestion
window is used as limit for buffer allocation in a future commit.
A current work is performed to change QUIC MUX buffer allocation limit
from a configurable static value to use the size of the congestion
window instead. This change may cause the buffer allocation limit to be
triggered more frequently.
To ensure HTTP/3 control emission is not perturbed by this change, mark
the stream with qcc_send_metadata(). This ensures that buffer allocation
for this stream won't be subject to the connection limit. This is
necessary to guarantee that SETTINGS and GOAWAY frames are emitted.
Define a new qc_stream_desc flag QC_SD_FL_OOB_BUF. This is to mark
streams which are not subject to the connection limit on allocated MUX
stream buffer.
The purpose is to simplify handling of QUIC MUX streams which do not
transfer data and as such are not driven by haproxy layer, for example
HTTP/3 control stream. These streams interacts synchronously with QUIC
MUX and cannot retry emission in case of temporary failure.
This commit will be useful once connection buffer allocation limit is
reimplemented to directly rely on the congestion window size. This will
probably cause the buffer limit to be reached more frequently, maybe
even on QUIC MUX initialization. As such, it will be possible to mark
control streams and prevent them to be subject to the buffer limit.
QUIC MUX expose a new function qcs_send_metadata(). It can be used by an
application protocol to specify which streams are used for control
exchanges. For the moment, no such stream use this mechanism.
A limit per connection is put on the number of buffers allocated by QUIC
MUX for emission accross all its streams. This ensures memory
consumption remains under control. This limit is simply explained as a
count of buffers which can be concurrently allocated for each
connection.
As such, quic_conn structure was used to account currently allocated
buffers. However, a quic_conn nevers allocates new stream buffers. This
is only done at QUIC MUX layer. As such, this commit moves buffer
accounting inside QCC structure. This simplifies the API, most notably
qc_stream_buf_alloc() usage.
Note that this commit inverts the accounting. Previously, it was
initially set to 0 and increment for each allocated buffer. Now, it is
set to the maximum value and decrement for each buf usage. This is
considered as clearer to use.
This commit simply adjusts QUIC stream buffer allocation. This operation
is conducted by QUIC MUX using qc_stream_desc layer. Previously,
qc_stream_buf_alloc() would return a qc_stream_buf instance and QUIC MUX
would finalized the buffer area allocation. Change this to perform the
buffer allocation directly into qc_stream_buf_alloc().
This patch clarifies the interaction between QUIC MUX and
qc_stream_desc. It is cleaner to allocate the buffer via qc_stream_desc
as it is already responsible to free the buffer.
It also ensures that connection buffer accounting is only done after the
whole qc_stream_buf and its buffer are allocated. Previously, the
increment operation was performed between the two steps. This was not an
issue, as this kind of error triggers the whole connection closure.
However, if in the future this is handled as a stream closure instead,
this commit ensures that the buffer remains valid in all cases.
Define a new global keyword tune.quic.frontend.max-window-size. This
allows to set globally the maximum congestion window size for each QUIC
frontend connections.
The default value is 0. It is a special value which automatically derive
the size from the configured QUIC connection buffer limit. This is
similar to the previous "quic-cc-algo" behavior, which can be used to
override the maximum window size per bind line.
quic-cc-algo is a bind line keyword which allow to select a QUIC
congestion algorithm. It can take an optional integer to specify the
maximum window size. This value is an integer and support the suffixes
'k', 'm' and 'g' to specify respectively kilobytes, megabytes and
gigabytes.
Extract the maximum window size parsing in a dedicated function named
parse_window_size(). It accepts as input an integer value with an
optional suffix, 'k', 'm' or 'g'. The first invalid character is
returned by the function to the caller.
No functional change. This commit will allow to quickly implement a new
keyword to configure a default congestion window size in the global
section.
Document nocc congestion algorithm as an entry of quic-cc-algo.
Highlight the fact that it is reserved for debugging and should not be
used outside of this use case.
It is possible to override the default QUIC congestion algorithm on a
bind line. With the same setting, it is also possible to specify the
maximum congestion window size.
The parser rejects values outside of the range between 10k and 4g. This
is in contradiction with the documentation which specify 1k as the lower
value. Correct this value in the documentation.
This should be backported up to 2.9.
load_cfg_in_mem() can continuously reallocate memory in order to load an
extremely large input from /dev/stdin, until it fails with ENOMEM, which means
that process has consumed all available RAM. In case of containers and
virtualized environments it's not very good.
So, in order to prevent this, let's introduce MAX_CFG_SIZE as 10MB, which will
limit the size of input supplied via /dev/stdin.
Some systems require log formats in the CLF format and that meant that I
could not send my logs for proxies in mode tcp to those servers. This
implements a format that uses log variables that are compatble with TCP
mode frontends and replaces traditional HTTP values in the CLF format
to make them stand out. Instead of logging method and URI like this
"GET /example HTTP/1.1" it will log "TCP " and for a response code I
used "000" so it would be easy to separate from legitimate HTTP
traffic. Now your log servers that require a CLF format can see the
timings for TCP traffic as well as HTTP.
It is now possible to use "drop" keyword for "on" lines under a
log-profile section to specify that no log at all should be emitted for
the specified step (setting an empty format was not sufficient to do so
because only the log payload would be empty, not the log header, thus the
log would still be emitted).
It may be useful to selectively disable logging at specific steps for a
given log target (since the log profile may be set on log directives):
log-profile myprof
on request format "blabla" sd "custom sd"
on response drop
New testcase was added to reg-tests/log/log_profiles.vtc
With 7a21c3a ("MAJOR: log: implement proper postparsing for logformat
expressions") which finally made postparsing checks reliable, we started
to get report from users that couldn't start haproxy 3.0 with configs that
used to work in the past. The current situation is described in GH #2642.
While the checks are mostly relevant, it turns out there are not strictly
needed anymore from a technical point of view. Most of them were useful in
early logformat implementation to prevent runtime bugs due to the use of
an alias or fetch at runtime from an incompatible proxy. It's been a few
versions already that the code handling fetches and log aliases is robust
enough to support fetches/aliases used from the wrong context: all it
does is that the fetch/alias will silently fail if it's not available.
This can be proved by the fact that even if the postparsing checks were
partially broken in the past, it didn't cause runtime issues (at least
on recent haproxy versions).
Most of these checks can now be seen as configuration hints: when a check
triggers, it will indicate a configuration inconsistency in most cases,
but they are some corner cases where it is not possible to know at config
time if the conditions will be met for the alias/fetch to work properly..
so instead of failing with a hard error like we did so far, let's just be
more permissive and report our findings using "diag_warning": such
warnings are only emitted when haproxy is started with '-dD' cli option.
We also took this opportunity to improve messages clarity and make them
more precise (report the offending item instead of complaining about the
whole expression because of a single element).
With this patch, configs that used to start before 7a21c3a shouldn't
trigger hard errors anymore.
This may be backported in 3.0.
Fix the shebang of the python script to use /usr/bin/env, allowing to
call the script directly from a virtualenv with `./release-estimator.py`
without using the python3 install of the system.
The CHANGELOG URL which is parsed in the HTML now have a relative
scheme, which is incompatible with requests. This patch adds an https
scheme to the URL.
it might be useful to investigate logs of failed tests. to keep
artifacts small the following actions are taken
- only failed logs are kept
- logs retention is 6 days
pat_ref_set_elt() returns 0, if we are run out of memory or can't parse a new
map value. Any arror message emitted by pat_ref_set_elt() is saved in err
buffer, if its provided by caller. These error messages are cumulated during
the loop.
pat_ref_set() is used to update values in map, referred to the same given key.
If during the update pat_ref_set_elt() fails, let's retun 0 to caller
immediately. We have the same non-unique key and the same new value in each
loop. So it seems quite odd to cumulate the same error messages and print it in
CLI:
> add map @1 mytest.map <<
+ 1.0.1.11 TestA
+ 1.0.1.11 TESTA
+ 1.0.1.11 test_a
+
> set map mytest.map 1.0.1.11 15
unable to parse '15' unable to parse '15' unable to parse '15'.
cli_parse_set_map(), which calls pat_ref_set() to update map, will return only
one error message with this patch:
> set map mytest.map 1.0.1.11 15
unable to parse '15'.
hlua_set_map() and http_action_set_map() don't provide error buffer and will
just exit on the first error.
This should be backported in all stable versions.
memprintf() performs realloc and updates then the pointer to an output buffer,
where it has written the data. So free() is called on the previous buffer
address, if it was provided.
pat_ref_set_elt() uses memprintf() to write its error message as well as
pat_ref_set(). So, when we re-enter into the while loop the second time and
pat_ref_set_elt() has returned, the *err ptr (previous value of *merr) is
already freed by memprintf() from pat_ref_set_el().
'if (!found)' condition is false at this point, because we've found a node at
the first loop. So, the second memprintf(), in order to write error messages,
does again free(*err).
This should be backported in all stable versions.
When encoding HTX to HTTP/3 headers on the response path, a bunch of
ABORT_NOW() where used when buffer room was not enough. In most cases
this is safe as output buffer has just been allocated and so is empty at
the start of the function. However, with a header list longer than a
whole buffer, this would cause an unexpected crash.
Fix this by removing ABORT_NOW() statement with proper error return
path. For the moment, this would cause the whole connection to be close
rather than the stream only. This may be further improved in the future.
Also remove ABORT_NOW() when encoding frame length at the end of headers
or trailers encoding. Buffer room is sufficient as it was already
checked prior in the same function.
This should be backported up to 2.6. Special care should be handled
however as this code path has changed frequently :
* for 2.9 and older, the extra following statement must be inserted
prior each newly added goto statement :
h3c->err = H3_INTERNAL_ERROR;
* for 2.6, trailers support is not implemented. As such, related chunks
should just be ignored when backporting.
qcc_send_frames() can be called with an empty list and returns
immediately with an error code. This is convenience to be able to call
it in a while loop.
Remove the trace with "error" when this is the case and replacing it
with a less alarming "leaving on..." message. This should help debugging
when traces are active.
This commit fixes potential null ptr dereferences reported by coverity, see
more details about it in the issues #2676 and #2668.
'outline' ptr, which is initialized to NULL explicitly as a temporary buffer to
store split keywords may be in theory implicitly dereferenced in some corner
cases (which we haven't encountered yet with real world configurations) in
'if (!**args)'. parse_line() code, called before under some conditions
assigns: args[arg] = outline + outpos and outpos initial value is 0.
This patch is done mostly as a safeguard in order not to trigger
BUG_ON(fdtab[fd].owner != NULL) check, if listen() will fail on UNIX domain
socket.
In uxst_bind_listener(), the pretty same logic of closing socket on error path
was kept, as it was in tcp_bind_listener() before. The use of fd_delete() was
not generalized, when the support of UNIX sock_stream protocol was implemented.
So, let's remove fd from fdtab on failure, instead of closing it. Otherwise,
uxst_bind_listener(), which could be called in loop for each receiver, will
obtain the same fd via socket() for the next receiver. Then, it will bind it
again and it will try to re-insert it in fdtab.
This can be backported to all stable versions.
QUIC stream IDs are expressed as QUIC variable integer which cover the
range for 0 to 2^62 - 1. As such, it is forbidden to send an ID for
MAX_STREAMS flow-control frame which would allow to overcome this value.
This patch fixes MAX_STREAMS emission to ensure sent value is valid.
This also ensures that the peer cannot open a stream with an invalid ID
as this would cause a flow-control violation instead.
This must be backported up to 2.6.
This helps to optimize a bit load_cfg_in_mem() and fixes the potential null ptr
dereference in fread() call. If (read_bytes + bytes_to_read) equals to initial
chunk_size (zero), realloc is never called, *cfg_content keeps its NULL value.
So, let's assure that initial number of bytes to read
(read_bytes + bytes_to_read) is stricly positive, when we enter into loop at
the first time.
A recent fix broke the pipelined command on the master CLI, this
reg-tests implement a simple test that allow to check its right
behavior.
This could be backported as far as 2.6.
Since commit 3d93ecc ("BUG/MAJOR: cli: Restore non-interactive mode
behavior with pipelined commands") and commit 598c7f16 ("BUG/MEDIUM:
cli: Warn if pipelined commands are delimited by a \n"), the pipelined
command on the master CLI are either broken or emit warnings depending
on which version.
The reason is that mode applied on the master CLI are saved on the in
the current CLI session, and then reinserted for each pipelined command,
however, these commande were inserted as new lines.
For example:
"@1; expert-mode on; debug dev log foo; debug dev log bar"
Would be sent as:
"expert mode on\ndebug dev log foo"
"expert mode on\ndebug dev log bar"
This patch fixes the issue by using the new ci_insert() function which
inserts a string instead of a newline, and the command are now suffixed
by ';' upon insertion allowing a correct pipelined command chain.
This must be backported with the previous commit introducing ci_insert()
in every stable version.
This is broken since the 3.0 version, but it emits a warning in every
version below, because 598c7f164 was backported.