When a negative initial windows size is reported, we're going to close
the connection, so it's important to report a trace to explain why!
This should be backported at least to 3.1 and possibly 3.0 (adapting the
context since there's no glitches there).
The COUNT_IF() macro was initially meant to return true/false to be used
in if() conditions but had an extra do { } while(0) that prevents it from
doing so. Let's get rid of the do { } while(0) before the code generalizes
to too many places. There's no impact on existing code, but may have to be
backported if future fixes rely on it.
Define a new function quic_register_build_options(). Its purpose is to
register a build options string for QUIC features which is reported when
using haproxy -vv.
This will allow to easily determine if connection socket-owner mode and
GSO are supported or not. Here is the new filtered output :
$ ./haproxy -vv|grep '^QUIC:'
QUIC: connection socket-owner mode support : yes
QUIC: GSO emission support : yes
Two features are tested on startup via quic_test_socketopts() :
connection socket-owner mode support and GSO. Extract both test in their
separated functions called by quic_test_socketopts().
This patch will allow to reuse easily QUIC features detection for build
options report via haproxy -vv.
quic_test_socketopts() function is used to detect system support for
QUIC network stack. Previously, it relies on an already bound listener
instance, notably to ensure that two UDP sockets can be bound on the
same source address.
Improve quic_test_socketopts() to run without any listener argument. It
now automatically instantiates and manipulates two dummy sockets FDs to
check for multi-bind support. This brings two advantages :
* the function is now called via an initcall
* it will easily be reusable to implement build option description
This commit is the counterpart of the previous one for FCGI mux. It
replaces objt_server() by unsafe __objt_server(), as conn target is
guarantee to point to a valid server instance, which can then be used as
_srv_add_idle() argument.
This commit is the counterpart of the previous one for SPOP mux. It
replaces objt_server() by unsafe __objt_server(), as conn target is
guarantee to point to a valid server instance, which can then be used as
_srv_add_idle() argument.
This should fix coverity report from github issue #2811.
This commit is the counterpart of the previous one for H2 mux. It
replaces objt_server() by unsafe __objt_server(), as conn target is
guarantee to point to a valid server instance, which can then be used as
_srv_add_idle() argument.
When dealing with a backend connection, H1 mux IO handler must reinsert
it in its idle list pool if it was extracted from it at the beginning.
This is the case if conn_in_list is true.
On reinsert, idle list pool is retrieved via the server instance
accessible from <conn.target>. Replace objt_server usage with
__objt_server as an idle connection is always attached to a server. This
ensures that there is no issue when using _srv_add_idle() then.
This should fix coverity report from github issue #2810.
in 50322dff ("MEDIUM: server: add init-state"), some examples on how to
use init-state server keyword were added alongside with the keyword
documentation.
However, as reported by Nick Ramirez, there was an error because the
example that stated that haproxy will pass the traffic to the server after
3 successful health checks used the "init-state down" instead of the
"init-state fully-down". Thus the behavior wouldn't match what the
comment said (only 1 successful health check was required).
Here we fix the example in itself to match with the comment. Also the
following example ("# or") was also affected, but it is kind of
redundant as the main purpose of the examples are to illustrate the
feature in itself and not how to use server-template directive, so we
remove it.
This should be backported in 3.1 with 50322dff
Update comment and condition. nb_oldpids it's not a pointer, but a signed int,
which keeps the max number of elements in oldpids array. So, it's a good
practice to check, if it's strictly positive here.
If master process can't open a pidfile, there is no sense to send SIGTTIN to
oldpids, as it will exit. So, old workers will terminate as well. It's better
to send the last alert to the log about unrecoverable error, because master is
already in its polling loop.
For the standalone mode we should keep the previous logic in this case: send
SIGTTIN to old process and unbind listeners for the new one. So, it's better
to put this error path in main(), as it's done when other configuration settings
can't be applied.
This patch should be backported only in 3.1.
When a new master process is launched like below:
./haproxy -W -D -p ha.pid -sf $(cat ha.pid)...
The old master process and its workers do not stop. Since the master-worker
refactoring, the code, which sends USR1/TERM to old pids from -sf, is called
only for the standalone mode. In master-worker mode we should receive the READY
message from the newly forked worker at first, in order to be able to terminate
the previous master.
So, to fix this, let's terminate the previous master in _send_status(), where
we parse the READY message from the newly forked worker. And let's continue to
use oldpids array, as it was in 3.0, in order to stop the workers, launched
before the reload.
This patch should be backported only in 3.1.
After reload, previously launched programs are stopped explicitly in
mworker_ext_launch_all(). So, there is no longer need to save their PIDs in
oldpids array before the master reexec().
This also prepares the fix of "-D -W -sf/-st" modes, as we will need to
loop over this array in the master process context, in order to stop the
previous master, when the new one is ready.
This patch should be backported only in 3.1.
These options are now deprectated, but the proxy capabilities are not
properly checked during the configuration parsing leading to always ignore
these options. This is now fixed by checking the frontend capability for
"accept-invalid-http-request" option and the backend capability for
"accept-invalid-http-response" option.
In addition, the messages about the deprecation of these options are now
emitted with ha_warning() instead of ha_alert() because they are only
warnings and not errors.
This patch should fix the issue #2806. It must be backported to 3.1.
Recently, an issue was found with QUIC zero-copy forwarding on 3.0
version. A desynchronization could occur internally in QCS Tx bytes
counters which would cause a BUG_ON() crash on qcs_destroy() when the
stream is detached.
It was silently fixed in version 3.1 by the following patch. As it was
considered as an optimization, it was not scheduled yet for backport.
6697e87ae5
MINOR: mux-quic: Don't send an emtpy H3 DATA frame during zero-copy forwarding
This mistake has been caused due to some counter-intuitive manipulation
in QUIC zero-copy implementation. Try to streamline this in QUIC MUX
done_ff callback and its application protocol counterpart. Especially
for values exchanged between MUX and application on one side, and MUX
and stconn layer as done_fastfwd return value.
First, application done_ff callback now returns the length of the wholly
encoded frame. For HTTP/3, it means header length + payload length h3
frame. This value can then be reused as qcc_send_stream() argument to
increase QCS Tx soft offset.
As previously, special care has been taken to ensure that QUIC MUX
done_ff only return the transferred data bytes. Thus, any extra offset
for HTTP/3 header is properly excluded. This is mandatory for stconn
layer to consider the transfer has completed.
Secondly, remove duplicated code in application done_ff to reset iobuf
info. This is now factorize in QUIC MUX done_ff itself.
This patch is related to github issue #2678.
Since commit 1cc851d9f2 ("MEDIUM: mux-h2: start to update stream when
sending WU") we started storing stream offsets in the h2s struct. These
offsets are updated at a few points, where it's safe to write to the
stream, and in h2c_send_strm_wu(), where the h2s->h2c was not performed.
Due to this, nothing protects the h2s from being updated when sending a
WU for a closed stream, which might only happen when acknowledging a
frame after resetting that stream, which is quite unlikely. In any case
if this happens, it will crash as in issue #2793 since the closed streams
are purposely read-only to catch such bugs.
The fix is trivial, just check h2s->h2c before deciding to update the
stream.
Thanks to @Wahnes for reporting this, and Christopher for spotting the
cause. This needs to be backported to 3.1 only.
Better late than never, commit 1f73d35 ("MINOR: stktable: implement
"recv-only" table option") implemented stktable flags and initial
definitions, but it lacks some comments plus the flag is stored as
16bits but the SKT_FL_ definition width allows for only 8bits so
it is a bit confusing, let's fix that
Thanks to previous commit stktable struct now have a "flags" struct member
Let's take this opportunity to remove the isolated "nopurge" attribute in
stktable struct and rely on a flag named STK_FL_NOPURGE instead.
This helps to better organize stktable struct members.
When "recv-only" keyword is added on a stick table declaration (in peers
or proxy section), haproxy considers that the table is only used for
data retrieval from a remote location and not used to perform local
updates. As such, it enables the retrieval of local-only values such
as conn_cur that are ignored by default. This can be useful in some
contexts where we want to know about local-values such are conn_cur
from a remote peer.
To do this, add stktable struct flags which default to NONE and enable
the RECV_ONLY flag on the table then "recv-only" keyword is found in the
table declaration. Then, when in peer_treat_updatemsg(), when handling
table updates, don't ignore data updates for local-only values if the flag
is set.
This patch is similar to the previous one, but for GSO support. Remove
alert level message to a diag report only visible with argument -dD.
This must be backported up to 3.1.
QUIC relies on several advanced network API features from the kernel to
perform optimally. Checks are performed during startup to ensure that
these features are supported. A fallback is automatically performed for
every incompatible feature.
Besides the automatic fallback mechanism, a message is also reported to
the user at the same time. Previously, alert level was used, but it is
incorrect as it is reserved for unrecoverable errors which should
prevent haproxy to start. Warning level could be used, but this can
annoy users running with zero-warning mode.
This patch removes the alert message when 'socket-owner connection' mode
cannot be activated. Convert the message to a diag level. This allows
users to start without forcing configuration modification to hide a
warning. Besides, several feature fallback such as the polling mechanism
does not emit any warning either, so it's better to adopt a similar
behavior for QUIC features.
This must be backported up to 2.8.
TASK_F_USR1 is used by MUX tasklet when emission has been interrupted
due to pacing. When the tasklet runs again, only qcc_purge_sending()
will be called as an optimization.
Pacing status is only removed via qcc_wakeup(). Until then, TASK_F_USR1
is not cleared. This causes an issue after emission with pacing
completion if the MUX tasklet is woken up for a recv subscribe, as
qcc_wakeup() is not used by quic-conn layer. The tasklet will
incorrectly run only for pacing emission, without handling reception
process. Worst, a crash will occur if QCC tx frames list is empty, due
to a BUG_ON() in qcc_purge_sending().
Recv subscribe is only used for 0-RTT, when QUIC MUX is instantiated
before quic-conn handshake completion. Thus, this bug can only be
reproduced with 0-rtt. Furthermore, MUX must already have emitted at
least a few response bytes with pacing, before QUIC handshake
completion. It cannot easily be reproduced, at least with CLI clients
where the handshake is always already completed before MUX exchanges.
To fix this, remove TASK_F_USR1 when pacing emission has been completed.
At least, this prevents BUG_ON() on qcc_purge_sending() as it won't be
called with an empty QCC Tx frame list anymore. However, this bug has
revealed that MUX tasklet architecture is not suitable when both
handling reception and emission part. This will be improved in a future
serie of patches.
This should fix github issue #2796.
This must be backported up to 3.1.
In 3.1-dev10, commit 8dd4efe42f ("MAJOR: mworker: move master-worker
fork in init()") made the fork_poller() code unconditional, while it
is only desirable for processes that have been forked from a parent
(standalone daemon mode) or from a master (master-worker mode). The
call can be expensive in some cases as it will create a new poller,
scan and try to migrate to it all existing FDs till the highest known
one. With very high numbers of FDs, this can take several seconds to
start.
This should be backported to 3.1.
Commit 8dd4efe42f ("MAJOR: mworker: move master-worker fork in init()")
introduced some sensitive changes to the startup code (which was
expected), and one sensitive change is that the second call to setsid()
was accidentally made unconditional. As such it even applies to foreground
processes, resulting in foreground processes being detached from the
terminal and no longer responding to Ctrl-C nor Ctrl-Z. An example of
this simply consists in start haproxy -db under sudo. Then a new shell
is required to stop it.
This patch removes this second setsid(), as it is already done in
apply_daemon_mode().
This must be backported to 3.1.
This patch fixes two wrong calls to bbr_inflight().
bbr_target_inflight() aim is to compute the number of bytes BBR has to put on
the network as bytes in flight (sent but not acked bytes). It must call
bbr_inflight() with the current window gain value (in place of a wrong fixed 100
gain value here, in percents).
bbr_is_time_to_cruise() also called bbr_inflight() with a wrong gain value
as parameter due to a confusion between the value mentioned by the RFC (1
meaning 100% of the current window) and our implementation which needs value in
percents (so 100 in place of 1 here). Note that bbr_is_time_to_cruise() aim is to
make BBR the decision to leave the probing_bw down state. The bug had as side
effect to make BBR stay in this state during too long periods of time during
which the bottleneck bandwidth is decreasing, leading to big oscillations
between the mininum and maximum bottleneck bandwidth estimations.
This patch must be backported to 3.1 where BBR was first implemented.
Now when tasklets are woken up via tasklet_wakeup(), tasklet_wakeup_on()
or tasklet_wakeup_after(), either the optional wakeup flags will be used,
or TASK_WOKEN_OTHER will be used.
This allows tasklet handlers waking up for any given cause to notice
whether or not they were also woken for another reason. For example, a
mux handler could skip heavy parts when seeing that TASK_WOKEN_OTHER is
absent, proving that no standard tasklet_wakeup() was done, for example
in response to a subscribe().
The benefit of the TASK_WOKEN_* flags is that they're purged during the
wakeup, and that they're easy to check for using TASK_WOKEN_ANY.
TASK_F_UEVT1 and TASK_F_UEVT2 are also usable for private use (e.g. wakeup
from a stream to a connection inside a mux).
Probably that in the future, code dealing with subscribe events should
start to place TASK_WOKEN_IO like is done for upper layers.
Pidfile should be created at the latest initialization stage, when we are
sure, that process is able to start successfully, otherwise PID value, written
in this file is no longer valid.
So, for the standalone mode, let's move the block, which opens the pidfile and
let's put it just before applying "chroot". In master-worker mode, master
doesn't perform chroot. So it creates the pidfile, only when the "READY"
message from the newly forked worker is received.
This should be backported only in 3.1
After master-worker mode refactoring, global.pidfile is only used in
handle_pidfile(), which opens the provided file and writes the PID into it. So,
it's more appropriate to perform the close(pidfd) and ha_free(&global.pidfile)
also in this function.
This commit prepares the fix of the pidfile creation, as it's created now very
early, when we are not sure, that process has successfully started. In
master-worker mode handle_pidfile() can be called in the master process context.
So, let's make it accessible from other compilation units via global.h.
This should be backported only in 3.1.
When haproxy is launched in a background and in a subshell (see example below),
according to POSIX standard (2.11. Signals and Error Handling), it inherits
from the subshell SIG_IGN signal handler for SIGINT and SIGQUIT.
$ (./haproxy -f env4.cfg &)
So, when haproxy is lanched like this, it doesn't stop upon receiving
the SIGINT. This can be a root cause of some unexpected timeouts, when haproxy
is started under VTest, as VTest sends to the process SIGINT in order to
terminate it. To fix this, let's explicitly register the default signal
handler for the SIGINT in signal_init() initcall.
This should be backported in all stable versions.
In GH #2804, @Bbulatov reported that the result of hlua_stream_ctx_get()
was used and de-referenced without checking if it's NULL in
hlua_filter_delete() while other functions used to check for NULL before
de-referencing it.
In fact hlua_stream_ctx_get() can only return NULL if
hlua_stream_ctx_prepare() failed or was not called on the current stream.
Now because of the filter's API, since hlua_filter_delete() is mapped as
detach method and hlua_filter_new() as attach method, and since
hlua_filter_new() is responsible for calling hlua_stream_ctx_prepare(),
there's no reason hlua_filter_delete() should be called if
hlua_filter_new() failed or wasn't called. Thus we can assume that hlua
can never be NULL in hlua_filter_delete(), so we add a BUG_ON() to ensure
it is always the case and remove the ambiguity.
As reported by @Bbulatov on GH #2804, fe is found at multiple places in
listener_release(): in some places it is first checked against NULL before
being de-referenced while in some other places it is not, which is
ambiguous and could hide a bug.
In practise, fe cannot be NULL for now, but it might not be the case in
the future as we want to keep the possibility to run isolated listeners
(that is, without proxy attached).
We've already ensured this was the case with a57786e ("BUG/MINOR:
listener: null pointer dereference suspected by coverity"), but
this promise was recently broken by 65ae134 ("BUG/MINOR: listener: Wake
proxy's mngmt task up if necessary on session release").
Let's fix that by conditionning the block with an "else if" statement
instead of a regular "else".
No need for backport except if multi-connection protocols (ie: FTP) were
to be backported as well.
This is to please a non identified compilers which complains about an hypothetic
<time_ns> variable which would be not initialized even if this is the case only
when it is not used.
This build issue arrived with this commit:
BUG/MINOR: improve BBR throughput on very fast links
Should be backported to 3.1 with this previous commit.
When the response status line is formatted before sending it to the client,
if there is no reason set, HAProxy should add one that matches the status
code, as stated in the configuration manual. However it is not performed.
It is possible to hit this bug when the response comes from a H2 server,
because there is no reason field in HTTP/2 and above.
This patch should fix the issue #2798. It should be backported to all stable
versions.
It is possible to loose the request after several L7 retries, leading to
crashes, because the request channel flag stating some data were sent is not
properly reset.
When a L7 retry is performed, some flags on different entities must be reset
to be sure a new connection will be properly retried, just like it was the
first one, mainly because there was no connection establishment failure. One
of them, on the request channel, is not reset. The flag stating some data
were already sent. It is annoying because this flag is used during the
connection establishment to know if an error is triggered at the connection
level or at the data level. In the last case, the error must be handled by
the HTTP response analyzer, to eventually perform another L7 retry.
Because CF_WROTE_DATA flag is not removed when a L7 retry is performed, a
subsequent connection establishment error may be handled as a L7 error while
in fact the request was never sent. It also means the request was never
saved in the buffer used to performed L7 retries. Thus, on the next L7
retires, the request is just lost. This forecefully leads to a bunch of
undefined behavior. One of them is a crash, when the request is used to
perform the load-balancing.
This patch should fix issue #2793. It must be backported to all stable
versions.
On snd_buf completion, QUIC MUX tasklet is scheduled if newly data has
been transferred from the stream layer. Thanks to qcc_wakeup(), pacing
status is removed from tasklet, which ensure next emission will reset Tx
frames and use the new data.
Tasklet is not scheduled if MUX is already subscribed on send due to a
previous blocking condition. This is an optimization to prevent an
unneeded IO handler execution. However, this causes a bug if an emission
is currently delayed due to pacing. As pacing status is not removed on
snd_buf, next emission process will continue emission with older data
without refreshing the newly transferred one.
This causes a transfer freeze. Unless there is some activity on the
connection, the transfer will be eventually aborted due to idle timeout.
To fix this, remove TASK_F_USR1 if tasklet wakeup is not called due to
send subscription. Note that this code is also duplicated in done_ff for
zero-copy transfer.
This must be backported up to 3.1.
In _event_hdl_publish(), when we prepare the asynchronous event and no
<data> was provided (set to NULL), we forgot to initialize the _data
event_hdl_async_event struct member to NULL, which leads to uninitialized
reads in event_hdl_async_free_event() when the event is freed:
==1002331== Conditional jump or move depends on uninitialised value(s)
==1002331== at 0x35D9D1: event_hdl_async_free_event (event_hdl.c:224)
==1002331== by 0x1CC8EC: hlua_event_runner (hlua.c:9917)
==1002331== by 0x39AD3F: run_tasks_from_lists (task.c:641)
==1002331== by 0x39B7B4: process_runnable_tasks (task.c:883)
==1002331== by 0x314B48: run_poll_loop (haproxy.c:2976)
==1002331== by 0x315218: run_thread_poll_loop (haproxy.c:3190)
==1002331== by 0x18061D: main (haproxy.c:3747)
The bug severity was set to MEDIUM because of its nature, and it's best
if this patch can be backported up to 2.8. But in practise it can only be
triggered with events that don't provide optional data: since PAT_REF
events are the first native events making use of this feature, this bug
shouldn't be an issue before f72a66e ("MINOR: pattern: publish event_hdl
events on pat_ref updates")
Patref:set(key, val[, force]) takes optional "force" parameter (defaults
to false) to force the entry to be created if it doesn't already exist
To retrieve the value, lua_tointeger() was used in place of
lua_toboolean(), and because of that force is not enabled if "true"
is passed as parameter (only numbers were recognized) despite the
documentation mentioning that "force" is a boolean.
To fix the issue, we replace lua_tointeger by lua_toboolean.
Also, the doc was updated to rename "bool" to "boolean" for the "force"
parameter to stay consistent with historical naming in the file.
No backport needed unless 9ee37de5c ("MINOR: hlua_fcn: add Patref:set()")
is.
Patref:set() can achieve the same thing as core.set_map()
Patref:add() can achieve the same thing as core.add_acl()
Patref:del() can achieve the same thing as core.del_map() and
core.del_acl()
As a bonus, Patref:{set,add} are more efficient than their core
legacy equivalent, because they don't require systematic pattern
reference lookup for each individual operation.
Let's mention that in the doc to encourage Patref methods adoption.
Just like we did for server events, in this patch we expose the PAT_REF
event family (see "MINOR: event_hdl: add PAT_REF events") in Lua.
Unlike server events, Patref events don't provide additional event data,
and the registration can only take place from a Patref object (ie: not
globally).
Thanks to this commit it now becomes possible to trigger actions when
updates are performed on a map (or acl list) being monitor, without
the need to loop or use inefficient workarounds.
There is no cli equivalent for this one. It is similar to Patref:add()
excepts thay it takes a table as parameter (for acl: table of keys, for
maps: table of keys:values). The goal is to add multiple entries at once
to limit locking time to the strict minimum. It is recommended to use this
one over Patref:add() when adding multiple entries at once.
Just like "set map" on the cli, the Patref:set() method (only relevant
for maps) can be used to modify an existing entry's value in the pattern
reference pointed to by the Lua Patref object. Lookup is performed on the
key. The update will target the live pattern reference version, unless
Patref:prepare() is ongoing.
Just like "del map" and "del acl" on the cli, the Patref:del() method can
be used to delete an existing entry in the pattern reference pointed to
by the Lua Patref object. The update will target the live pattern
reference version, unless Patref:prepare() is ongoing.
Just like "add map" and "add acl" on the cli, the Patref:add() method can
be used to add a new entry to the pattern reference pointed to by the
Lua Patref object. The update will target the live pattern reference
version, unless Patref:prepare() is ongoing.
If Patref:commit() was used and the new version (generation) isn't going
to be committed, calling Patref:giveup() will allow allocated resources
to be freed and reused. It is a good habit to call this if commit()
isn't called after a prepare().
It is a special Lua Patref method: it bypasses the commit/prepare logic
and purges the whole pattern reference items pointed to by Patref Lua
object (all versions, not just the current one). It doesn't have a cli
equivalent: it leverages pat_ref_purge_range().
Just like the "prepare map" or "prepare acl" on the cli, but for Lua:
it leverages the pattern API to create a subset (ie: a new generation id)
that will automatically be used as target for following Patref operations
(add/set/del...) until the "commit" method is invoked to atomically push
the pending updates.