For a while there has been the constraint of having to run as root for
transparent proxying, and we're starting to see some cases where QUIC is
not running in socket-per-connection mode due to the missing capability
that would be needed to bind a privileged port. It's not realistic to
ask all QUIC users on port 443 to run as root, so instead let's provide
a basic support for capabilities at least on linux. The ones currently
supported are cap_net_raw, cap_net_admin and cap_net_bind_service. The
mechanism was made OS-specific with a dedicated file because it really
is. It can be easily refined later for other OSes if needed.
A new keyword "setcaps" is added to the global section, to enumerate the
capabilities that must be kept when switching from root to non-root. This
is ignored in other situations though. HAProxy has to be built with
USE_LINUX_CAP=1 for this to be supported, which is enabled by default
for linux-glibc, linux-glibc-legacy and linux-musl.
A good way to test this is to start haproxy with such a config:
global
uid 1000
setcap cap_net_bind_service
frontend test
mode http
timeout client 3s
bind quic4@:443 ssl crt rsa+dh2048.pem allow-0rtt
and run it under "sudo strace -e trace=bind,setuid", then connecting
there from an H3 client. The bind() syscall must succeed despite the
user id having been switched.
A new protocol named "reverse_connect" is created. This will be used to
instantiate connections that are opened by a reverse bind.
For the moment, only a minimal set of callbacks are defined with no real
work. This will be extended along the next patches.
make "range" which was introduced with 06d34d4 ("DEV: makefile: add a
new "range" target to iteratively build all commits") does not work with
POSIX shells (namely: bourne shell), and will fail with this kind of
errors:
|/bin/sh: 6: Syntax error: "(" unexpected (expecting ")")
|make: *** [Makefile:1226: range] Error 2
This is because arrays and arithmetic expressions which are used for the
"range" target are not supported by sh (unlike bash and other "modern"
interpreters).
However the make "all" target already complies with POSIX, so in this
commit we try to make "range" target POSIX compliant to ensure that the
makefile works as expected on systems where make uses /bin/sh as default
intepreter and where /bin/sh points to POSIX shell.
This will iterate over all commits in the range passed in RANGE, or all
those from master to RANGE if no ".." exists in RANGE, and run "make all"
with the exact same variables. This aims to ease the verification that
no build failure exists inside a series. In case of error, it prints the
faulty commit and stops there with the tree checked out. Example:
$ make-disctcc range RANGE=HEAD
Found 14 commit(s) in range master..HEAD.
Current branch is 20230809-plock+tbl+peers-4
Starting to building now...
[ 1/14 ] 392922bc5 #############################
(...)
Done! 14 commit(s) built successfully for RANGE master..HEAD
Maybe in the future it will automatically use HEAD as a default for RANGE
depending on the feedback.
It's not listed in the help target so as not to encourage users to try it
as it can very quickly become confusing due to the checkouts.
Move the TX part of the code to quic_tx.c.
Add quic_tx-t.h and quic_tx.h headers for this TX part code.
The definition of quic_tx_packet struct has been move from quic_conn-t.h to
quic_tx-t.h.
Same thing for the TX part:
Move the RX part of the code to quic_rx.c.
Add quic_rx-t.h and quic_rx.h headers for this TX part code.
The definition of quic_rx_packet struct has been move from quic_conn-t.h to
quic_rx-t.h.
Move the code which directly calls the functions of the OpenSSL QUIC API into
quic_ssl.c new C file.
Some code have been extracted from qc_conn_finalize() to implement only
the QUIC TLS part (see quic_tls_finalize()) into quic_tls.c.
qc_conn_finalize() has also been exported to be used from this new quic_ssl.c
C module.
To accelerate the compilation of quic_conn.c file, export the code in relation
with the traces from quic_conn.c to quic_trace.c.
Also add some headers (quic_trace-t.h and quic_trace.h).
-pthread is normally the right way to enable threads, it involves -lpthread
at the end of the arguments, and also enables -D_REENTRANT=1. We normally
don't care about the subtle difference, but building with a static openssl
library that has threads enabled breaks because -lpthread is placed before
the SSL_LDFLAGS and openssl doesn't find pthread_atfork().
Let's change the flag to -pthread once for all, that's something we've
considered over the last decade without having a good reason to do it
since it didn't bring any value. Now at least it fixes a build issues,
this is a good reason. This doesn't need to be backported since it is
one of the consequences of the new more flexible build options in 2.8.
Building with an install of wolfssl and openssl side-by-side breaks
because for wolfssl we need the two include levels and since some
names are in common, this results in some files being found in the
original openssl tree. Let's swap the two include paths so that all
that is related to wolfssl is found there first when needed.
No backport is needed.
Due to the test on the target introduced by commit 9577a152b ("BUILD:
makefile: do not erase build options for some build options"), if a
tool (e.g. halog) is build first before haproxy after a clean or a
fresh source extraction, the .build_opts file does not exist and
"make" complains since there's no such target. Make sure to define
the empty target for all "else" blocks there. No backport is needed.
One painfully annoying thing with the build options change detection
is that they get rebuild for about everything except when the build
target is exactly "reg-tests". But in practice every time reg tests
are run we end up having to experience a full rebuild because the
reg-tests script runs "make version" which is sufficient to refresh
the file.
There are two issues here. The first one is that we ought to skip all
targets that do not make use of the build options. This includes all
the tools such as "flags" for example, or utility targets like "tags",
"help" or "version". The second issue is that with most of these extra
targets we do not set the TARGET variable, and that one is used when
creating the build_opts file, so let's preserve the file when TARGET
is not set.
Now it's possible to re-run a make after a make reg-tests without having
to rebuild the whole project.
"make help" ends with a list of enabled/disabled features for TARGET '',
which makes no sense. Let's only display enabled/disabled features when
a target is set. It also removes visual pollution when users seek help.
The issue was introduced with commit c108f37c2 ("BUILD: makefile:
rework 51D to split v3/v4"), and is also related to commit b16d9b58
("BUILD: makefile: never force -latomic, set USE_LIBATOMIC instead")
where USE_ATOMIC has been replaced.
LIST_DELETE doesn't affect the previous pointers of the stored element.
This can sometimes hide bugs when such a pointer is reused by accident
in a LIST_NEXT() or equivalent after having been detached for example, or
ia another LIST_DELETE is performed again, something that LIST_DEL_INIT()
is immune to. By compiling with -DDEBUG_LIST, we'll replace a freshly
detached list element with two invalid pointers that will cause a crash
in case of accidental misuse. It's not enabled by default.
This algorithm does nothing except initializing the congestion control window
to a fixed value. Very smart!
Modify the QUIC congestion control configuration parser to support this new
algorithm. The congestion control algorithm must be set as follows:
quic-cc-algo nocc-<cc window size(KB))
For instance if "nocc-15" is provided as quic-cc-algo keyword value, this
will set a fixed window of 15KB.
Minor build update to still both support the v2 and v3 api from
the 3.1.7 release which supports a cache but would need a shift
in the HAProxy build not necessary at the moment.
In the second half of the year and for the next major HAProxy release
branch, v2 could be dropped altogether thus the next HAProxy 2.9
major release will contain more changes towards the v3 support
and reminder for the v2 EOL.
To be backported.
PCRE relies on pcre-config binary tool to provide includes/libs paths.
This may generate standard entries such as '/usr/lib' which will
override more specific ones if present before them on the linking step.
This situation was encountered when building with both QuicTLS and PCRE.
This generates a linking error as the default SSL libraries were used
for linking even with correct SSL flags pointing to QuicTLS dirs.
To fix this issue, USE_PCRE and its affiliated options have been moved
at the end of 'use_opts' variable. Indeed, related CFLAGS/LDFLAGS are
concatenated in their order of appearance through the macro
collect_opts_flags (see include/make/options.mk). PCRE in the last
position ensures it won't override specific entries declared before.
Introducing http_ext class for http extension related work that
doesn't fit into existing http classes.
HTTP extension "forwarded", introduced with 7239 RFC is now supported
by haproxy.
The option supports various modes from simple to complex usages involving
custom sample expressions.
Examples :
# Those servers want the ip address and protocol of the client request
# Resulting header would look like this:
# forwarded: proto=http;for=127.0.0.1
backend www_default
mode http
option forwarded
#equivalent to: option forwarded proto for
# Those servers want the requested host and hashed client ip address
# as well as client source port (you should use seed for xxh32 if ensuring
# ip privacy is a concern)
# Resulting header would look like this:
# forwarded: host="haproxy.org";for="_000000007F2F367E:60138"
backend www_host
mode http
option forwarded host for-expr src,xxh32,hex for_port
# Those servers want custom data in host, for and by parameters
# Resulting header would look like this:
# forwarded: host="host.com";by=_haproxy;for="[::1]:10"
backend www_custom
mode http
option forwarded host-expr str(host.com) by-expr str(_haproxy) for for_port-expr int(10)
# Those servers want random 'for' obfuscated identifiers for request
# tracing purposes while protecting sensitive IP information
# Resulting header would look like this:
# forwarded: for=_000000002B1F4D63
backend www_for_hide
mode http
option forwarded for-expr rand,hex
By default (no argument provided), forwarded option will try to mimic
x-forward-for common setups (source client ip address + source protocol)
The option is not available for frontends.
no option forwarded is supported.
More info about 7239 RFC here: https://www.rfc-editor.org/rfc/rfc7239.html
More info about the feature in doc/configuration.txt
This should address feature request GH #575
Depends on:
- "MINOR: http_htx: add http_append_header() to append value to header"
- "MINOR: sample: add ARGC_OPT"
- "MINOR: proxy: introduce http only options"
Thanks to the generic naming of the build options, it's now relatively
easy to enumerate all _CFLAGS and _LDFLAGS for defined USE_* options.
That was added to the first line of 'make opts', but is only listed for
enabled options, non-empty variables or cmd-line defined variables.
By creating USE_SSL and enabling it when USE_OPENSSL is set, we can
get rid of the special case that was made with it regarding cflags
collect and when resetting options. The option doesn't need to be
manually set, though in the future it might prove useful if other
non-openssl API are supported.
It's getting complicated to configure includes and lib dirs for
OpenSSL API variants such as WolfSSL, because some settings are
common and others are specific but carry a prefix that doesn't
match the USE_* rule scheme.
This patch simplifies everything by considering that all SSL libs
will use SSL_INC, SSL_LIB, SSL_CFLAGS and SSL_LDFLAGS. That's much
more convenient. This works thanks to the settings collector which
explicitly checks the SSL_* settings. When USE_OPENSSL_WOLFSSL is
set, then USE_OPENSSL is implied, so that there's no need to
duplicate maintenance effort.
In order to simplify maintenance and long-term evolutions, now the
feature remains enabled by setting USE_51DEGREES=1 and the version
is set in 51DEGREES_VER (3 or 4 are supported only). The default
version remains 3. All 51DEGREES flags are shared between both
versions and only use the "51DEGREES_" prefix.
The related CFLAGS and LDFLAGS can now be overridden using
51DEGREES_CFLAGS and 51DEGREES_LDFLAGS, both of which are automatically
collected into the respective OPTIONS_*. The USE_51DEGREES_V4 option is
now removed, and the doc was updated.
The CFLAGS and LDFLAGS appended by USE_PCRE/USE_PCRE2 can now be
overridden using PCRE_CFLAGS/PCRE2_CFLAGS and PCRE_LDFLAGS/PCRE2_LDFLAGS.
Its worth noting that PCRE2_LDFLAGS did already exist and was preset from
the pkgconfig output then complemented with -lpcre2-posix, and only then
the -L and optional -Wl,-Bstatic were appended when adding them to the
resulting global LDFLAGS. A search on the net did not reveal any use of
PCRE2_LDFLAGS in any public build scripts, and for consistency sake it's
important to make sure that we can now finally override the -L settings
like we're able to do with every other build option. Thus the meaning of
this variable changed to include all the related ldflags (-L and -Wl).
These flags are now automatically collected into OPTIONS_*.
The CFLAGS and LDFLAGS appended by USE_LUA can now be overridden using
LUA_CFLAGS and LUA_LDFLAGS. Note that if these flags are forced, they
have to contain the optional -DHLUA_PREPEND_PATH= since this is added
to CFLAGS.
The CFLAGS appended by USE_ENGINE can now be overridden using
ENGINE_CFLAGS. These would have been better located inside the
OPENSSL stuff but it's a bit too late now.
There are multiple options for 51DEGREES, v3/v4, threading or not,
pattern/trie for v3, vhash for v4, use of libatomic, etc. While the
current rules deal with all of that correctly, it's too difficult to
focus on one version because the two are interleaved for every single
option. Let's just split them into two independent blocks. This removes
some if/endif, and makes the lecture much more straightforward.
Some conditional blocks have become out of control over time and are
totally unreadble. It took 15 minutes to figure what "endif" matched
what "if" in the PCRE one for example, and DA and 51D use multiple
levels as well that are not easy to sort out.
Let's reindent the whole thing. Most places that were already indented
used 2 spaces per level, so here we're keeping that principle. It was
just not done on the two last ones that are used to define some rules
because we don't want spaces before rule names. A few had the opening
condition indicated on the endif line.
It would be desirable that over time this more maintainable layout is
preserved.
The PCRE/PCRE2 CFLAGS forcefully add -DUSE_PCRE or -DUSE_PCRE2 because
we want that USE_STATIC_PCRE or USE_PCRE_JIT implicitly enables them.
However, doing it this way is incorrect because the option is not visible
in BUILD_FEATURES, and for example, some regtests depending on such
features (such as map_redirect.vtc) would be skipped if only the static
or jit versions are enabled.
The correct way to do this is to always set USE_PCRE feature for such
variants instead of adding the define.
This could almost be backported but would require to backport other
makefile patches and likely only has effects on the reg-tests at the
moment, so it's probably not worth the hassle.
Lua and 51d make use of -lm, which would be better served by having its
own option than being passed in the LDFLAGS. It also simplifies linking
against a static version of libm. The option uses its own LDFLAGS which
are automatically collected into OPTIONS_LDFLAGS.
Two places, 51Dv4 and AIX7.2, used to forcefully add -latomic to the
ldflags (and via different variables). This must not be done because
it depends on compiler, arch, etc. USE_LIBATOMIC=implicit is much
better: it allows the user to forcefully disable it if undesired.
The LIBATOMIC_LDFLAGS are set to -latomic and automatically added
to OPTIONS_LDFLAGS.
It will make this dependency appear in haproxy -vv but that's not
and issue and it may even sometimes help when troubleshooting.
The HLUA_PREPEND_PATH and HLUA_PREPEND_CPATH settings were only applied
when LUA_LIB_NAME was empty, otherwise they were silently ignored. Let's
take them out of that conditional block as this makes no sense to enforce
such a restriction (the main reason in fact is that this whole block is
unreadable).
Also take this opportunity to unfold the last two imbricated tests of
LUA_LIB_NAME and put comments around certain blocks to know what "endif"
matches what "if".
While LUA_INC is sometimes set in the makefile (only when LUA_LIB_NAME
is not set), LUA_LIB is never pre-initialized and faces the risk of
being accidently inherited from the environment. Let's make sure both
are properly reset first when not explicitly set. For this we always
set LUA_INC based on the autodetection if it's not set, and always
pre-initialize LUA_LIB to empty. This also helps make that block
slightly less difficult to understand.
There used to be special cases where USE_DL was only for the SSL library,
then for Lua, then was used globally, but each of them kept their own copy
of -ldl. When building on a system supporting libdl, with SSL and Lua
enabled, no less than 3 -ldl are found on the linker's command line.
What matters is only that it's close to the end, so let's remove the old
specific ones and move the global one to the end. The option now uses its
own DL_LDFLAGS that is automatically collected into OPTIONS_LDFLAGS.
I got a build error when adding USE_OPENSSL_WOLFSSL to my make command
line because SSL_INC was still set and caused some conflicting headers
to be included first. There's already an exclusion test for the wolfssl
variant used for SSL_LIB, make it also cover SSL_INC to avoid this.
This may be backported to 2.7 to ease testing of wolfssl.
The default include paths for wolfssl didn't match the explicit pattern
one. This was causing some confusion about what to look for, complexifying
the rules and making /usr/local/include to be automatically included if a
path was not set.
Let's just proceed as we usually do, i.e. pass -I only when a path is
specified, so that it works similarly to openssl. Let's also simplify
the LDFLAG rule at the same time.
This may be backported to 2.7 to ease testing of wolfssl.
It happens that a few "if USE_foo" were placed too low in the makefile,
and would mostly work by luck thanks to not using variables that were
already referenced before. The opentracing include is even trickier
because it extends OPTIONS_CFLAGS that was last read a few lines before
being included, but it only works because COPTS is defined as a macro and
not a variable, so it will be evaluated later. At least now it doesn't
touch OPTIONS_* anymore and since it's cleanly arranged, it will work by
default via the flags collector.
Let's just move these late USE_* handlers upper and place a visible
delimiter after them reminding not to add any after.
Now OPTIONS_CFLAGS and OPTIONS_LDFLAGS don't need to be set anymore
for options USE_xxx that set xxx_CFLAGS or xxx_LDFLAGS. These ones
will be automatically connected.
The only entry for now that was ready for this was PCRE2, so it was
adjusted so as not to append to OPTIONS_LDFLAGS anymore. More will
come later.
A lot of _SRC, _INC, _LIB etc variables are set and expected to be
initialized to an empty string by default. However, an in-depth
review of all of them showed that WOLFSSL_{INC,LIB}, SSL_{INC,LIB},
LUA_{INC,LIB}, and maybe others were not always initialized and could
sometimes leak from the environment and as such cause strange build
issues when running from cascaded scripts that had exported them.
The approach taken here consists in iterating over all USE_* options
and unsetting any _SRC, _INC, _LIB, _CFLAGS and _LDFLAGS that follows
the same name. For the few variable names options that don't exactly
match the build option (SSL & WOLFSSL), these ones are specifically
added to the list. The few that were explicitly cleared in their own
sections were just removed since not needed anymore. Note that an
"undefine" command appeared in GNU make 3.82 but since we support
older ones we can only initialize the variables to an empty string
here. It's not a problem in practice.
We're now certain that these variables are empty wherever they are
used, and that it is possible to just append to them, or use them
as-is.
Some macros and functions are barely understandable and are only used
to iterate over known options from the use_opts list. Better assign
them a name and move them into a dedicated file to clean the makefile
a little bit. Now at least "use_opts" only appears once, where it is
defined. This also allowed to completely remove the BUILD_FEATURES
macro that caused some confusion until previous commit.
The BUILD_FEATURES string was created too early to inherit implicit
additions. This could make the features list report that some features
were disabled while they had later been enabled. Better make it a macro
that is interpreted where needed based on the current state of each
option.
Commit b81483cf2 ("MEDIUM: da: update doc and build for new scheduler
mode service.") added a new directory to the Device Atlas dummy lib,
but this one is not cleaned during "make clean", causing build failures
sometimes when switching between compiler versions during development.
This should be backported to 2.6.
Adding base code to provide subscribe/publish API for internal
events processing.
event_hdl provides two complementary APIs, both are implemented
in src/event_hdl.c and include/haproxy/event_hdl{-t.h,.h}:
One API targeting developers that want to register event handlers
that will be notified on specific events.
(SUBSCRIBE)
One API targeting developers that want to notify registered handlers
about an event.
(PUBLISH)
This feature is being considered to address the following scenarios:
- mailers code refactoring (getting rid of deprecated
tcp-check ruleset implementation)
- server events from lua code (registering user defined
lua function that is executed with relevant data when a
server is dynamically added/removed or on server state change)
- providing a stable and easy to use API for upcoming
developments that rely on specific events to perform actions.
(e.g: ressource cleanup when a server is deleted from haproxy)
At this time though, we don't have much use cases in mind in addition to
server events handling, but the API is aimed at being multipurpose
so that new event families, with their own particularities, can be
easily implemented afterwards (and hopefully) without requiring breaking
changes to the API.
Moreover, you should know that the API was not designed to cope well
with high rate event publishing.
Mostly because publishing means iterating over unsorted subscriber list.
So it won't scale well as subscriber list increases, but it is intended in
order to keep the code simple and versatile.
Instead, it is assumed that events implemented using this API
should be periodic events, and that events related to critical
io/networking processing should be handled using
dedicated facilities anyway.
(After all, this is meant to be a general purpose event API)
Apart from being easily extensible, one of the main goals of this API is
to make subscriber code as simple and safe as possible.
This is done by offering multiple event handling modes:
- SYNC mode:
publishing code directly
leverages handler code (callback function)
and handler code has a direct access to "live" event data
(pointers mostly, alongside with lock hints/context
so that accessing data pointers can be done properly)
- normal ASYNC mode:
handler is executed in a backward compatible way with sync mode,
so that it is easy to switch from and to SYNC/ASYNC mode.
Only here the handler has access to "offline" event data, and
not "live" data (ptrs) so that data consistency is guaranteed.
By offline, you should understand "snapshot" of relevant data
at the time of the event, so that the handler can consume it
later (even if associated ressource is not valid anymore)
- advanced ASYNC mode
same as normal ASYNC mode, but here handler is not a function
that is executed with event data passed as argument: handler is a
user defined tasklet that is notified when event occurs.
The tasklet may consume pending events and associated data
through its own message queue.
ASYNC mode should be considered first if you don't rely on live event
data and you wan't to make sure that your code has the lowest impact
possible on publisher code. (ie: you don't want to break stuff)
Internal API documentation will follow:
You will find more details about the notions we roughly approached here.
This patch also adds a set of new global options:
- 51degrees-use-performance-graph { on | off }
- 51degrees-use-predictive-graph { on | off }
- 51degrees-drift <number>
- 51degrees-difference <number>
- 51degrees-allow-unmatched { on | off }
To build using the latest 51Degrees V4 engine with Hash algorithm, set
USE_51DEGREES_V4=1.
Other supported build options are 51DEGREES_INC, 51DEGREES_LIB and
51DEGREES_SRC which needs to be set to the directory that contains
headers and C files. For example:
make TARGET=<target> USE_51DEGREES_V4=1 51DEGREES_SRC='51D_REPO_PATH'/src
This adds a USE_OPENSSL_WOLFSSL option, wolfSSL must be used with the
OpenSSL compatibility layer. This must be used with USE_OPENSSL=1.
WolfSSL build options:
./configure --prefix=/opt/wolfssl --enable-haproxy
HAProxy build options:
USE_OPENSSL=1 USE_OPENSSL_WOLFSSL=1 WOLFSSL_INC=/opt/wolfssl/include/ WOLFSSL_LIB=/opt/wolfssl/lib/ ADDLIB='-Wl,-rpath=/opt/wolfssl/lib'
Using at least the commit 54466b6 ("Merge pull request #5810 from
Uriah-wolfSSL/haproxy-integration") from WolfSSL. (2022-11-23).
This is still to be improved, reg-tests are not supported yet, and more
tests are to be done.
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
This time the current ordering of common objects remained mostly
unchanged, except for flt_bwlim that was added. However, the SSL
and QUIC build order still had not been handled and were extremely
imbalanced, so they were adjusted. It's even possible to start
building QUIC before openssl to save a little bit more but more
likely that a few large quic files will get split again over time.
There's quite a large barely readable functions block in the makefile
dedicated to compiler option support. It provides no value here and
makes it harder to find user-configurable stuff, so let's move it to
include/make/compiler.mk to keep the makefile a bit cleaner. It's better
to keep the options themselves in the makefile however.
It's better to see "make" entering a subdir than seeing nothing, so
let's use a command name for make. Since make 3.81, "+" needs to be
prepended in front of the command to pass the job server to the subdir.
The $(Q), $(V), $(cmd_xx) handling needs to be reused in sub-project
makefiles and it's a pain to maintain inside the main makefile. Let's
just move that into a new subdir include/make/ with a dedicated file
"verbose.mk". It slightly cleans up the makefile in addition.
The "poll" and "tcploop" sub-projects have their own makefiles. But
since the cmd_* commands were migrated from "echo" to $(info) with
make 3.81, the command is confusingly displayed in the top-level
makefile before entering the directory, even making one think that
the build occurred.
Let's instead propagate the verbosity level through the sub-projects
and let them adapt their own cmd_CC. For now this peans a little bit
of duplication for poll and tcploop.
Since these ones come with their own makefiles, the top-level makefile
cannot decide when they have to be rebuilt, it should always defer the
decision to the compoent's makefile, so we must mark them as phony.
Because of these, they were not updated after a change without calling
a "clean" first.
When compiled with USE_SHM_OPEN=1 the startup-logs are now able to use
an shm which is used to keep the logs when switching to mworker wait
mode. This allows to keep the failed reload logs.
When allocating the startup-logs at first start of the process, haproxy
will do a shm_open with a unique path using the PID of the process, the
file is unlink immediatly so we don't let unwelcomed files be. The fd
resulting from this shm is stored in the HAPROXY_STARTUPLOGS_FD
environment variable so it can be mmap again when switching to wait
mode.
When forking children, the process is copying the mmap to a a mallocated
ring so we never share the same memory section between the master and
the workers. When switching to wait mode, the shm is not used anymore as
it is also copied to a mallocated structure.
This allow to use the "show startup-logs" command over the master CLI,
to get the logs of the latest startup or reload. This way the logs of
the latest failed reload are also kept.
This is only activated on the linux-glibc target for now.
xprt_quic module was too large and did not reflect the true architecture
by contrast to the other protocols in haproxy.
Extract code related to XPRT layer and keep it under xprt_quic module.
This code should only contains a simple API to communicate between QUIC
lower layer and connection/MUX.
The vast majority of the code has been moved into a new module named
quic_conn. This module is responsible to the implementation of QUIC
lower layer. Conceptually, it overlaps with TCP kernel implementation
when comparing QUIC and HTTP1/2 stacks of haproxy.
This should be backported up to 2.6.
Extract function dealing with HTX outside of MUX QUIC. For the moment,
only rcv_buf stream operation is concerned.
The main objective is to be able to support both TCP and HTTP proxy mode
with a common base and add specialized modules on top of it.
This should be backported up to 2.6.
QUIC MUX implements several APIs to interface with stream, quic-conn and
app-ops layers. It is planified to better separate this roles, possibly
by using several files.
The first step is to extract QUIC MUX traces in a dedicated source
files. This will allow to reuse traces in multiple files.
The main objective is to be
able to support both TCP and HTTP proxy mode with a common base and add
specialized modules on top of it.
This should be backported up to 2.6.
With the ability to back a memory ring into an mmapped file, it makes
sense to be able to dump these files. That's what this utility does.
The entire ring is dumped to stdout. It's well suited to large dumps,
it converts roughly 6 GB of logs per second.
The utility is really meant for developers at the moment. It might
evolve into a more general tool but at the moment it's still possible
that it might need to be run under gdb to process certain crash dumps.
Also at the moment it must not be used on a ring being actively written
to or it will dump garbage.
The code is made so that we can envision later to attach to a live
ring and dump live contents, but this requires that the utility is
built with the exact same options (threads etc), and that the file
is opened read-write. For now these parts have been commented out,
waiting for a reasonably balanced and non-intrusive solution to be
found (e.g. signals must be intercepted so that the tool cannot
leave the ring with a watcher present).
If it is detected that the memory layout of the ring struct differs,
a warning is emitted. At the end, if an error occurs, a warning is
printed as well (this does happen when the process is not cleanly
stopped, but it indicates the end was reached).
Since version 1.1.0, OpenSSL's libcrypto ignores the provided locking
mechanism and uses pthread's rwlocks instead. The problem is that for
some code paths (e.g. async engines) this results in a huge amount of
syscalls on systems facing a bit of contention, to the point where more
than 80% of the CPU can be spent in the system dealing with spinlocks
just for futex_wake().
This patch provides an alternative by redefining the relevant pthread
rwlocks from the low-overhead version of the progressive rw locks. This
way there will be no more syscalls in case of contention, and CPU will
be burnt in userland. Doing this saves massive amounts of CPU, where
the locks only take 12-15% vs 80% before, which allows SSL to work much
faster on large thread counts (e.g. 24 or more).
The tryrdlock and trywrlock variants have been implemented using a CAS
since their goal is only to succeed on no contention and never to wait.
The pthread_rwlock API is complete except that the timed versions of
the rdlock and wrlock do not wait and simply fall back to trylock
versions.
Since the gains have only been observed with async engines for now,
this option remains disabled by default. It can be enabled at build
time using USE_PTHREAD_EMULATION=1.
Cubic is the congestion control algorithm used by default by the Linux kernel
since 2.6.15 version. This algorithm is supposed to achieve good scalability and
fairness between flows using the same network path, it should also be used by QUIC
by default. This patch implements this algorithm and select it as default algorithm
for the congestion control.
Must be backported to 2.6.
Add a new INSTALL variable to allow overridiing the flags passed to
install(1). install(1) on OpenBSD/NetBSD/Solaris/AIX does not support
the -v flag. With the new INSTALL variable and handling only use the
-v flag with the Linux targets.
This patch adds a filter to limit bandwith at the stream level. Several
filters can be defined. A filter may limit incoming data (upload) or
outgoing data (download). The limit can be defined per-stream or shared via
a stick-table. For a given stream, the bandwith limitation filters can be
enabled using the "set-bandwidth-limit" action.
A bandwith limitation filter can be used indifferently for HTTP or TCP
stream. For HTTP stream, only the payload transfer is limited. The filter is
pretty simple for now. But it was designed to be extensible. The current
design tries, as far as possible, to never exceed the limit. There is no
burst.
Implement a standalone binary to be able to easily a hex-string QPACK
stream. The binary must be compiled via the Makefile. Hex-strings are
specified on stdin.
As usual, let's sort objects by inverse build time at -O2. It will
still vary based on the options but keeps them optimally sorted for
parallel builds.
Add ->inc_err_cnt new callback to qcc_app_ops struct which can
be called from xprt to increment the application level error code counters.
It take the application context as first parameter to be generic and support
new QUIC applications to come.
Add h3_stats.c module with counters for all the frame types and error codes.
Make the transport parameters be standlone as much as possible as
it consists only in encoding/decoding data into/from buffers.
Reduce the size of xprt_quic.h. Unfortunalety, I think we will
have to continue to include <xprt_quic-t.h> to use the trace API
into this module.
There's no more reason for keepin the code and definitions in conn_stream,
let's move all that to stconn. The alphabetical ordering of include files
was adjusted.
Define the new type ncbuf. It can be used as a buffer with
non-contiguous data and wrapping support.
To reduce as much as possible the memory footprint, size of data and
gaps are stored in the gaps themselves. This put some limitation on the
buffer usage. A reserved space is present just before the head to store
the size of the first data block. Also, add and delete operations will
be constrained to ensure minimal gap sizes are preserved.
The sizes stored in the gaps are represented by a custom type named
ncb_sz_t. This type is a typedef to easily change it : this has a
direct impact on the maximum buffer size (MAX(ncb_sz_t) - sizeof(ncb_sz_t))
and the minimal gap sizes (sizeof(ncb_sz_t) * 2)).
Currently, it is set to uint32_t.
Some error reports are misleading on some recent versions of gcc because
it goes on to build for a very long time after it meets an error. Not
only this makes it hard to scroll back to the beginning of the error,
but it also hides the cause of the error when it's prominently printed
in a "#error" statement. This typically happens when building with QUIC
and without OPENSSL where there can be 4 pages of unknown types and such
errors after the "Must define USE_OPENSSL" suggestion.
The flag -Wfatal-errors serves exactly this purpose, to stop after the
first error, and it's supported on all the compilers we support, so let's
enable this now.
Regroup all type definitions and functions related to qc_stream_desc in
the source file src/quic_stream.c.
qc_stream_desc complexity will be increased with the development of Tx
multi-buffers. Having a dedicated module is useful to mix it with
pure transport/quic-conn code.
OpenSSL 3.0 emits tons of deprecation warnings for the engine API, and
it becomes a real problem because these hide other real warnings and
will prevent distros from building with -Werror. Fortunately there's a
macro to shut this one, OPENSSL_SUPPRESS_DEPRECATED, that is sufficient
to get things back to normal, so let's define it when USE_ENGINE is set.
This way we still get a chance to see other deprecation warnings when
engines are not used.
Previous patch forgot to add USE_ENGINE to the list of options to be
transferred to CFLAGS, so USE_ENGINE had no effect and engines would
remain disabled.
The OpenSSL engine API is deprecated starting with OpenSSL 3.0.
In order to have a clean build this feature is now disabled by default.
It can be reactivated with USE_ENGINE=1 on the build line.
Remove the definition of DEBUG_HPACK on qpack-dec.c which forces the
QPACK decoding traces on stderr. Also change the name to use a dedicated
one for QPACK decoding as DEBUG_QPACK.
Move all inline functions with trace from quic_loss.h to a dedicated
object file. This let to remove the TRACE_SOURCE macro definition
outside of the include file.
This change is required to be able to define another TRACE_SOUCE inside
the mux_quic.c for a dedicated trace module.
kFreeBSD needs to be treated as a distinct target from FreeBSD
since the underlying system libc is the GNU one. Thus, relying
only on __GLIBC__ no longer suffice.
- freebsd-glibc new target, key difference is including crypt.h
and linking to libdl like linux.
- cpu affinity available but the api is still the FreeBSD's.
- enabling auxiliary data access only for Linux.
Patch based on preliminary work done by @bigon.
closes#1555
We used to have DEBUG_STRICT_NOCRASH to disable crashes on BUG_ON().
Now we have other levels (WARN_ON(), CHECK_IF()) so we need something
finer-grained.
This patch introduces DEBUG_STRICT_ACTION which takes an integer value.
0 disables crashes and is the equivalent of DEBUG_STRICT_NOCRASH. 1 is
the default and only enables crashes on BUG_ON(). 2 also enables crashes
on WARN_ON(), and 3 also enables warnings on CHECK_IF(), and is suited
to developers and CI.
The first one will enable all currently deployed BUG_ON() checks. These
ones are safe from a performance perspective and from a reliability
perspective. New ones may be added later with different categories
(hot path, detection of uncertain events, etc).
DEBUG_MEMORY_POOLS enables the "tag" pool debugging option by default,
so that pools may be better traced in dumps. This one alone results in
almost imperceptible performance difference, and 8 extra bytes per
allocated object.
Both options are safe for production use (they're among those enabled
all the time on haproxy.org) and allow to produce much more trustable
bug reports which should save a few round trips with the reporters.