Over time, some of the buffer management functions grew quite a bit,
and were still forced to remain inlined since all defined in buf.h.
Let's create buf.c and move the heaviest ones there. All those moved
here were above 200 bytes.
This is an import of the compact elastic binary trees at commit
a9cd84a ("OPTIM: descent: better prefetch less and for writes when
deleting")
These will be used to replace certain lists (and possibly certain
tree nodes as well). They're as fast (or even faster) than ebtrees
for lookups, as fast for insertion and slower for deletion, and a
node only uses 2 pointers (like a list).
The only changes were cebtree.h where common/tools.h was replaced
with ebtree.h which we already have and already provides the needed
functions and macros, and the addition of a wrapper cebtree-prv.h in
src/ to redirect to import/cebtree-prv.h.
There exist two sorts of token used by QUIC. They are both used to validate
the peer address (path validation). Retry are used for the current
connection the client want to open. This patch implement the other
sort of tokens which after having been received from a connection, may
be provided for the next connection from the same IP address to validate
it (or validate the network path between the client and the server).
The token generation is implemented by quic_generate_token(), and
the token validation by quic_token_chek(). The same method
is used as for Retry tokens to build such tokens to be reused for
future connections. The format is very simple: one byte for the format
identifier to distinguish these new tokens for the Retry token, followed
by a 32bits timestamps. As this part is ciphered with AEAD as cryptographic
algorithm, 16 bytes are needed for the AEAD tag. 16 more random bytes
are added to this token and a salt to derive the AEAD secret used
to cipher the token. In addition to this salt, this is the client IP address
which is used also as AAD to derive the AEAD secret. So, the length of
the token is fixed: 37 bytes.
Implement a new set of rules labelled as quic-initial.
These rules as specific to QUIC. They are scheduled to be executed early
on Initial packet parsing, prior a new QUIC connection instantiation.
Contrary to tcp-request connection, this allows to reject traffic
earlier, most notably by avoiding unnecessary QUIC SSL handshake
processing.
A new module quic_rules is created. Its main function
quic_init_exec_rules() is called on Initial packet parsing in function
quic_rx_pkt_retrieve_conn().
For the moment, only "accept" and "dgram-drop" are valid actions. Both
are final. The latter drops silently the Initial packet instead of
allocating a new QUIC connection.
It is no possible yet to use it. Idles connections and pipelining mode are
not supported for now. But it should be possible to open a SPOP connection,
perform the HELLO handshake, send a NOTIFY frame based on data produced by
the client side and receive the corresponding ACK frame to transfer its
content to the client side.
The related issue is #2502.
The code which gets, sets and checks initial and current fd limits and process
related limits (maxconn, maxsock, ulimit-n, fd-hard-limit) is spread around
different functions in haproxy.c and in fd.c. Let's group it together in
dedicated limits.c and limits.h.
This patch is done in order to prepare the moving of limits-related functions
from different places to the new 'limits' compilation unit. It helps to keep
clean the next patch, which will do only the move without any additional
modifications.
Such detailed split is needed in order to be sure not to break accidentally
limits logic and in order to be able to compile each commit separately in case
of git-bisect.
Move the code which is used to select the final certificate with the
clienthello callback. ssl_sock_client_sni_pool need to be exposed from
outside ssl_sock.c
Some large files have been split since 2.9 (e.g. stats) and build times
have moved and become less smooth, causing a less even parallel build.
As usual, a small reordering cleans all this up. The effect was less
visible than previous years though.
Create a new module stats-proxy. Move stats functions related to proxies
list looping in it. This allows to reduce stats source file dividing its
size by half.
Define a new CLI command "dump stats-file" with its handler
cli_parse_dump_stat_file(). It will loop twice on proxies_list to dump
first frontend and then backend side. It reuses the common function
stats_dump_stat_to_buffer(), using STAT_F_BOUND to restrict on the
correct side.
A new module stats-file.c is added to regroup function specifics to
stats-file. It defines two main functions :
* stats_dump_file_header() to generate the list of column list prefixed
by the line context, either "#fe" or "#be"
* stats_dump_fields_file() to generate each stat lines. Object without
GUID are skipped. Each stat entry is separated by a comma.
For the moment, stats-file does not support statistics modules. As such,
stats_dump_*_line() functions are updated to prevent looping over stats
module on stats-file output.
Extract functions related to HTML stats webpage from stats.c into a new
module named stats-html. This allows to reduce stats.c to roughly half
of its original size.
"make opts" is nice because it shows all options being used, but it
does so in a copy-pastable way that aggregates everything on a single
line, rendering very poorly and making it hard to spot the relevant
variables.
Let's break lines and append a trailing backslash to them instead. This
gives something much more readable which remains copy-pastable. Options
can be inspected and more easily replicated. It was also verified that
copy-pasting the whole block after "make" does continue to work like
before and produces the same output.
This one is often used as a gateway to inject regular CFLAGS, even though
not designed for this. It's now ignored, but any attempt at setting it
reports a warning suggesting to use CFLAGS or ARCH_FLAGS instead.
The VERBOSE_CFLAGS variable is used to report the CFLAGS used in the
output of "haproxy -vv". It currently contains all -Wxxx and -Wno-xxx
because there was no other possibility till now, but these warnings
are highly specific to the compiler version in use, are automatically
detected and do not bring value being presented here. Worse, by
encouraging users to copy-paste them, they can end up with warnings
that are not relevant to their build environment.
In addition, their presence there doesn't give indication of whether
or not they triggered, so they cannot vouch for the output code quality.
Better just not report them there. However, as part of VERBOSE_CFLAGS
they were also used to detect whether or not to rebuild, via build_opts,
so they still have to be explicitly mentioned there if we want to make
sure that changing warning options triggers a rebuild to see their
effect.
Now we can have more useful outputs like this one which indicate precisely
what to use to safely rebuild the executable:
Build options :
TARGET = linux-glibc
CC = gcc
CFLAGS = -O2 -g -fwrapv
OPTIONS = USE_OPENSSL=1 USE_LUA=1 USE_ZLIB= USE_SLZ=1 USE_DEVICEATLAS=1 USE_51DEGREES=1 USE_WURFL=1 USE_QUIC=1 USE_PROMEX=1 USE_PCRE=1
DEBUG = -DDEBUG_DONT_SHARE_POOLS -DDEBUG_MEMORY_POOLS -DDEBUG_EXPR -DDEBUG_STRICT=2 -DDEBUG_DEV -DDEBUG_MEM_STATS -DDEBUG_POOL_TRACING
It's currently not possible to only set some -Wno... without breaking
the -W... and conversely. Let's split both sets apart so that it's now
possible to set -W... alone in WARN_CFLAGS to enable only some warnings,
and pass the -Wno... in SPEC_CFLAGS without losing the enabled ones.
The compiler-specific CFLAGS that are automatically detected (SPEC_CFLAGS)
are currently the ones affected by ERR and FAILFAST. Not only this makes
these options useless as soon as SPEC_CFLAGS is forced, but it also means
that any change to them causes a rebuild, so disabling FAILFAST or
enabling ERR to get more info on a faulty object causes a full rebuild
of all others. Let's just move them to ERROR_CFLAGS that only contains
these two. It's not documented outside of the makefile because it's not
supposed to be touched.
-Wfatal-errors is set by default and is not supported on older compilers.
Since it's part of all the automatically detected flags, it's painful to
remove when needed. Also it's a matter of taste, some developers might
prefer to get a long list of all errors at once, others prefer that the
build stops immediately after the root cause.
The default is now back to no -Wfatal-errors, and when FAILFAST is set to
any non-empty non-zero value, -Wfatal-errors is added:
$ make TARGET=linux-glibc USE_OPENSSL=0 USE_QUIC=1 FAILFAST=0 2>&1 | wc
132 536 6111
$ make TARGET=linux-glibc USE_OPENSSL=0 USE_QUIC=1 FAILFAST=1 2>&1 | wc
8 39 362
It's among the options that change a lot on the developer's side and it's
tempting to change from ERR=1 to ERR=0 on the make command line by reusing
the history, except it doesn't work. Let's explictily permit ERR=0 to
disable -Werror like ERR= does.
We now have a separate CFLAGS variable for sensitive options that affect
code generation and/or standard compliance. This one must not be mixed
with the warnings auto-detection because changing such warnings can
result in losing the options. Now it's easier to affect them separately.
The option was not listed in the series of variables useful to packagers
because they're not supposed to touch it.
ARCH_FLAGS used to be merged into LDFLAGS so that it was not possible to
pass extra options to LDFLAGS without losing ARCH_FLAGS. This commit now
splits them apart and leaves LDFLAGS empty by default. The doc explains
how to use it for rpath and such occasional use cases.
ARCH_FLAGS was always present and is documented as being fed to both
CC and LD during the build. This is meant for options that need to be
consistent between the two stages such as -pg, -flto, -fsanitize=address,
-m64, -g etc. Its doc was lacking a bit of clarity though, and it was
not enumerated in the makefile's variables list.
ARCH however was only documented as affecting ARCH_FLAGS, and was just
never used as the only two really usable and supported ARCH_FLAGS options
were -m32 and -m64. In addition it was even written in the makefile that
it was CPU that was affecting the ARCH_FLAGS. Let's just drop ARCH and
improve the documentation on ARCH_FLAGS. Again, if ARCH is set, a warning
is emitted explaining how to proceed.
ARCH_FLAGS is now preset to -g so that we finally have a correct place
to deal with such debugging options that need to be passed to both
stages. The fedora and musl CI workflows were updated to also use it
instead of sticking to duplicate DEBUG_CFLAGS+LDFLAGS.
It's also worth noting that BUILD_ARCH was being passed to the build
process and never used anywhere in the code, so its removal will not
be noticed.
The CPU variable, when used, is almost always exclusively used with
"generic" to disable any CPU-specific optimizations, or "native" to
enable "-march=native". Other options are not used and are just making
CPU_CFLAGS more confusing.
This commit just drops all pre-configured variants and replaces them
with documentation about examples of supported options. CPU_CFLAGS is
preserved as it appears that it's mostly used as a proxy to inject the
distro's CFLAGS, and it's just empty by default.
The CPU variable is checked, and if set to anything but "generic", it
emits a warning about its deprecation and invites the user to read
INSTALL.
Users who would just set CPU_CFLAGS will be able to continue to do so,
those who were using CPU=native will have to pass CPU_CFLAGS=-march=native
and those who were passing one of the other options will find it in the
doc as well.
Note that this also removes the "CPU=" line from haproxy -vv, that most
users got used to seeing set to "generic" or occasionally "native"
anyway, thus that didn't provide any useful information.
CPU_CFLAGS is meant to set the CPU-specific options (-mcpu, -march etc).
The fact that it also includes the optimization level is annoying because
one cannot be set without replacing the other. Let's move the optimization
level to a new independent OPT_CFLAGS that is added early to the list, so
that other CFLAGS (including CPU_CFLAGS) can continue to override it if
necessary.
These settings were appended to the final build CFLAGS and used to
contain a mix of obsolete settings that can equally be passed in one
of the many other variables such as DEFINE or more recently CFLAGS.
Let's just drop the obsolete comment about it, and check if anything
was forced there, then emit a warning suggesting to move that to other
variables such as DEFINE or CFLAGS, so as to be kind to package
maintainers.
CFLAGS has always been a troublemaker because the variable was preset
based on other options, including dynamically detected ones, so
overriding it would just lose the original contents, forcing users
to resort to various alternatives such as DEFINE, ADDINC or SMALL_OPTS.
Now that the variable's usage was cleared, let's just preset it to
empty (and it MUST absolutely remain like this) and append it at the
end of the compiler's options. This will now allow to change an
optimization level, force a CPU type or disable a warning as users
commonly expect from CFLAGS passed to a makefile, and not to override
*all* the compiler flags as it has progressively become.
CFLAGS currently is a concatenation of 4 other variables, some of which
are dynamically determined. This has long been totally unusable to pass
any extra option. Let's just get rid of it and pass the 4 variables at
the 3 only places CFLAGS was used. This will later allow us to make
CFLAGS something really usable.
This also has the benefit of implicitly restoring the build on AIX5
which needs to disable DEBUG_CFLAGS to solve symbol issues when built
with -g. Indeed, that one got ignored since the targets moved past the
CFLAGS definition which collects DEBUG_CFLAGS.
This option has been set by default for a very long time and also
complicates the manipulation of the DEBUG variable. Let's make it
the official default and permit to unset it by setting it to zero.
The other pool-related DEBUG options were adjusted to also explicitly
check for the zero value for consistency.
We continue to carry it in the makefile, which adds to the difficulty
of passing new options. Let's make DEBUG_STRICT=1 the default so that
one has to explicitly pass DEBUG_STRICT=0 to disable it. This allows us
to remove the option from the default DEBUG variable in the makefile.
William rightfully reported that not supporting =0 to disable a USE_xxx
option is sometimes painful (e.g. a script might do USE_xxx=$(command)).
It's not that difficult to handle actually, we just need to consider the
value 0 as empty at the few places that test for an empty string in
options.mk, and in each "ifneq" test in the main Makefile, so let's do
that. We even take care of preserving the original value in the build
options string so that building with USE_OPENSSL=0 will be reported
as-is in haproxy -vv, and with "-OPENSSL" in the feature list.
William suggested that it would be nice to warn about unknown USE_*
variables to more easily catch misspelled ones. The valid ones are
present in use_opts, so by appending "=%" to each of them, we can
build a series of patterns to exclude from MAKEOVERRIDES and emit
a warning for the ones that stand out.
Example:
$ make TARGET=linux-glibc USE_QUIC_COMPAT_OPENSSL=1
Makefile:338: Warning: ignoring unknown build option: USE_QUIC_COMPAT_OPENSSL=1
CC src/slz.o
Define a new module guid. Its purpose is to be able to attach a global
identifier for various objects such as proxies, servers and listeners.
A new type guid_node is defined. It will be stored in the objects which
can be referenced by such GUID. Several functions are implemented to
properly initialized, insert, remove and lookup GUID in a global tree.
Modification operations should only be conducted under thread isolation.
Since the systemd notify feature is now independant of any library,
lets enable it by default for linux-glibc.
The -Ws mode still need to be used in order to use the sd_nofify()
function. And the function won't do anything if the NOTIFY_SOCKET
environment variable is not defined.
Given the xz drama which allowed liblzma to be linked to openssh, lets remove
libsystemd to get rid of useless dependencies.
The sd_notify API seems to be stable and is now documented. This patch replaces
the sd_notify() and sd_notifyf() function by a reimplementation inspired by the
systemd documentation.
This should not change anything functionnally. The function will be built when
haproxy is built using USE_SYSTEMD=1.
References:
https://github.com/systemd/systemd/issues/32028https://www.freedesktop.org/software/systemd/man/devel/sd_notify.html#Notes
Before:
wla@kikyo:~% ldd /usr/sbin/haproxy
linux-vdso.so.1 (0x00007ffcfaf65000)
libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x000074637fef4000)
libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x000074637fe4f000)
libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x000074637f400000)
liblua5.4.so.0 => /lib/x86_64-linux-gnu/liblua5.4.so.0 (0x000074637fe0d000)
libsystemd.so.0 => /lib/x86_64-linux-gnu/libsystemd.so.0 (0x000074637f92a000)
libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x000074637f365000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000074637f000000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x000074637f27a000)
libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2 (0x000074637fdff000)
libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x000074637eeb8000)
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x000074637fdcd000)
libzstd.so.1 => /lib/x86_64-linux-gnu/libzstd.so.1 (0x000074637ee01000)
liblz4.so.1 => /lib/x86_64-linux-gnu/liblz4.so.1 (0x000074637fda8000)
/lib64/ld-linux-x86-64.so.2 (0x000074637ff5d000)
libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x000074637f904000)
After:
wla@kikyo:~% ldd /usr/sbin/haproxy
linux-vdso.so.1 (0x00007ffd51901000)
libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f758d6c0000)
libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x00007f758d61b000)
libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007f758ca00000)
liblua5.4.so.0 => /lib/x86_64-linux-gnu/liblua5.4.so.0 (0x00007f758d5d9000)
libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007f758d365000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f758d5ba000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f758c600000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f758c915000)
/lib64/ld-linux-x86-64.so.2 (0x00007f758d729000)
A backport to all stable versions could be considered at some point.
As previously mentioned in cd352c0db ("MINOR: log/balance: rename
"log-sticky" to "sticky""), let's define a sticky algorithm that may be
used from any protocol. Sticky algorithm sticks on the same server as
long as it remains available.
The documentation was updated accordingly.
A ring is used for the DNS code but slightly differently from the generic
one, which prevents some important changes from being made to the generic
code without breaking DNS. As the use cases differ, it's better to just
split them apart for now and have the DNS code use its own ring that we
rename dns_ring and let the generic code continue to live on its own.
The unused parts such as CLI registration were dropped, resizing and
allocation from a mapped area were dropped. dns_ring_detach_appctx() was
kept despite not being used, so as to stay consistent with the comments
that say it must be called, despite the DNS code explicitly mentioning
that it skips it for now (i.e. this may change in the future).
Hopefully after the generic rings are converted the DNS code can migrate
back to them, though this is really not necessary.
In this patch we add a registration mechanism for modules. To do so, a
module must defined the "promex_module" structure. The dump itself will be
based on 2 contexts. One for all the dump and another one for each metric
time-series. These contexts are used as restart points when the dump is
interrupted.
Modules must also implement 6 callback functions:
* start_metric_dump(): It is an optional callback function. If defined, it
is responsible to initialize the dump context use
as the first restart point.
* stop_metric_dump(): It is an optional callback function. If defined, it
is responsible to deinit the dump context.
* metric_info(): This one is mandatory. It returns the info about the
metric: name, type and flags and descrition.
* start_ts(): This one is mandatory, it initializes the context for a time
series for a given metric. This context is the second
restart point.
* next_ts(): This one is mandatory. It interates on time series for a
given metrics. It is also responsible to handle end of a
time series and deinit the context.
* fill_ts(): It fills info on the time series for a given metric : the
labels and the value.
In addition, a module must set its name and declare the number of metrics is
exposed.
Create a new module dedicated to flow control handling. It will be used
to implement earlier flow control update on snd_buf stream callback.
For the moment, only Tx part is implemented (i.e. limit set by the peer
that haproxy must respect for sending). A type quic_fctl is defined to
count emitted data bytes. Two offsets are used : a real one and a soft
one. The difference is that soft offset can be incremented beyond limit
unless it is already in excess.
Soft offset will be used for HTX to H3 parsing. As size of generated H3
is unknown before parsing, it allows to surpass the limit one time. Real
offset will be used during STREAM frame generation : this time the limit
must not be exceeded to prevent protocol violation.
- Reflecing the changes done in addons/deviceatlas/Makefile.inc.
Enabling the cache feature and its disabling option as well.
- Now the `dadwsch` application is part of the API's package for more
general purposes, we remove it.
- Minor and transparent to user changes into da.c's workflow, also
making more noticeable some notices with appropriate logging levels.
- Adding support for the new `deviceatlas-cache-size` config keyword,
a no-op when the cache support is disabled.
- Adding missing compilation units and relevant api updates to
the dummy library version.
Add quic_retry.c new C file for the QUIC retry feature:
quic_saddr_cpy() moved from quic_tx.c,
quic_generate_retry_token_aad() moved from
quic_generate_retry_token() moved from
parse_retry_token() moved from
quic_retry_token_check() moved from
quic_retry_token_check() moved from
Move quic_cid and quic_connnection_id from quic_conn-t.h to new quic_cid-t.h header.
Move defintions of quic_stateless_reset_token_init(), quic_derive_cid(),
new_quic_cid(), quic_get_cid_tid() and retrieve_qc_conn_from_cid() to quic_cid.c
new C file.
This adds a new option for the Makefile USE_OPENSSL_AWSLC, and
update the documentation with instructions to use HAProxy with
AWS-LC.
Update the type of the OCSP callback retrieved with
SSL_CTX_get_tlsext_status_cb with the actual type for
libcrypto versions greater than 1.0.2. This doesn't affect
OpenSSL which casts the callback to void* in SSL_CTX_ctrl.
For a while there has been the constraint of having to run as root for
transparent proxying, and we're starting to see some cases where QUIC is
not running in socket-per-connection mode due to the missing capability
that would be needed to bind a privileged port. It's not realistic to
ask all QUIC users on port 443 to run as root, so instead let's provide
a basic support for capabilities at least on linux. The ones currently
supported are cap_net_raw, cap_net_admin and cap_net_bind_service. The
mechanism was made OS-specific with a dedicated file because it really
is. It can be easily refined later for other OSes if needed.
A new keyword "setcaps" is added to the global section, to enumerate the
capabilities that must be kept when switching from root to non-root. This
is ignored in other situations though. HAProxy has to be built with
USE_LINUX_CAP=1 for this to be supported, which is enabled by default
for linux-glibc, linux-glibc-legacy and linux-musl.
A good way to test this is to start haproxy with such a config:
global
uid 1000
setcap cap_net_bind_service
frontend test
mode http
timeout client 3s
bind quic4@:443 ssl crt rsa+dh2048.pem allow-0rtt
and run it under "sudo strace -e trace=bind,setuid", then connecting
there from an H3 client. The bind() syscall must succeed despite the
user id having been switched.
A new protocol named "reverse_connect" is created. This will be used to
instantiate connections that are opened by a reverse bind.
For the moment, only a minimal set of callbacks are defined with no real
work. This will be extended along the next patches.
make "range" which was introduced with 06d34d4 ("DEV: makefile: add a
new "range" target to iteratively build all commits") does not work with
POSIX shells (namely: bourne shell), and will fail with this kind of
errors:
|/bin/sh: 6: Syntax error: "(" unexpected (expecting ")")
|make: *** [Makefile:1226: range] Error 2
This is because arrays and arithmetic expressions which are used for the
"range" target are not supported by sh (unlike bash and other "modern"
interpreters).
However the make "all" target already complies with POSIX, so in this
commit we try to make "range" target POSIX compliant to ensure that the
makefile works as expected on systems where make uses /bin/sh as default
intepreter and where /bin/sh points to POSIX shell.