Commit Graph

22011 Commits

Author SHA1 Message Date
Willy Tarreau
6b6a6ad431 CI: update the build options to get rid of unneeded DEBUG options
Now that DEBUG_STRICT and DEBUG_MEMORY_POOLS are the default, we can
drop them from the build options.
2024-04-11 17:25:45 +02:00
Willy Tarreau
772f9a5874 BUILD: pools: make DEBUG_MEMORY_POOLS=1 the default option
This option has been set by default for a very long time and also
complicates the manipulation of the DEBUG variable. Let's make it
the official default and permit to unset it by setting it to zero.
The other pool-related DEBUG options were adjusted to also explicitly
check for the zero value for consistency.
2024-04-11 17:25:45 +02:00
Willy Tarreau
b70981532a BUILD: debug: make DEBUG_STRICT=1 the default
We continue to carry it in the makefile, which adds to the difficulty
of passing new options. Let's make DEBUG_STRICT=1 the default so that
one has to explicitly pass DEBUG_STRICT=0 to disable it. This allows us
to remove the option from the default DEBUG variable in the makefile.
2024-04-11 17:25:45 +02:00
Willy Tarreau
b22b968a48 BUILD: cache: fix non-inline vs inline declaration mismatch to silence a warning
Some compilers report this on the cache:
  src/cache.c:235: warning: 'release_entry_locked' declared inline after being called
  src/cache.c:235: warning: previous declaration of 'release_entry_locked' was here

And indeed, the function is first declared non-inline and later inline.
Let's just set the inline status from the beginning. It's not really
needed to backport this.
2024-04-11 17:25:25 +02:00
Willy Tarreau
e791b243f0 BUG/MINOR: debug: make sure DEBUG_STRICT=0 does work as documented
Setting DEBUG_STRICT=0 only validates the defined(DEBUG_STRICT) test
regarding DEBUG_STRICT_ACTION, which is equivalent to DEBUG_STRICT>=0.
Let's make sure the test checks for >0 so that DEBUG_STRICT=0 properly
disables DEBUG_STRICT.
2024-04-11 16:41:08 +02:00
Willy Tarreau
2a9ccf5b25 BUILD: atomic: fix peers build regression on gcc < 4.7 after recent changes
Recent commit 4c1480f13b ("MINOR: stick-tables: mark the seen stksess
with a flag "seen"") introduced a build regression on older versions of
gcc before 4.7. This is in the old __sync_ API, the HA_ATOMIC_LOAD()
implementation uses an intermediary return value called "ret" that is
of the same name as the variable passed in argument to the macro in the
aforementioned commit. As such, the compiler complains with a cryptic
error:
  src/peers.c: In function 'peer_teach_process_stksess_lookup':
  src/peers.c:1502: error: invalid type argument of '->' (have 'int')

The solution is to avoid referencing the argument in the expression and
using an intermediary variable for the pointer as done elsewhere in the
code. It seems there's no other place affected with this. It probably
does not need to be backported since this code is antique and very rarely
used nowadays.
2024-04-11 16:41:08 +02:00
Amaury Denoyelle
32b9e97f92 BUG/MINOR: guid: fix crash on invalid guid name
Using an invalid GUID for guid_insert() causes a crash. This is easily
reproducible using for example an invalid character with "guid" keyword.
Here is the provided backtrace :

  Thread 1 "haproxy" received signal SIGSEGV, Segmentation fault.
  0x00005555561fda95 in guid_insert (objt=0x520000002080, uid=0x519000002dac "@foo2", errmsg=0x7ffff4c0a7a0)
      at src/guid.c:83
  83              ha_free(&guid->node.key);

This error is present in guid_insert() cleanup parts. GUID node is not
allocated in case of an early error so it's impossible to dereference it
to free guid.node.key. Fix this simply by using an intermediary pointer
key.

This does not need to be backported.
2024-04-11 15:09:53 +02:00
Willy Tarreau
d78c346670 BUILD: makefile: support USE_xxx=0 as well
William rightfully reported that not supporting =0 to disable a USE_xxx
option is sometimes painful (e.g. a script might do USE_xxx=$(command)).
It's not that difficult to handle actually, we just need to consider the
value 0 as empty at the few places that test for an empty string in
options.mk, and in each "ifneq" test in the main Makefile, so let's do
that. We even take care of preserving the original value in the build
options string so that building with USE_OPENSSL=0 will be reported
as-is in haproxy -vv, and with "-OPENSSL" in the feature list.
2024-04-11 11:06:19 +02:00
Willy Tarreau
aa32ab13f0 BUILD: makefile: warn about unknown USE_* variables
William suggested that it would be nice to warn about unknown USE_*
variables to more easily catch misspelled ones. The valid ones are
present in use_opts, so by appending "=%" to each of them, we can
build a series of patterns to exclude from MAKEOVERRIDES and emit
a warning for the ones that stand out.

Example:

  $ make TARGET=linux-glibc  USE_QUIC_COMPAT_OPENSSL=1
  Makefile:338: Warning: ignoring unknown build option: USE_QUIC_COMPAT_OPENSSL=1
    CC      src/slz.o
2024-04-11 11:06:19 +02:00
Christopher Faulet
1fa6eb2eb9 BUG/MINOR: http-ana: Fix TX_L7_RETRY and TX_D_L7_RETRY values
These values are obviously wrong. There is an extra zero at the end for both
defines. By chance, it is harmless. But it is better to fix it.

This patch should be backported as far as 2.6.
2024-04-10 15:50:00 +02:00
Christopher Faulet
3fc38593d5 BUG/MEDIUM: http-ana: Deliver 502 on keep-alive for fressh server connection
In HTTP keep-alive, if we face a connection error to the server while
sending the request, the error should not be reported, and the client-side
connection should simply be closed, so that client knows it can retry.

If the error happens during the connection stage, there is two cases. We
have a connection timeout or an allocation error. In this case, the 503
response must be skipped if it is not the first request on the client-side
connection. Or we have a connection error. In this case, the 503 response
must be skipped if it is a reused server connection. Otherwise, during the
connection stage, the 503-Service-unavailable response is delivered to the
client. The part works properly.

If the error happens after this stage, the 502-Bad-gateway response
delivering should only be based on the server-side connection status. For a
reused server connection, the client-side connection must be closed with no
reponses. However, for a fresh server-side connection, a 502-Bad-gateway
response must be delivered to the client. Unfortunately, This part is
buggy. Only the client-side connection state is considered and the response
is skipped if it is not the first request for the same client connection.

The bug is not so visbile in HTTP/1.1 but in H2 and H3 it is pretty annoying
because for a connection, requests are multiplexed, in parallels. It means
there is no first request. So, because of this bug, for H2 and H3,
502-Bad-gateway responses because of a connection error before receiveing
the response are always skipped.

To fix the issue, in http_wait_for_response() analyser, we must only rely on
SF_SRV_REUSED stream flag to skip the 502 response or not. This flag is set
if the server connection was reused.

The bug is their since a while. SF_SRV_REUSED flag was added in the version
1.5 especially to fix this kind of bug. But only the 503 case was fixed.

This patch should fix the issue #2285. It must be backported to every stable
versions.
2024-04-10 15:42:22 +02:00
Amaury Denoyelle
c6e3d60fc1 OPTIM: quic: do not call qc_prep_pkts() if everything sent
qc_send() is implemented as a loop to repeatedly invoke
qc_prep_pkts()/qc_send_ppkts(). This ensures that all data are emitted
even if bigger that a single Tx buffer instance. This is useful if
congestion window is empty but big enough for application data.

Looping is interrupted if qc_prep_pkts() returns a negative error
code, for example due to no space left in congestion window. It can also
returns 0 if no input data to sent, which also interrupt the loop.

To limit this last case, removed quic_enc_level from send_list each time
everything already send via qc_prep_pkts(). Loop can then be interrupted
as soon as send_list is empty, avoiding an extra superfluous call to
qc_prep_pkts().
2024-04-10 11:18:01 +02:00
Amaury Denoyelle
34b31d85cb OPTIM: quic: do not call qc_send() if nothing to emit
qc_send() was systematically called by quic_conn IO handlers with all
instantiated quic_enc_level. Change this to only register quic_enc_level
for send if needed. Do not call at all qc_send() if no qel registered.

A new function qel_need_sending() is defined to detect if sending is
required. First, it checks if quic_enc_level has prepared frames or
probing is set. It can also returns true if ACK required either on
quic_enc_level itself or because of quic_conn ack timer fired. Finally,
a CONNECTION_CLOSE emission for quic_conn is also a valid case.

This should reduce the number of invocations of qc_send(). This could
improve slightly performance, as well as simplify traces debugging.
2024-04-10 11:17:21 +02:00
Amaury Denoyelle
7fc1ce5bc8 MEDIUM: quic: remove duplicate hdshk/app send functions
A series of previous patches have clean up sending function for
handshake case. Their new exposed API is now flexible enough to convert
app case to use the same functions.

As such, qc_send_hdshk_pkts() is renamed qc_send() and become the single
entry point for QUIC emission. It is used during application packets
emission in quic_conn_app_io_cb(), qc_send_mux(). Also the internal
function qc_prep_hpkts() is renamed qc_prep_pkts().

Remove the new unneeded qc_send_app_pkts() and qc_prep_app_pkts().

Also removed qc_send_app_probing(). It was a simple wrapper over other
application send functions. Now, default qc_send() can be reuse for such
cases with <old_data> argument set to true.

An adjustment was needed when converting qc_send_hdshk_pkts() to the
general qc_send() version. Previously, only a single packets
encoding/emission cycle was performed. This was enough as handshake
packets are always smaller than Tx buffer. However, it may be possible
to emit more application data. As such, a loop is necessary to perform
multiple encoding/emission cycles, as this was already the case in
qc_send_app_pkts().

No functional difference should happen with this commit. However, as
these are critcal functions with a lot of changes, this patch is
labelled as medium.
2024-04-10 11:07:35 +02:00
Amaury Denoyelle
4e4127a66d MINOR: quic: use qc_send_hdshk_pkts() in handshake IO cb
quic_conn_io_cb() manually implements emission by using lower level
functions qc_prep_pkts() and qc_send_ppkts(). Replace this by using the
higher level function qc_send_hdshk_pkts() which notably handle buffer
allocation and purging.

This allows to clean up send API by flagging qc_prep_pkts() and
qc_send_ppkts() as static. They are now used in a single location inside
qc_send_hdshk_pkts().
2024-04-10 11:07:19 +02:00
Amaury Denoyelle
3a8f4761e7 MINOR: quic: improve sending API on retransmit
qc_send_hdshk_pkts() is a wrapper for qc_prep_hpkts() used on
retransmission. It was restricted to use two quic_enc_level pointers as
distinct arguments. Adapt it to directly use the same list of
quic_enc_level which is passed then to qc_prep_hpkts().

Now for retransmission quic_enc_level send list is built directly into
qc_dgrams_retransmit() which calls qc_send_hdshk_pkts().

Along this change, a new utility function qel_register_send() is
defined. It is an helper to build the quic_enc_level send list. It
enfores that each quic_enc_level instance is only registered in a single
list to prevent memory issues. It is both used in qc_dgrams_retransmit()
and quic_conn_io_cb().
2024-04-10 11:06:55 +02:00
Amaury Denoyelle
93f5b4c8ae MINOR: quic: uniformize sending methods for handshake
Emission of packets during handshakes was implemented via an API which
uses two alternative ways to specify the list of frames.

The first one uses a NULL list of quic_enc_level as argument for
qc_prep_hpkts(). This was an implicit method to iterate on all qels
stored in quic_conn instance, with frames already inserted in their
corresponding quic_pktns.

The second method was used for retransmission. It uses a custom local
quic_enc_level list specified by the caller as input to qc_prep_hpkts().
Frames were accessible through <retransmit> list pointers of each
quic_enc_level used in an implicit mechanism.

This commit clarifies the API by using a single common method. Now
quic_enc_level list must always be specified by the caller. As for
frames list, each qels must set its new field <send_frms> pointer to the
list of frames to send. Callers of qc_prep_hpkts() are responsible to
always clear qels send list. This prevent a single instance of
quic_enc_level to be inserted while being attached to another list.

This allows notably to clean up some unnecessary code. First,
<retransmit> list of quic_enc_level is removed as it is replaced by new
<send_frms>. Also, it's now possible to use proper list_for_each_entry()
inside qc_prep_hpkts() to loop over each qels. Internal functions for
quic_enc_level selection is now removed.
2024-04-10 11:06:41 +02:00
Amaury Denoyelle
44eec848e8 MINOR: quic: simplify qc_send_hdshk_pkts() return
Clean up trailer of qc_send_hdshk_pkts() by removing label "leave". Only
"out" label is now used. This operation is safe as LIST_DEL_INIT() is
idempotent. Caller of qc_send_hdshk_pkts() also ensures input frame
lists are freed, so it's better to always reset quic_enc_level
<retrans_frms> member.

Also take the opportunity to reset QUIC_FL_CONN_RETRANS_OLD_DATA only if
already set. This is considered more robust and will also remove
unneeded trace occurences.

No functional change. The main objective of this commit is to clean up
code in preparation of a refactoring on send functions.
2024-04-10 10:14:36 +02:00
Aurelien DARRAGON
9420cfc0db CLEANUP: log: lf_text_len() returns a pointer not an integer
In c83684519 ("MEDIUM: log: add the ability to include samples in logs")
we checked the return value of lf_text_len() as an integer instead of
comparing the pointer with NULL explicitly. Since this may be confusing,
let's test the return value against NULL.

[ada: for backports, the patch needs to be applied manually because of
 c6a713842 ("MINOR: log: simplify last_isspace in sess_build_logline()")]
2024-04-09 17:35:53 +02:00
Aurelien DARRAGON
28548f812f BUG/MINOR: log: invalid snprintf() usage in sess_build_logline()
According to snprintf() man page:

       The  functions snprintf() and vsnprintf() do not write more than
       size bytes (including the terminating null byte ('\0')).  If the
       output was truncated due to this limit, then the return value is
       the number of  characters  (excluding  the  terminating null byte)
       which would have been written to the final string if enough space
       had been available.  Thus, a return value of size or more means
       that the output was truncated.

However, in sess_build_logline(), each time we need to check the return
value of snprintf(), here is how we proceed:

       iret = snprintf(tmplog, max, ...);
       if (iret < 0 || iret > max)
           // error
       // success
       tmplog += iret;

Here is the issue: if snprintf() lacks 1 byte space to write the
terminating NULL byte, it will return max. Which means in this case
that we fail to know that snprintf() truncated the output in reality,
and we still add iret to tmplog pointer. Considering sess_build_logline()
should NOT write more than <maxsize> bytes (including the terminating NULL
byte) as per the function description, in this case the function would
write <maxsize>+1 byte (to write the terminating NULL byte upon return),
which may lead to invalid write if <dst> was meant to hold <maxsize> bytes
at maximum.

Hopefully, this bug wasn't triggered so far because sess_build_logline()
is called with logline as <dst> argument and <global.max_syslog_len> as
<maxsize> argument, logline being initialized with 1 extra byte upon
startup.

But we better fix this to comply with the function description and prevent
any side-effect since some sess_build_logline() helpers may assume that
'tmplog-dst < maxsize' is always true. Also sess_build_logline() users
probably don't expect NULL-byte to be accounted for in the produced
logline length.

This should be backported to all stable versions.

[ada: for backports, the patch needs to be applied manually because of
 c6a713842 ("MINOR: log: simplify last_isspace in sess_build_logline()")]
2024-04-09 17:35:53 +02:00
Aurelien DARRAGON
8226e92eb0 BUG/MINOR: tools/log: invalid encode_{chunk,string} usage
encode_{chunk,string}() is often found to be used this way:

  ret = encode_{chunk,string}(start, stop...)
  if (ret == NULL || *ret != '\0') {
	//error
  }
  //success

Indeed, encode_{chunk,string} will always try to add terminating NULL byte
to the output string, unless no space is available for even 1 byte.
However, it means that for the caller to be able to spot an error, then it
must provide a buffer (here: start) which is already initialized.

But this is wrong: not only this is very tricky to use, but since those
functions don't return NULL on failure, then if the output buffer was not
properly initialized prior to calling the function, the caller will
perform invalid reads when checking for failure this way. Moreover, even
if the buffer is initialized, we cannot reliably tell if the function
actually failed this way because if the buffer was previously initialized
with NULL byte, then the caller might think that the call actually
succeeded (since the function didn't return NULL and didn't update the
buffer).

Also, sess_build_logline() relies lf_encode_{chunk,string}() functions
which are in fact wrappers for encode_{chunk,string}() functions and thus
exhibit the same error handling mechanism. It turns out that
sess_build_logline() makes unsafe use of those functions because it uses
the error-checking logic mentionned above while buffer (tmplog) is not
guaranteed to be initialized when entering the function. This may
ultimately cause malfunctions or invalid reads if the output buffer is
lacking space.

To fix the issue once and for all and prevent similar bugs from being
introduced, we make it so encode_{string, chunk} and escape_string()
(based on encode_string()) now explicitly return NULL on failure
(when the function failed to write at least the ending NULL byte)

lf_encode_{string,chunk}() helpers had to be patched as well due to code
duplication.

This should be backported to all stable versions.

[ada: for 2.4 and 2.6 the patch won't apply as-is, it might be helpful to
 backport ae1e14d65 ("CLEANUP: tools: removing escape_chunk() function")
 first, considering it's not very relevant to maintain a dead function]
2024-04-09 17:35:45 +02:00
Aurelien DARRAGON
b15f6dfae8 BUG/MINOR: log: fix lf_text_len() truncate inconsistency
In c5bff8e550 ("BUG/MINOR: log: improper behavior when escaping log data")
we fixed lf_text_len() behavior with +E (escape) option.

However we introduced an inconsistency if output buffer is too small to
hold the whole output and truncation occurs: indeed without +E option up
to <size> bytes (including NULL byte) will be used whereas with +E option
only <size-1> bytes will be used. Fixing the function and related comment
so that the function behaves the same in regards to truncation whether +E
option is used or not.

This should be backported to all stable versions.
2024-04-09 17:30:13 +02:00
Willy Tarreau
0db8b6034d BUG/MINOR: listener: always assign distinct IDs to shards
When sharded listeners were introdcued in 2.5 with commit 6dfbef4145
("MEDIUM: listener: add the "shards" bind keyword"), a point was
overlooked regarding how IDs are assigned to listeners: they are just
duplicated! This means that if a "option socket-stats" is set and a
shard is configured, or multiple thread groups are enabled, then a stats
dump will produce several lines with exactly the same socket name and ID.

This patch tries to address this by trying to assign consecutive numbers
to these sockets. The usual algo is maintained, but with a preference for
the next number in a shard. This will help users reserve ranges for each
socket, for example by using multiples of 100 or 1000 on each bind line,
leaving enough room for all shards to be assigned.

The mechanism however is quite tricky, because the configured listener
currently ends up being the last one of the shard. This helps insert them
before the current position without having to revisit them. But here it
causes a difficulty which is that we'd like to restart from the current
ID and assign new ones on top of it. What is done is that the number is
passed between shards and the current one is cleared (and removed from
the tree) so that we instead insert the new one. It's tricky because of
the situation which depends whether it's the listener that was already
assigned on the bind line or not. But overall, always removing the entry,
always adding the new one when the ID is not zero, and passing them from
the reference to the next one does the trick.

This may be backported to all versions till 2.6.
2024-04-09 08:57:02 +02:00
Christopher Faulet
70251a2aeb BUG/MINOR: cli: Don't warn about a too big command for incomplete commands
When a command is too big to fit in a buffer, a error is returned before
closing. However, the error is also returned if the command is small enough
but incomplete. It happens on abort. In this case, the error must not be
reported. The regression was introduced when a dedicated sn_buf callbac
function was added.

To fix the issue, both cases are now handled separately.

No backport needed.
2024-04-08 11:49:13 +02:00
Willy Tarreau
0046922aed [RELEASE] Released version 3.0-dev7
Released version 3.0-dev7 with the following main changes :
    - BUG/MINOR: ssl: Wrong ocsp-update "incompatibility" error message
    - BUG/MINOR: ssl: Detect more 'ocsp-update' incompatibilities
    - MEDIUM: ssl: Add 'tune.ssl.ocsp-update.mode' global option
    - REGTESTS: ssl: Add OCSP update compatibility tests
    - REGTESTS: ssl: Add functional test for global ocsp-update option
    - BUG/MINOR: server: reject enabled for dynamic server
    - BUG/MINOR: server: fix persistence cookie for dynamic servers
    - MINOR: server: allow cookie for dynamic servers
    - REGTESTS: Fix script about OCSP update compatibility tests
    - BUG/MINOR: cli: Report an error to user if command or payload is too big
    - MINOR: sc_strm: Add generic version to perform sync receives and sends
    - MEDIUM: stream: Use generic version to perform sync receives and sends
    - MEDIUM: buf: Add b_getline() and b_getdelim() functions
    - MEDIUM: applet: Handle applets with their own buffers in put functions
    - MEDIUM: cli/applet: Stop to test opposite SC in I/O handler of CLI commands
    - MINOR: applet: Always use applet API to set appctx flags
    - BUG/MEDIUM: applet: State appctx have more data if its EOI/EOS/ERROR flag is set
    - MAJOR: cli: Update the CLI applet to handle its own buffers
    - MINOR: applet: Let's applets .snd_buf function deal with full input buffers
    - MINOR: stconn: Add a connection flag to notify sending data are the last ones
    - MAJOR: cli: Use a custom .snd_buf function to only copy the current command
    - DOC: config: balance 'first' not usable in LOG mode
    - BUG/MINOR: log/balance: detect if user tries to use unsupported algo
    - MINOR: lbprm: implement true "sticky" balance algo
    - MEDIUM: log/balance: leverage lbprm api for log load-balancing
    - BUG/BUILD: debug: fix unused variable error
    - MEDIUM: lb-chash: Deterministic node hashes based on server address
    - BUG/MEDIUM: stick-tables: fix a small remaining race in expiration task
    - REGTESTS: Do not use REQUIRE_VERSION for HAProxy 2.5+ (4)
    - REGTESTS: Remove REQUIRE_VERSION=1.9 from all tests (2)
    - CLEANUP: Reapply ist.cocci (3)
    - CLEANUP: Reapply strcmp.cocci (2)
    - CLEANUP: Reapply xalloc_cast.cocci
    - CLEANUP: Reapply ha_free.cocci
    - CI: vtest: show coredumps if any
    - REGTESTS: ssl: disable ssl/ocsp_auto_update.vtc
    - BUG/MINOR: backend: properly handle redispatch 0
    - MINOR: quic: HyStart++ implementation (RFC 9406)
    - BUG/MEDIUM: stconn: Don't forward shutdown to SE if iobuf is not empty
    - BUG/MEDIUM: stick-table: use the update lock when reading tables from peers
    - BUG/MAJOR: applet: fix a MIN vs MAX usage in appctx_raw_rcv_buf()
    - OPTIM: peers: avoid the locking dance around peer_send_teach_process_msgs()
    - BUILD: quic: 32 bits compilation issue (QUIC_MIN() usage)
    - BUG/MEDIUM: server/lbprm: fix crash in _srv_set_inetaddr_port()
    - MEDIUM: mworker: get rid of libsystemd
    - BUILD: systemd: fix build error on non-systemd systems with USE_SYSTEMD=1
    - BUG/MINOR: bwlim/config: fix missing '\n' after error messages
    - MINOR: stick-tables: mark the seen stksess with a flag "seen"
    - OPTIM: stick-tables: check the stksess without taking the read lock
    - MAJOR: stktable: split the keys across multiple shards to reduce contention
    - CI: extend Fedora Rawhide, add m32 mode
    - BUG/MINOR: stick-tables: Missing stick-table key nullity check
    - BUILD: systemd: enable USE_SYSTEMD by default with TARGET=linux-glibc
    - MINOR: systemd: Include MONOTONIC_USEC field in RELOADING=1 message
    - BUG/MINOR: proxy: fix logformat expression leak in use_backend rules
    - MEDIUM: log: rename logformat var to logformat tag
    - MINOR: log: expose logformat_tag struct
    - MEDIUM: log: carry tag context in logformat node
    - MEDIUM: tree-wide: add logformat expressions wrapper
    - MINOR: proxy: add PR_FL_CHECKED flag
    - MAJOR: log: implement proper postparsing for logformat expressions
    - MEDIUM: log: add compiling logic to logformat expressions
    - MEDIUM: proxy/log: leverage lf_expr API for logformat preparsing
    - MINOR: guid: introduce global UID module
    - MINOR: guid: restrict guid format
    - MINOR: proxy: implement GUID support
    - MINOR: server: implement GUID support
    - MINOR: listener: implement GUID support
    - DOC: configuration: grammar fixes for strict-sni
    - BUG/MINOR: init: relax LSTCHK_NETADM checks for non root
    - MEDIUM: capabilities: check process capabilities sets
    - CLEANUP: global: remove LSTCHK_CAP_BIND
    - BUG/MEDIUM: quic: don't blindly rely on unaligned accesses
2024-04-06 17:02:07 +02:00
Willy Tarreau
c499cd15c7 BUG/MEDIUM: quic: don't blindly rely on unaligned accesses
There are several places where the QUIC low-level code performs unaligned
accesses by casting unaligned char* pointers to uint32_t, but this is
totally forbidden as it only works on machines that support unaligned
accesses, and either crashes on other ones (SPARC, MIPS), can result in
reading garbage (ARMv5) or be very slow due to the access being emulated
(RISC-V). We do have functions for this, such as read_u32() and write_u32()
that rely on the compiler's knowledge of the machine's capabilities to
either perform an unaligned access or do it one byte at a time.

This must be backported at least as far as 2.6. Some of the code moved a
few times since, so in order to figure the points that need to be fixed,
one may look for a forced pointer cast without having verified that either
the machine is compatible or that the pointer is aligned using this:

  $ git grep 'uint[36][24]_t \*)'

Or build and run the code on a MIPS or SPARC and perform requests using
curl to see if they work or crash with a bus error. All the places fixed
in this commit were found thanks to an immediate crash on the first
request.

This was tagged medium because the affected archs are not the most common
ones where QUIC will be found these days.
2024-04-06 00:07:49 +02:00
Valentine Krasnobaeva
eef14e9574 CLEANUP: global: remove LSTCHK_CAP_BIND
Remove LSTCHK_CAP_BIND as it is never set and never checked.
2024-04-05 18:01:54 +02:00
Valentine Krasnobaeva
f0b6436f57 MEDIUM: capabilities: check process capabilities sets
Since the Linux capabilities support add-on (see the commit bd84387beb
("MEDIUM: capabilities: enable support for Linux capabilities")), we can also
check haproxy process effective and permitted capabilities sets, when it
starts and runs as non-root.

Like this, if needed network capabilities are presented only in the process
permitted set, we can get this information with capget and put them in the
process effective set via capset. To do this properly, let's introduce
prepare_caps_from_permitted_set().

First, it checks if binary effective set has CAP_NET_ADMIN or CAP_NET_RAW. If
there is a match, LSTCHK_NETADM is removed from global.last_checks list to
avoid warning, because in the initialization sequence some last configuration
checks are based on LSTCHK_NETADM flag and haproxy process euid may stay
unpriviledged.

If there are no CAP_NET_ADMIN and CAP_NET_RAW in the effective set, permitted
set will be checked and only capabilities given in 'setcap' keyword will be
promoted in the process effective set. LSTCHK_NETADM will be also removed in
this case by the same reason. In order to be transparent, we promote from
permitted set only capabilities given by user in 'setcap' keyword. So, if
caplist doesn't include CAP_NET_ADMIN or CAP_NET_RAW, LSTCHK_NETADM would not
be unset and warning about missing priviledges will be emitted at
initialization.

Need to call it before protocol_bind_all() to allow binding to priviledged
ports under non-root and 'setcap cap_net_bind_service' must be set in the
global section in this case.
2024-04-05 18:01:54 +02:00
Valentine Krasnobaeva
e4306fb822 BUG/MINOR: init: relax LSTCHK_NETADM checks for non root
Linux capabilities support and ability to preserve it for running process
after switching to a global.uid was added recently by the commit bd84387beb
("MEDIUM: capabilities: enable support for Linux capabilities")).
This new feature hasn't yet been taken into account by last config checks,
which are performed at initialization stage.

So, to update it, let's perform it after set_identity() call. Like this,
current EUID is already changed to a global.uid and prepare_caps_for_setuid()
would unset LSTCHK_NETADM flag, only if capabilities given in the 'setcap'
keyword in the configuration file were preserved.

Otherwise, if system doesn't support Linux capabilities or they were not set
via 'setcap', we keep the previous strict behaviour: process will terminate
with an alert, in order to insist that user: either needs to change
run UID (worst case: start and run as root), or he needs to set/recheck
capabilities listed as 'setcap' arguments.

In the case, when haproxy will start and run under a non-root user this patch
doesn't change the previous behaviour: we'll still let him try the
configuration, but we inform via warning that unexpected things may occur.

Need to be backported until v2.9, including v2.9.
2024-04-05 18:01:54 +02:00
Nicolas CARPi
a4f564b05e DOC: configuration: grammar fixes for strict-sni
Fix incorrect grammar in strict-sni:
* is allow -> is allowed
* which match -> that matches
* allows to start -> allows starting
2024-04-05 17:56:37 +02:00
Amaury Denoyelle
0489d85263 MINOR: listener: implement GUID support
This commit is similar with the two previous ones. Its purpose is to add
GUID support on listeners. Due to bind_conf and listeners configuration,
some specifities were required.

Its possible to define several listeners on a single bind line, for
example by specifying multiple addresses. As such, it's impossible to
support a "guid" keyword on a bind line. The problem is exacerbated by
the cloning of listeners when sharding is used.

To resolve this, a new keyword "guid-prefix" is defined for bind lines.
It allows to specify a string which will be used as a prefix for
automatically generated GUID for each listeners attached to a bind_conf.

Automatic GUID listeners generation is implemented via a new function
bind_generate_guid(). It is called on post-parsing, after
bind_complete_thread_setup(). For each listeners on a bind_conf, a new
GUID is generated with bind_conf prefix and the index of the listener
relative to other listeners in the bind_conf. This last value is stored
in a new bind_conf field named <guid_idx>. If a GUID cannot be inserted,
for example due to a non-unique value, an error is returned, startup is
interrupted with configuration rejected.
2024-04-05 15:40:42 +02:00
Amaury Denoyelle
8259456981 MINOR: server: implement GUID support
This commit is similar to previous one, except that it implements GUID
support for server instances. A guid_node field is inserted into server
structure. A new "guid" server keyword is defined.
2024-04-05 15:40:42 +02:00
Amaury Denoyelle
da754b4533 MINOR: proxy: implement GUID support
Implement proxy identiciation through GUID. As such, a guid_node member
is inserted into proxy structure. A proxy keyword "guid" is defined to
allow user to fix its value.
2024-04-05 15:40:42 +02:00
Amaury Denoyelle
1009ca4160 MINOR: guid: restrict guid format
GUID format is unspecified to allow users to choose the naming scheme.
Some restrictions however are added by this patch, mainly to ensure
coherence and memory usage.

The first restriction is on the length of GUID. No more than 127
characters can be used to prevent memory over consumption.

The second restriction is on the character set allowed in GUID. Utility
function invalid_char() is used for this : it allows alphanumeric
values and '-', '_', '.' and ':'.
2024-04-05 15:40:42 +02:00
Amaury Denoyelle
84fa6b344a MINOR: guid: introduce global UID module
Define a new module guid. Its purpose is to be able to attach a global
identifier for various objects such as proxies, servers and listeners.

A new type guid_node is defined. It will be stored in the objects which
can be referenced by such GUID. Several functions are implemented to
properly initialized, insert, remove and lookup GUID in a global tree.
Modification operations should only be conducted under thread isolation.
2024-04-05 15:40:42 +02:00
Aurelien DARRAGON
e751eebfc6 MEDIUM: proxy/log: leverage lf_expr API for logformat preparsing
Currently, the way proxy-oriented logformat directives are handled is way
too complicated. Indeed, "log-format", "log-format-error", "log-format-sd"
and "unique-id-format" all rely on preparsing hints stored inside
proxy->conf member struct. Those preparsing hints include the original
string that should be compiled once the proxy parameters are known plus
the config file and line number where the string was found to generate
precise error messages in case of failure during the compiling process
that happens within check_config_validity().

Now that lf_expr API permits to compile a lf_expr struct that was
previously prepared (with original string and config hints), let's
leverage lf_expr_compile() from check_config_validity() and instead
of relying on individual proxy->conf hints for each logformat expression,
store string and config hints in the lf_expr struct directly and use
lf_expr helpers funcs to handle them when relevant (ie: original
logformat string freeing is now done at a central place inside
lf_expr_deinit(), which allows for some simplifications)

Doing so allows us to greatly simplify the preparsing logic for those 4
proxy directives, and to finally save some space in the proxy struct.

Also, since httpclient proxy has its "logformat" automatically compiled
in check_config_validity(), we now use the file hint from the logformat
expression struct to set an explicit name that will be reported in case
of error ("parsing [httpclient:0] : ...") and remove the extraneous check
in httpclient_precheck() (logformat was parsed twice previously..)
2024-04-04 19:10:01 +02:00
Aurelien DARRAGON
2b79457bc0 MEDIUM: log: add compiling logic to logformat expressions
split parse_logformat_string() into two functions:

parse_logformat_string() sticks to the same behavior, but now becomes an
helper for lf_expr_compile() which uses explicit arguments so that it
becomes possible to use lf_expr_compile() without a proxy, but also
compile an expression which was previously prepared for compiling (set
string and config hints within the logformat expression to avoid manually
storing string and config context if the compiling step happens later).

lf_expr_dup() may be used to duplicate an expression before it is
compiled, lf_expr_xfer() now makes sure that the input logformat is
already compiled.

This is some prerequisite works for log-profiles implementation, no
functional change should be expected.
2024-04-04 19:10:01 +02:00
Aurelien DARRAGON
7a21c3a4ef MAJOR: log: implement proper postparsing for logformat expressions
This patch tries to address a design flaw with how logformat expressions
are parsed from config. Indeed, some parse_logformat_string() calls are
performed during config parsing when the proxy mode is not yet known.

Here's a config example that illustrates the issue:

  defaults
     mode tcp

  listen test
     bind :8888
     http-response set-header custom-hdr "%trl" # needs http
     mode http

The above config should work, because the effective proxy mode is http,
yet haproxy fails with this error:

  [ALERT]    (99051) : config : parsing [repro.conf:6] : error detected in proxy 'test' while parsing 'http-response set-header' rule : format tag 'trl' is reserved for HTTP mode.

To fix the issue once and for all, let's implement smart postparsing for
logformat expressions encountered during config parsing:

  - split parse_logformat_string() (and subfonctions) in order to create a
    new lf_expr_postcheck() function that must be called to finish
    preparing and checking the logformat expression once the proxy type is
    known.
  - save some config hints info during parse_logformat_string() to
    generate more precise error messages during lf_expr_postcheck(), if
    needed, we rely on curpx->conf.args.{file,line} hints for that because
    parse_logformat_string() doesn't know about current file and line
    number.
  - lf_expr_postcheck() uses PR_FL_CHECKED proxy flag to know if the
    function may try to make the proxy compatible with the expression, or
    if it should simply fail as soon as an incompatibility is detected.
  - if parse_logformat_string() is called from an unchecked proxy, then
    schedule the expression for postparsing, else (ie: during runtime),
    run the postcheck right away.

This change will also allow for some logformat expression error handling
simplifications in the future.
2024-04-04 19:10:01 +02:00
Aurelien DARRAGON
56d8074798 MINOR: proxy: add PR_FL_CHECKED flag
PR_FL_CHECKED is set on proxy once the proxy configuration was fully
checked (including postparsing checks).

This information may be useful to functions that need to know if some
config-related proxy properties are likely to change or not due to parsing
or postparsing/check logics. Also, during runtime, except for some rare cases
config-related proxy properties are not supposed to be changed.
2024-04-04 19:10:01 +02:00
Aurelien DARRAGON
6810c41f8e MEDIUM: tree-wide: add logformat expressions wrapper
log format expressions are broadly used within the code: once they are
parsed from input string, they are converted to a linked list of
logformat nodes.

We're starting to face some limitations because we're simply storing the
converted expression as a generic logformat_node list.

The first issue we're facing is that storing logformat expressions that
way doesn't allow us to add metadata alongside the list, which is part
of the prerequites for implementing log-profiles.

Another issue with storing logformat expressions as generic lists of
logformat_node elements is that it's starting to become really hard to
tell when we rely on logformat expressions or not in the code given that
there isn't always a comment near the list declaration or manipulation
to indicate that it's relying on logformat expressions under the hood,
so this adds some complexity for code maintenance.

This patch looks quite impressive due to changes in a lot of header and
source files (since logformat expressions are broadly used), but it does
a simple thing: it defines the lf_expr structure which itself holds a
generic list of logformat nodes, and then declares some helpers to
manipulate lf_expr elements and fixes the code so that we now exclusively
manipulate logformat_node lists as lf_expr elements outside of log.c.

For now, lf_expr struct only contains the list of logformat nodes (no
additional metadata), but now that we have dedicated type and helpers,
doing so in the future won't be problematic at all and won't require
extensive code changes.
2024-04-04 19:10:01 +02:00
Aurelien DARRAGON
7d8f45b647 MEDIUM: log: carry tag context in logformat node
This is a pretty simple patch despite requiring to make some visible
changes in the code:

When parsing a logformat string, log tags (ie: '%tag', AKA log tags) are
turned into logformat nodes with their type set to the type of the
corresponding logformat_tag element which was matched by name. Thus, when
"compiling" a logformat tag, we only keep a reference to the tag type
from the original logformat_tag.

For example, for "%B" log tag, we have the following logformat_tag
element:

  {
    .name = "B",
    .type = LOG_FMT_BYTES,
    .mode = PR_MODE_TCP,
    .lw = LW_BYTES,
    .config_callback = NULL
  }

When parsing "%B" string, we search for a matching logformat tag
inside logformat_tags[] array using the provided name, once we find a
matching element, we craft a logformat node whose type will be
LOG_FMT_BYTES, but from the node itself, we no longer have access to
other informations that are set in the logformat_tag struct element.

Thus from a logformat_node resulting from a log tag, with current
implementation, we cannot easily get back to matching logformat_tag
struct element as it would require us to scan the whole logformat_tags
array at runtime using node->type to find the matching element.

Let's take a simpler path and consider all tag-specific LOG_FMT_*
subtypes as being part of the same logformat node type: LOG_FMT_TAG.

Thanks to that, we're now able to distinguish logformat nodes made
from logformat tag from other logformat nodes, and link them to
their corresponding logformat_tag element from logformat_tags[] array. All
it costs is a simple indirection and an extra pointer in logformat_node
struct.

While at it, all LOG_FMT_* types related to logformat tags were moved
inside log.c as they have no use outside of it since they are simply
lookup indexes for sess_build_logline() and could even be replaced by
function pointers some day...
2024-04-04 19:10:01 +02:00
Aurelien DARRAGON
8cf5c3d7f0 MINOR: log: expose logformat_tag struct
rename logformat_type internal struct to logformat_tag to to make it less
confusing, then expose logformat_tag struct through header file so that it
can be referenced in other structs.

also rename logformat_keywords[] to logformat_tags[] for better
consistency.
2024-04-04 19:10:01 +02:00
Aurelien DARRAGON
c85cbc1061 MEDIUM: log: rename logformat var to logformat tag
What we use to call logformat variable in the code is referred as
log-format tag in the documentation. Having both 'var' and 'tag' labels
referring to the same thing is really confusing. Let's make the code
comply with the documentation by replacing all logformat var/variable/VAR
occurences with either tag or TAG.

No functional change should be expected, the only visible side-effect from
user point of view is that "variable" was replaced by "tag" in some error
messages.
2024-04-04 19:10:01 +02:00
Aurelien DARRAGON
64b5ab87ef BUG/MINOR: proxy: fix logformat expression leak in use_backend rules
When support for dynamic names was added for use_backend rules in
702d44f2f ("MEDIUM: proxy: support use_backend with dynamic names"), the
sample expression resulting from parse_logformat_string() was only freed
for non dynamic rules (when the expression resolved to a simple string
node). But for complex expressions (ie: multiple nodes), rule->dynamic
was set but the expression was never released, resulting in a small
memory leak when freeing the parent proxy.

To fix the issue, in free_proxy(), we free the switching rule expression
if the switching rule is dynamic.

This should be backported to every stable versions.
[ada: prior to 2.9, free_logformat_list() helper did not exist: we may
 use the same manual sample expr freeing logic as in server_rules pruning
 right above it]
2024-04-04 19:10:01 +02:00
Tim Duesterhus
ad54273cf9 MINOR: systemd: Include MONOTONIC_USEC field in RELOADING=1 message
As per the `sd_notify` manual:

> A field carrying the monotonic timestamp (as per CLOCK_MONOTONIC) formatted
> in decimal in μs, when the notification message was generated by the client.
> This is typically used in combination with "RELOADING=1", to allow the
> service manager to properly synchronize reload cycles. See systemd.service(5)
> for details, specifically "Type=notify-reload".

Thus this change allows users with a recent systemd to switch to
`Type=notify-reload`, should they desire to do so. Correct behavior was
verified with a Fedora 39 VM.

see systemd/systemd#25916

[wla: the service file should be updated this way:]

    diff --git a/admin/systemd/haproxy.service.in b/admin/systemd/haproxy.service.in
    index 22a53d8aab..8c6dadb5e5 100644
    --- a/admin/systemd/haproxy.service.in
    +++ b/admin/systemd/haproxy.service.in
    @@ -8,12 +8,11 @@ EnvironmentFile=-/etc/default/haproxy
     EnvironmentFile=-/etc/sysconfig/haproxy
     Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid" "EXTRAOPTS=-S /run/haproxy-master.sock"
     ExecStart=@SBINDIR@/haproxy -Ws -f $CONFIG -p $PIDFILE $EXTRAOPTS
    -ExecReload=@SBINDIR@/haproxy -Ws -f $CONFIG -c $EXTRAOPTS
    -ExecReload=/bin/kill -USR2 $MAINPID
     KillMode=mixed
     Restart=always
     SuccessExitStatus=143
    -Type=notify
    +Type=notify-reload
    +ReloadSignal=SIGUSR2

     # The following lines leverage SystemD's sandboxing options to provide
     # defense in depth protection at the expense of restricting some flexibility

Signed-off-by: William Lallemand <wlallemand@haproxy.com>
2024-04-04 15:58:29 +02:00
William Lallemand
310e3d070c BUILD: systemd: enable USE_SYSTEMD by default with TARGET=linux-glibc
Since the systemd notify feature is now independant of any library,
lets enable it by default for linux-glibc.

The -Ws mode still need to be used in order to use the sd_nofify()
function. And the function won't do anything if the NOTIFY_SOCKET
environment variable is not defined.
2024-04-04 14:06:11 +02:00
Frederic Lecaille
fcb096f7cd BUG/MINOR: stick-tables: Missing stick-table key nullity check
This bug arrived with this commit:
     MAJOR: stktable: split the keys across multiple shards to reduce contention

At this time, there are no callers which call stktable_get_entry() without checking
the nullity of <key> passed as parameter. But the documentation of this function
says it supports this case where the <key> passed as parameter could be null.

Move the nullity test on <key> at first statement of this function.

Thanks to @chipitsine for having reported this issue in GH #2518.
2024-04-04 11:08:56 +02:00
Ilya Shipitsin
ba1a0559e4 CI: extend Fedora Rawhide, add m32 mode
hopefully it will allow to catch regressions like this
https://github.com/haproxy/haproxy/commit/e41638a
2024-04-04 08:59:34 +02:00
Willy Tarreau
1a088da7c2 MAJOR: stktable: split the keys across multiple shards to reduce contention
In order to reduce the contention on the table when keys expire quickly,
we're spreading the load over multiple trees. That counts for keys and
expiration dates. The shard number is calculated from the key value
itself, both when looking up and when setting it.

The "show table" dump on the CLI iterates over all shards so that the
output is not fully sorted, it's only sorted within each shard. The Lua
table dump just does the same. It was verified with a Lua program to
count stick-table entries that it works as intended (the test case is
reproduced here as it's clearly not easy to automate as a vtc):

  function dump_stk()
    local dmp = core.proxies['tbl'].stktable:dump({});
    local count = 0
    for _, __ in pairs(dmp) do
        count = count + 1
    end
    core.Info('Total entries: ' .. count)
  end

  core.register_action("dump_stk", {'tcp-req', 'http-req'}, dump_stk, 0);

  ##
  global
    tune.lua.log.stderr on
    lua-load-per-thread lua-cnttbl.lua

  listen front
    bind :8001
    http-request lua.dump_stk if { path_beg /stk }
    http-request track-sc1 rand(),upper,hex table tbl
    http-request redirect location /

  backend tbl
    stick-table size 100k type string len 12 store http_req_cnt

  ##
  $ h2load -c 16 -n 10000 0:8001/
  $ curl 0:8001/stk

  ## A count close to 100k appears on haproxy's stderr
  ## On the CLI, "show table tbl" | wc will show the same.

Some large parts were reindented only to add a top-level loop to iterate
over shards (e.g. process_table_expire()). Better check the diff using
git show -b.

The number of shards is decided just like for the pools, at build time
based on the max number of threads, so that we can keep a constant. Maybe
this should be done differently. For now CONFIG_HAP_TBL_BUCKETS is used,
and defaults to CONFIG_HAP_POOL_BUCKETS to keep the benefits of all the
measurements made for the pools. It turns out that this value seems to
be the most reasonable one without inflating the struct stktable too
much. By default for 1024 threads the value is 32 and delivers 980k RPS
in a test involving 80 threads, while adding 1kB to the struct stktable
(roughly doubling it). The same test at 64 gives 1008 kRPS and at 128
it gives 1040 kRPS for 8 times the initial size. 16 would be too low
however, with 675k RPS.

The stksess already have a shard number, it's the one used to decide which
peer connection to send the entry. Maybe we should also store the one
associated with the entry itself instead of recalculating it, though it
does not happen that often. The operation is done by hashing the key using
XXH32().

The peers also take and release the table's lock but the way it's used
it not very clear yet, so at this point it's sure this will not work.

At this point, this allowed to completely unlock the performance on a
80-thread setup:

 before: 5.4 Gbps, 150k RPS, 80 cores
  52.71%  haproxy    [.] stktable_lookup_key
  36.90%  haproxy    [.] stktable_get_entry.part.0
   0.86%  haproxy    [.] ebmb_lookup
   0.18%  haproxy    [.] process_stream
   0.12%  haproxy    [.] process_table_expire
   0.11%  haproxy    [.] fwrr_get_next_server
   0.10%  haproxy    [.] eb32_insert
   0.10%  haproxy    [.] run_tasks_from_lists

 after: 36 Gbps, 980k RPS, 80 cores
  44.92%  haproxy    [.] stktable_get_entry
   5.47%  haproxy    [.] ebmb_lookup
   2.50%  haproxy    [.] fwrr_get_next_server
   0.97%  haproxy    [.] eb32_insert
   0.92%  haproxy    [.] process_stream
   0.52%  haproxy    [.] run_tasks_from_lists
   0.45%  haproxy    [.] conn_backend_get
   0.44%  haproxy    [.] __pool_alloc
   0.35%  haproxy    [.] process_table_expire
   0.35%  haproxy    [.] connect_server
   0.35%  haproxy    [.] h1_headers_to_hdr_list
   0.34%  haproxy    [.] eb_delete
   0.31%  haproxy    [.] srv_add_to_idle_list
   0.30%  haproxy    [.] h1_snd_buf

WIP: uint64_t -> long

WIP: ulong -> uint

code is much smaller
2024-04-03 17:34:47 +02:00
Willy Tarreau
864ac31174 OPTIM: stick-tables: check the stksess without taking the read lock
Thanks to the previous commit, we can now simply perform an atomic read
on stksess->seen and take the write lock to recreate the entry only if
at least one peer has seen it, otherwise leave it untouched. On a test
on 40 cores, the performance used to drop from 2.10 to 1.14M RPS when
one peer was connected, now it drops to 2.05, thus there's basically
no impact of connecting a peer vs ~45% previously, all spent in the
read lock. This can be particularly important when often updating the
same entries (user-agent, source address during an attack etc).
2024-04-03 17:34:47 +02:00