haproxy

mirror of http://git.haproxy.org/git/haproxy.git/ synced 2025-01-27 16:23:18 +00:00

Author	SHA1	Message	Date
Willy Tarreau	6b6a6ad431	CI: update the build options to get rid of unneeded DEBUG options Now that DEBUG_STRICT and DEBUG_MEMORY_POOLS are the default, we can drop them from the build options.	2024-04-11 17:25:45 +02:00
Willy Tarreau	772f9a5874	BUILD: pools: make DEBUG_MEMORY_POOLS=1 the default option This option has been set by default for a very long time and also complicates the manipulation of the DEBUG variable. Let's make it the official default and permit to unset it by setting it to zero. The other pool-related DEBUG options were adjusted to also explicitly check for the zero value for consistency.	2024-04-11 17:25:45 +02:00
Willy Tarreau	b70981532a	BUILD: debug: make DEBUG_STRICT=1 the default We continue to carry it in the makefile, which adds to the difficulty of passing new options. Let's make DEBUG_STRICT=1 the default so that one has to explicitly pass DEBUG_STRICT=0 to disable it. This allows us to remove the option from the default DEBUG variable in the makefile.	2024-04-11 17:25:45 +02:00
Willy Tarreau	b22b968a48	BUILD: cache: fix non-inline vs inline declaration mismatch to silence a warning Some compilers report this on the cache: src/cache.c:235: warning: 'release_entry_locked' declared inline after being called src/cache.c:235: warning: previous declaration of 'release_entry_locked' was here And indeed, the function is first declared non-inline and later inline. Let's just set the inline status from the beginning. It's not really needed to backport this.	2024-04-11 17:25:25 +02:00
Willy Tarreau	e791b243f0	BUG/MINOR: debug: make sure DEBUG_STRICT=0 does work as documented Setting DEBUG_STRICT=0 only validates the defined(DEBUG_STRICT) test regarding DEBUG_STRICT_ACTION, which is equivalent to DEBUG_STRICT>=0. Let's make sure the test checks for >0 so that DEBUG_STRICT=0 properly disables DEBUG_STRICT.	2024-04-11 16:41:08 +02:00
Willy Tarreau	2a9ccf5b25	BUILD: atomic: fix peers build regression on gcc < 4.7 after recent changes Recent commit `4c1480f13b` ("MINOR: stick-tables: mark the seen stksess with a flag "seen"") introduced a build regression on older versions of gcc before 4.7. This is in the old __sync_ API, the HA_ATOMIC_LOAD() implementation uses an intermediary return value called "ret" that is of the same name as the variable passed in argument to the macro in the aforementioned commit. As such, the compiler complains with a cryptic error: src/peers.c: In function 'peer_teach_process_stksess_lookup': src/peers.c:1502: error: invalid type argument of '->' (have 'int') The solution is to avoid referencing the argument in the expression and using an intermediary variable for the pointer as done elsewhere in the code. It seems there's no other place affected with this. It probably does not need to be backported since this code is antique and very rarely used nowadays.	2024-04-11 16:41:08 +02:00
Amaury Denoyelle	32b9e97f92	BUG/MINOR: guid: fix crash on invalid guid name Using an invalid GUID for guid_insert() causes a crash. This is easily reproducible using for example an invalid character with "guid" keyword. Here is the provided backtrace : Thread 1 "haproxy" received signal SIGSEGV, Segmentation fault. 0x00005555561fda95 in guid_insert (objt=0x520000002080, uid=0x519000002dac "@foo2", errmsg=0x7ffff4c0a7a0) at src/guid.c:83 83 ha_free(&guid->node.key); This error is present in guid_insert() cleanup parts. GUID node is not allocated in case of an early error so it's impossible to dereference it to free guid.node.key. Fix this simply by using an intermediary pointer key. This does not need to be backported.	2024-04-11 15:09:53 +02:00
Willy Tarreau	d78c346670	BUILD: makefile: support USE_xxx=0 as well William rightfully reported that not supporting =0 to disable a USE_xxx option is sometimes painful (e.g. a script might do USE_xxx=$(command)). It's not that difficult to handle actually, we just need to consider the value 0 as empty at the few places that test for an empty string in options.mk, and in each "ifneq" test in the main Makefile, so let's do that. We even take care of preserving the original value in the build options string so that building with USE_OPENSSL=0 will be reported as-is in haproxy -vv, and with "-OPENSSL" in the feature list.	2024-04-11 11:06:19 +02:00
Willy Tarreau	aa32ab13f0	BUILD: makefile: warn about unknown USE_* variables William suggested that it would be nice to warn about unknown USE_* variables to more easily catch misspelled ones. The valid ones are present in use_opts, so by appending "=%" to each of them, we can build a series of patterns to exclude from MAKEOVERRIDES and emit a warning for the ones that stand out. Example: $ make TARGET=linux-glibc USE_QUIC_COMPAT_OPENSSL=1 Makefile:338: Warning: ignoring unknown build option: USE_QUIC_COMPAT_OPENSSL=1 CC src/slz.o	2024-04-11 11:06:19 +02:00
Christopher Faulet	1fa6eb2eb9	BUG/MINOR: http-ana: Fix TX_L7_RETRY and TX_D_L7_RETRY values These values are obviously wrong. There is an extra zero at the end for both defines. By chance, it is harmless. But it is better to fix it. This patch should be backported as far as 2.6.	2024-04-10 15:50:00 +02:00
Christopher Faulet	3fc38593d5	BUG/MEDIUM: http-ana: Deliver 502 on keep-alive for fressh server connection In HTTP keep-alive, if we face a connection error to the server while sending the request, the error should not be reported, and the client-side connection should simply be closed, so that client knows it can retry. If the error happens during the connection stage, there is two cases. We have a connection timeout or an allocation error. In this case, the 503 response must be skipped if it is not the first request on the client-side connection. Or we have a connection error. In this case, the 503 response must be skipped if it is a reused server connection. Otherwise, during the connection stage, the 503-Service-unavailable response is delivered to the client. The part works properly. If the error happens after this stage, the 502-Bad-gateway response delivering should only be based on the server-side connection status. For a reused server connection, the client-side connection must be closed with no reponses. However, for a fresh server-side connection, a 502-Bad-gateway response must be delivered to the client. Unfortunately, This part is buggy. Only the client-side connection state is considered and the response is skipped if it is not the first request for the same client connection. The bug is not so visbile in HTTP/1.1 but in H2 and H3 it is pretty annoying because for a connection, requests are multiplexed, in parallels. It means there is no first request. So, because of this bug, for H2 and H3, 502-Bad-gateway responses because of a connection error before receiveing the response are always skipped. To fix the issue, in http_wait_for_response() analyser, we must only rely on SF_SRV_REUSED stream flag to skip the 502 response or not. This flag is set if the server connection was reused. The bug is their since a while. SF_SRV_REUSED flag was added in the version 1.5 especially to fix this kind of bug. But only the 503 case was fixed. This patch should fix the issue #2285. It must be backported to every stable versions.	2024-04-10 15:42:22 +02:00
Amaury Denoyelle	c6e3d60fc1	OPTIM: quic: do not call qc_prep_pkts() if everything sent qc_send() is implemented as a loop to repeatedly invoke qc_prep_pkts()/qc_send_ppkts(). This ensures that all data are emitted even if bigger that a single Tx buffer instance. This is useful if congestion window is empty but big enough for application data. Looping is interrupted if qc_prep_pkts() returns a negative error code, for example due to no space left in congestion window. It can also returns 0 if no input data to sent, which also interrupt the loop. To limit this last case, removed quic_enc_level from send_list each time everything already send via qc_prep_pkts(). Loop can then be interrupted as soon as send_list is empty, avoiding an extra superfluous call to qc_prep_pkts().	2024-04-10 11:18:01 +02:00
Amaury Denoyelle	34b31d85cb	OPTIM: quic: do not call qc_send() if nothing to emit qc_send() was systematically called by quic_conn IO handlers with all instantiated quic_enc_level. Change this to only register quic_enc_level for send if needed. Do not call at all qc_send() if no qel registered. A new function qel_need_sending() is defined to detect if sending is required. First, it checks if quic_enc_level has prepared frames or probing is set. It can also returns true if ACK required either on quic_enc_level itself or because of quic_conn ack timer fired. Finally, a CONNECTION_CLOSE emission for quic_conn is also a valid case. This should reduce the number of invocations of qc_send(). This could improve slightly performance, as well as simplify traces debugging.	2024-04-10 11:17:21 +02:00
Amaury Denoyelle	7fc1ce5bc8	MEDIUM: quic: remove duplicate hdshk/app send functions A series of previous patches have clean up sending function for handshake case. Their new exposed API is now flexible enough to convert app case to use the same functions. As such, qc_send_hdshk_pkts() is renamed qc_send() and become the single entry point for QUIC emission. It is used during application packets emission in quic_conn_app_io_cb(), qc_send_mux(). Also the internal function qc_prep_hpkts() is renamed qc_prep_pkts(). Remove the new unneeded qc_send_app_pkts() and qc_prep_app_pkts(). Also removed qc_send_app_probing(). It was a simple wrapper over other application send functions. Now, default qc_send() can be reuse for such cases with <old_data> argument set to true. An adjustment was needed when converting qc_send_hdshk_pkts() to the general qc_send() version. Previously, only a single packets encoding/emission cycle was performed. This was enough as handshake packets are always smaller than Tx buffer. However, it may be possible to emit more application data. As such, a loop is necessary to perform multiple encoding/emission cycles, as this was already the case in qc_send_app_pkts(). No functional difference should happen with this commit. However, as these are critcal functions with a lot of changes, this patch is labelled as medium.	2024-04-10 11:07:35 +02:00
Amaury Denoyelle	4e4127a66d	MINOR: quic: use qc_send_hdshk_pkts() in handshake IO cb quic_conn_io_cb() manually implements emission by using lower level functions qc_prep_pkts() and qc_send_ppkts(). Replace this by using the higher level function qc_send_hdshk_pkts() which notably handle buffer allocation and purging. This allows to clean up send API by flagging qc_prep_pkts() and qc_send_ppkts() as static. They are now used in a single location inside qc_send_hdshk_pkts().	2024-04-10 11:07:19 +02:00
Amaury Denoyelle	3a8f4761e7	MINOR: quic: improve sending API on retransmit qc_send_hdshk_pkts() is a wrapper for qc_prep_hpkts() used on retransmission. It was restricted to use two quic_enc_level pointers as distinct arguments. Adapt it to directly use the same list of quic_enc_level which is passed then to qc_prep_hpkts(). Now for retransmission quic_enc_level send list is built directly into qc_dgrams_retransmit() which calls qc_send_hdshk_pkts(). Along this change, a new utility function qel_register_send() is defined. It is an helper to build the quic_enc_level send list. It enfores that each quic_enc_level instance is only registered in a single list to prevent memory issues. It is both used in qc_dgrams_retransmit() and quic_conn_io_cb().	2024-04-10 11:06:55 +02:00
Amaury Denoyelle	93f5b4c8ae	MINOR: quic: uniformize sending methods for handshake Emission of packets during handshakes was implemented via an API which uses two alternative ways to specify the list of frames. The first one uses a NULL list of quic_enc_level as argument for qc_prep_hpkts(). This was an implicit method to iterate on all qels stored in quic_conn instance, with frames already inserted in their corresponding quic_pktns. The second method was used for retransmission. It uses a custom local quic_enc_level list specified by the caller as input to qc_prep_hpkts(). Frames were accessible through <retransmit> list pointers of each quic_enc_level used in an implicit mechanism. This commit clarifies the API by using a single common method. Now quic_enc_level list must always be specified by the caller. As for frames list, each qels must set its new field <send_frms> pointer to the list of frames to send. Callers of qc_prep_hpkts() are responsible to always clear qels send list. This prevent a single instance of quic_enc_level to be inserted while being attached to another list. This allows notably to clean up some unnecessary code. First, <retransmit> list of quic_enc_level is removed as it is replaced by new <send_frms>. Also, it's now possible to use proper list_for_each_entry() inside qc_prep_hpkts() to loop over each qels. Internal functions for quic_enc_level selection is now removed.	2024-04-10 11:06:41 +02:00
Amaury Denoyelle	44eec848e8	MINOR: quic: simplify qc_send_hdshk_pkts() return Clean up trailer of qc_send_hdshk_pkts() by removing label "leave". Only "out" label is now used. This operation is safe as LIST_DEL_INIT() is idempotent. Caller of qc_send_hdshk_pkts() also ensures input frame lists are freed, so it's better to always reset quic_enc_level <retrans_frms> member. Also take the opportunity to reset QUIC_FL_CONN_RETRANS_OLD_DATA only if already set. This is considered more robust and will also remove unneeded trace occurences. No functional change. The main objective of this commit is to clean up code in preparation of a refactoring on send functions.	2024-04-10 10:14:36 +02:00
Aurelien DARRAGON	9420cfc0db	CLEANUP: log: lf_text_len() returns a pointer not an integer In `c83684519` ("MEDIUM: log: add the ability to include samples in logs") we checked the return value of lf_text_len() as an integer instead of comparing the pointer with NULL explicitly. Since this may be confusing, let's test the return value against NULL. [ada: for backports, the patch needs to be applied manually because of `c6a713842` ("MINOR: log: simplify last_isspace in sess_build_logline()")]	2024-04-09 17:35:53 +02:00
Aurelien DARRAGON	28548f812f	BUG/MINOR: log: invalid snprintf() usage in sess_build_logline() According to snprintf() man page: The functions snprintf() and vsnprintf() do not write more than size bytes (including the terminating null byte ('\0')). If the output was truncated due to this limit, then the return value is the number of characters (excluding the terminating null byte) which would have been written to the final string if enough space had been available. Thus, a return value of size or more means that the output was truncated. However, in sess_build_logline(), each time we need to check the return value of snprintf(), here is how we proceed: iret = snprintf(tmplog, max, ...); if (iret < 0 \|\| iret > max) // error // success tmplog += iret; Here is the issue: if snprintf() lacks 1 byte space to write the terminating NULL byte, it will return max. Which means in this case that we fail to know that snprintf() truncated the output in reality, and we still add iret to tmplog pointer. Considering sess_build_logline() should NOT write more than <maxsize> bytes (including the terminating NULL byte) as per the function description, in this case the function would write <maxsize>+1 byte (to write the terminating NULL byte upon return), which may lead to invalid write if <dst> was meant to hold <maxsize> bytes at maximum. Hopefully, this bug wasn't triggered so far because sess_build_logline() is called with logline as <dst> argument and <global.max_syslog_len> as <maxsize> argument, logline being initialized with 1 extra byte upon startup. But we better fix this to comply with the function description and prevent any side-effect since some sess_build_logline() helpers may assume that 'tmplog-dst < maxsize' is always true. Also sess_build_logline() users probably don't expect NULL-byte to be accounted for in the produced logline length. This should be backported to all stable versions. [ada: for backports, the patch needs to be applied manually because of `c6a713842` ("MINOR: log: simplify last_isspace in sess_build_logline()")]	2024-04-09 17:35:53 +02:00
Aurelien DARRAGON	8226e92eb0	BUG/MINOR: tools/log: invalid encode_{chunk,string} usage encode_{chunk,string}() is often found to be used this way: ret = encode_{chunk,string}(start, stop...) if (ret == NULL \|\| *ret != '\0') { //error } //success Indeed, encode_{chunk,string} will always try to add terminating NULL byte to the output string, unless no space is available for even 1 byte. However, it means that for the caller to be able to spot an error, then it must provide a buffer (here: start) which is already initialized. But this is wrong: not only this is very tricky to use, but since those functions don't return NULL on failure, then if the output buffer was not properly initialized prior to calling the function, the caller will perform invalid reads when checking for failure this way. Moreover, even if the buffer is initialized, we cannot reliably tell if the function actually failed this way because if the buffer was previously initialized with NULL byte, then the caller might think that the call actually succeeded (since the function didn't return NULL and didn't update the buffer). Also, sess_build_logline() relies lf_encode_{chunk,string}() functions which are in fact wrappers for encode_{chunk,string}() functions and thus exhibit the same error handling mechanism. It turns out that sess_build_logline() makes unsafe use of those functions because it uses the error-checking logic mentionned above while buffer (tmplog) is not guaranteed to be initialized when entering the function. This may ultimately cause malfunctions or invalid reads if the output buffer is lacking space. To fix the issue once and for all and prevent similar bugs from being introduced, we make it so encode_{string, chunk} and escape_string() (based on encode_string()) now explicitly return NULL on failure (when the function failed to write at least the ending NULL byte) lf_encode_{string,chunk}() helpers had to be patched as well due to code duplication. This should be backported to all stable versions. [ada: for 2.4 and 2.6 the patch won't apply as-is, it might be helpful to backport `ae1e14d65` ("CLEANUP: tools: removing escape_chunk() function") first, considering it's not very relevant to maintain a dead function]	2024-04-09 17:35:45 +02:00
Aurelien DARRAGON	b15f6dfae8	BUG/MINOR: log: fix lf_text_len() truncate inconsistency In `c5bff8e550` ("BUG/MINOR: log: improper behavior when escaping log data") we fixed lf_text_len() behavior with +E (escape) option. However we introduced an inconsistency if output buffer is too small to hold the whole output and truncation occurs: indeed without +E option up to <size> bytes (including NULL byte) will be used whereas with +E option only <size-1> bytes will be used. Fixing the function and related comment so that the function behaves the same in regards to truncation whether +E option is used or not. This should be backported to all stable versions.	2024-04-09 17:30:13 +02:00
Willy Tarreau	0db8b6034d	BUG/MINOR: listener: always assign distinct IDs to shards When sharded listeners were introdcued in 2.5 with commit `6dfbef4145` ("MEDIUM: listener: add the "shards" bind keyword"), a point was overlooked regarding how IDs are assigned to listeners: they are just duplicated! This means that if a "option socket-stats" is set and a shard is configured, or multiple thread groups are enabled, then a stats dump will produce several lines with exactly the same socket name and ID. This patch tries to address this by trying to assign consecutive numbers to these sockets. The usual algo is maintained, but with a preference for the next number in a shard. This will help users reserve ranges for each socket, for example by using multiples of 100 or 1000 on each bind line, leaving enough room for all shards to be assigned. The mechanism however is quite tricky, because the configured listener currently ends up being the last one of the shard. This helps insert them before the current position without having to revisit them. But here it causes a difficulty which is that we'd like to restart from the current ID and assign new ones on top of it. What is done is that the number is passed between shards and the current one is cleared (and removed from the tree) so that we instead insert the new one. It's tricky because of the situation which depends whether it's the listener that was already assigned on the bind line or not. But overall, always removing the entry, always adding the new one when the ID is not zero, and passing them from the reference to the next one does the trick. This may be backported to all versions till 2.6.	2024-04-09 08:57:02 +02:00
Christopher Faulet	70251a2aeb	BUG/MINOR: cli: Don't warn about a too big command for incomplete commands When a command is too big to fit in a buffer, a error is returned before closing. However, the error is also returned if the command is small enough but incomplete. It happens on abort. In this case, the error must not be reported. The regression was introduced when a dedicated sn_buf callbac function was added. To fix the issue, both cases are now handled separately. No backport needed.	2024-04-08 11:49:13 +02:00
Willy Tarreau	0046922aed	[RELEASE] Released version 3.0-dev7 Released version 3.0-dev7 with the following main changes : - BUG/MINOR: ssl: Wrong ocsp-update "incompatibility" error message - BUG/MINOR: ssl: Detect more 'ocsp-update' incompatibilities - MEDIUM: ssl: Add 'tune.ssl.ocsp-update.mode' global option - REGTESTS: ssl: Add OCSP update compatibility tests - REGTESTS: ssl: Add functional test for global ocsp-update option - BUG/MINOR: server: reject enabled for dynamic server - BUG/MINOR: server: fix persistence cookie for dynamic servers - MINOR: server: allow cookie for dynamic servers - REGTESTS: Fix script about OCSP update compatibility tests - BUG/MINOR: cli: Report an error to user if command or payload is too big - MINOR: sc_strm: Add generic version to perform sync receives and sends - MEDIUM: stream: Use generic version to perform sync receives and sends - MEDIUM: buf: Add b_getline() and b_getdelim() functions - MEDIUM: applet: Handle applets with their own buffers in put functions - MEDIUM: cli/applet: Stop to test opposite SC in I/O handler of CLI commands - MINOR: applet: Always use applet API to set appctx flags - BUG/MEDIUM: applet: State appctx have more data if its EOI/EOS/ERROR flag is set - MAJOR: cli: Update the CLI applet to handle its own buffers - MINOR: applet: Let's applets .snd_buf function deal with full input buffers - MINOR: stconn: Add a connection flag to notify sending data are the last ones - MAJOR: cli: Use a custom .snd_buf function to only copy the current command - DOC: config: balance 'first' not usable in LOG mode - BUG/MINOR: log/balance: detect if user tries to use unsupported algo - MINOR: lbprm: implement true "sticky" balance algo - MEDIUM: log/balance: leverage lbprm api for log load-balancing - BUG/BUILD: debug: fix unused variable error - MEDIUM: lb-chash: Deterministic node hashes based on server address - BUG/MEDIUM: stick-tables: fix a small remaining race in expiration task - REGTESTS: Do not use REQUIRE_VERSION for HAProxy 2.5+ (4) - REGTESTS: Remove REQUIRE_VERSION=1.9 from all tests (2) - CLEANUP: Reapply ist.cocci (3) - CLEANUP: Reapply strcmp.cocci (2) - CLEANUP: Reapply xalloc_cast.cocci - CLEANUP: Reapply ha_free.cocci - CI: vtest: show coredumps if any - REGTESTS: ssl: disable ssl/ocsp_auto_update.vtc - BUG/MINOR: backend: properly handle redispatch 0 - MINOR: quic: HyStart++ implementation (RFC 9406) - BUG/MEDIUM: stconn: Don't forward shutdown to SE if iobuf is not empty - BUG/MEDIUM: stick-table: use the update lock when reading tables from peers - BUG/MAJOR: applet: fix a MIN vs MAX usage in appctx_raw_rcv_buf() - OPTIM: peers: avoid the locking dance around peer_send_teach_process_msgs() - BUILD: quic: 32 bits compilation issue (QUIC_MIN() usage) - BUG/MEDIUM: server/lbprm: fix crash in _srv_set_inetaddr_port() - MEDIUM: mworker: get rid of libsystemd - BUILD: systemd: fix build error on non-systemd systems with USE_SYSTEMD=1 - BUG/MINOR: bwlim/config: fix missing '\n' after error messages - MINOR: stick-tables: mark the seen stksess with a flag "seen" - OPTIM: stick-tables: check the stksess without taking the read lock - MAJOR: stktable: split the keys across multiple shards to reduce contention - CI: extend Fedora Rawhide, add m32 mode - BUG/MINOR: stick-tables: Missing stick-table key nullity check - BUILD: systemd: enable USE_SYSTEMD by default with TARGET=linux-glibc - MINOR: systemd: Include MONOTONIC_USEC field in RELOADING=1 message - BUG/MINOR: proxy: fix logformat expression leak in use_backend rules - MEDIUM: log: rename logformat var to logformat tag - MINOR: log: expose logformat_tag struct - MEDIUM: log: carry tag context in logformat node - MEDIUM: tree-wide: add logformat expressions wrapper - MINOR: proxy: add PR_FL_CHECKED flag - MAJOR: log: implement proper postparsing for logformat expressions - MEDIUM: log: add compiling logic to logformat expressions - MEDIUM: proxy/log: leverage lf_expr API for logformat preparsing - MINOR: guid: introduce global UID module - MINOR: guid: restrict guid format - MINOR: proxy: implement GUID support - MINOR: server: implement GUID support - MINOR: listener: implement GUID support - DOC: configuration: grammar fixes for strict-sni - BUG/MINOR: init: relax LSTCHK_NETADM checks for non root - MEDIUM: capabilities: check process capabilities sets - CLEANUP: global: remove LSTCHK_CAP_BIND - BUG/MEDIUM: quic: don't blindly rely on unaligned accesses	2024-04-06 17:02:07 +02:00
Willy Tarreau	c499cd15c7	BUG/MEDIUM: quic: don't blindly rely on unaligned accesses There are several places where the QUIC low-level code performs unaligned accesses by casting unaligned char* pointers to uint32_t, but this is totally forbidden as it only works on machines that support unaligned accesses, and either crashes on other ones (SPARC, MIPS), can result in reading garbage (ARMv5) or be very slow due to the access being emulated (RISC-V). We do have functions for this, such as read_u32() and write_u32() that rely on the compiler's knowledge of the machine's capabilities to either perform an unaligned access or do it one byte at a time. This must be backported at least as far as 2.6. Some of the code moved a few times since, so in order to figure the points that need to be fixed, one may look for a forced pointer cast without having verified that either the machine is compatible or that the pointer is aligned using this: $ git grep 'uint[36][24]_t \*)' Or build and run the code on a MIPS or SPARC and perform requests using curl to see if they work or crash with a bus error. All the places fixed in this commit were found thanks to an immediate crash on the first request. This was tagged medium because the affected archs are not the most common ones where QUIC will be found these days.	2024-04-06 00:07:49 +02:00
Valentine Krasnobaeva	eef14e9574	CLEANUP: global: remove LSTCHK_CAP_BIND Remove LSTCHK_CAP_BIND as it is never set and never checked.	2024-04-05 18:01:54 +02:00
Valentine Krasnobaeva	f0b6436f57	MEDIUM: capabilities: check process capabilities sets Since the Linux capabilities support add-on (see the commit `bd84387beb` ("MEDIUM: capabilities: enable support for Linux capabilities")), we can also check haproxy process effective and permitted capabilities sets, when it starts and runs as non-root. Like this, if needed network capabilities are presented only in the process permitted set, we can get this information with capget and put them in the process effective set via capset. To do this properly, let's introduce prepare_caps_from_permitted_set(). First, it checks if binary effective set has CAP_NET_ADMIN or CAP_NET_RAW. If there is a match, LSTCHK_NETADM is removed from global.last_checks list to avoid warning, because in the initialization sequence some last configuration checks are based on LSTCHK_NETADM flag and haproxy process euid may stay unpriviledged. If there are no CAP_NET_ADMIN and CAP_NET_RAW in the effective set, permitted set will be checked and only capabilities given in 'setcap' keyword will be promoted in the process effective set. LSTCHK_NETADM will be also removed in this case by the same reason. In order to be transparent, we promote from permitted set only capabilities given by user in 'setcap' keyword. So, if caplist doesn't include CAP_NET_ADMIN or CAP_NET_RAW, LSTCHK_NETADM would not be unset and warning about missing priviledges will be emitted at initialization. Need to call it before protocol_bind_all() to allow binding to priviledged ports under non-root and 'setcap cap_net_bind_service' must be set in the global section in this case.	2024-04-05 18:01:54 +02:00
Valentine Krasnobaeva	e4306fb822	BUG/MINOR: init: relax LSTCHK_NETADM checks for non root Linux capabilities support and ability to preserve it for running process after switching to a global.uid was added recently by the commit `bd84387beb` ("MEDIUM: capabilities: enable support for Linux capabilities")). This new feature hasn't yet been taken into account by last config checks, which are performed at initialization stage. So, to update it, let's perform it after set_identity() call. Like this, current EUID is already changed to a global.uid and prepare_caps_for_setuid() would unset LSTCHK_NETADM flag, only if capabilities given in the 'setcap' keyword in the configuration file were preserved. Otherwise, if system doesn't support Linux capabilities or they were not set via 'setcap', we keep the previous strict behaviour: process will terminate with an alert, in order to insist that user: either needs to change run UID (worst case: start and run as root), or he needs to set/recheck capabilities listed as 'setcap' arguments. In the case, when haproxy will start and run under a non-root user this patch doesn't change the previous behaviour: we'll still let him try the configuration, but we inform via warning that unexpected things may occur. Need to be backported until v2.9, including v2.9.	2024-04-05 18:01:54 +02:00
Nicolas CARPi	a4f564b05e	DOC: configuration: grammar fixes for strict-sni Fix incorrect grammar in strict-sni: * is allow -> is allowed * which match -> that matches * allows to start -> allows starting	2024-04-05 17:56:37 +02:00
Amaury Denoyelle	0489d85263	MINOR: listener: implement GUID support This commit is similar with the two previous ones. Its purpose is to add GUID support on listeners. Due to bind_conf and listeners configuration, some specifities were required. Its possible to define several listeners on a single bind line, for example by specifying multiple addresses. As such, it's impossible to support a "guid" keyword on a bind line. The problem is exacerbated by the cloning of listeners when sharding is used. To resolve this, a new keyword "guid-prefix" is defined for bind lines. It allows to specify a string which will be used as a prefix for automatically generated GUID for each listeners attached to a bind_conf. Automatic GUID listeners generation is implemented via a new function bind_generate_guid(). It is called on post-parsing, after bind_complete_thread_setup(). For each listeners on a bind_conf, a new GUID is generated with bind_conf prefix and the index of the listener relative to other listeners in the bind_conf. This last value is stored in a new bind_conf field named <guid_idx>. If a GUID cannot be inserted, for example due to a non-unique value, an error is returned, startup is interrupted with configuration rejected.	2024-04-05 15:40:42 +02:00
Amaury Denoyelle	8259456981	MINOR: server: implement GUID support This commit is similar to previous one, except that it implements GUID support for server instances. A guid_node field is inserted into server structure. A new "guid" server keyword is defined.	2024-04-05 15:40:42 +02:00
Amaury Denoyelle	da754b4533	MINOR: proxy: implement GUID support Implement proxy identiciation through GUID. As such, a guid_node member is inserted into proxy structure. A proxy keyword "guid" is defined to allow user to fix its value.	2024-04-05 15:40:42 +02:00
Amaury Denoyelle	1009ca4160	MINOR: guid: restrict guid format GUID format is unspecified to allow users to choose the naming scheme. Some restrictions however are added by this patch, mainly to ensure coherence and memory usage. The first restriction is on the length of GUID. No more than 127 characters can be used to prevent memory over consumption. The second restriction is on the character set allowed in GUID. Utility function invalid_char() is used for this : it allows alphanumeric values and '-', '_', '.' and ':'.	2024-04-05 15:40:42 +02:00
Amaury Denoyelle	84fa6b344a	MINOR: guid: introduce global UID module Define a new module guid. Its purpose is to be able to attach a global identifier for various objects such as proxies, servers and listeners. A new type guid_node is defined. It will be stored in the objects which can be referenced by such GUID. Several functions are implemented to properly initialized, insert, remove and lookup GUID in a global tree. Modification operations should only be conducted under thread isolation.	2024-04-05 15:40:42 +02:00
Aurelien DARRAGON	e751eebfc6	MEDIUM: proxy/log: leverage lf_expr API for logformat preparsing Currently, the way proxy-oriented logformat directives are handled is way too complicated. Indeed, "log-format", "log-format-error", "log-format-sd" and "unique-id-format" all rely on preparsing hints stored inside proxy->conf member struct. Those preparsing hints include the original string that should be compiled once the proxy parameters are known plus the config file and line number where the string was found to generate precise error messages in case of failure during the compiling process that happens within check_config_validity(). Now that lf_expr API permits to compile a lf_expr struct that was previously prepared (with original string and config hints), let's leverage lf_expr_compile() from check_config_validity() and instead of relying on individual proxy->conf hints for each logformat expression, store string and config hints in the lf_expr struct directly and use lf_expr helpers funcs to handle them when relevant (ie: original logformat string freeing is now done at a central place inside lf_expr_deinit(), which allows for some simplifications) Doing so allows us to greatly simplify the preparsing logic for those 4 proxy directives, and to finally save some space in the proxy struct. Also, since httpclient proxy has its "logformat" automatically compiled in check_config_validity(), we now use the file hint from the logformat expression struct to set an explicit name that will be reported in case of error ("parsing [httpclient:0] : ...") and remove the extraneous check in httpclient_precheck() (logformat was parsed twice previously..)	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	2b79457bc0	MEDIUM: log: add compiling logic to logformat expressions split parse_logformat_string() into two functions: parse_logformat_string() sticks to the same behavior, but now becomes an helper for lf_expr_compile() which uses explicit arguments so that it becomes possible to use lf_expr_compile() without a proxy, but also compile an expression which was previously prepared for compiling (set string and config hints within the logformat expression to avoid manually storing string and config context if the compiling step happens later). lf_expr_dup() may be used to duplicate an expression before it is compiled, lf_expr_xfer() now makes sure that the input logformat is already compiled. This is some prerequisite works for log-profiles implementation, no functional change should be expected.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	7a21c3a4ef	MAJOR: log: implement proper postparsing for logformat expressions This patch tries to address a design flaw with how logformat expressions are parsed from config. Indeed, some parse_logformat_string() calls are performed during config parsing when the proxy mode is not yet known. Here's a config example that illustrates the issue: defaults mode tcp listen test bind :8888 http-response set-header custom-hdr "%trl" # needs http mode http The above config should work, because the effective proxy mode is http, yet haproxy fails with this error: [ALERT] (99051) : config : parsing [repro.conf:6] : error detected in proxy 'test' while parsing 'http-response set-header' rule : format tag 'trl' is reserved for HTTP mode. To fix the issue once and for all, let's implement smart postparsing for logformat expressions encountered during config parsing: - split parse_logformat_string() (and subfonctions) in order to create a new lf_expr_postcheck() function that must be called to finish preparing and checking the logformat expression once the proxy type is known. - save some config hints info during parse_logformat_string() to generate more precise error messages during lf_expr_postcheck(), if needed, we rely on curpx->conf.args.{file,line} hints for that because parse_logformat_string() doesn't know about current file and line number. - lf_expr_postcheck() uses PR_FL_CHECKED proxy flag to know if the function may try to make the proxy compatible with the expression, or if it should simply fail as soon as an incompatibility is detected. - if parse_logformat_string() is called from an unchecked proxy, then schedule the expression for postparsing, else (ie: during runtime), run the postcheck right away. This change will also allow for some logformat expression error handling simplifications in the future.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	56d8074798	MINOR: proxy: add PR_FL_CHECKED flag PR_FL_CHECKED is set on proxy once the proxy configuration was fully checked (including postparsing checks). This information may be useful to functions that need to know if some config-related proxy properties are likely to change or not due to parsing or postparsing/check logics. Also, during runtime, except for some rare cases config-related proxy properties are not supposed to be changed.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	6810c41f8e	MEDIUM: tree-wide: add logformat expressions wrapper log format expressions are broadly used within the code: once they are parsed from input string, they are converted to a linked list of logformat nodes. We're starting to face some limitations because we're simply storing the converted expression as a generic logformat_node list. The first issue we're facing is that storing logformat expressions that way doesn't allow us to add metadata alongside the list, which is part of the prerequites for implementing log-profiles. Another issue with storing logformat expressions as generic lists of logformat_node elements is that it's starting to become really hard to tell when we rely on logformat expressions or not in the code given that there isn't always a comment near the list declaration or manipulation to indicate that it's relying on logformat expressions under the hood, so this adds some complexity for code maintenance. This patch looks quite impressive due to changes in a lot of header and source files (since logformat expressions are broadly used), but it does a simple thing: it defines the lf_expr structure which itself holds a generic list of logformat nodes, and then declares some helpers to manipulate lf_expr elements and fixes the code so that we now exclusively manipulate logformat_node lists as lf_expr elements outside of log.c. For now, lf_expr struct only contains the list of logformat nodes (no additional metadata), but now that we have dedicated type and helpers, doing so in the future won't be problematic at all and won't require extensive code changes.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	7d8f45b647	MEDIUM: log: carry tag context in logformat node This is a pretty simple patch despite requiring to make some visible changes in the code: When parsing a logformat string, log tags (ie: '%tag', AKA log tags) are turned into logformat nodes with their type set to the type of the corresponding logformat_tag element which was matched by name. Thus, when "compiling" a logformat tag, we only keep a reference to the tag type from the original logformat_tag. For example, for "%B" log tag, we have the following logformat_tag element: { .name = "B", .type = LOG_FMT_BYTES, .mode = PR_MODE_TCP, .lw = LW_BYTES, .config_callback = NULL } When parsing "%B" string, we search for a matching logformat tag inside logformat_tags[] array using the provided name, once we find a matching element, we craft a logformat node whose type will be LOG_FMT_BYTES, but from the node itself, we no longer have access to other informations that are set in the logformat_tag struct element. Thus from a logformat_node resulting from a log tag, with current implementation, we cannot easily get back to matching logformat_tag struct element as it would require us to scan the whole logformat_tags array at runtime using node->type to find the matching element. Let's take a simpler path and consider all tag-specific LOG_FMT_* subtypes as being part of the same logformat node type: LOG_FMT_TAG. Thanks to that, we're now able to distinguish logformat nodes made from logformat tag from other logformat nodes, and link them to their corresponding logformat_tag element from logformat_tags[] array. All it costs is a simple indirection and an extra pointer in logformat_node struct. While at it, all LOG_FMT_* types related to logformat tags were moved inside log.c as they have no use outside of it since they are simply lookup indexes for sess_build_logline() and could even be replaced by function pointers some day...	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	8cf5c3d7f0	MINOR: log: expose logformat_tag struct rename logformat_type internal struct to logformat_tag to to make it less confusing, then expose logformat_tag struct through header file so that it can be referenced in other structs. also rename logformat_keywords[] to logformat_tags[] for better consistency.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	c85cbc1061	MEDIUM: log: rename logformat var to logformat tag What we use to call logformat variable in the code is referred as log-format tag in the documentation. Having both 'var' and 'tag' labels referring to the same thing is really confusing. Let's make the code comply with the documentation by replacing all logformat var/variable/VAR occurences with either tag or TAG. No functional change should be expected, the only visible side-effect from user point of view is that "variable" was replaced by "tag" in some error messages.	2024-04-04 19:10:01 +02:00
Aurelien DARRAGON	64b5ab87ef	BUG/MINOR: proxy: fix logformat expression leak in use_backend rules When support for dynamic names was added for use_backend rules in `702d44f2f` ("MEDIUM: proxy: support use_backend with dynamic names"), the sample expression resulting from parse_logformat_string() was only freed for non dynamic rules (when the expression resolved to a simple string node). But for complex expressions (ie: multiple nodes), rule->dynamic was set but the expression was never released, resulting in a small memory leak when freeing the parent proxy. To fix the issue, in free_proxy(), we free the switching rule expression if the switching rule is dynamic. This should be backported to every stable versions. [ada: prior to 2.9, free_logformat_list() helper did not exist: we may use the same manual sample expr freeing logic as in server_rules pruning right above it]	2024-04-04 19:10:01 +02:00
Tim Duesterhus	ad54273cf9	MINOR: systemd: Include MONOTONIC_USEC field in RELOADING=1 message As per the `sd_notify` manual: > A field carrying the monotonic timestamp (as per CLOCK_MONOTONIC) formatted > in decimal in μs, when the notification message was generated by the client. > This is typically used in combination with "RELOADING=1", to allow the > service manager to properly synchronize reload cycles. See systemd.service(5) > for details, specifically "Type=notify-reload". Thus this change allows users with a recent systemd to switch to `Type=notify-reload`, should they desire to do so. Correct behavior was verified with a Fedora 39 VM. see systemd/systemd#25916 [wla: the service file should be updated this way:] diff --git a/admin/systemd/haproxy.service.in b/admin/systemd/haproxy.service.in index 22a53d8aab..8c6dadb5e5 100644 --- a/admin/systemd/haproxy.service.in +++ b/admin/systemd/haproxy.service.in @@ -8,12 +8,11 @@ EnvironmentFile=-/etc/default/haproxy EnvironmentFile=-/etc/sysconfig/haproxy Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid" "EXTRAOPTS=-S /run/haproxy-master.sock" ExecStart=@SBINDIR@/haproxy -Ws -f $CONFIG -p $PIDFILE $EXTRAOPTS -ExecReload=@SBINDIR@/haproxy -Ws -f $CONFIG -c $EXTRAOPTS -ExecReload=/bin/kill -USR2 $MAINPID KillMode=mixed Restart=always SuccessExitStatus=143 -Type=notify +Type=notify-reload +ReloadSignal=SIGUSR2 # The following lines leverage SystemD's sandboxing options to provide # defense in depth protection at the expense of restricting some flexibility Signed-off-by: William Lallemand <wlallemand@haproxy.com>	2024-04-04 15:58:29 +02:00
William Lallemand	310e3d070c	BUILD: systemd: enable USE_SYSTEMD by default with TARGET=linux-glibc Since the systemd notify feature is now independant of any library, lets enable it by default for linux-glibc. The -Ws mode still need to be used in order to use the sd_nofify() function. And the function won't do anything if the NOTIFY_SOCKET environment variable is not defined.	2024-04-04 14:06:11 +02:00
Frederic Lecaille	fcb096f7cd	BUG/MINOR: stick-tables: Missing stick-table key nullity check This bug arrived with this commit: MAJOR: stktable: split the keys across multiple shards to reduce contention At this time, there are no callers which call stktable_get_entry() without checking the nullity of <key> passed as parameter. But the documentation of this function says it supports this case where the <key> passed as parameter could be null. Move the nullity test on <key> at first statement of this function. Thanks to @chipitsine for having reported this issue in GH #2518.	2024-04-04 11:08:56 +02:00
Ilya Shipitsin	ba1a0559e4	CI: extend Fedora Rawhide, add m32 mode hopefully it will allow to catch regressions like this https://github.com/haproxy/haproxy/commit/e41638a	2024-04-04 08:59:34 +02:00
Willy Tarreau	1a088da7c2	MAJOR: stktable: split the keys across multiple shards to reduce contention In order to reduce the contention on the table when keys expire quickly, we're spreading the load over multiple trees. That counts for keys and expiration dates. The shard number is calculated from the key value itself, both when looking up and when setting it. The "show table" dump on the CLI iterates over all shards so that the output is not fully sorted, it's only sorted within each shard. The Lua table dump just does the same. It was verified with a Lua program to count stick-table entries that it works as intended (the test case is reproduced here as it's clearly not easy to automate as a vtc): function dump_stk() local dmp = core.proxies['tbl'].stktable:dump({}); local count = 0 for _, __ in pairs(dmp) do count = count + 1 end core.Info('Total entries: ' .. count) end core.register_action("dump_stk", {'tcp-req', 'http-req'}, dump_stk, 0); ## global tune.lua.log.stderr on lua-load-per-thread lua-cnttbl.lua listen front bind :8001 http-request lua.dump_stk if { path_beg /stk } http-request track-sc1 rand(),upper,hex table tbl http-request redirect location / backend tbl stick-table size 100k type string len 12 store http_req_cnt ## $ h2load -c 16 -n 10000 0:8001/ $ curl 0:8001/stk ## A count close to 100k appears on haproxy's stderr ## On the CLI, "show table tbl" \| wc will show the same. Some large parts were reindented only to add a top-level loop to iterate over shards (e.g. process_table_expire()). Better check the diff using git show -b. The number of shards is decided just like for the pools, at build time based on the max number of threads, so that we can keep a constant. Maybe this should be done differently. For now CONFIG_HAP_TBL_BUCKETS is used, and defaults to CONFIG_HAP_POOL_BUCKETS to keep the benefits of all the measurements made for the pools. It turns out that this value seems to be the most reasonable one without inflating the struct stktable too much. By default for 1024 threads the value is 32 and delivers 980k RPS in a test involving 80 threads, while adding 1kB to the struct stktable (roughly doubling it). The same test at 64 gives 1008 kRPS and at 128 it gives 1040 kRPS for 8 times the initial size. 16 would be too low however, with 675k RPS. The stksess already have a shard number, it's the one used to decide which peer connection to send the entry. Maybe we should also store the one associated with the entry itself instead of recalculating it, though it does not happen that often. The operation is done by hashing the key using XXH32(). The peers also take and release the table's lock but the way it's used it not very clear yet, so at this point it's sure this will not work. At this point, this allowed to completely unlock the performance on a 80-thread setup: before: 5.4 Gbps, 150k RPS, 80 cores 52.71% haproxy [.] stktable_lookup_key 36.90% haproxy [.] stktable_get_entry.part.0 0.86% haproxy [.] ebmb_lookup 0.18% haproxy [.] process_stream 0.12% haproxy [.] process_table_expire 0.11% haproxy [.] fwrr_get_next_server 0.10% haproxy [.] eb32_insert 0.10% haproxy [.] run_tasks_from_lists after: 36 Gbps, 980k RPS, 80 cores 44.92% haproxy [.] stktable_get_entry 5.47% haproxy [.] ebmb_lookup 2.50% haproxy [.] fwrr_get_next_server 0.97% haproxy [.] eb32_insert 0.92% haproxy [.] process_stream 0.52% haproxy [.] run_tasks_from_lists 0.45% haproxy [.] conn_backend_get 0.44% haproxy [.] __pool_alloc 0.35% haproxy [.] process_table_expire 0.35% haproxy [.] connect_server 0.35% haproxy [.] h1_headers_to_hdr_list 0.34% haproxy [.] eb_delete 0.31% haproxy [.] srv_add_to_idle_list 0.30% haproxy [.] h1_snd_buf WIP: uint64_t -> long WIP: ulong -> uint code is much smaller	2024-04-03 17:34:47 +02:00
Willy Tarreau	864ac31174	OPTIM: stick-tables: check the stksess without taking the read lock Thanks to the previous commit, we can now simply perform an atomic read on stksess->seen and take the write lock to recreate the entry only if at least one peer has seen it, otherwise leave it untouched. On a test on 40 cores, the performance used to drop from 2.10 to 1.14M RPS when one peer was connected, now it drops to 2.05, thus there's basically no impact of connecting a peer vs ~45% previously, all spent in the read lock. This can be particularly important when often updating the same entries (user-agent, source address during an attack etc).	2024-04-03 17:34:47 +02:00

1 2 3 4 5 ...

22011 Commits