haproxy

mirror of http://git.haproxy.org/git/haproxy.git/ synced 2024-12-26 14:42:21 +00:00

Author	SHA1	Message	Date
Aurelien DARRAGON	ecd8f3bfd7	BUG/MINOR: hlua: missing lock in hlua_filter_new() This is a complementary patch to `8670db7` ("BUG/MAJOR: hlua: improper lock usage with hlua_ctx_resume()") for hlua_filter_new(). Indeed, the HLUA_E_ERRMSG case still relies on the lua stack but didn't take the lock to do so. This should be backported up to 2.6.	2024-03-13 09:24:46 +01:00
Aurelien DARRAGON	4aefffc38c	BUG/MINOR: hlua: segfault when loading the same filter from different contexts Trying to register the same lua filter from global and per-thread context (using 'lua-load' + 'lua-load-per-thread') causes a segmentation fault in hlua_post_init(). This is due to a simple copy paste error as we try to print the function name in the error message (like we do when loading the same lua function from different contexts) instead of the filter name. This should be backported up to 2.6.	2024-03-13 09:24:46 +01:00
William Lallemand	bb25ee7b26	CI: github: add -DDEBUG_LIST to the default builds Add the -DDEBUG_LIST flag which allow to check if a list element was removed twice.	2024-03-13 09:01:11 +01:00
William Lallemand	bbc215d3bd	CLEANUP: ssl: remove useless #ifdef in openssl-compat.h Remove a useless #ifdef in openssl-compat.h	2024-03-13 08:51:04 +01:00
William Lallemand	501d9fdb86	MEDIUM: ssl: allow to change the OpenSSL security level from global section The new "ssl-security-level" option allows one to change the OpenSSL security level without having to change the openssl.cnf global file of your distribution. This directives applies on every SSL_CTX context. People sometimes change their security level directly in the ciphers directive, however there are some cases when the security level change is not applied in the right order (for example when applying a DH param). Before this patch, it was to possible to trick by using a specific openssl.cnf file and start haproxy this way: OPENSSL_CONF=./openssl.cnf ./haproxy -f bug-2468.cfg Values for the security level can be found there: https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_set_security_level.html This was discussed in github issue #2468.	2024-03-12 17:37:11 +01:00
William Lallemand	7e9e4a8f50	MEDIUM: ssl: initialize the SSL stack explicitely In issue #2448, users are complaining that FIPS is not working correctly since the removal of SSL_library_init(). This was removed because SSL_library_init() is deprecated with OpenSSL 3.x and emits a warning. But the initialization was not needed anymore because it is done at the first openssl API call. However it some cases it is needed. SSL_library_init() is now a define to OPENSSL_init_ssl(0, NULL). This patch adds OPENSSL_init_ssl(0, NULL) to the init. This could be backported in every stable branches, however let's wait before backporting it.	2024-03-12 12:03:07 +01:00
Willy Tarreau	7223296092	BUG/MINOR: server: fix first server template not being indexed 3.0-dev1 introduced a small regression with commit `b4db3be86e` ("BUG/MINOR: server: fix server_find_by_name() usage during parsing"). By changing the way servers are indexed and moving it into the server template loop, the first one is no longer indexed because the loop starts at low+1 since it focuses on duplication. Let's index the first one explicitly now. This should not be backported, unless the commit above is backported.	2024-03-12 08:23:03 +01:00
Dragan Dosen	0091692d97	BUG/MINOR: ssl: do not set the aead_tag flags in sample_conv_aes_gcm() This was not useful and was using uninitialized value. Introduced with the commit `08ac28237` ("MINOR: Add aes_gcm_enc converter"). Must be backported wherever the commit `08ac28237` was backported.	2024-03-11 19:20:44 +01:00
Dragan Dosen	d7610e6dde	BUG/MINOR: ssl: fix possible ctx memory leak in sample_conv_aes_gcm() The issue was introduced with the commit `c31499d74` ("MINOR: ssl: Add aes_gcm_dec converter"). This must be backported to all stable branches where the above converter is present, but it may need to be adjusted for older branches because of code refactoring.	2024-03-11 19:20:31 +01:00
Brooks Davis	c03a023882	MINOR: tools: use public interface for FreeBSD get_exec_path() Where possible (FreeBSD 13+), use the public, documented interface to the ELF auxiliary argument vector: elf_aux_info(). __elf_aux_vector is a private interface exported so that the runtime linker can set its value during process startup and not intended for public consumption. In FreeBSD 15 it has been removed from libc and moved to libsys.	2024-03-11 19:00:37 +01:00
William Lallemand	3262c2ddcd	DOC: configuration: clarify ciphersuites usage (V2) The previous attempt removed the TLSv1.3 version for the "ciphersuites" keywords. However it looks like the TLSv1.2 support for SSL_CTX_set_ciphersuites() is a bug, and can have inconsistent behavior. This patch revert the previous attempt and add explaining about this problem and clear examples on how to configure TLSv1.2 ciphers + TLSv1.3 ciphersuites. Revert "DOC: configuration: clarify ciphersuites usage" This reverts commit `e2a44d6c94`. This must be backported to all stable branches. Fixes issue #2459.	2024-03-11 17:58:16 +01:00
Amaury Denoyelle	c499d66f37	MINOR: quic: remove qc_treat_rx_crypto_frms() This commit removes qc_treat_rx_crypto_frms(). This function was used in a single place inside qc_ssl_provide_all_quic_data(). Besides, its naming was confusing as conceptually it is directly linked to quic_ssl module instead of quic_rx. Thus, body of qc_treat_rx_crypto_frms() is inlined directly inside qc_ssl_provide_all_quic_data(). Also, qc_ssl_provide_quic_data() is now only used inside quic_ssl to its scope is set to static. Overall, API for CRYPTO frame handling is now cleaner.	2024-03-11 14:27:51 +01:00
Amaury Denoyelle	b068e758fb	MINOR: quic: simplify rescheduling for handshake On CRYPTO frames reception, tasklet is rescheduled with TASK_HEAVY to limit CPU consumption. This commit slighly simplifies this by regrouping TASK_HEAVY setting and tasklet_wakeup() instructions in a single location in qc_handle_crypto_frm(). All other unnecessary tasklet_wakeup() are removed.	2024-03-11 14:15:36 +01:00
Willy Tarreau	6770259083	MEDIUM: mux-h2: allow to set the glitches threshold to kill a connection Till now it was still needed to write rules to eliminate bad behaving H2 clients, while most of the time it would be desirable to just be able to set a threshold on the level of anomalies on a connection. This is what this patch does. By setting a glitches threshold for frontend and backend, it allows to automatically turn a connection to the error state when the threshold is reached so that the connection dies by itself without having to write possibly complex rules. One subtlety is that we still have the error state being exclusive to the parser's state so this requires the h2c_report_glitches() function to return a status indicating if the threshold was reached or not so that processing can instantly stop and bypass the state update, otherwise the state could be turned back to a valid one (e.g. after parsing CONTINUATION); we should really contemplate the possibility to use H2_CF_ERROR for this. Fortunately there were very few places where a glitch was reported outside of an error path so the changes are quite minor. Now by setting the front value to 1000, a client flooding with short CONTINUATION frames is instantly stopped.	2024-03-11 08:25:08 +01:00
Willy Tarreau	e6e7e1587e	MINOR: mux-h2: always use h2c_report_glitch() The function aims at centralizing counter measures but due to the fact that it only increments the counter by one unit, sometimes it was not used and the value was calculated directly. Let's pass the increment in argument so that it can be used everywhere.	2024-03-11 07:36:56 +01:00
Willy Tarreau	db1a7513b7	[RELEASE] Released version 3.0-dev5 Released version 3.0-dev5 with the following main changes : - BUG/MEDIUM: applet: Fix HTX .rcv_buf callback function to release outbuf buffer - BUG/MAJOR: ssl/ocsp: crash with ocsp when old process exit or using ocsp CLI - BUG/MEDIUM: server: fix dynamic servers initial settings - BUG/MINOR: ssl/cli: duplicate cleaning code in cli_parse_del_crtlist - LICENSE: event_hdl: fix GPL license version - LICENSE: http_ext: fix GPL license version - BUG/MEDIUM: mux-h1: Fix again 0-copy forwarding of chunks with an unknown size - BUG/MINOR: mux-h1: Properly report when mux is blocked during a nego - MINOR: mux-h1: Move checks performed before a shutdown in a dedicated function - MINOR: mux-h1: Move all stuff to detach a stream in an internal function - MAJOR: mux-h1: Drain requests on client side before shut a stream down - MEDIUM: htx/http-ana: No longer close connection on early HAProxy response - MINOR: quic: filter show quic by address - MINOR: quic: specify show quic output fields - MINOR: quic: add MUX output for show quic - CLEANUP: mux-h2: Fix h2s_make_data() comment about the return value - DOC: configuration: clarify ciphersuites usage - BUG/MINOR: config/quic: Alert about PROXY protocol use on a QUIC listener - BUG/MINOR: hlua: Fix log level to the right value when set via TXN:set_loglevel - MINOR: hlua: Be able to disable logging from lua - BUG/MINOR: tools: seed the statistical PRNG slightly better - BUG/MINOR: hlua: fix unsafe lua_tostring() usage with empty stack - BUG/MINOR: hlua: don't use lua_tostring() from unprotected contexts - BUG/MINOR: hlua: fix possible crash in hlua_filter_new() under load - BUG/MINOR: hlua: improper lock usage in hlua_filter_callback() - BUG/MINOR: hlua: improper lock usage in hlua_filter_new() - BUG/MEDIUM: hlua: improper lock usage with SET_SAFE_LJMP() - BUG/MAJOR: hlua: improper lock usage with hlua_ctx_resume() - BUG/MINOR: hlua: don't call ha_alert() in hlua_event_subscribe() - MINOR: hlua: use SEND_ERR to report errors in hlua_event_runner() - CLEANUP: hlua: txn class functions may LJMP - BUG/MINOR: sink: fix a race condition in the TCP log forwarding code - BUILD: thread: move lock label definitions to thread-t.h - BUILD: tree-wide: fix a few missing includes in a few files - BUILD: buf: make b_ncat() take a const for the source - CLEANUP: assorted typo fixes in the code and comments - CLEANUP: fix typo in naming for variable "unused" - CI: run more smoke tests on config syntax to check memory related issues - CI: enable monthly build only test on netbsd-9.3 - CI: skip scheduled builds on forks - BUG/MINOR: ssl/cli: typo in new ssl crl-file CLI description - BUG/MEDIUM: quic: fix connection freeze on post handshake - BUG/MINOR: mux-quic: fix crash on aborting uni remote stream - CLEANUP: log: fix obsolete comment for add_sample_to_logformat_list() - CLEANUP: tree-wide: use proper ERR_* return values for PRE_CHECK fcts - BUG/MINOR: cfgparse: report proper location for log-format-sd errors - MINOR: vars: export var_set and var_unset functions - MINOR: Add aes_gcm_enc converter - BUG/MEDIUM: quic: fix handshake freeze under high traffic - MINOR: quic: always use ncbuf for rx CRYPTO - BUILD: ssl: define EVP_CTRL_AEAD_GET_TAG for older versions - DOC: design: write first notes about ring-v2 - OPTIM: sink: try to merge "dropped" messages faster - OPTIM: sink: drop the sink lock used to count drops - DEV: haring: make haring not depend on the struct ring itself - DEV: haring: split the code between ring and buffer - DEV: haring: automatically use the advertised ring header size - BUILD: solaris: fix compilation errors	2024-03-09 16:50:15 +01:00
matthias sweertvaegher	062ea3a3d4	BUILD: solaris: fix compilation errors Compilation on solaris fails because of usage of names reserved on that platform, i.e. 'queue' and 's_addr'. This patch redefines 'queue' as '_queue' and renames 's_addr' to 'srv_addr' which fixes compilation for now. Future plan: rename 'queue' in code base so define can be removed again. Backporting: 2.9, 2.8	2024-03-09 11:24:54 +01:00
Willy Tarreau	88e141b823	DEV: haring: automatically use the advertised ring header size Instead of emitting a warning, since we don't need the ring struct anymore, we can just read what we need, parse the buffer and use the advertised offset. Thus for now -f is simply ignored.	2024-03-09 11:23:52 +01:00
Willy Tarreau	77d7c35243	DEV: haring: split the code between ring and buffer By splitting the initialization and the parsing of the ring, we'll ease the support for multiple ring sizes and get rid of the annoyances of the optional lock.	2024-03-09 11:23:52 +01:00
Willy Tarreau	4dddbb63a0	DEV: haring: make haring not depend on the struct ring itself haring needs to be self-sufficient about the ring format so that it continues to build when the ring API changes. Let's import the struct ring definition and call it "ring_v1".	2024-03-09 11:23:52 +01:00
Willy Tarreau	758cb450a2	OPTIM: sink: drop the sink lock used to count drops The sink lock was made to prevent event producers from passing while there were other threads trying to print a "dropped" message, in order to guarantee the absence of reordering. It has a serious impact however, which is that all threads need to take the read lock when producing a regular trace even when there's no reader. This patch takes a different approach. The drop counter is shifted left by one so that the lowest bit is used to indicate that one thread is already taking care of trying to dump the counter. Threads only read this value normally, and will only try to change it if it's non-null, in which case they'll first check if they are the first ones trying to dump it, otherwise will simply count another drop and leave. This has a large benefit. First, it will avoid the locking that causes stalls as soon as a slow reader is present. Second, it avoids any write on the fast path as long as there's no drop. And it remains very lightweight since we just need to add +2 or subtract 2*dropped in operations, while offering the guarantee that the sink_write() has succeeded before unlocking the counter. While a reader was previously limiting the traffic to 11k RPS under 4C/8T, now we reach 36k RPS vs 14k with no reader, so readers will no longer slow the traffic down and will instead even speed it up due to avoiding the contention down the chain in the ring. The locking cost dropped from ~75% to ~60% now (it's in ring_write now).	2024-03-09 11:23:52 +01:00
Willy Tarreau	eb7b2ec83a	OPTIM: sink: try to merge "dropped" messages faster When a reader doesn't read fast enough and causes drops, subsequent threads try to produce a "dropped" message. But it takes time to produce and emit this message, in part due to the use of chunk_printf() that relies on vfprintf() which has to parse the printf format, and during this time other threads may continue to increment the counter. This is the reason why this is currently performed in a loop. When reading what is received, it's common to see a large count followed by one or two single-digit counts, indicating that we could possibly have improved that by writing faster. Let's improve the situation a little bit. First we're now using a static message prefixed with enough space to write the digits, and a call to ultoa_r() fills these digits from right to left so that we don't have to process a format string nor perform a copy of the message. Second, we now re-check the counter immediately after having prepared the message so that we still get an opportunity for updating it. In order to avoid too long loops, this is limited to 10 iterations. Tests show that the number of single-digit "dropped" counters on output now dropped roughly by 15-30%. Also, it was observed that with 8 threads, there's almost never more than one retry.	2024-03-09 11:23:52 +01:00
Willy Tarreau	571232535a	DOC: design: write first notes about ring-v2 This explains the observed limitations of the current ring applied to traces and proposes a multi-step, more scalable, improvement.	2024-03-09 11:23:52 +01:00
Willy Tarreau	26cd248feb	BUILD: ssl: define EVP_CTRL_AEAD_GET_TAG for older versions Amaury reported that previous commit `08ac282375` ("MINOR: Add aes_gcm_enc converter") broke the CI on OpenSSL 1.0.2 due to the define above not existing there. Let's just map it to its older name when not existing. For reference, these were renamed when switching to 1.1.0: https://marc.info/?l=openssl-cvs&m=142244190907706&w=2 No backport is needed.	2024-03-08 18:23:34 +01:00
Amaury Denoyelle	1ee7bf5bd9	MINOR: quic: always use ncbuf for rx CRYPTO The previous patch fix the handling of in-order CRYPTO frames which requires the usage of a new buffer for these data as their handling is delayed to run under TASK_HEAVY. In fact, as now all CRYPTO frames handling must be delayed, their handling can be unify. This is the purpose of this commit, which removes the just introduced new buffer. Now, all CRYPTO frames are buffered inside the ncbuf. Unused elements such as crypto_frms member for encryption level are also removed. This commit is not a bugcfix but is a direct follow-up to the last one. As such, it can probably be backported with it to 2.9 to reduce code differences between these versions.	2024-03-08 17:22:48 +01:00
Amaury Denoyelle	81f118cec0	BUG/MEDIUM: quic: fix handshake freeze under high traffic QUIC relies on SSL_do_hanshake() to be able to validate handshake. As this function is computation heavy, it is since 2.9 called only under TASK_HEAVY. This has been implemented by the following patch : `94d20be138` MEDIUM: quic: Heavy task mode during handshake Instead of handling CRYPTO frames immediately during reception, this patch delays the process to run under TASK_HEAVY tasklet. A frame copy is stored in qel.rx.crypto_frms list. However, this frame still reference the receive buffer. If the receive buffer is cleared before the tasklet is rescheduled, it will point to garbage data, resulting in haproxy decryption error. This happens if a fair amount of data is received constantly to preempt the quic_conn tasklet execution. This bug can be reproduced with a fair amount of clients. It is exhibited by 'show quic full' which can report connections blocked on handshake. Using the following commands result in h2load non able to complete the last connections. $ h2load --alpn-list h3 -t 8 -c 800 -m 10 -w 10 -n 8000 "https://127.0.0.1:20443/?s=10k" Also, haproxy QUIC listener socket mode was active to trigger the issue. This forces several connections to share the same reception buffer, rendering the bug even more plausible to occur. It should be possible to reproduce it with connection socket if increasing the clients amount. To fix this bug, define a new buffer under quic_cstream. It is used exclusively to copy CRYPTO data for in-order frame if ncbuf is empty. This ensures data remains accessible even if receive buffer is cleared. Note that this fix is only a temporary step. Indeed, a ncbuf is also already used for out-of-order data. It should be possible to unify its usage for both in and out-of-order data, rendering this new buffer instance unnecessary. In this case, several unneeded elements will become obsolete such as qel.rx.crypto_frms list. This will be done in a future refactoring patch. This must be backported up to 2.9.	2024-03-08 17:22:48 +01:00
Nenad Merdanovic	08ac282375	MINOR: Add aes_gcm_enc converter The converter can be used to encrypt the raw byte input using the AES-GCM algorithm, using provided nonce and key. Co-authored-by: Dragan Dosen (ddosen@haproxy.com)	2024-03-08 17:20:43 +01:00
Nenad Merdanovic	e225e04ba7	MINOR: vars: export var_set and var_unset functions Co-authored-by: Dragan Dosen <ddosen@haproxy.com>	2024-03-08 17:20:43 +01:00
Aurelien DARRAGON	cf37e4cc1b	BUG/MINOR: cfgparse: report proper location for log-format-sd errors When a parsing error occurs inside a log-format-sd expression, we report the location of the log-format directive (which may not be set) instead of reporting the proper log-format-sd directive location where the parsing error occured. 1\|listen test 2\| log-format "%B" # no error 3\| log-format-sd "%bad" # error \| [ALERT] (322261) : config : Parsing [empty.conf:2]: failed to parse log-format-sd : no such format variable 'bad'. If you wanted to emit the '%' character verbatim, you need to use '%%'. The fix consists in using the config hints dedicated to log-format-sd directive instead of the log-format one. The bug was introduced in `8a4e4420` ("MEDIUM: log-format: Use standard HAProxy log system to report errors"). This should be backported to every stable versions.	2024-03-07 11:48:17 +01:00
Aurelien DARRAGON	59f08f65fd	CLEANUP: tree-wide: use proper ERR_* return values for PRE_CHECK fcts httpclient_precheck(), ssl_ocsp_update_precheck(), and resolvers_create_default() functions are registered through REGISTER_PRE_CHECK() macro to be called by haproxy during init from the pre_check_list list. When calling functions registered in pre_check_list, haproxy expects ERR_* return values. However those 3 functions currently use raw return values, so we better use explicit ERR_* macros to prevent breakage in the future if ERR_* values mapping were to change.	2024-03-07 11:48:08 +01:00
Aurelien DARRAGON	2df7e077c7	CLEANUP: log: fix obsolete comment for add_sample_to_logformat_list() Since `833cc794` ("MEDIUM: sample: handle comma-delimited converter list") logformat expressions now support having a comma-delimited converter list right after the fetch. Let's remove a leftover comment from the initial implementation that says otherwise.	2024-03-07 11:47:56 +01:00
Amaury Denoyelle	b0dd4810e7	BUG/MINOR: mux-quic: fix crash on aborting uni remote stream A remote unidirectional stream can be aborted prematurely if application layers cannot identify its type. In this case, a STOP_SENDING frame is emitted. Since QUIC MUX refactoring, a crash would occur in this scenario due to 2 specific characteristics of remote uni streams : * qcs.tx.fctl was not initialized completely. This cause a crash due to BUG_ON() statement inside qcs_destroy(). * qcs.stream is never allocated. This caused qcs_prep_bytes() to crash inside qcc_io_send(). This bug is considered minor as it happens only on very specific QUIC clients. It was detected when using s2n-quic over interop. This does not need to be backported.	2024-03-06 10:41:01 +01:00
Amaury Denoyelle	d8f1ff8648	BUG/MEDIUM: quic: fix connection freeze on post handshake After handshake completion, QUIC server is responsible to emit HANDSHAKE_DONE frame. Some clients wait for it to begin STREAM transfers. Previously, there was no explicit tasklet_wakeup() after handshake completion, which is necessary to emit post-handshake frames. In most cases, this was undetected as most client continue emission which will reschedule the tasklet. However, as there is no tasklet_wakeup(), this is not a consistent behavior. If this bug occurs, it causes a connection freeze, preventing the client to emit any request. The connection is finally closed on idle timeout. To fix this, add an explicit tasklet_wakeup() after handshake completion. It sounds simple enough but in fact it's difficult to find the correct location efor tasklet_wakeup() invocation, as post-handshake is directly linked to connection accept, with different orderings. Notably, if 0-RTT is used, connection can be accepted prior handshake completion. Another major point is that along HANDSHAKE_DONE frame, a series of NEW_CONNECTION_ID frames are emitted. However, these new CIDs allocation must occur after connection is migrated to its new thread as these CIDs are tied to it. A BUG_ON() is present to check this in qc_set_tid_affinity(). With all this in mind, 2 locations were selected for the necessary tasklet_wakeup() : * on qc_xprt_start() : this is useful for standard case without 0-RTT. This ensures that this is done only after connection thread migration. * on qc_ssl_provide_all_quic_data() : this is done on handshake completion with 0-RTT used. In this case only, connection is already accepted and migrated, so tasklet_wakeup() is safe. Note that as a side-change, quic_accept_push_qc() API has evolved to better reflect differences between standard and 0-RTT usages. It is now forbidden to call it multiple times on a single quic_conn instance. A BUG_ON() has been added. This issue is labelled as medium even though it seems pretty rare. It was only reproducible using QUIC interop runner, with haproxy compiled with LibreSSL with quic-go as client. However, affected code parts are pretty sensible, which justify the chosen severity. This should fix github issue #2418. It should be backported up to 2.6, after a brief period of observation. Note that the extra comment added in qc_set_tid_affinity() can be removed in 2.6 as thread migration is not implemented for this version. Other parts should apply without conflict.	2024-03-06 10:39:57 +01:00
William Lallemand	3a3c2b2695	BUG/MINOR: ssl/cli: typo in new ssl crl-file CLI description The `new ssl crl-file` option description on the CLI lacks the dash. Must be backported as far as 2.6.	2024-03-05 14:49:17 +01:00
Ilya Shipitsin	e3af7f832c	CI: skip scheduled builds on forks tracking bleeding edge changes with some rare platforms or modern compilers on scheduled basis is not what usually forks do. let's skip by default in forks, if some fork is interested, it might be enabled locally	2024-03-05 11:50:34 +01:00
Ilya Shipitsin	6142a3fad7	CI: enable monthly build only test on netbsd-9.3 it is interesting to try https://github.com/vmactions/netbsd-vm actions	2024-03-05 11:50:34 +01:00
Ilya Shipitsin	6b54a7b65e	CI: run more smoke tests on config syntax to check memory related issues config syntax check seems add a value on testing code path not covered by VTest, also checks are very fast	2024-03-05 11:50:34 +01:00
Ilya Shipitsin	96cd04f8db	CLEANUP: fix typo in naming for variable "unused" In resolvers.c:rslv_promex_next_ts() and in stick-tables.c:stk_promex_next_ts(), an unused argument was mistakenly called "unsued" instead of "unused". Let's fix this in a separate patch so that it can be omitted from backports if this causes build problems.	2024-03-05 11:50:34 +01:00
Ilya Shipitsin	da3b12ade1	CLEANUP: assorted typo fixes in the code and comments This is 39th iteration of typo fixes The naming issue on the argument called "unsued" instead of "unused" in two functions from resolvers and stick-tables was put into a second patch so that it can be omitted if it were to cause backport issues.	2024-03-05 11:50:34 +01:00
Willy Tarreau	93a0fb74f4	BUILD: buf: make b_ncat() take a const for the source In 2.7 with commit 35df34223b ("MINOR: buffers: split b_force_xfer() into b_cpy() and b_force_xfer()"), b_ncat() was extracted from b_force_xfer() but kept its source variable instead of constant, making it unusable for calls from a const source. Let's just fix it.	2024-03-05 11:50:34 +01:00
Willy Tarreau	0a0041d195	BUILD: tree-wide: fix a few missing includes in a few files Some include files, mostly types definitions, are missing a few includes to define the types they're using, causing include ordering dependencies between files, which are most often not seen due to the alphabetical order of includes. Let's just fix them. These were spotted by building pre-compiled headers for all these files to .h.gch.	2024-03-05 11:50:34 +01:00
Willy Tarreau	ac692d7ee5	BUILD: thread: move lock label definitions to thread-t.h The 'lock_label' enum is defined in thread.h but it's used in a few type files, so let's move it to thread-t.h to allow explicit includes.	2024-03-05 11:50:34 +01:00
Willy Tarreau	962c129dc1	BUG/MINOR: sink: fix a race condition in the TCP log forwarding code That's exactly the same as commit 53bfab080c ("BUG/MINOR: sink: fix a race condition between the writer and the reader") that went into 2.7 and was backported as far as 2.4, except that since the code was duplicated, the second instance was not noticed, leaving the race present. The race has a limited impact, if a forwarder reaches the end of the logs and a new message arrives before it leaves, the forwarder will only wake up after yet another new message will be sent. In practice it remains unnoticeable because for the race to trigger, one needs to have a steady flow of logs, which means the wakeup will happen anyway. This should be backported, but no need to insist on it if it resists.	2024-03-05 11:48:44 +01:00
Aurelien DARRAGON	75c8a1bc2d	CLEANUP: hlua: txn class functions may LJMP Clarify that some txn related class functions may LJMP by adding the __LJMP tag to their prototype.	2024-03-04 16:48:51 +01:00
Aurelien DARRAGON	f364f4670b	MINOR: hlua: use SEND_ERR to report errors in hlua_event_runner() Instead of reporting lua errors using ha_alert(), let's use SEND_ERR() helper which will also try to generate a log message according to lua log settings.	2024-03-04 16:48:48 +01:00
Aurelien DARRAGON	e1b0031650	BUG/MINOR: hlua: don't call ha_alert() in hlua_event_subscribe() hlua_event_subscribe() is meant to be called from a protected lua env during init and/or runtime. As such, only hlua_event_sub() makes uses of it: when an error happens hlua_event_sub() will already raise a Lua exception. Thus it's not relevant to use ha_alert() there as it could generate log pollution (error is relevant from Lua script point of view, not from haproxy one). This could be backported in 2.8.	2024-03-04 16:48:42 +01:00
Aurelien DARRAGON	8670db7a89	BUG/MAJOR: hlua: improper lock usage with hlua_ctx_resume() hlua_ctx_resume() itself can safely be used as-is in a multithreading context because it takes care of taking the lua lock. However, when hlua_ctx_resume() returns, the lock is released and it is thus the caller's responsibility to ensure it owns the lock prior to performing additional manipulations on the Lua stack. Unfortunately, since early haproxy lua implementation, we used to do it wrong: The most common hlua_ctx_resume() pattern we can find in the code (because it was duplicated over and over over time) is the following: \|ret = hlua_ctx_resume() \|switch (ret) { \| case HLUA_E_OK: \| break; \| case HLUA_E_ERRMSG: \| break; \| [...] \|} Problem is: for some of the switch cases, we still perform lua stack manipulations. This is the case for the HLUA_E_ERRMSG for instance where we often use lua_tostring() to retrieve last lua error message on the top of the stack, or sometimes for the HLUA_E_OK case, when we need to perform some lua cleanup logic once the resume ended. But all of this is done WITHOUT the lua lock, so this means that the main lua stack could be accessed simultaneously by concurrent threads when a script was loaded using 'lua-load'. While it is not critical for switch-cases dedicated to error handling, (those are not supposed to happen very often), it can be very problematic for stack manipulations occuring in the HLUA_E_OK case under heavy load for instance. In this case, main lua stack corruptions will eventually happen. This is especially true inside hlua_filter_new(), where this bug was known to cause lua stack corruptions under load, leading to lua errors and even crashing the process as reported by @bgrooot in GH #2467. The fix is relatively simple, once hlua_ctx_resume() returns: we should consider that ANY lua stack access should be lua-lock protected. If the related lua calls may raise lua errors, then (RE)SET_SAFE_LJMP combination should be used as usual (it allows to lock the lua stack and catch lua exceptions at the same time), else hlua_{lock,unlock} may be used if no exceptions are expected. This patch should fix GH #2467. It should be backported to all stable versions. [ada: some ctx adj will be required for older versions as event_hdl doesn't exist prior to 2.8 and filters were implemented in 2.5, thus some chunks won't apply]	2024-03-04 16:48:31 +01:00
Aurelien DARRAGON	19b016f9f8	BUG/MEDIUM: hlua: improper lock usage with SET_SAFE_LJMP() When we want to perform some unsafe lua stack manipulations from an unprotected lua environment, we use SET_SAFE_LJMP() RESET_SAFE_LJMP() combination to lock lua stack and catch potential lua exceptions that may occur between the two. Hence, we regularly find this pattern (duplicated over and over): \|if (!SET_SAFE_LJMP(hlua)) { \| const char error; \| \| if (lua_type(hlua->T, -1) == LUA_TSTRING) \| error = hlua_tostring_safe(hlua->T, -1); \| else \| error = "critical error"; \| SEND_ERR(NULL, ": %s.\n", error); \|} This is wrong because when SET_SAFE_LJMP() returns false (meaning that an exception was caught), then the lua lock was released already, thus the caller is not expected to perform lua stack manipulations (because the main lua stack may be shared between multiple threads). In the pattern above we only want to retrieve the lua exception message which may be found at the top of the stack, to do so we now explicitly take the lua lock before accessing the lua stack. Note that hlua_lock() doesn't catch lua exceptions so only safe lua functions are expected to be used there (lua functions that may NOT raise exceptions). It should be backported to every stable versions. [ada: some ctx adj will be required for older versions as event_hdl doesn't exist prior to 2.8 and filters were implemented in 2.5, thus some chunks won't apply, but other fixes should stay relevant]	2024-03-04 16:47:20 +01:00
Aurelien DARRAGON	d81c2205a3	BUG/MINOR: hlua: improper lock usage in hlua_filter_new() In hlua_filter_new(), after each hlua resume, we systematically try to empty the stack by calling lua_settop(). However we're doing this without locking the lua context, so it is unsafe in multithreading context if the script is loaded using 'lua-load'. To fix the issue, we protect the call with hlua_{lock,unlock}() helpers. This should be backported up to 2.6.	2024-03-04 16:47:18 +01:00
Aurelien DARRAGON	51f291c795	BUG/MINOR: hlua: improper lock usage in hlua_filter_callback() In hlua_filter_callback(), some lua stack work is performed under SET_SAFE_LJMP() guard which also takes care of locking the hlua context when needed. However, a lua_gettop() call is performed out of the guard, thus it is unsafe in multithreading context if the script is loaded using 'lua-load' because in this case the main lua stack is shared between threads and each access to a lua stack must be performed under the lock, thus we move lua_gettop() call under the lock. It should be backported up to 2.6.	2024-03-04 16:47:17 +01:00

... 4 5 6 7 8 ...

22063 Commits