haproxy

mirror of http://git.haproxy.org/git/haproxy.git/ synced 2025-01-29 09:32:49 +00:00

Author	SHA1	Message	Date
William Lallemand	46bea1c616	BUILD: peers: peers-t.h depends on stick-table-t.h peers-t.h uses "struct stktable" as well as STKTABLE_DATA_TYPES which are defined in stick-table-t.h. It works by accident because stick-table-t.h was always included before. But could provoke build issue with EXTRA code. To be backported as far as 2.2.	2022-12-16 15:51:44 +01:00
Aurelien DARRAGON	5594184190	MINOR: stats: introduce stats field ctx Add a new value in stats ctx: field. Implement field support in line dumping parent functions stats_print_proxy_field_json() and stats_dump_proxy_to_buffer(). This will allow child dumping functions to support partial line dumping when needed. ie: when dumping buffer is exhausted: do a partial send and wait for a new buffer to finish the dump. Thanks to field ctx, the function can start dumping where it left off on previous (unterminated) invokation.	2022-12-15 16:53:49 +01:00
Amaury Denoyelle	15f3cc4b38	MINOR: http: extract content-length parsing from H2 Extract function h2_parse_cont_len_header() in the generic HTTP module. This allows to reuse it for all HTTP/x parsers. The function is now available as http_parse_cont_len_header(). Most notably, this will be reused in the next bugfix for the H3 parser. This is necessary to check that content-length header match the length of DATA frames. Thus, it must be backported to 2.6.	2022-12-14 11:34:18 +01:00
Amaury Denoyelle	dbf6ad470b	BUG/MINOR: quic: properly handle alloc failure in qc_new_conn() qc_new_conn() is used to allocate a quic_conn instance and its various internal members. If one allocation fails, quic_conn_release() is used to cleanup things. For the moment, pool_zalloc() is used which ensures that all content is null. However, some members must be initialized to a special values to be able to use quic_conn_release() safely. This is the case for quic_conn lists and its tasklet. Also, some quic_conn internal allocation functions were doing their own cleanup on failure without reset to NULL. This caused an issue with quic_conn_release() which also frees this members. To fix this, these functions now only return an error without cleanup. It is the caller responsibility to free the allocated content, which is done via quic_conn_release(). Without this patch, allocation failure in qc_new_conn() would often result in segfault. This was reproduced easily using fail-alloc at 10%. This should be backported up to 2.6.	2022-12-12 11:44:34 +01:00
Cedric Paillet	e06e31ea3b	MINOR: promex: introduce haproxy_backend_agg_check_status This patch introduces haproxy_backend_agg_check_status metric as we wanted in `42d7c402d` but with the right data source. This patch could be backported as far as 2.4.	2022-12-09 10:54:48 +01:00
Cedric Paillet	7d6644e689	BUG/MINOR: promex: create haproxy_backend_agg_server_status haproxy_backend_agg_server_check_status currently aggregates haproxy_server_status instead of haproxy_server_check_status. We deprecate this and create a new one, haproxy_backend_agg_server_status to clarify what it really does. This patch could be backported as far as 2.4.	2022-12-09 10:54:27 +01:00
Willy Tarreau	9192d20f02	MINOR: pools: make DEBUG_UAF a runtime setting Since the massive pools cleanup that happened in 2.6, the pools architecture was made quite more hierarchical and many alternate code blocks could be moved to runtime flags set by -dM. One of them had not been converted by then, DEBUG_UAF. It's not much more difficult actually, since it only acts on a pair of functions indirection on the slow path (OS-level allocator) and a default setting for the cache activation. This patch adds the "uaf" setting to the options permitted in -dM so that it now becomes possible to set or unset UAF at boot time without recompiling. This is particularly convenient, because every 3 months on average, developers ask a user to recompile haproxy with DEBUG_UAF to understand a bug. Now it will not be needed anymore, instead the user will only have to disable pools and enable uaf using -dMuaf. Note that -dMuaf only disables previously enabled pools, but it remains possible to re-enable caching by specifying the cache after, like -dMuaf,cache. A few tests with this mode show that it can be an interesting combination which catches significantly less UAF but will do so with much less overhead, so it might be compatible with some high-traffic deployments. The change is very small and isolated. It could be helpful to backport this at least to 2.7 once confirmed not to cause build issues on exotic systems, and even to 2.6 a bit later as this has proven to be useful over time, and could be even more if it did not require a rebuild. If a backport is desired, the following patches are needed as well: CLEANUP: pools: move the write before free to the uaf-only function CLEANUP: pool: only include pool-os from pool.c not pool.h REORG: pool: move all the OS specific code to pool-os.h CLEANUP: pools: get rid of CONFIG_HAP_POOLS DEBUG: pool: show a few examples in -dMhelp	2022-12-08 18:54:59 +01:00
Willy Tarreau	4da51bd190	CLEANUP: pools: get rid of CONFIG_HAP_POOLS This one was set in defaults.h only when neither DEBUG_NO_POOLS nor DEBUG_UAF were set. This was not the most convenient location to look for it, and it was only used in pool.c to decide on the initial value of POOL_DBG_NO_CACHE. Let's just use DEBUG_NO_POOLS \|\| DEBUG_UAF directly on this flag and get rid of the intermediary condition. This also has the benefit of removing a double inversion, which is always nice for understanding.	2022-12-08 17:45:08 +01:00
Willy Tarreau	a95636682d	REORG: pool: move all the OS specific code to pool-os.h Till now pool-os used to contain a mapping from pool_{alloc,free}_area() to pool_{alloc,free}_area_uaf() in case of DEBUG_UAF, or the regular malloc-based function. And the _uaf() functions were in pool.c. But since 2.4 with the first cleanup of the pools, there has been no more calls to pool_{alloc,free}_area() from anywhere but pool.c, from exactly one place each. As such, there's no more need to keep _uaf() apart in pool.c, we can inline it into pool-os.h and leave all the OS stuff there, with pool.c calling either based on DEBUG_UAF. This is cleaner with less round trips between both files and easier to find.	2022-12-08 17:32:57 +01:00
Willy Tarreau	76a97a98ca	CLEANUP: pool: only include pool-os from pool.c not pool.h There's no need for the low-level pool functions to be known from all callers anymore, they're only used by pool.c. Let's reduce the amount of header files processed.	2022-12-08 17:32:40 +01:00
Willy Tarreau	5ab3c61932	BUILD: atomic: atomic.h may need compiler.h on ARMv8.2-a We get a build error in ncbuf.c when building for ARMv8.2-a because ncbuf has minimal includes and among them bug.h which includes atomic.h. Atomic.h may use "forceinline" without including compiler.h, hence the build error. It was verified that adding it doesn't inflate the total headers. Since all other C files include api.h which already covers this, there's no real need to bapkport this. The issue was already there in 2.3 though.	2022-12-08 08:36:24 +01:00
Aurelien DARRAGON	7d541a91ec	BUG/MINOR: checks: restore legacy on-error fastinter behavior With previous commit, `9e080bf` ("BUG/MINOR: checks: make sure fastinter is used even on forced transitions"), on-error mark-down\|sudden-death\|fail-check are now working as expected. However, on-error fastinter remains broken because srv_getinter(), used in the above commit to check the expiration date, won't return fastinter interval if server health is maxed out (which is the case with on-error fastinter mode). To fix this, we introduce a check flag named CHK_ST_FASTINTER. This flag is set when on-error is triggered. This way we can force srv_getinter() to return fastinter interval whenever the flag is set. The flag is automatically cleared as soon as the new check task expiry is recalculated in process_chk_conn(). This restores original behavior prior to `d114f4a` ("MEDIUM: checks: spread the checks load over random threads"). It must be backported to 2.7 along with the aforementioned commits.	2022-12-07 17:03:55 +01:00
Ilya Shipitsin	5fa29b8a74	CLEANUP: assorted typo fixes in the code and comments This is 34th iteration of typo fixes	2022-12-07 09:08:18 +01:00
Aurelien DARRAGON	22f82f81e5	MINOR: server/event_hdl: add support for SERVER_UP and SERVER_DOWN events We're using srv_update_status() as the only event source or UP/DOWN server events in an attempt to simplify the support for these 2 events. It seems srv_update_status() is the common path for server state changes anyway Tested with server state updated from various sources: - the cli - server-state file (maybe we could disable this or at least don't publish in global event queue in the future if it ends in slower startup for setups relying on huge server state files) - dns records (ie: srv template) (again, could be fined tuned to only publish in server specific subscriber list and no longer in global subscription list if mass dns update tend to slow down srv_update_status()) - normal checks and observe checks (HCHK_STATUS_HANA) (same as above, if checks related state update storms are expected) - lua scripts - html stats page (admin mode)	2022-12-06 10:22:07 +01:00
Aurelien DARRAGON	129ecf441f	MINOR: server/event_hdl: add support for SERVER_ADD and SERVER_DEL events Basic support for ADD and DEL server events are added through this commit: SERVER_ADD is published on dynamic server addition through cli. SERVER_DEL is published on dynamic server deletion through cli. This work depends on: "MINOR: event_hdl: add event handler base api" "MINOR: server: add srv->rid (revision id) value"	2022-12-06 10:22:07 +01:00
Aurelien DARRAGON	745ce8e8ad	MINOR: stats: add server revision id support Make use of the new srv->rid value in stats. Stat is referred as ST_F_SRID, it is now used in stats_fill_sv_stats function in order to be included in csv and json stats dumps. Moreover, "rid: $value" will be displayed next to server puid in html stats page if "stats show-legend" is specified in the stats frontend. (mouse hovering tooltip) Depends on the following commit: "MINOR: server: add srv->rid (revision id) value"	2022-12-06 10:22:06 +01:00
Aurelien DARRAGON	61e3894dfe	MINOR: server: add srv->rid (revision id) value With current design, we could not distinguish between previously existing deleted server and a new server reusing the deleted server name/id. This can cause some confusion when auditing stats/events/logs, because the new server will look similar to the old one. To address this, we're adding a new value in server structure: rid rid (revision id) value is an unsigned 32bits value that is set upon server creation. Value is derived from a global counter that starts at 0 and is incremented each time one or multiple server deletions are followed by a server addition (meaning that old name/id reuse could occur). Thanks to this revision id, it is now easy to tell whether the server we're looking at is the same as before or if it has been deleted and re-added in the meantime. (combining server name/id + server revision id yields a process-wide unique identifier)	2022-12-06 10:22:06 +01:00
Amaury Denoyelle	d3083c9df9	MINOR: quic: reconnect quic-conn socket on address migration UDP addresses may change over time for a QUIC connection. When using quic-conn owned socket, we have to detect address change to break the bind/connect association on the socket. For the moment, on change detected, QUIC connection socket is closed and a new one is opened. In the future, we may improve this by trying to keep the original socket and reexecute only bind/connect syscalls. This change is part of quic-conn owned socket implementation. It may be backported to 2.7 after a period of observation.	2022-12-02 14:45:43 +01:00
Amaury Denoyelle	7c9fdd9c3a	MEDIUM: quic: move receive out of FD handler to quic-conn io-cb This change is the second part for reception on QUIC connection socket. All operations inside the FD handler has been delayed to quic-conn tasklet via the new function qc_rcv_buf(). With this change, buffer management on reception has been simplified. It is now possible to use a local buffer inside qc_rcv_buf() instead of quic_receiver_buf(). This change is part of quic-conn owned socket implementation. It may be backported to 2.7 after a period of observation.	2022-12-02 14:45:43 +01:00
Amaury Denoyelle	5b41486b7f	MEDIUM: quic: use quic-conn socket for reception Try to use the quic-conn socket for reception if it is allocated. For this, the socket is inserted in the fdtab. This will call the new handler quic_conn_io_cb() which is responsible to process the recv() system call. It will reuse datagram dispatch for simplicity. However, this is guaranteed to be called on the quic-conn thread, so it will be more efficient to use a dedicated buffer. This will be implemented in another commit. This patch should improve performance by reducing contention on the receiver socket. However, more gain can be obtained when the datagram dispatch operation will be skipped. Older quic_sock_fd_iocb() is renamed to quic_lstnr_sock_fd_iocb() to emphasize its usage for the receiver socket. This change is part of quic-conn owned socket implementation. It may be backported to 2.7 after a period of observation.	2022-12-02 14:45:43 +01:00
Amaury Denoyelle	40909dfec5	MINOR: quic: allocate a socket per quic-conn Allocate quic-conn owned socket if possible. This requires that this is activated in haproxy configuration. Also, this is done only if local address is known so it depends on the support of IP_PKTINFO. For the moment this socket is not used. This causes QUIC support to be broken as received datagram are not read. This commit will be completed by a following patch to support recv operation on the newly allocated socket. This change is part of quic-conn owned socket implementation. It may be backported to 2.7 after a period of observation.	2022-12-02 14:45:43 +01:00
Amaury Denoyelle	75839a44e7	MINOR: quic: startup detect for quic-conn owned socket support To be able to use individual sockets for QUIC connections, we rely on the OS network stack which must support UDP sockets binding on the same local address. Add a detection code for this feature executed on startup. When the first QUIC listener socket is binded, a test socket is created and binded on the same address. If the bind call fails, we consider that it's impossible to use individual socket for QUIC connections. A new global option GTUNE_QUIC_SOCK_PER_CONN is defined. If startup detect fails, this value is resetted from global options. For the moment, there is no code to activate the option : this will be in a follow-up patch with the introduction of a new configuration option. This change is part of quic-conn owned socket implementation. It may be backported to 2.7 after a period of observation.	2022-12-02 14:45:43 +01:00
Amaury Denoyelle	eec0b3c1bd	MINOR: quic: detect connection migration Detect connection migration attempted by the client. This is done by comparing addresses stored in quic-conn with src/dest addresses of the UDP datagram. A new function qc_handle_conn_migration() has been added. For the moment, no operation is conducted and the function will be completed during connection migration implementation. The only notable things is the increment of a new counter "quic_conn_migration_done". This should be backported up to 2.7.	2022-12-02 14:45:43 +01:00
Amaury Denoyelle	21e611dc89	MINOR: tools: add port for ipcmp as optional criteria Complete ipcmp() function with a new argument <check_port>. If this argument is true, the function will compare port values besides IP addresses and return true only if both are identical. This commit will simplify QUIC connection migration detection. As such, it should be backported to 2.7.	2022-12-02 14:45:43 +01:00
Amaury Denoyelle	8687b63c69	MINOR: quic: extract datagram parsing code Extract individual datagram parsing code outside of datagrams list loop in quic_lstnr_dghdlr(). This is moved in a new function named quic_dgram_parse(). To complete this change, quic_lstnr_dghdlr() has been moved into quic_sock source file : it belongs to QUIC socket lower layer and is directly called by quic_sock_fd_iocb(). This commit will ease implementation of quic-conn owned socket. New function quic_dgram_parse() will be easily usable after a receive operation done on quic-conn IO-cb. This should be backported up to 2.7.	2022-12-02 14:45:43 +01:00
Amaury Denoyelle	518c98f150	MINOR: quic: remove qc from quic_rx_packet quic_rx_packet struct had a reference to the quic_conn instance. This is useless as qc instance is always passed through function argument. In fact, pkt.qc is used only in qc_pkt_decrypt() on key update, even though qc is also passed as argument. Simplify this by removing qc field from quic_rx_packet structure definition. Also clean up qc_pkt_decrypt() documentation and interface to align it with other quic-conn related functions. This should be backported up to 2.7.	2022-12-02 14:45:43 +01:00
William Lallemand	52ddd99940	MEDIUM: ssl: rename the struct "cert_key_and_chain" to "ckch_data" Rename the structure "cert_key_and_chain" to "ckch_data" in order to avoid confusion with the store whcih often called "ckchs". The "cert_key_and_chain ckch" were renamed "ckch_data data", so we now have store->data instead of ckchs->ckch. Marked medium because it changes the API.	2022-12-02 11:48:30 +01:00
Aurelien DARRAGON	68e692da02	MINOR: event_hdl: add event handler base api Adding base code to provide subscribe/publish API for internal events processing. event_hdl provides two complementary APIs, both are implemented in src/event_hdl.c and include/haproxy/event_hdl{-t.h,.h}: One API targeting developers that want to register event handlers that will be notified on specific events. (SUBSCRIBE) One API targeting developers that want to notify registered handlers about an event. (PUBLISH) This feature is being considered to address the following scenarios: - mailers code refactoring (getting rid of deprecated tcp-check ruleset implementation) - server events from lua code (registering user defined lua function that is executed with relevant data when a server is dynamically added/removed or on server state change) - providing a stable and easy to use API for upcoming developments that rely on specific events to perform actions. (e.g: ressource cleanup when a server is deleted from haproxy) At this time though, we don't have much use cases in mind in addition to server events handling, but the API is aimed at being multipurpose so that new event families, with their own particularities, can be easily implemented afterwards (and hopefully) without requiring breaking changes to the API. Moreover, you should know that the API was not designed to cope well with high rate event publishing. Mostly because publishing means iterating over unsorted subscriber list. So it won't scale well as subscriber list increases, but it is intended in order to keep the code simple and versatile. Instead, it is assumed that events implemented using this API should be periodic events, and that events related to critical io/networking processing should be handled using dedicated facilities anyway. (After all, this is meant to be a general purpose event API) Apart from being easily extensible, one of the main goals of this API is to make subscriber code as simple and safe as possible. This is done by offering multiple event handling modes: - SYNC mode: publishing code directly leverages handler code (callback function) and handler code has a direct access to "live" event data (pointers mostly, alongside with lock hints/context so that accessing data pointers can be done properly) - normal ASYNC mode: handler is executed in a backward compatible way with sync mode, so that it is easy to switch from and to SYNC/ASYNC mode. Only here the handler has access to "offline" event data, and not "live" data (ptrs) so that data consistency is guaranteed. By offline, you should understand "snapshot" of relevant data at the time of the event, so that the handler can consume it later (even if associated ressource is not valid anymore) - advanced ASYNC mode same as normal ASYNC mode, but here handler is not a function that is executed with event data passed as argument: handler is a user defined tasklet that is notified when event occurs. The tasklet may consume pending events and associated data through its own message queue. ASYNC mode should be considered first if you don't rely on live event data and you wan't to make sure that your code has the lowest impact possible on publisher code. (ie: you don't want to break stuff) Internal API documentation will follow: You will find more details about the notions we roughly approached here.	2022-12-02 09:40:52 +01:00
Willy Tarreau	eaded987ee	[RELEASE] Released version 2.8-dev0 Released version 2.8-dev0 with the following main changes : - MINOR: version: mention that it's development again	2022-12-01 15:25:34 +01:00
Willy Tarreau	989c55dc2f	MINOR: version: mention that it's development again This essentially reverts `d705b85a4a`.	2022-12-01 15:24:10 +01:00
Willy Tarreau	d705b85a4a	MINOR: version: mention that it's stable now This version will be maintained up to around Q1 2024. The INSTALL file also mentions it.	2022-12-01 15:15:24 +01:00
Stefan Eissing	b82296c10e	BUILD: quic: allow build with USE_QUIC and USE_OPENSSL_WOLFSSL WolfSSL does not implement the TLS1_3_CK_AES_128_CCM_SHA256 cipher as well as the SSL_ERROR_WANT_ASYNC, SSL_ERROR_WANT_ASYNC_JOB and SSL_ERROR_WANT_CLIENT_HELLO_CB error codes. This patch disables them for WolfSSL. Signed-off-by: William Lallemand <wlallemand@haproxy.org>	2022-11-30 17:38:27 +01:00
Ilya Shipitsin	6f86eaae4f	CLEANUP: assorted typo fixes in the code and comments This is 33rd iteration of typo fixes	2022-11-30 14:02:36 +01:00
Willy Tarreau	d5cae6a0c7	MINOR: stick-table: change the API of the function used to calculate the shard The function used to calculate the shard number currently requires a stktable_key on input for this. Unfortunately, it happens that peers currently miss this calculation and they do not provide stktable_key at all, instead they're open-coding all the low-level stick-table work (hence why it's missing). Thus we'll need to be able to calculate the shard number in keys coming from peers as well but the current API does not make it possible. This commit addresses this by inverting the order where the length and the shard number are used. Now the low-level function is independent on stksess and stktable_key, it takes a table, pointer and length and does all the job. The upper function takes care of the type and key to get the its length, and is for use only from stick-table code. This doesn't change anything except that the low-level one will be usable from outside (hence why it's exported now).	2022-11-29 18:06:42 +01:00
Amaury Denoyelle	d64a26f023	CLEANUP: ncbuf: inline small functions ncbuf API relies on lot of small functions. Mark these functions as inline to reduce call invocations and facilitate compiler optimizations to reduce code size. This should be backported up to 2.6.	2022-11-29 15:14:39 +01:00
Willy Tarreau	56460ee52a	MINOR: stick-table: store a per-table hash seed and use it Instead of using memcpy() to concatenate the table's name to the key when allocating an stksess, let's compute once for all a per-table seed at boot time and use it to calculate the key's hash. This saves two memcpy() and the usage of a chunk, it's always nice in a fast path. When tested under extreme conditions with a 80-byte long table name, it showed a 1% performance increase.	2022-11-28 18:58:06 +01:00
Willy Tarreau	63b5b33ba8	CLEANUP: stick-table: fill alignment holes in the stktable struct There were two 32-bit holes in the stktable struct surrounding 32-bit words, so let's just reorder them a little bit to address the issue.	2022-11-28 18:49:55 +01:00
William Lallemand	0a2d63236c	BUG/MINOR: ssl: shut the ca-file errors emitted during httpclient init With an OpenSSL library which use the wrong OPENSSLDIR, HAProxy tries to load the OPENSSLDIR/certs/ into @system-ca, but emits a warning when it can't. This patch fixes the issue by allowing to shut the error when the SSL configuration for the httpclient is not explicit. Must be backported in 2.6.	2022-11-24 19:14:19 +01:00
Uriah Pollock	3cbf09ed64	MEDIUM: ssl: add minimal WolfSSL support with OpenSSL compatibility mode This adds a USE_OPENSSL_WOLFSSL option, wolfSSL must be used with the OpenSSL compatibility layer. This must be used with USE_OPENSSL=1. WolfSSL build options: ./configure --prefix=/opt/wolfssl --enable-haproxy HAProxy build options: USE_OPENSSL=1 USE_OPENSSL_WOLFSSL=1 WOLFSSL_INC=/opt/wolfssl/include/ WOLFSSL_LIB=/opt/wolfssl/lib/ ADDLIB='-Wl,-rpath=/opt/wolfssl/lib' Using at least the commit 54466b6 ("Merge pull request #5810 from Uriah-wolfSSL/haproxy-integration") from WolfSSL. (2022-11-23). This is still to be improved, reg-tests are not supported yet, and more tests are to be done. Signed-off-by: William Lallemand <wlallemand@haproxy.org>	2022-11-24 11:29:03 +01:00
Uriah Pollock	79320cb074	BUILD: quic: use openssl-compat.h instead of openssl/ssl.h Replace the include of openssl/ssl.h by openssl-compat.h. Signed-off-by: William Lallemand <wlallemand@haproxy.org>	2022-11-24 11:29:03 +01:00
Willy Tarreau	946d370d22	BUILD: flags: really restrict the cases where flags are exposed A number of internal flags started to be exposed to external programs at the location of their definition since commit `77acaf5af` ("MINOR: flags: add a new file to host flag dumping macros"). This allowed the "flags" utility to decode many more of them and always correctly. The condition to expose them was to rely on the preliminary definition of EOF that indicates that stdio is already included. But this was a wrong approach. It only guarantees that snprintf() can safely be used but still causes large functions to be built. But stdio is often included before some of these includes, so these heavy inline functions actually have to be compiled in many cases. The result is that the build time significantly increased, especially with fast compilers like gcc -O0 which took +50% or TCC which took +100%! This patch addresses the problem by instead relying on an explicit macro HA_EXPOSE_FLAGS that the calling program must explicitly define before including these files. flags.c does this and that's all. The previous build time is now restored with a speed up of 20 to 50% depending on the build options.	2022-11-24 08:32:27 +01:00
Willy Tarreau	08093cc0fa	CLEANUP: tools: do not needlessly include xxhash nor cli from tools.h These includes brought by commit `9c76637ff` ("MINOR: anon: add new macros and functions to anonymize contents") resulted in an increase of exactly 20% of the number of lines to build. These include are not needed there, only tools.c needs xxhash.h.	2022-11-24 08:30:48 +01:00
Willy Tarreau	4d46638540	BUILD: compiler: include compiler's definitions before ours Building with TCC caused a warning on __attribute__() being redefined, because we do define it on compilers that don't have it, but we didn't include the compiler's definitions first to leave it a chance to expose its definitions. The correct way to do this would be to include sys/cdefs.h but we currently don't include it explicitly and a few reports on the net mention some platforms where it could be missing by default. Let's use inttypes.h instead, it always causes it (or its equivalent) to be included and we know it's present on supported platforms since we already depend on it. No backport is needed.	2022-11-24 08:30:48 +01:00
Willy Tarreau	fc50b9dd14	BUG/MAJOR: sched: protect task during removal from wait queue The issue addressed by commit `fbb934da9` ("BUG/MEDIUM: stick-table: fix a race condition when updating the expiration task") is still present when thread groups are enabled, but this time it lies in the scheduler. What happens is that a task configured to run anywhere might already have been queued into one group's wait queue. When updating a stick table entry, sometimes the task will have to be dequeued and requeued. For this a lock is taken on the current thread group's wait queue lock, but while this is necessary for the queuing, it's not sufficient for dequeuing since another thread might be in the process of expiring this task under its own group's lock which is different. This is easy to test using 3 stick tables with 1ms expiration, 3 track-sc rules and 4 thread groups. The process crashes almost instantly under heavy traffic. One approach could consist in storing the group number the task was queued under in its descriptor (we don't need 32 bits to store the thread id, it's possible to use one short for the tid and another one for the tgrp). Sadly, no safe way to do this was figured, because the race remains at the moment the thread group number is checked, as it might be in the process of being changed by another thread. It seems that a working approach could consist in always having it associated with one group, and only allowing to change it under this group's lock, so that any code trying to change it would have to iterately read it and lock its group until the value matches, confirming it really holds the correct lock. But this seems a bit complicated, particularly with wait_expired_tasks() which already uses upgradable locks to switch from read state to a write state. Given that the shared tasks are not that common (stick-table expirations, rate-limited listeners, maybe resolvers), it doesn't seem worth the extra complexity for now. This patch takes a simpler and safer approach consisting in switching back to a single wq_lock, but still keeping separate wait queues. Given that shared wait queues are almost always empty and that otherwise they're scanned under a read lock, the contention remains manageable and most of the time the lock doesn't even need to be taken since such tasks are not present in a group's queue. In essence, this patch reverts half of the aforementionned patch. This was tested and confirmed to work fine, without observing any performance degradation under any workload. The performance with 8 groups on an EPYC 74F3 and 3 tables remains twice the one of a single group, with the contention remaining on the table's lock first. No backport is needed.	2022-11-22 09:10:08 +01:00
Willy Tarreau	c21a187ec0	MINOR: server/idle: make the next_takeover index per-tgroup In order to evenly pick idle connections from other threads, there is a "next_takeover" index in the server, that is incremented each time a connection is picked from another thread, and indicates which one to start from next time. With thread groups this doesn't work well because the index is the same regardless of the group, and if a group has more threads than another, there's even a risk to reintroduce an imbalance. This patch introduces a new per-tgroup storage in servers which, for now, only contains an instance of this next_takeover index. This way each thread will now only manipulate the index specific to its own group, and the takeover will become fair again. More entries may come soon.	2022-11-21 19:21:07 +01:00
Willy Tarreau	9dc231a6b2	BUG/MINOR: server/idle: at least use atomic stores when updating max_used_conns In 2.2, some idle conns usage metrics were added by commit `cf612a045` ("MINOR: servers: Add a counter for the number of currently used connections."), which mentioned that the operation doesn't need to be atomic since we're not seeking exact values. This is true but at least we should use atomic stores to make sure not to cause invalid values to appear on archs that wouldn't guarantee atomicity when writing an int, such as writing two 16-bit words. This is pretty unlikely on our targets but better keep the code safe against this. This may be backported as far as 2.2.	2022-11-21 19:21:07 +01:00
Willy Tarreau	2fba08faec	MINOR: cli/pools: add sorting capabilities to "show pools" The "show pools" command is used a lot for debugging but didn't get much love over the years. This patch brings new capabilities: - sorting the output by pool names to ese their finding ("byname"). - sorting the output by reverse item size to spot the biggest ones("bysize") - sorting the output by reverse number of allocated bytes ("byusage") The last one (byusage) also omits displaying the ones with zero allocation. In addition, an optional max number of output entries may be passed so as to dump only the N most relevant ones.	2022-11-21 10:14:52 +01:00
Ilya Shipitsin	ace3da8dd4	CLEANUP: quic: replace "choosen" with "chosen" all over the code Some variables were set as "choosen" instead of "chosen", this is dedicated spelling fix	2022-11-21 09:22:28 +01:00
Frédéric Lécaille	74b5f7b31b	BUG/MAJOR: quic: Crash after discarding packet number spaces This previous patch was not sufficient to prevent haproxy from crashing when some Handshake packets had to be inspected before being possibly retransmitted: "BUG/MAJOR: quic: Crash upon retransmission of dgrams with several packets" This patch introduced another issue: access to packets which have been released because still attached to others (in the same datagram). This was the case for instance when discarding the Initial packet number space before inspecting an Handshake packet in the same datagram through its ->prev or member in our case. This patch implements quic_tx_packet_dgram_detach() which detaches a packet from the adjacent ones in the same datagram to be called when ackwowledging a packet (as done in the previous commit) and when releasing its memory. This was, we are sure the released packets will not be accessed during retransmissions. Thank you to @gabrieltz for having reported this issue in GH #1903. Must be backported to 2.6.	2022-11-20 18:35:46 +01:00
Fr�d�ric L�caille	814645f42f	BUG/MAJOR: quic: Crash upon retransmission of dgrams with several packets As revealed by some traces provided by @gabrieltz in GH #1903 issue, there are clients (chrome I guess) which acknowledge only one packet among others in the same datagram. This is the case for the first datagram sent by a QUIC haproxy listener made an Initial packet followed by an Handshake one. In this identified case, this is the Handshake packet only which is acknowledged. But if the client is able to respond with an Handshake packet (ACK frame) this is because it has successfully parsed the Initial packet. So, why not also acknowledging it? AFAIK, this is mandatory. On our side, when restransmitting this datagram, the Handshake packet was accessed from the Initial packet after having being released. Anyway. There is an issue on our side. Obviously, we must not expect an implementation to respect the RFC especially when it want to build an attack ;) With this simple patch for each TX packet we send, we also set the previous one in addition to the next one. When a packet is acknowledged, we detach the next one and the next one in the same datagram from this packet, so that it cannot be resent when resending these packets (the previous one, in our case). Thank you to @gabrieltz for having reported this issue. Must be backported to 2.6.	2022-11-19 04:56:55 +01:00

1 2 3 4 5 ...

6593 Commits