haproxy

mirror of http://git.haproxy.org/git/haproxy.git/ synced 2025-01-01 09:42:02 +00:00

Author	SHA1	Message	Date
Frédéric Lécaille	0b872e24cd	REORG: quic: Move qc_may_probe_ipktns() to quic_tls.h This function is in relation with the Initial packet number space which is more linked to the QUIC TLS specifications. Let's move it to quic_tls.h to be inlined.	2023-11-28 15:37:50 +01:00
Frédéric Lécaille	c93ebcc59b	REORG: quic: Move quic_build_post_handshake_frames() to quic_conn module Move quic_build_post_handshake_frames() from quic_rx.c to quic_conn.c. This is a function which is also called from the TX part (quic_tx.c).	2023-11-28 15:37:50 +01:00
Frédéric Lécaille	3482455ddd	REORG: quic: Move qc_handle_conn_migration() to quic_conn.c This function manipulates only quic_conn objects. Its location is definitively in quic_conn.c.	2023-11-28 15:37:50 +01:00
Frédéric Lécaille	581549851c	REORG: quic: Move QUIC path definitions/declarations to quic_cc module Move quic_path struct from quic_conn-t.h to quic_cc-t.h and rename it to quic_cc_path. Update the code consequently. Also some inlined functions in relation with QUIC path to quic_cc.h	2023-11-28 15:37:50 +01:00
Frédéric Lécaille	f32fc26b62	REORG: quic: Rename some functions used upon ACK receipt Rename some functions to reflect more their jobs. Move qc_release_lost_pkts() to quic_loss.c	2023-11-28 15:37:50 +01:00
Frédéric Lécaille	f74d882ef0	REORG: quic: Move the QUIC DCID parser to quic_sock.c Move quic_get_dgram_dcid() from quic_conn.c to quic_sock.c because only used in this file and define it as static.	2023-11-28 15:37:50 +01:00
Frédéric Lécaille	09ab48472c	REORG: quic: Move several inlined functions from quic_conn.h Move quic_pkt_type(), quic_saddr_cpy(), quic_write_uint32(), max_available_room(), max_stream_data_size(), quic_packet_number_length(), quic_packet_number_encode() and quic_compute_ack_delay_us() to quic_tx.c because only used in this file. Also move quic_ack_delay_ms() and quic_read_uint32() to quic_tx.c because they are used only in this file. Move quic_rx_packet_refinc() and quic_rx_packet_refdec() to quic_rx.h header. Move qc_el_rx_pkts(), qc_el_rx_pkts_del() and qc_list_qel_rx_pkts() to quic_tls.h header.	2023-11-28 15:37:47 +01:00
Frédéric Lécaille	831764641f	REORG: quic: Move QUIC CRYPTO stream definitions/declarations to QUIC TLS Move quic_cstream struct definition from quic_conn-t.h to quic_tls-t.h. Its pool is also moved from quic_conn module to quic_tls. Same thing for quic_cstream_new() and quic_cstream_free().	2023-11-28 15:37:22 +01:00
Frédéric Lécaille	ae885b9b68	REORG: quic: Move CRYPTO data buffer defintions to QUIC TLS module Move quic_crypto_buf struct definition from quic_conn-t.h to quic_tls-t.h. Also move its pool definition/declaration to quic_tls-t.h/quic_tls.c.	2023-11-28 15:37:22 +01:00
Frédéric Lécaille	5f9bd6bbce	BUILD: quic: Missing RX header inclusions Fix such building issues: In file included from src/quic_tx.c:15: include/haproxy/quic_tx.h:51:23: warning: ‘struct quic_rx_packet’ Do not know why the compiler warns about such missing header inclusions just now. It should have complained a long time ago during the big QUIC source code split.	2023-11-28 15:37:22 +01:00
Frédéric Lécaille	f949f7df83	REORG: quic: QUIC connection types header cleaning Move UDP datagram definitions from quic_conn-t.h to quic_sock-t.h Move debug quic_rx_crypto_frm struct from quic_conn-t.h to quic_trace-t.h	2023-11-28 15:37:22 +01:00
Frédéric Lécaille	0fc0d45745	REORG: quic: Add a new module to handle QUIC connection IDs Move quic_cid and quic_connnection_id from quic_conn-t.h to new quic_cid-t.h header. Move defintions of quic_stateless_reset_token_init(), quic_derive_cid(), new_quic_cid(), quic_get_cid_tid() and retrieve_qc_conn_from_cid() to quic_cid.c new C file.	2023-11-28 15:37:22 +01:00
Frédéric Lécaille	21615d4376	CLEANUP: quic: Remove dead definitions/declarations Remove useless definitions and declarations.	2023-11-28 15:37:22 +01:00
Christopher Faulet	2a307d273a	BUG/MEDIUM: stconn: Don't perform zero-copy FF if opposite SC is blocked When zero-copy data fast-forwarding is inuse, if the opposite SC is blocked, there is no reason to try to fast-forward more data. Worst, in some cases, this can lead to a receive loop of the producer side while the consumer side is blocked. No backport needed.	2023-11-28 14:01:56 +01:00
Amaury Denoyelle	e97489a526	MINOR: trace: support -dt optional format Add an optional argument for "-dt". This argument is interpreted as a list of several trace statement separated by comma. For each statement, a specific trace name can be specifed, or none to act on all sources. Using double-colon separator, it is possible to add specifications on the wanted level and verbosity.	2023-11-27 17:15:14 +01:00
Amaury Denoyelle	cef29d3708	MINOR: trace: define simple -dt argument Add '-dt' haproxy process argument. This will automatically activate all trace sources on stderr with the error level. This could be useful to troubleshoot issues such as protocol violations.	2023-11-27 17:10:18 +01:00
Willy Tarreau	3ac9912837	OPTIM: pattern: save memory and time using ebst instead of ebis In the pat_ref_elt struct, the pattern string is stored outside of the node element, using a pointer to an strdup(). Not only this needlessly wastes at least 16-24 bytes per entry (8 for the pointer, 8-16 for the allocator), it also makes the tree descent less efficient since both the node and the string have to be visited for each layer (hence at least two cache lines). Let's use an ebmb storage and place the pattern right at the end of the pat_ref_elt, making it a variable-sized element instead. The set-map test below jumps from 173 to 182 kreq/s/core, and the memory usage drops from 356 MB to 324 MB: http-request set-map(/dev/null) %[rand(1000000)] 1 This is even more visible with large maps: after loading 16M IP addresses into a map, the process uses this amount of memory: - 3.15 GB with haproxy-2.8 - 4.21 GB with haproxy-2.9-dev11 - 3.68 GB with this patch So that's a net saving of 32 bytes per entry here, which cuts in half the extra cost of the tree, and loading a large map takes about 20% less time.	2023-11-27 11:25:07 +01:00
Willy Tarreau	fc800b6cb7	MINOR: task/profiling: do not record task_drop_running() as a caller Task_drop_running() is used to remove the RUNNING bit and check if while the task was running it got a new wakeup from itself. Thus each time task_drop_running() marks itself as a caller, it in fact removes the previous caller that woke up the task, such as below: Tasks activity over 10.439 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lat_tot lat_avg task_run_applet 57895273 6.396m 6.628us 2.733h 170.0us <- run_tasks_from_lists@src/task.c:658 task_drop_running Better not mark this function as a caller and keep the original one: Tasks activity over 13.834 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lat_tot lat_avg task_run_applet 62424582 5.825m 5.599us 5.717h 329.7us <- sc_app_chk_rcv_applet@src/stconn.c:952 appctx_wakeup	2023-11-27 11:24:52 +01:00
William Lallemand	3dd55fa132	MINOR: mworker/cli: implement hard-reload over the master CLI The mworker mode never had a proper 'hard-stop' (-st) for the reload, this is a mode which was commonly used with the daemon mode, but it was never implemented in mworker mode. This patch fixes the problem by implementing a "hard-reload" command over the master CLI. It does the same as the "reload" command, but instead of waiting for the connections to stop in the previous process, it immediately quits the previous process after binding.	2023-11-24 21:44:25 +01:00
Aurelien DARRAGON	f2629ebd4e	MINOR: proxy: add free_server_rules() helper function Take the px->server_rules freeing part out of free_proxy() and make it a dedicated helper function so that it becomes possible to use it from anywhere.	2023-11-24 16:27:55 +01:00
Aurelien DARRAGON	24da4d3ee7	MINOR: tools: use const for read only pointers in ip{cmp,cpy} In this patch we fix the prototype for ipcmp() and ipcpy() functions so that input pointers that are used exclusively for reads are used as const pointers. This way, the compiler can safely assume that those variables won't be altered by the function.	2023-11-24 16:27:55 +01:00
Aurelien DARRAGON	683b2ae013	MINOR: server/event_hdl: add SERVER_INETADDR event In this patch we add the support for a new SERVER event in the event_hdl API. SERVER_INETADDR is implemented as an advanced server event. It is published each time the server's ip address or port is about to change. (ie: from the cli, dns, lua...) SERVER_INETADDR data is an event_hdl_cb_data_server_inetaddr struct that provides additional info related to the server inet addr change, but can be casted as a regular event_hdl_cb_data_server struct if additional info is not needed.	2023-11-24 16:27:55 +01:00
Christopher Faulet	8d46a2c973	MAJOR: h3: Implement zero-copy support to send DATA frame When possible, we try send DATA frame without copying data. To do so, we swap the input buffer with QCS tx buffer. It is only possible iff: * There is only one HTX block of data at the beginning of the message * Amount of data to send is equal to the size of the HTX data block * The QCS tx buffer is empty In this case, both buffers are swapped. The frame metadata are written at the begining of the buffer, before data and where the HTX structure is stored.	2023-11-24 07:42:43 +01:00
Christopher Faulet	1bcc0f8892	MEDIUM: mux-quic: Add consumer-side fast-forwarding support The QUIC multiplexer now implements callbacks to consume fast-forwarded data. It relies on the H3 stack to acquire the buffer and format the frame.	2023-11-24 07:42:43 +01:00
Amaury Denoyelle	a3187fe06c	MINOR: rhttp: add count of active conns per thread Add a new member <nb_rhttp_conns> in thread_ctx structure. Its purpose is to count the current number of opened reverse HTTP connections regarding from their listeners membership. This patch will be useful to support multi-thread for active reverse HTTP, in order to select the less loaded thread. Note that despite access to <nb_rhttp_conns> are only done by the current thread, atomic operations are used. This is because once multi-thread support will be added, external threads will also retrieve values from others.	2023-11-23 17:43:01 +01:00
Amaury Denoyelle	55e78ff7e1	MINOR: rhttp: large renaming to use rhttp prefix Previous commit renames 'proto_reverse_connect' module to 'proto_rhttp'. This commits follows this by replacing various custom prefix by 'rhttp_' to make the code uniform. Note that 'reverse_' prefix was kept in connection module. This is because if a new reversable protocol not based on HTTP is implemented, it may be necessary to reused the same connection function which are protocol agnostic.	2023-11-23 17:40:01 +01:00
Amaury Denoyelle	e09af499b4	MINOR: rhttp: rename proto_reverse_connect This commit is renaming of module proto_reverse_connect to proto_rhttp. This name is selected as it is shorter and more precise.	2023-11-23 17:38:58 +01:00
Willy Tarreau	1de44daf7d	MINOR: ext-check: add an option to preserve environment variables In Github issue #2128, @jvincze84 explained the complexity of using external checks in some advanced setups due to the systematic purge of environment variables, and expressed the desire to preserve the existing environment. During the discussion an agreement was found around having an option to "external-check" to do that and that solution was tested and confirmed to work by user @nyxi. This patch just cleans this up, implements the option as "preserve-env" and documents it. The default behavior does not change, the environment is still purged, unless "preserve-env" is passed. The choice of not using "import-env" instead was made so that we could later use it to name specific variables that have to be imported instead of keeping the whole environment. The patch is simple enough that it could be backported if needed (and was in fact tested on 2.6 first).	2023-11-23 16:53:57 +01:00
Ilya Shipitsin	80813cdd2a	CLEANUP: assorted typo fixes in the code and comments This is 37th iteration of typo fixes	2023-11-23 16:23:14 +01:00
Willy Tarreau	6455fd5024	MINOR: debug: add the ability to enter components in the post_mortem struct Here the idea is to collect components' versions and build options. The main component is haproxy, but the API is made so that any sub-system can easily add a component there (for example the detailed version of a device detection lib, or some info about a lib loaded from Lua). The elements are stored as a pointer to an array of structs and its count so that it's sufficient to issue this in gdb to list them all at once: print *post_mortem.components@post_mortem.nb_components For now we collect name, version, toolchain, toolchain options, build options and path. Maybe more could be useful in the future.	2023-11-23 15:39:21 +01:00
Willy Tarreau	2268f10dd6	DEBUG: tinfo: store the pthread ID and the stack pointer in tinfo When debugging a core, it's difficult to match a given gdb thread number against an internal thread. Let's just store the pthread ID and the stack pointer in each tinfo. This could help in the future by allowing to just glance over them and pick the right one depending what info is found first.	2023-11-23 14:32:55 +01:00
Amaury Denoyelle	54c94c60d2	DEBUG: connection/flags: update flags for reverse HTTP Add missing CO_FL_REVERSED and CO_FL_ACT_REVERSING flag definitions in conn_show_flags(). These flags were introduced in this release with reverse HTTP support. No need to backport	2023-11-20 18:10:12 +01:00
Amaury Denoyelle	decf29d06d	MINOR: quic: remove unneeded QUIC specific stopping function On CONNECTION_CLOSE reception/emission, QUIC connections enter CLOSING state. At this stage, only CONNECTION_CLOSE can be reemitted and all other exchanges are stopped. Previously, on haproxy process stopping, if all QUIC connections were in CLOSING state, they were released before their closing timer expiration to not block the process shutdown. However, since a recent commit, the closing timer has been shorten to a more reasonable delay. It is now consider viable to respect connections closing state even on process shutdown. As such, stopping specific code in QUIC connections idle timer task was removed. A specific function quic_handle_stopping() was implemented to notify QUIC connections on shutdown from main() function. It should have been deleted along the removal in QUIC idle timer task. This patch just does this.	2023-11-20 17:59:52 +01:00
Willy Tarreau	445fc1fe3a	BUG/MINOR: sock: mark abns sockets as non-suspendable and always unbind them In 2.3, we started to get a cleaner socket unbinding mechanism with commit `f58b8db47` ("MEDIUM: receivers: add an rx_unbind() method in the protocols"). This mechanism rightfully refrains from unbinding when sockets are expected to be transferrable to another worker via "expose-fd listeners", but this is not compatible with ABNS sockets, which do not support reuseport, unbinding nor being renamed: in short they will always prevent a new process from binding. It turns out that this is not much visible because by pure accident, GTUNE_SOCKET_TRANSFER is only set in the code dealing with master mode and deamons, so it's never set in foreground mode nor in tests even if present on the stats socket. However with master mode, it is now always set even when not present on the stats socket, and will always conflict. The only reasonable approach seems to consist in marking these abns sockets as non-suspendable so that the generic sock_unbind() code can decide to just unbind them regardless of GTUNE_SOCKET_TRANSFER. This should carefully be backported as far as 2.4.	2023-11-20 11:38:26 +01:00
Aurelien DARRAGON	4b2616f784	MINOR: log/backend: prevent stick table and stick rules with LOG mode Report a warning and prevent errors if user tries to declare a stick table or use stick rules within a log backend.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	6a29888f60	MINOR: log/backend: ensure log exclusive params are not used in other modes add proxy_cfg_ensure_no_log() function (similar to proxy_cfg_ensure_no_http()) to ensure at the end of proxy parsing that no log exclusive options are found if the proxy is not in log mode.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	b61147fd2a	MEDIUM: log/balance: merge tcp/http algo with log ones "log-balance" directive was recently introduced to configure the balancing algorithm to use when in a log backend. However, it is confusing and it causes issues when used in default section. In this patch, we take another approach: first we remove the "log-balance" directive, and instead we rely on existing "balance" directive to configure log load balancing in log backend. Some algorithms such as roundrobin can be used as-is in a log backend, and for log-only algorithms, they are implemented as "log-$name" inside the "backend" directive. The documentation was updated accordingly.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	f42dfaa214	MEDIUM: lbprm: store algo params on 32bits Make sure lbprm.algo can store 32bits by declaring it as uint32_t Then, use all 32 available bits to offer 4 extra bits for the BE_LB_NEED inputs. This will allow new required inputs to be easily added (up to 4 new ones, plus one that wasn't used yet if we keep them exclusive) This required some cleanup: all ALGO bitfields were rewritten in the 32bits format and the high ones were shifted to make room for the new BE_LB_NEED bits.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	a327b80f1f	CLEANUP: backend: removing unused LB param BE_LB_HASH_RND was introduced with `760e81d35` ("MINOR: backend: implement random-based load balancing") but was never used since. Removing it to regain an extra slot for future types.	2023-11-18 11:16:21 +01:00
Aurelien DARRAGON	e10cf61099	MINOR: stktable: add stktable_deinit function Adding sktable_deinit() helper function to properly cleanup a sticktable that was initialized using stktable_init().	2023-11-18 11:16:21 +01:00
Willy Tarreau	f592a0d5dd	MINOR: rhttp: remove the unused outgoing connect() function A dummy connect() function previously had to be installed for the log server so that a reverse-http address could be referenced on a "server" line, but after the recent rework of the server line parsing, this is no longer needed, and this is actually annoying as it makes one believe there is a way to connect outside, which is not true. Let's now get rid of this function.	2023-11-17 18:10:16 +01:00
Fr�d�ric L�caille	888d1dc3dc	MINOR: quic: Rename "handshake" timeout to "client-hs" Use a more specific name for this timeout to distinguish it from a possible future one on the server side. Also update the documentation.	2023-11-17 18:09:41 +01:00
Frédéric Lécaille	e3e0bb90ce	MEDIUM: quic: Add support for "handshake" timeout setting. The idle timer task may be used to trigger the client handshake timeout. The hanshake timeout expiration date (qc->hs_expire) is initialized when the connection is allocated. Obviously, this timeout is taken into an account only during the handshake by qc_idle_timer_do_rearm() whose job is to rearm the idle timer. The idle timer expiration date could be initialized only one time, then never updated until the hanshake completes. But this only works if the handshake timeout is smaller than the idle timer task timeout. If the handshake timeout is set greater than the idle timeout, this latter may expire before the handshake timeout. This patch may have an impact on the L1/C1 interop tests (with heavy packet loss or corruption). This is why I guess some implementations with a hanshake timeout support set a big timeout during this test. This is at least the case for ngtcp2 which sets a 180s hanshake timeout! haproxy will certainly have to proceed the same way if it wants to have a chance to pass this test as before this handshake timeout.	2023-11-17 17:31:42 +01:00
Frédéric Lécaille	b33eacc523	MINOR: proxy: Add "handshake" new timeout (frontend side) Add a new timeout for the handshake, on the frontend side only. Such a hanshake will be typically used for TLS hanshakes during client connections to TLS/TCP or QUIC frontends.	2023-11-17 17:31:42 +01:00
Christopher Faulet	5ed101e09c	BUG/MINOR: stconn: Report read activity on non-indep streams for partial sends Partial sends is an activity, not a full blocking. Thus a read activity must be reported for non-independent stream. It is especially important for very congested stream where full sends are uncommon. This patch must be backported to 2.8.	2023-11-17 15:36:43 +01:00
Christopher Faulet	020231ea79	MINOR: channel: Add functions to get info on buffers and deal with HTX streams This patch adds HXT-aware versions of the functions c_data(), ci_data() and c_empty(). channel_data() function returns the amount of data in the channel, channel_input_data() returns the amount of input data and channel_empty() returns true if the channel's buffer is empty. These functions handles HTX buffers. In addition, channel_data_limit() function, still HTX-aware, can be used to get the maximum absolute amount of data that can be copied in a buffer, independently on data already present in the buffer.	2023-11-17 15:08:15 +01:00
Christopher Faulet	7393bf7e42	MINOR: htx: Use a macro for overhead induced by HTX The overhead induced by the HTX format was set to the HTX structure itself and two HTX blocks. It was set this way to optimize zero-copy during transfers. This value may (and will) be used at different places. Thus we now use a macro, called HTX_BUF_OVERHEAD.	2023-11-17 12:13:00 +01:00
Christopher Faulet	b68c579eda	BUG/MEDIUM: stconn: Update fsb date on partial sends The first-send-blocked date was originally designed to save the date of the first send of a series where some data remain blocked. It was relaxed recently (`3083fd90e` "BUG/MEDIUM: stconn: Report a send activity everytime data were sent") to save the date of the first full blocked send. However, it is not accurrate. When all data are sent, the fsb value must be reset to TICK_ETERNITY. When nothing is sent and if it is not already set, it must be set. But when data are partially sent, the value must be updated and not reset. Otherwise the write timeout may be ignored because fsb date is never set. So, changes brought by the patch above are reverted and sc_ep_report_blocked_send() was changed to know if some data were sent or not. This way we are able to update fsb value. l This patch must be backported to 2.8.	2023-11-17 12:13:00 +01:00
Remi Tricot-Le Breton	45a2ff0f4a	MINOR: shctx: Remove 'use_shared_mem' variable This global variable was used to avoid using locks on shared_contexts in the unlikely case of nbthread==1. Since the locks do not do anything when USE_THREAD is not defined, it will be more beneficial to simply remove this variable and the systematic test on its value in the shared context locking functions.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	4fe6c1365d	MINOR: shctx: Remove redundant arg from free_block callback The free_block callback does not get called on blocks that are not row heads anymore so we don't need too shared_block parameters.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	48f81ec09d	MAJOR: cache: Delay cache entry delete in reserve_hot function A reference counter on the cache_entry was added in a previous commit. Its value is atomically increased and decreased via the retain_entry and release_entry functions. This is needed because of the latest cache and shared_context modifications that introduced two separate locks instead of the preexisting single shctx_lock one. With the new logic, we have two main blocks competing for the two locks: - the one in the http_action_req_cache_use that performs a lookup in the cache tree (locked by the cache lock) and then tries to remove the corresponding blocks from the shared_context's 'avail' list until the response is sent to the client by the cache applet, - the shctx_row_reserve_hot that traverses the 'avail' list and gives them back to the caller, while removing previous row heads from the cache tree Those two blocks require the two locks but one of them would take the cache lock first, and the other one the shctx_lock first, which would end in a deadlock without the current patch. The way this conflict is resolved in this patch is by ensuring that at least one of those uses works without taking the two locks at the same time. The solution found was to keep taking the two locks in the cache_use case. We first lock the cache to lookup for an entry and we then take the shctx lock as well to detach the corresponding blocks from the 'avail' list. The subtlety is that between the cache lookup and the actual locking of the shctx, another thread might have called the reserve_hot function in which we only take the shctx lock. In this function we traverse the 'avail' list to remove blocks that are then given to the caller. If one of those blocks corresponds to a previous row head, we call the 'free_blocks' callback that used to delete the cache entry from the tree. We now avoid deleting directly the cache entries in reserve_hot and we rather set the cache entries 'complete' param to 0 so that no other thread tries to work with this entry. This way, when we release the shctx lock in reserve_hot, the first thread that had performed the cache lookup and had found an entry that we just gave to another thread will see that the 'complete' field is 0 and it won't try to work with this response. The actual removal of entries from the cache tree will now be performed in the new 'reserve_finish' callback called at the end of the shctx_row_reserve_hot function. It will iterate on all the row head that were inserted in a dedicated list in the 'free_block' callback and perform the actual delete.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	1cd91b4f2a	MINOR: shctx: Add new reserve_finish callback call to shctx_row_reserve_hot This patch adds a reserve_finish callback that can be defined by the subsystems that require a shared_context. It is called at the end of shctx_row_reserve_hot after the shared_context lock is released.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	ed35b9411a	MEDIUM: cache: Switch shctx spinlock to rwlock and restrict its scope Since a lock on the cache tree was added in the latest cache changes, we do not need to use the shared_context's lock to lock more than pure shared_context related data anymore. This already existing lock will now only cover the 'avail' list from the shared_context. It can then be changed to a rwlock instead of a spinlock because we might want to only run through the avail list sometimes. Apart form changing the type of the shctx lock, the main modification introduced by this patch is to limit the amount of code covered by the shctx lock. This lock does not need to cover any code strictly related to the cache tree anymore.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	3831d8454f	MEDIUM: shctx: Remove 'hot' list from shared_context The "hot" list stored in a shared_context was used to keep a reference to shared blocks that were currently being used and were thus removed from the available list (so that they don't get reused for another cache response). This 'hot' list does not ever need to be shared across threads since every one of them only works on their current row. The main need behind this 'hot' list was to detach the corresponding blocks from the 'avail' list and to have a known list root when calling list_for_each_entry_from in shctx_row_data_append (for instance). Since we actually never need to iterate over all members of the 'hot' list, we can remove it and replace the inc_hot/dec_hot logic by a detach/reattach one.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	ac9c49b40d	MEDIUM: cache: Use dedicated cache tree lock alongside shctx lock Every use of the cache tree was covered by the shctx lock even when no operations were performed on the shared_context lists (avail and hot). This patch adds a dedicated RW lock for the cache so that blocks of code that work on the cache tree only can use this lock instead of the superseding shctx one. This is useful for operations during which the concerned blocks are already in the hot list. When the two locks need to be taken at the same time, in http_action_req_cache_use and in shctx_row_reserve_hot, the shctx one must be taken first. A new parameter needed to be added to the shared_context's free_block callback prototype so that cache_free_block can take the cache lock and release it afterwards.	2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton	81d8014af8	MINOR: shctx: Remove explicit 'from' param from shctx_row_data_append This parameter is not necessary since the first element of a row always has a pointer to the row's tail.	2023-11-16 19:35:10 +01:00
Amaury Denoyelle	8cc3fc73f1	MINOR: connection: update rhttp flags usage Change the flags used for reversed connection : * CO_FL_REVERSED is now put after reversal for passive connect. For active connect, it is delayed when accept is completed after reversal. * CO_FL_ACT_REVERSING replace the old CO_FL_REVERSED. It is put only for active connect on reversal and removes once accept is done. This allows to identify a connection as reversed during its whole lifetime. This should be useful to extend reverse connect.	2023-11-16 17:53:31 +01:00
Christopher Faulet	e5cffa8ace	MINOR: connection: Add a CTL flag to notify mux it should wait for reads again MUX_SUBS_RECV ctl flag is added to instruct the mux it should wait for read events. This flag will be pretty useful to handle abortonclose option.	2023-11-14 11:01:51 +01:00
Frédéric Lécaille	9021e8935e	MINOR: quic: Maximum congestion control window for each algo Make all the congestion support the maximum congestion control window set by configuration. There is nothing special to explain. For each each algo, each time the window is incremented it is also bounded.	2023-11-13 17:53:18 +01:00
Frédéric Lécaille	028a55a1d0	MINOR: quic: Add a max window parameter to congestion control algorithms Add a new ->max_cwnd member to bind_conf struct to store the maximum congestion control window value for each QUIC binding. Modify the "quic-cc-algo" keyword parsing to add an optional parameter to its value: the maximum congestion window value between parentheses as follows: ex: quic-cc-algo cubic(10m) This value must be bounded, greater than 10k and smaller than 1g.	2023-11-13 17:53:18 +01:00
Aurelien DARRAGON	64e0b63442	BUG/MEDIUM: server: invalid address (post)parsing checks This bug was introduced with `29b76ca` ("BUG/MEDIUM: server/log: "mode log" after server keyword causes crash ") Indeed, we cannot safely rely on addr_proto being set when str2sa_range() returns in parse_server() (even if SRV_PARSE_PARSE_ADDR is set), because proto lookup might be bypassed when FQDN addresses are involved. Unfortunately, the above patch wrongly assumed that proto would always be set when SRV_PARSE_PARSE_ADDR was passed to parse_server() (so when str2sa_range() was called), resulting in invalid postparsing checks being performed, which could as well lead to crashes with log backends ("mode log" set) because some postparsing init was skipped as a result of proto not being set and this wasn't expected later in the init code. To fix this, we now make use of the previous patch to perform server's address compatibility checks on hints that are always set when str2sa_range() succesfully returns. For log backend, we're also adding a complementary test to check if the address family is of expected type, else we report an error, plus we're moving the postinit logic in log api since _srv_check_proxy_mode() is only meant to check proxy mode compatibility and we were abusing it. This patch depends on: - "MINOR: tools: make str2sa_range() directly return type hints" No backport required unless `29b76ca` gets backported.	2023-11-10 17:49:57 +01:00
Aurelien DARRAGON	12582eb8e5	MINOR: tools: make str2sa_range() directly return type hints str2sa_range() already allows the caller to provide <proto> in order to get a pointer on the protocol matching with the string input thanks to `5fc9328a` ("MINOR: tools: make str2sa_range() directly return the protocol") However, as stated into the commit message, there is a trick: "we can fail to return a protocol in case the caller accepts an fqdn for use later. This is what servers do and in this case it is valid to return no protocol" In this case, we're unable to return protocol because the protocol lookup depends on both the [proto type + xprt type] and the [family type] to be known. While family type might not be directly resolved when fqdn is involved (because family type might be discovered using DNS queries), proto type and xprt type are already known. As such, the caller might be interested in knowing those address related hints even if the address family type is not yet resolved and thus the matching protocol cannot be looked up. Thus in this patch we add the optional net_addr_type (custom type) argument to str2sa_range to enable the caller to check the protocol type and transport type when the function succeeds.	2023-11-10 17:49:57 +01:00
Willy Tarreau	a13f8425f0	MINOR: task/debug: make task_queue() and task_schedule() possible callers It's common to see process_stream() being woken up by wake_expired_tasks in the profiling output, without knowing which timeout was set to cause this. By making it possible to record the call places of task_queue() and task_schedule(), and by making wake_expired_tasks() explicitly not replace it, we'll be able to know which task_queue() or task_schedule() was triggered for a given wakeup. For example below: process_stream 51200 311.4ms 6.081us 34.59s 675.6us <- run_tasks_from_lists@src/task.c:659 task_queue process_stream 19227 70.00ms 3.640us 9.813m 30.62ms <- sc_notify@src/stconn.c:1136 task_wakeup process_stream 6414 102.3ms 15.95us 8.093m 75.70ms <- stream_new@src/stream.c:578 task_wakeup It's visible that it's the run_tasks_from_lists() which in fact applies on the task->expire returned by the ->process() function itself.	2023-11-09 17:24:00 +01:00
Willy Tarreau	0eb0914dba	MINOR: task/debug: explicitly support passing a null caller to wakeup functions This is used for tracing and profiling. By permitting to have a NULL caller, we allow a caller to explicitly pass zero to state that the current caller must not be replaced. This will soon be used by wake_expired_tasks() to avoid replacing a caller in the expire loop.	2023-11-09 17:24:00 +01:00
Amaury Denoyelle	bb28215d9b	MEDIUM: quic: define an accept queue limit QUIC connections are pushed manually into a dedicated listener queue when they are ready to be accepted. This happens after handshake finalization or on 0-RTT packet reception. Listener is then woken up to dequeue them with listener_accept(). This patch comptabilizes the number of connections currently stored in the accept queue. If reaching a certain limit, INITIAL packets are dropped on reception to prevent further QUIC connections allocation. This should help to preserve system resources. This limit is automatically derived from the listener backlog. Half of its value is reserved for handshakes and the other half for accept queues. By default, backlog is equal to maxconn which guarantee that there can't be no more than maxconn connections in handshake or waiting to be accepted.	2023-11-09 16:24:00 +01:00
Amaury Denoyelle	3df6a60113	MEDIUM: quic: limit handshake per listener Implement a limit per listener for concurrent number of QUIC connections. When reached, INITIAL packets for new connections are automatically dropped until the number of handshakes is reduced. The limit value is automatically based on listener backlog, which itself defaults to maxconn. This feature is important to ensure CPU and memory resources are not consume if too many handshakes attempt are started in parallel. Special care is taken if a connection is released before handshake completion. In this case, counter must be decremented. This forces to ensure that member <qc.state> is set early in qc_new_conn() before any quic_conn_release() invocation.	2023-11-09 16:23:52 +01:00
Amaury Denoyelle	278808915b	MINOR: quic: reduce half open counters scope Accounting is implemented for half open connections which represent QUIC connections waiting for handshake completion. When reaching a certain limit, Retry mechanism is automatically activated prior to instantiate new connections. The issue with this behavior is that two notions are mixed : QUIC connection handshake phase and Retry which is mechanism against amplification attacks. As such, only peer address validation should be taken into account to activate Retry protection. This patch chooses to reduce the scope of half_open_conn. Now only connection waiting to validate the peer address are now accounted for. Most notably, connections instantiated with a validated Retry token check are not accounted. One impact of this patch is that it should prevent to activate Retry mechanism too early, in particular in case if multiple handshakes are too slow. Another limitation should be implemented to protect against this scenario.	2023-11-09 16:23:52 +01:00
Amaury Denoyelle	d38bb7f8a7	MEDIUM: quic: adjust address validation When a new QUIC connection is created, server considers peer address as not yet validated. The server must limit its sending up to 3 times the content already received. This is a defensive measure to avoid flooding a remote host victim of address spoofing. This patch adjust the condition to consider the peer address as validated. Two conditions are now considered : * successful handling of a received HANDSHAKE packet. This was already done before although implemented in a different way. * validation of a Retry token. This was not considered prior this patch despite RFC recommandation. This patch also adjusts how a connection is internally labelled as using a validated peer address. Before, above conditions were checked via quic_peer_validated_addr(). Now, a flag QUIC_FL_CONN_PEER_VALIDATED_ADDR is set to labelled this. It already existed prior this patch but was only used for quic_cc_conn. This should now be more explicit.	2023-11-09 16:23:52 +01:00
Frédéric Lécaille	0016dbaef4	BUG/MEDIUM: quic: Possible crashes during secrets allocations (heavy load) This bug could be reproduced with -dMfail option and detected by libasan. During the TLS secrets allocations, when failed, quic_tls_ctx_secs_free() is called. It resets the already initialized secrets. Some were detected as initialized when not, or with a non initialized length, which leads to big "memset(0)" detected by libsasan. Ensure that all the secrets are really initialized with correct lengths. No need to be backported.	2023-11-09 10:32:31 +01:00
Frédéric Lécaille	4cfae3ac01	MINOR: quic: release the TLS context asap from quic_conn_release() This was no reason not to release as soon as possible the TLS/SSL QUIC connection context from quic_conn_release() before allocating a "closing connection" connection (quic_cc_conn struct).	2023-11-09 10:32:31 +01:00
Frédéric Lécaille	3a8dd48e30	MEDIUM: quic: Heavy task mode with non contiguously bufferized CRYPTO data This patch sets the handshake task in heavy task mode when receiving in disorder CRYPTO data which results in in order bufferized CRYPTO data. This is done thanks to a non-contiguous buffer and from qc_handle_crypto_frm() after having potentially bufferized CRYPTO data in this buffer. qc_treat_rx_crypto_frms() is no more called from qc_treat_rx_pkts() but instead this is where the task is set in heavy task mode. Consequently, this is the job of qc_ssl_provide_all_quic_data() to call directly qc_treat_rx_crypto_frms() to provide the in order bufferized CRYPTO data to the TLS stack. As this function releases the non-contiguous buffer for the CRYPTO data, if possible, there is no need to do that from qc_treat_rx_crypto_frms() anymore.	2023-11-09 10:32:31 +01:00
Frédéric Lécaille	94d20be138	MEDIUM: quic: Heavy task mode during handshake Add a new pool for the CRYPTO data frames received in order. Add ->rx.crypto_frms list to each encryption level to store such frames when they are received in order from qc_handle_crypto_frm(). Also set the handshake task (qc_conn_io_cb()) in heavy task mode from this function after having received such frames. When this task detects that it is set in heavy mode, it calls qc_ssl_provide_all_quic_data() newly implemented function to provide the CRYPTO data to the TLS task. Modify quic_conn_enc_level_uninit() to release these CRYPTO frames when releasing the encryption level they are in relation with.	2023-11-09 10:32:31 +01:00
Christopher Faulet	84d26bcf3f	MINOR: stconn/mux-h2: Use a iobuf flag to report EOI to consumer side during FF IOBUF_FL_EOI iobuf flag is now set by the producer to notify the consumer that the end of input was reached. Thanks to this flag, we can remove the ugly ack in h2_done_ff() to test the opposite SE flags. Of course, for now, it works and it is good enough. But we must keep in mind that EOI is always forwarded from the producer side to the consumer side in this case. But if this change, a new CO_RFL_ flag will have to be added to instruct the producer if it can forward EOI or not.	2023-11-08 21:14:07 +01:00
Christopher Faulet	4be0c7c655	MEDIUM: stconn/muxes: Loop on data fast-forwarding to forward at least a buffer In the mux-to-mux data forwarding, we now try, as far as possible to send at least a buffer. Of course, if the consumer side is congested or if nothing more can be received, we leave. But the idea is to retry to fast-forward data if less than a buffer was forwarded. It is only performed for buffer fast-forwarding, not splicing. The idea behind this patch is to optimise the forwarding, when a first forward was performed to complete a buffer with some existing data. In this case, the amount of data forwarded is artificially limited because we are using a non-empty buffer. But without this limitation, it is highly probable that a full buffer could have been sent. And indeed, with H2 client, a significant improvement was observed during our test. To do so, .done_fastfwd() callback function must be able to deal with interim forwards. Especially for the H2 mux, to remove H2_SF_NOTIFIED flags on the H2S on the last call only. Otherwise, the H2 stream can be blocked by itself because it is in the send_list. IOBUF_FL_INTERIM_FF iobuf flag is used to notify the consumer it is not the last call. This flag is then removed on the last call.	2023-11-08 21:14:07 +01:00
Willy Tarreau	89c6b67a82	BUG/MEDIUM: pool: fix releasable pool calculation when overloaded In 2.6-dev1, the method used to decide how many pool entries could be released at once was revisited to support releases in batches. This was done with commits `91a8e28f9` ("MINOR: pool: add a function to estimate how many may be released at once") and `361e31e3f` ("MEDIUM: pool: compute the number of evictable entries once per pool"). The first commit takes care of the possible inconsistency between the moment the allocated count and the used count are read, but unfortunately fixed it the wrong way, by adjusting "used" to match "alloc" whenever it was lower (i.e. almost always). This results in a nasty case which is that as soon as the allocated value becomes higher than the estimated count of needed entries, we end up returning pool->minavail, which causes very small batches to be released, starting from commit `1513c5479` ("MEDIUM: pools: release cached objects in batches"). The problem was further amplified in 2.9-dev3 with commit `7bf829ace` ("MAJOR: pools: move the shared pool's free_list over multiple buckets") because it now becomes possible for a thread to allocate from one bucket and release into a few other different ones, causing an accumulation of entries in that bucket. The fix is trivial, simply adjust the alloc counter if the used one is higher, before performing operations. This must be backported to 2.6.	2023-11-08 17:12:49 +01:00
Amaury Denoyelle	6f9b65f952	BUG/MEDIUM: quic: fix sslconns on quic_conn alloc failure QUIC connections are accounted inside global sslconns. As with QUIC actconn, it suffered from a similar issue if an intermediary allocation failed inside qc_new_conn(). Fix this similarly by moving increment operation inside qc_new_conn(). Increment and error path are now centralized and much easier to validate. The consequences are similar to the actconn fix : on memory allocation global sslconns may wrap, this time blocking any future QUIC or SSL connections on the process. This must be backported up to 2.6.	2023-11-07 14:06:02 +01:00
Christopher Faulet	62812b2a1d	DOC: stconn: Improve comments about lra and fsb usage Recent fixes have shown <lra> and <fsb> uses were not prettu clear. So let's try to improve documentation about these value. Especially when <lra> is updated and how to used it.	2023-11-07 10:41:11 +01:00
Christopher Faulet	d247152ec2	BUG/MEDIUM: Don't apply a max value on room_needed in sc_need_room() In sc_need_room(), we compute the maximum room that can be requested to restarted reading to be sure to be able to unblock the SC. At worst when the buffer is emptied. Here, the buffer reserve is considered but it is an issue. Counting the reserve can lead to a wicked bug with the H1 multiplexer, when small amount of data are found at the end of the HTX buffer. In this case, to not wrap, the H1 mux requests more room. It is an optim to be able to resync the buffer with the consumer side and to be able to perform zero-copy transfers. However, if this amount of data is smaller than the reserve and if the consumer is congested, we fall in a loop because the wrong value is used to request more room. The H1 mux continues to pretend there is not enough space in the buffer, while the effective requested value is lower than the free space in the buffer. While the consumer is congested and does not consume these data, the is no way to stop the loop. We can fix the function by removing the buffer reserve from the computation. But it remains a dangerous decision to apply a max value on room_needed. It is safer to require the caller must set a correct value. For now, it is true. But at the end, it is totally unexepected to wait for more room than an empty buffer can contain. This patch must be backported to 2.8.	2023-11-07 10:35:38 +01:00
Christopher Faulet	4a2660aa45	BUG/MEDIUM: stconn: Don't report rcv/snd expiration date if SC cannot epxire When receive or send expiration date of a stream-connector is retrieved, we now automatically check if it may expire. If not, TICK_ETERNITY is returned. The expiration dates of the frontend and backend stream-connectors are used to compute the stream expiration date. This operation is performed at 2 places: at the end of process_stream() and in sc_notify() if the stream is not woken up. With this patch, there is no special changes for process_stream() because it was already handled. It make thing a little simpler. However, it fixes sc_notify() by avoiding to erroneously compute an expiration date in past. This highly reduce the stream wakeups when there is contention on the consumer side. The bug was introduced with the commit `8073094bf` ("NUG/MEDIUM: stconn: Always update stream's expiration date after I/O"). It was an error to unconditionnaly set the stream expiration data, without testing blocking conditions on both SC. This patch must be backported to 2.8.	2023-11-07 10:30:01 +01:00
Christopher Faulet	141b489291	BUG/MEDIUM: stconn: Report send activity during mux-to-mux fast-forward When data are directly forwarded from a mux to the opposite one, we must not forget to report send activity when data are successfully sent or report a blocked send with data are blocked. It is important because otherwise, if the transfer is quite long, longer than the client or server timeout, an error may be triggered because the write timeout is reached. H1, H2 and PT muxes are concerned. To fix the issue, The done_fastword() callback now returns the amount of data consummed. This way it is possible to update/reset the FSB data accordingly. No backport needed.	2023-11-07 10:30:01 +01:00
Alexander Stephan	6f4bfed3a2	MINOR: server: Add parser support for set-proxy-v2-tlv-fmt This commit introduces a generic server-side parsing of type-value pair arguments and allocation of a TLV list via a new keyword called set-proxy-v2-tlv-fmt. This allows to 1) forward any TLV type with the help of fc_pp_tlv, 2) generally, send out any TLV type and value via a log format expression. To have this fully working the connection will need to be updated in a follow-up commit to actually respect the new server TLV list. default-server support has also been implemented.	2023-11-04 04:56:59 +01:00
Aurelien DARRAGON	5158c0ff69	MEDIUM: stktable/peers: "write-to" local table on peer updates In this patch, we add the possibility to declare on a table definition ("table" in peer section, or "stick-table" in proxy section) that we want the remote/peer updates on that table to be pushed on a local haproxy table in addition to the source table. Consider this example: \|peers mypeers \| peer local 127.0.0.1:3334 \| peer clust 127.0.0.1:3333 \| table t1.local type string size 10m store server_id,server_key expire 30s \| table t1.clust type string size 10m store server_id,server_key write-to mypeers/t1.local expire 30s With this setup, we consider haproxy uses t1.local as cache/local table for read and write operations, and that t1.clust is a remote table containing datas processed from t1.local and similar tables from other haproxy peers in a cluster setup. The t1.clust table will be used to refresh the local/cache one via the "write-to" statement. What will happen, is that every time haproxy will see entry updates for the t1.clust table: it will overwrite t1.local table with fresh data and will update the entry expiration timer. If t1.local entry doesn't exist yet (key doesn't exist), it will automatically create it. Note that only types that cannot be used for arithmetic ops will be handled, and this to prevent processed values from the remote table from interfering with computations based on values from the local table. (ie: prevent cumulative counters from growing indefinitely). "write-to" will only push supported types if they both exist in the source and the target table. Be careful with server_id and server_key storage because they are often declared implicitly when referencing a table in sticking rules but it is required to declare them explicitly for them to be pushed between a remote and a local table through "write-to" option. Also note that the "write-to" target table should have the same type as the source one, and that the key length should be strictly equal, otherwise haproxy will raise an error due to the tables being incompatibles. A table that is already being written to cannot be used as a source table for a "write-to" target. Thanks to this patch, it will now be possible to use sticking rules in peer cluster context by using a local table as a local cache which will be automatically refreshed by one or multiple remote table(s). This commit depends on: - "MINOR: stktable: stktable_init() sets err_msg on error" - "MINOR: stktable: check if a type should be used as-is"	2023-11-03 17:30:30 +01:00
Aurelien DARRAGON	db0cb54f81	MINOR: stktable: check if a type should be used as-is stick table types now have an extra bit named 'as_is' that allows us to check if such type should be used as-is or if it may be involved in arithmetic operations such as counters. This can be useful since those types are not common and may require specific handling. e.g.: stktable_data_types[data_type].as_is will be set to 1 if the type cannot be used in arithmetic operations.	2023-11-03 17:30:30 +01:00
Aurelien DARRAGON	b8c19f877a	MINOR: stktable: stktable_init() sets err_msg on error stktable_init() now sets err_msg when error occurs so that caller is able to precisely report the cause of the failure.	2023-11-03 17:30:30 +01:00
Aurelien DARRAGON	b9c0b039c8	MINOR: proxy/stktable: add resolve_stick_rule helper function Simplify stick and store sticktable proxy rules postparsing by adding a sticking rule entry resolve (postparsing) function. This will ease code maintenance.	2023-11-03 17:30:30 +01:00
Ruei-Bang Chen	7a1ec235cd	MINOR: sample: Add fetcher for getting all cookie names This new fetcher can be used to extract the list of cookie names from Cookie request header or from Set-Cookie response header depending on the stream direction. There is an optional argument that can be used as the delimiter (which is assumed to be the first character of the argument) between cookie names. The default delimiter is comma (,). Note that we will treat the Cookie request header as a semi-colon separated list of cookies and each Set-Cookie response header as a single cookie and extract the cookie names accordingly.	2023-11-03 09:57:06 +01:00
Amaury Denoyelle	4a89dba6d5	MEDIUM: quic: count quic_conn for global sslconns Similar to the previous commit which check for maxconn before allocating a QUIC connection, this patch checks for maxsslconn at the same step. This is necessary as a QUIC connection cannot run without a SSL context. This should be backported up to 2.6. It relies on the following patch : "BUG/MINOR: ssl: use a thread-safe sslconns increment"	2023-10-26 15:35:58 +02:00
Amaury Denoyelle	7735cf3854	MEDIUM: quic: count quic_conn instance for maxconn Increment actconn and check maxconn limit when a quic_conn is instantiated. This is necessary because prior to this patch, quic_conn instances where not counted. Global actconn was only incremented after the handshake has been completed and the connection structure is allocated. The increment is done using increment_actconn() on INITIAL packet parsing if a new connection is about to be created. If the limit is reached, the allocation is cancelled and the INITIAL packet is dropped. The decrement is done under quic_conn_release(). This means that quic_cc_conn instances are not taken into account. This seems safe enough because quic_cc_conn are only used for minimal usage. The counterpart of this change is that maxconn must not be checked a second time when listener_accept() is done over a QUIC connection. For this, a new bind_conf flag BC_O_XPRT_MAXCONN is set for listeners when maxconn is already counted by the lower layer. For the moment, it is positionned only for QUIC listeners. Without this patch, haproxy process could suffer from heavy memory/CPU load if the number of concurrent handshake is high. This patch is not considered a bug fix per-se. However, it has a major benefit to protect against too many QUIC handshakes. As such, it should be backported up to 2.6. For this, it relies on the following patch : "MINOR: frontend: implement a dedicated actconn increment function"	2023-10-26 15:35:56 +02:00
Amaury Denoyelle	350f8b0c07	BUG/MINOR: ssl: use a thread-safe sslconns increment Each time a new SSL context is allocated, global.sslconns is incremented. If global.maxsslconn is reached, the allocation is cancelled. This procedure was not entirely thread-safe due to the check and increment operations conducted at different stage. This could lead to global.maxsslconn slightly exceeded when several threads allocate SSL context while sslconns is near the limit. To fix this, use a CAS operation in a do/while loop. This code is similar to the actconn/maxconn increment for connection. A new function increment_sslconn() is defined for this operation. For the moment, only SSL code is using it. However, it is expected that QUIC will also use it to count QUIC connections as SSL ones. This should be backported to all stable releases. Note that prior to the 2.6, sslconns was outside of global struct, so this commit should be slightly adjusted.	2023-10-26 15:25:07 +02:00
Amaury Denoyelle	fffd435bbd	MINOR: frontend: implement a dedicated actconn increment function When a new frontend connection is instantiated, actconn global counter is incremented. If global maxconn value is reached, the connection is cancelled. This ensures that system limit are under control. Prior to this patch, the atomic check/increment operations were done directly into listener_accept(). Move them in a dedicated function increment_actconn() in frontend module. This will be useful when QUIC connections will be counted in actconn counter.	2023-10-26 15:18:48 +02:00
Willy Tarreau	96bb99a87d	DEBUG: pools: detect that malloc_trim() is in progress Now when calling ha_panic() with a thread still under malloc_trim(), we'll set a new tainted flag to easily report it, and the output trace will report that this condition happened and will suggest to use no-memory-trimming to avoid it in the future.	2023-10-25 15:48:02 +02:00
Willy Tarreau	26a6481f00	DEBUG: lua: add tainted flags for stuck Lua contexts William suggested that since we can detect the presence of Lua in the stack, let's combine it with stuck detection to set a new pair of flags indicating a stuck Lua context and a stuck Lua shared context. Now, executing an infinite loop in a Lua sample fetch function with yield disabled crashes with tainted=0xe40 if loaded from a lua-load statement, or tainted=0x640 from a lua-load-per-thread statement. In addition, at the end of the panic dump, we can check if Lua was seen stuck and emit recommendations about lua-load-per-thread and the choice of dependencies depending on the presence of threads and/or shared context.	2023-10-25 15:48:02 +02:00
Willy Tarreau	46bbb3a33b	DEBUG: add a tainted flag when ha_panic() is called This will make it easier to know that the panic function was called, for the occasional case where the dump crashes and/or the stack is corrupted and not much exploitable. Now at least it will be sufficient to check the tainted value to know that someone called ha_panic(), and it will also be usable to condition extra analysis.	2023-10-25 15:48:02 +02:00
Aurelien DARRAGON	66795bd721	MINOR: connection: add conn_pr_mode_to_proto_mode() helper func This function allows to safely map proxy mode to corresponding proto_mode This will allow for easier code maintenance and prevent mixups between proxy mode and proto mode.	2023-10-25 11:59:27 +02:00
Aurelien DARRAGON	29b76cae47	BUG/MEDIUM: server/log: "mode log" after server keyword causes crash In `9a74a6c` ("MAJOR: log: introduce log backends"), a mistake was made: it was assumed that the proxy mode was already known during server keyword parsing in parse_server() function, but this is wrong. Indeed, "mode log" can be declared late in the proxy section. Due to this, a simple config like this will cause the process to crash: \|backend test \| \| server name 127.0.0.1:8080 \| mode log In order to fix this, we relax some checks in _srv_parse_init() and store the address protocol from str2sa_range() in server struct, then we set-up a postparsing function that is to be called after config parsing to finish the server checks/initialization that depend on the proxy mode to be known. We achieve this by checking the PR_CAP_LB capability from the parent proxy to know if we're in such case where the effective proxy mode is not yet known (it is assumed that other proxies which are implicit ones don't provide this possibility and thus don't suffer from this constraint). Only then, if the capability is not found, we immediately perform the server checks that depend on the proxy mode, else the check is postponed and it will automatically be performed during postparsing thanks to the REGISTER_POST_SERVER_CHECK() hook. Note that we remove the SRV_PARSE_IN_LOG_BE flag because it was introduced in the above commit and it is no longer relevant. No backport needed unless `9a74a6c` gets backported.	2023-10-25 11:59:27 +02:00
Willy Tarreau	55d2fc0c02	DEBUG: mux-h2/flags: fix list of h2c flags used by the flags decoder The two recent commits below each added one flag to h2c but omitted to update the __APPEND_FLAG macro used by dev/flags so they are not properly decoded: `3dd963b35` ("BUG/MINOR: mux-h2: fix http-request and http-keep-alive timeouts again") `68d02e5fa` ("BUG/MINOR: mux-h2: make up other blocked streams upon removal from list") This can be backported along with these commits.	2023-10-25 11:44:54 +02:00
Amaury Denoyelle	f76e94d231	MINOR: backend: refactor insertion in avail conns tree Define a new function srv_add_to_avail_list(). This function is used to centralize connection insertion in available tree. It reuses a BUG_ON() statement to ensure the connection is not present in the idle list.	2023-10-25 10:33:06 +02:00
Amaury Denoyelle	f70cf28539	MINOR: listener: forbid most keywords for reverse HTTP bind Reverse HTTP bind is very specific in that in rely on a server to initiate connection. All connection settings are defined on the server line and ignored from the bind line. Before this patch, most of keywords were silently ignored. This could result in a configuration from doing unexpected things from the user point of view. To improve this situation, add a new 'rhttp_ok' field in bind_kw structure. If not set, the keyword is forbidden on a reverse bind line and will cause a fatal config error. For the moment, only the following keywords are usable with reverse bind 'id', 'name' and 'nbconn'. This change is safe as it's already forbidden to mix reverse and standard addresses on the same bind line.	2023-10-20 17:28:08 +02:00
Amaury Denoyelle	e05edf71df	MINOR: cfgparse: rename "rev@" prefix to "rhttp@" 'rev@' was used to specify a bind/server used with reverse HTTP transport. This notation was deemed not explicit enough. Rename it 'rhttp@' instead.	2023-10-20 14:44:37 +02:00
Amaury Denoyelle	9d4c7c1151	MINOR: server: convert @reverse to rev@ standard format Remove the recently introduced '@reverse' notation for HTTP reverse servers. Instead, reuse the 'rev@' prefix already defined for bind lines.	2023-10-20 14:44:37 +02:00
Amaury Denoyelle	3222047a14	MINOR: listener: add nbconn kw for reverse connect Previously, maxconn keyword was reused for a specific usage on reverse HTTP binds to specify the number of active connect to proceed. To avoid confusion, introduce a new dedicated keyword 'nbconn' which is specific to reverse HTTP bind. This new keyword is forbidden for non-reverse listener. A fatal error is emitted during config parsing if this rule is not respected. It's safe because it's also forbidden to mix standard and reverse addresses on the same bind line. Internally, nbconn value will be reassigned to 'maxconn' member of bind_conf structure. This ensures that listener layer will automatically reenable the preconnect task each time a connection is closed.	2023-10-20 14:44:37 +02:00
Amaury Denoyelle	37d7e52cc6	MINOR: cfgparse: forbid mixing reverse and standard listeners Reverse HTTP listeners are very specific and share only a very limited subset of keywords with other listeners. As such, it is probable meaningless to mix standard and reverse addresses on the same bind line. This patch emits a fatal error during configuration parsing if this is the case.	2023-10-20 14:44:37 +02:00
Christopher Faulet	60e7116be0	BUG/MEDIUM: peers: Fix synchro for huge number of tables The number of updates sent at once was limited to not loop too long to emit updates when the buffer size is huge or when the number of sync tables is huge. The limit can be configured and is set to 200 by default. However, this fix introduced a bug. It is impossible to syncrhonize two peers if the number of tables is higher than this limit. Thus by default, it is not possible to sync two peers if there are more than 200 tables to sync. Technically speacking, a teaching process is finished if we loop on all tables with no new update messages sent. Because we are limited at each call, the loop is splitted on several calls. However the restart point for the next loop is always the last table for which we emitted an update message. Thus with more tables than the limit, the loop never reachs the end point. Worse, in conjunction with the bug fixed by "BUG/MEDIUM: peers: Be sure to always refresh recconnect timer in sync task", it is possible to trigger the watchdog because the applets may be woken up in loop and leave requesting more room while its buffer is empty. To fix the issue, restart conditions for a teaching loop were changed. If the teach process is interrupted, we now save the restart point, called stop_local_table. It is the last evaluated table on the previous loop. This restart point is reset when the teach process is finished. In additionn, the updates_sent variable in peer_send_msgs() was renamed to updates to avoid ambiguities. Indeed, the variable is incremented, whether messages were sent or not. This patch must be backported as far as 2.6.	2023-10-20 14:32:12 +02:00
Willy Tarreau	3dd963b35f	BUG/MINOR: mux-h2: fix http-request and http-keep-alive timeouts again Stefan Behte reported that since commit `f279a2f14` ("BUG/MINOR: mux-h2: refresh the idle_timer when the mux is empty"), the http-request and http-keep-alive timeouts don't work anymore on H2. Before this patch, and since 3e448b9b64 ("BUG/MEDIUM: mux-h2: make sure control frames do not refresh the idle timeout"), they would only be refreshed after stream frames were sent (HEADERS or DATA) but the patch above that adds more refresh points broke these so they don't expire anymore as long as there's some activity. We cannot just revert the fix since it also addressed an isse by which sometimes the timeout would trigger too early and provoque truncated responses. The right approach here is in fact to only use refresh the idle timer when the mux buffer was flushed from any such stream frames. In order to achieve this, we're now setting a flag on the connection whenever we write a stream frame, and we consider that flag when deciding to refresh the buffer after it's emptied. This way we'll only clear that flag once the buffer is empty and there were stream data in it, not if there were no such stream data. In theory it remains possible to leave the flag on if some control data is appended after the buffer and it's never cleared, but in practice it's not a problem as a buffer will always get sent in large blocks when the window opens. Even a large buffer should be emptied once in a while as control frames will not fill it as much as data frames could. Given the patch above was backported as far as 2.6, this patch should also be backported as far as 2.6.	2023-10-18 17:17:58 +02:00
Willy Tarreau	91ed52976c	MINOR: dgram: allow to set rcv/sndbuf for dgram sockets as well tune.rcvbuf.client and tune.rcvbuf.server are not suitable for shared dgram sockets because they're per connection so their units are not the same. However, QUIC's listener and log servers are not connected and take per-thread or per-process traffic where a socket log buffer might be too small, causing undesirable packet losses and retransmits in the case of QUIC. This essentially manifests in listener mode with new connections taking a lot of time to set up under heavy traffic due to the small queues causing delays. Let's add a few new settings allowing to set these shared socket sizes on the frontend and backend side (which reminds that these are per-front/back and not per client/server hence not per connection).	2023-10-18 17:01:19 +02:00
Christopher Faulet	203211f4cb	REORG: stconn/muxes: Rename init step in fast-forwarding Instead of speaking of an initialisation stage for each data fast-forwarding, we now use the negociate term. Thus init_ff/init_fastfwd functions were renamed nego_ff/nego_fastfwd.	2023-10-18 12:46:55 +02:00
Christopher Faulet	023564b685	MINOR: global: Add an option to disable the zero-copy forwarding The zero-copy forwarding or the mux-to-mux forwarding is a way to fast-forward data without using the channels buffers. Data are transferred from a mux to the other one. The kernel splicing is an optimization of the zero-copy forwarding. But it can also use normal buffers (but not channels ones). This way, it could be possible to fast-forward data with muxes not supporting the kernel splicing (H2 and H3 muxes) but also with applets. However, this mode can introduce regressions or bugs in future (just like the kernel splicing). Thus, It could be usefull to disable this optim. To do so, in configuration, the global tune settting 'tune.disable-zero-copy-forwarding' may be set in a global section or the '-dZ' command line parameter may be used to start HAProxy. Of course, this also disables the kernel splicing.	2023-10-17 18:51:13 +02:00
Christopher Faulet	322d660d08	MINOR: tree-wide: Only rely on co_data() to check channel emptyness Because channel_is_empty() function does now only check the channel's buffer, we can remove it and rely on co_data() instead. Of course, all tests must be inverted. channel_is_empty() is thus removed.	2023-10-17 18:51:13 +02:00
Christopher Faulet	20c463955d	MEDIUM: channel: don't look at iobuf to report an empty channel It is important to split channels and I/O buffers. When data are pushed in an I/O buffer, we consider them as forwarded. The channel never sees them. Fast-forwarded data are now handled in the SE only.	2023-10-17 18:51:13 +02:00
Christopher Faulet	2d80eb5b7a	MEDIUM: mux-h1: Add fast-forwarding support The H1 multiplexer now implements callbacks function to produce and consume fast-forwarded data.	2023-10-17 18:51:13 +02:00
Christopher Faulet	91f1c5519a	MEDIUM: raw-sock: Specifiy amount of data to send via snd_pipe callback When data were sent using the kernel splicing, we tried to send all data with no restriction. Most of time it is valid. However, because the payload representation may differ between the producer and the consumer, it is important to be able to specify how must data to send via the splicing. Of course, for performance reason, it is important to maximize amount of data send via splicing at each call. However, on edge-cases, this now can be limited.	2023-10-17 18:51:13 +02:00
Christopher Faulet	7ffb7624fe	MINOR: connection: Remove mux callbacks about splicing The kernel splicing support was totally remove waiting for the mux-to-mux fast-forward implementation. So corresponding mux callbacks can be removed now.	2023-10-17 18:51:13 +02:00
Christopher Faulet	8b89fe3d8f	MINOR: stconn: Temporarily remove kernel splicing support mux-to-mux fast-forwarding will be added. To avoid mix with the splicing and simplify the commits, the kernel splicing support is removed from the stconn. CF_KERN_SPLICING flag is removed and the support is no longer tested in process_stream(). In the stconn part, rcv_pipe() callback function is no longer called. Reg-tests scripts testing the kernel splicing are temporarly marked as broken.	2023-10-17 18:51:13 +02:00
Christopher Faulet	242c6f0ded	MINOR: connection: Add new mux callbacks to perform data fast-forwarding To perform the mux-to-mux data fast-forwarding, 4 new callbacks were added into the mux_ops structure. 2 callbacks will be used from the stconn for fast-forward data. The 2 other callbacks will be used by the endpoint to request an iobuf to the opposite endpoint. * fastfwd() callback function is used by a producer to forward data * resume_fastfwd() callback function is used by a consumer if some data are blocked in the iobuf, to resume the data forwarding. * init_fastfwd() must be used by an endpoint (the producer one), inside the fastfwd() callback to request an iobuf to the opposite side (the consumer one). * done_fastfwd() must be used by an endpoint (the producer one) at the end of fastfwd() to notify the opposite endpoint (the consumer one) if data were forwarded or not. This API is still under development, so it may evolved. Especially when the fast-forward will be extended to applets. 2 helper functions were also added into the SE api to wrap init_fastfwd() and done_fastfwd() callback function of the underlying endpoint. For now, this API is unsed and not implemented at all in muxes.	2023-10-17 18:51:13 +02:00
Christopher Faulet	1d68bebb70	MINOR: stconn: Extend iobuf to handle a buffer in addition to a pipe It is unused for now, but the iobuf structure now owns a pointer to a buffer. This buffer will be used to perform mux-to-mux fast-forwarding when splicing is not supported or unusable. This pointer should be filled by an endpoint to let the opposite one forward data. Extra fields, in addition to the buffer, are mandatory because the buffer may already contains some data. the ".offset" field may be used may be used as the position to start to copy data. Finally, the amount of data copied in this buffer must be saved in ".data" field. Some flags are also added to prepare next changes. And helper stconn fnuctions are updated to also count data in the buffer. For a first implementation, it is not planned to handle data in the buffer and in the pipe in same time. But it will be possible to do so.	2023-10-17 18:51:13 +02:00
Christopher Faulet	e52519ac83	MINOR: stconn: Start to introduce mux-to-mux fast-forwarding notion Instead of talking about kernel splicing at stconn/sedesc level, we now try to talk about mux-to-mux fast-forwarding. To do so, 2 functions were added to know if there are fast-forwarded data and to retrieve this amount of data. Of course, for now, there is only data in a pipe. In addition, some flags were renamed to reflect this notion. Note the channel's documentation was not updated yet.	2023-10-17 18:51:13 +02:00
Christopher Faulet	8bee0dcd7d	MEDIUM: stconn/channel: Move pipes used for the splicing in the SE descriptors The pipes used to put data when the kernel splicing is in used are moved in the SE descriptors. For now, it is just a simple remplacement but there is a major difference with the pipes in the channel. The data are pushed in the consumer's pipe while it was pushed in the producer's pipe. So it means the request data are now pushed in the pipe of the backend SE descriptor and response data are pushed in the pipe of the frontend SE descriptor. The idea is to hide the pipe from the channel/SC side and to be able to handle fast-forwading in pipe but also in buffer. To do so, the pipe is inside a new entity, called iobuf. This entity will be extended.	2023-10-17 18:51:13 +02:00
Willy Tarreau	68d02e5fa9	BUG/MINOR: mux-h2: make up other blocked streams upon removal from list An interesting issue was met when testing the mux-to-mux forwarding code. In order to preserve fairness, in h2_snd_buf() if other streams are waiting in send_list or fctl_list, the stream that is attempting to send also goes to its list, and will be woken up by h2_process_mux() or h2_send() when some space is released. But on rare occasions, there are only a few (or even a single) streams waiting in this list, and these streams are just quickly removed because of a timeout or a quick h2_detach() that calls h2s_destroy(). In this case there's no even to wake up the other waiting stream in its list, and this will possibly resume processing after some client WINDOW_UPDATE frames or even new streams, so usually it doesn't last too long and it not much noticeable, reason why it was left that long. In addition, measures have shown that in heavy network-bound benchmark, this exact situation happens on less than 1% of the streams (reached 4% with mux-mux). The fix here consists in replacing these LIST_DEL_INIT() calls on h2s->list with a function call that checks if other streams were queued to the send_list recently, and if so, which also tries to resume them by calling h2_resume_each_sending_h2s(). The detection of late additions is made via a new flag on the connection, H2_CF_WAIT_INLIST, which is set when a stream is queued due to other streams being present, and which is cleared when this is function is called. It is particularly difficult to reproduce this case which is particularly timing-dependent, but in a constrained environment, a test involving 32 conns of 20 streams each, all downloading a 10 MB object previously showed a limitation of 17 Gbps with lots of idle CPU time, and now filled the cable at 25 Gbps. This should be backported to all versions where it applies.	2023-10-17 16:43:44 +02:00
Aurelien DARRAGON	94d0f77deb	MINOR: server: introduce "log-bufsize" kw "log-bufsize" may now be used for a log server (in a log backend) to configure the bufsize of implicit ring associated to the server (which defaults to BUFSIZE).	2023-10-13 10:05:07 +02:00
Aurelien DARRAGON	b30bd7adba	MEDIUM: log/balance: support for the "hash" lb algorithm hash lb algorithm can be configured with the "log-balance hash <cnv_list>" directive. With this algorithm, the user specifies a converter list with <cnv_list>. The produced log message will be passed as-is to the provided converter list, and the resulting hash will be used to select the log server that will receive the log message.	2023-10-13 10:05:06 +02:00
Aurelien DARRAGON	7251344748	MINOR: sample: add sample_process_cnv() function split sample_process() in 2 parts in order to be able to only process the converter part of a sample expression from an existing input sample struct passed as parameter.	2023-10-13 10:05:06 +02:00
Aurelien DARRAGON	a7563158f7	MINOR: lbprm: support for the "none" hash-type function Allow the use of the "none" hash-type function so that the key resulting from the sample expression is directly used as the hash. This can be useful to do the hashing manually using available hashing converters, or even custom ones, and then inform haproxy that it can directly rely on the sample expression result which is explictly handled as an integer in this case.	2023-10-13 10:05:06 +02:00
Aurelien DARRAGON	9a74a6cb17	MAJOR: log: introduce log backends Using "mode log" in a backend section turns the proxy in a log backend which can be used to log-balance logs between multiple log targets (udp or tcp servers) log backends can be used as regular log targets using the log directive with "backend@be_name" prefix, like so: \| log backend@mybackend local0 A log backend will distribute log messages to servers according to the log load-balancing algorithm that can be set using the "log-balance" option from the log backend section. For now, only the roundrobin algorithm is supported and set by default.	2023-10-13 10:05:06 +02:00
Aurelien DARRAGON	e58a9b4baf	MINOR: sink: add sink_new_from_srv() function This helper function can be used to create a new sink from an existing server struct (and thus existing proxy as well), in order to spare some resources when possible.	2023-10-13 10:05:06 +02:00
Aurelien DARRAGON	5c0d1c1a74	MEDIUM: sink: inherit from caller fmt in ring_write() when rings didn't set one implicit rings were automatically forced to the parent logger format, but this was done upon ring creation. This is quite restrictive because we might want to choose the desired format right before generating the log header (ie: when producing the log message), depending on the logger (log directive) that is responsible for the log message, and with current logic this is not possible. (To this day, we still have dedicated implicit ring per log directive, but this might change) In ring_write(), we check if the sink->fmt is specified: - defined: we use it since it is the most precise format (ie: for named rings) - undefined: then we fallback to the format from the logger With this change, implicit rings' format is now set to UNSPEC upon creation. This is safe because the log header building function automatically enforces the "raw" format when UNSPEC is set. And since logger->format also defaults to "raw", no change of default behavior should be expected.	2023-10-13 10:05:06 +02:00
Aurelien DARRAGON	6dad0549a5	MEDIUM: log/sink: simplify log header handling Introduce log_header struct to easily pass log header data between functions and use that to simplify the logic around log header handling. While at it, some outdated comments were updated as well. No change in behavior should be expected.	2023-10-13 10:05:06 +02:00
Aurelien DARRAGON	a9b185f34e	MEDIUM: log: introduce log target log targets were immediately embedded in logger struct (previously named logsrv) and could not be used outside of this context. In this patch, we're introducing log_target type with the associated helper functions so that it becomes possible to declare and use log targets outside of loggers scope.	2023-10-13 10:05:06 +02:00
Aurelien DARRAGON	18da35c123	MEDIUM: tree-wide: logsrv struct becomes logger When 'log' directive was implemented, the internal representation was named 'struct logsrv', because the 'log' directive would directly point to the log target, which used to be a (UDP) log server exclusively at that time, hence the name. But things have become more complex, since today 'log' directive can point to ring targets (implicit, or named) for example. Indeed, a 'log' directive does no longer reference the "final" server to which the log will be sent, but instead it describes which log API and parameters to use for transporting the log messages to the proper log destination. So now the term 'logsrv' is rather confusing and prevents us from introducing a new level of abstraction because they would be mixed with logsrv. So in order to better designate this 'log' directive, and make it more generic, we chose the word 'logger' which now replaces logsrv everywhere it was used in the code (including related comments). This is internal rewording, so no functional change should be expected on user-side.	2023-10-13 10:05:06 +02:00
Amaury Denoyelle	7d76ffb2a4	BUG/MINOR: quic: fix qc.cids access on quic-conn fail alloc CIDs tree is now allocated dynamically since the following commit : `276697438d` MINOR: quic: Use a pool for the connection ID tree. This can caused a crash if qc_new_conn() is interrupted due to an intermediary failed allocation. When freeing all connection members, free_quic_conn_cids() is used. However, this function does not support a NULL cids. To fix this, simply check that cids is NULL during free_quic_conn_cids() prologue. This bug was reproduced using -dMfail. No need to backport.	2023-10-13 08:52:16 +02:00
Willy Tarreau	5798b5bb14	BUG/MAJOR: connection: make sure to always remove a connection from the tree Since commit `5afcb686b` ("MAJOR: connection: purge idle conn by last usage") in 2.9-dev4, the test on conn->toremove_list added to conn_get_idle_flag() in 2.8 by commit `3a7b539b1` ("BUG/MEDIUM: connection: Preserve flags when a conn is removed from an idle list") becomes misleading. Indeed, now both toremove_list and idle_list are shared by a union since the presence in these lists is mutually exclusive. However, in conn_get_idle_flag() we check for the presence in the toremove_list to decide whether or not to delete the connection from the tree. This test now fails because instead it sees the presence in the idle or safe list via the union, and concludes the element must not be removed. Thus the element remains in the tree and can be found later after the connection is released, causing crashes that Tristan reported in issue #2292. The following config is sufficient to reproduce it with 2 threads: defaults mode http timeout client 5s timeout server 5s timeout connect 1s listen front bind :8001 server next 127.0.0.1:8002 frontend next bind :8002 timeout http-keep-alive 1 http-request redirect location / Sending traffic with a few concurrent connections and some short timeouts suffices to instantly crash it after ~10k reqs: $ h2load -t 4 -c 16 -n 10000 -m 1 -w 1 http://0:8001/ With Amaury we analyzed the conditions in which the function is called in order to figure a better condition for the test and concluded that ->toremove_list is never filled there so we can safely remove that part from the test and just move the flag retrieval back to what it was prior to the 2.8 patch above. Note that the patch is not reverted though, as the parts that would drop the unexpected flags removal are unchanged. This patch must NOT be backported. The code in 2.8 works correctly, it's only the change in 2.9 that makes it misbehave.	2023-10-12 14:20:03 +02:00
Amaury Denoyelle	f59f8326f9	REORG: quic: cleanup traces definition Move all QUIC trace definitions from quic_conn.h to quic_trace-t.h. Also remove multiple definition trace_quic macro definition into quic_trace.h. This forces all QUIC source files who relies on trace to include it while reducing the size of quic_conn.h.	2023-10-11 14:15:31 +02:00
Frédéric Lécaille	bd83b6effb	BUG/MINOR: quic: Avoid crashing with unsupported cryptographic algos This bug was detected when compiling haproxy against aws-lc TLS stack during QUIC interop runner tests. Some algorithms could be negotiated by haproxy through the TLS stack but not fully supported by haproxy QUIC implentation. This leaded tls_aead() to return NULL (same thing for tls_md(), tls_hp()). As these functions returned values were never checked, they could triggered segfaults. To fix this, one closes the connection as soon as possible with a handshake_failure(40) TLS alert. Note that as the TLS stack successfully negotiates an algorithm, it provides haproxy with CRYPTO data before entering ->set_encryption_secrets() callback. This is why this callback (ha_set_encryption_secrets() on haproxy side) is modified to release all the CRYPTO frames before triggering a CONNECTION_CLOSE with a TLS alert. This is done calling qc_release_pktns_frms() for all the packet number spaces. Modify some quic_tls_keys_hexdump to avoid crashes when the ->aead or ->hp EVP_CIPHER are NULL. Modify qc_release_pktns_frms() to do nothing if the packet number space passed as parameter is not intialized. This bug does not impact the QUIC TLS compatibily mode (USE_QUIC_OPENSSL_COMPAT). Thank you to @ilia-shipitsin for having reported this issue in GH #2309. Must be backported as far as 2.6.	2023-10-11 11:52:22 +02:00
William Lallemand	deed2b6077	BUILD: ssl: enable keylog for WolfSSL Enable the keylog feature when linking against an WolfSSL library which has the 'HAVE_SECRET_CALLBACK' define. Only supports <= TLSv1.2 secret dump.	2023-10-09 21:34:25 +02:00
William Lallemand	9a4c53d96c	CLEANUP: ssl: remove compat functions for openssl < 1.0.0 The openssl-compat.h file has some function which were implemented in order to provide compatibility with openssl < 1.0.0. Most of them where to support the 0.9.8 version, but we don't support this version anymore. This patch removes the deprecated code from openssl-compat.h	2023-10-09 17:27:53 +02:00
William Lallemand	1918bcbc12	BUILD: ssl: enable keylog for awslc AWSLC suports SSL_CTX_set_keylog_callback(), this patch enables the build with the keylog feature for this library.	2023-10-09 16:17:30 +02:00
William Lallemand	4428ac4f70	BUILD: ssl: add 'secure_memcmp' converter for WolfSSL and awslc CRYPTO_memcmp is supported by both awslc and wolfssl, lets add the suport for the 'secure_memcmp' converter into the build.	2023-10-09 15:44:50 +02:00
William Lallemand	bf426eecd7	BUILD: ssl: add 'ssl_c_r_dn' fetch for WolfSSL WolfSSL supports SSL_get0_verified_chain() so we can activate this feature.	2023-10-09 15:09:47 +02:00
William Lallemand	d75bc06bdc	BUILD: ssl: enable 'ciphersuites' for WolfSSL WolfSSL supports setting the 'ciphersuites', lets enable the keyword for it.	2023-10-09 14:56:43 +02:00
Willy Tarreau	1e3422e6b0	BUG/MEDIUM: actions: always apply a longest match on prefix lookup Many actions take arguments after a parenthesis. When this happens, they have to be tagged in the parser with KWF_MATCH_PREFIX so that a sub-word is sufficient (since by default the whole block including the parenthesis is taken). The problem with this is that the parser stops on the first match. This was OK years ago when there were very few actions, but over time new ones were added and many actions are the prefix of another one (e.g. "set-var" is the prefix of "set-var-fmt"). And what happens in this case is that the first word is picked. Most often that doesn't cause trouble because such similar-looking actions involve the same custom parser so actually the wrong selection of the first entry results in the correct parser to be used anyway and the error to be silently hidden. But it's getting worse when accidentally declaring prefixes in multiple files, because in this case it will solely depend on the object file link order: if the longest name appears first, it will be properly detected, but if it appears last, its other prefix will be detected and might very well not be related at all and use a distinct parser. And this is random enough to make some actions succeed or fail depending on the build options that affect the linkage order. Worse: what if a keyword is the prefix of another one, with a different parser but a compatible syntax ? It could seem to work by accident but not do the expected operations. The correct solution is to always look for the longest matching name. This way the correct keyword will always be matched and used and there will be no risk to randomly pick the wrong anymore. This fix must be backported to the relevant stable releases.	2023-10-06 17:06:44 +02:00
Christopher Faulet	a633338b55	BUG/MEDIUM: stconn: Fix comparison sign in sc_need_room() sc_need_room() function may be called with a negative value. In this case, the intent is to be notified if any space was made in the channel buffer. In the function, we get the min between the requested room and the maximum possible room in the buffer, considering it may be an HTX buffer. However this max value is unsigned and leads to an unsigned comparison, casting the negative value to an unsigned value. Of course, in this case, this always leads to the wrong result. This bug seems to have no effect but it is hard to be sure. To fix the issue, we take care to respect the requested room sign by casting the max value to a signed integer. This patch must be backported to 2.8.	2023-10-06 15:34:31 +02:00
Aurelien DARRAGON	205d480d9f	MINOR: sink: refine forward_px usage now forward_px only serves as a hint to know if a proxy was created specifically for the sink, in which case the sink is responsible for it. Everywhere forward_px was used in appctx context: get the parent proxy from the sft->srv instead. This permits to finally get rid of the double link dependency between sink and proxy.	2023-10-06 15:34:31 +02:00
Willy Tarreau	90fa2eaa15	MINOR: haproxy: permit to register features during boot The regtests are using the "feature()" predicate but this one can only rely on build-time options. It would be nice if some runtime-specific options could be detected at boot time so that regtests could more flexibly adapt to what is supported (capabilities, splicing, etc). Similarly, certain features that are currently enabled with USE_XXX could also be automatically detected at build time using ifdefs and would simplify the configuration, but then we'd lose the feature report in the feature list which is convenient for regtests. This patch makes sure that haproxy -vv shows the variable's contents and not the macro's contents, and adds a new hap_register_feature() to allow the code to register a new keyword.	2023-10-06 11:40:02 +02:00
Remi Tricot-Le Breton	a5e96425a2	MEDIUM: cache: Add "Origin" header to secondary cache key This patch add a hash of the Origin header to the cache's secondary key. This enables to manage store responses that have a "Vary: Origin" header in the cache when vary is enabled. This cannot be considered as a means to manage CORS requests though, it only processes the Origin header and hashes the presented value without any form of URI normalization. This need was expressed by Philipp Hossner in GitHub issue #251. Co-Authored-by: Philipp Hossner <philipp.hossner@posteo.de>	2023-10-05 10:53:54 +02:00
William Lallemand	45174e4fdc	BUILD: quic: allow USE_QUIC to work with AWSLC This patch fixes the build with AWSLC and USE_QUIC=1, this is only meant to be able to build for now and it's not feature complete. The set_encryption_secrets callback has been split in set_read_secret and set_write_secret. Missing features: - 0RTT was disabled. - TLS1_3_CK_CHACHA20_POLY1305_SHA256, TLS1_3_CK_AES_128_CCM_SHA256 were disabled - clienthello callback is missing, certificate selection could be limited (RSA + ECDSA at the same time)	2023-10-04 16:55:19 +02:00
Christopher Faulet	f32e28eddc	MINOR: mux-h1: Add flags if outgoing msg contains a header about its payload If a "Content-length" or "Transfer-Encoding; chunked" headers is found or inserted in an outgoing message, a specific flag is now set on the H1 stream. H1S_F_HAVE_CLEN is set for "Content-length" header and H1S_F_HAVE_CHNK for "Transfer-Encoding: chunked". This will be useful to properly format outgoing messages, even if one of these headers was removed by hand (with no update of the message meta-data).	2023-10-04 15:34:18 +02:00
Amaury Denoyelle	bd001ff346	MINOR: backend: refactor specific source address allocation Refactor alloc_bind_address() function which is used to allocate a sockaddr if a connection to a target server relies on a specific source address setting. The main objective of this change is to be able to use this function outside of backend module, namely for preconnections using a reverse server. As such, this function is now exported globally. For reverse connect, there is no stream instance. As such, the function parts which relied on it were reduced to the minimal. Now, stream is only used if a non-static address is configured which is useful for usesrc client\|clientip\|hdr_ip. These options have no sense for reverse connect so it should be safe to use the same function.	2023-10-03 17:49:12 +02:00
Amaury Denoyelle	2ac5d9a657	MINOR: quic: handle perm error on bind during runtime Improve EACCES permission errors encounterd when using QUIC connection socket at runtime : * First occurence of the error on the process will generate a log warning. This should prevent users from using a privileged port without mandatory access rights. * Socket mode will automatically fallback to listener socket for the receiver instance. This requires to duplicate the settings from the bind_conf to the receiver instance to support configurations with multiple addresses on the same bind line.	2023-10-03 16:52:02 +02:00
Amaury Denoyelle	3ef6df7387	MINOR: quic: define quic-socket bind setting Define a new bind option quic-socket : quic-socket [ connection \| listener ] This new setting works in conjunction with the existing configuration global tune.quic.socket-owner and reuse the same semantics. The purpose of this setting is to allow to disable connection socket usage on listener instances individually. This will notably be useful when needing to deactivating it when encountered a fatal permission error on bind() at runtime.	2023-10-03 16:49:26 +02:00
Willy Tarreau	7c69c9b51f	BUG/MAJOR: plock: fix major bug in pl_take_w() introduced with EBO When EBO was brought to pl_take_w() by plock commit 60d750d ("plock: use EBO when waiting for readers to leave in take_w() and stow()"), a mistake was made: the mask against which the current value of the lock is tested excludes the first reader like in stow(), but it must not because it was just obtained via an ldadd() which means that it doesn't count itself. The problem this causes is that if there is exactly one reader when a writer grabs the lock, the writer will not wait for it to leave before starting its operations. The solution consists in checking for any reader in the IF. However the mask passed to pl_wait_unlock_*() must still exclude the lowest bit as it's verified after a subsequent load. Kudos to Remi Tricot-Le Breton for reporting and bisecting this issue with a reproducer. No backport is needed since this was brought in 2.9-dev3 with commit `8178a5211` ("MAJOR: threads/plock: update the embedded library again"). The code is now on par with plock commit ada70fe.	2023-10-03 08:28:12 +02:00
Amaury Denoyelle	337c71423f	MINOR: connection: define mux flag for reverse support Add a new MUX flag MX_FL_REVERSABLE. This value is used to indicate that MUX instance supports connection reversal. For the moment, only HTTP/2 multiplexer is flagged with it. This allows to dynamically check if reversal can be completed during MUX installation. This will allow to relax requirement on config writing for 'tcp-request session attach-srv' which currently cannot be used mixed with non-http/2 listener instances, even if used conditionnally with an ACL.	2023-09-29 18:09:08 +02:00
Amaury Denoyelle	ac1164de7c	MINOR: connection: define error for reverse connect Define a new error code for connection CO_ER_REVERSE. This will be used to report an issue which happens on a connection targetted for reversal before reverse process is completed.	2023-09-29 18:08:26 +02:00
Emeric Brun	3c250cb847	Revert "BUG/MEDIUM: quic: missing check of dcid for init pkt including a token" This reverts commit `072e774939`. Doing h2load with h3 tests we notice this behavior: Client ---- INIT no token SCID = a , DCID = A ---> Server (1) Client <--- RETRY+TOKEN DCID = a, SCID = B ---- Server (2) Client ---- INIT+TOKEN SCID = a , DCID = B ---> Server (3) Client <--- INIT DCID = a, SCID = C ---- Server (4) Client ---- INIT+TOKEN SCID = a, DCID = C ---> Server (5) With (5) dropped by haproxy due to token validation. Indeed the previous patch adds SCID of retry packet sent to the aad of the token ciphering aad. It was useful to validate the next INIT packets including the token are sent by the client using the new provided SCID for DCID as mantionned into the RFC 9000. But this stateless information is lost on received INIT packets following the first outgoing INIT packet from the server because the client is also supposed to re-use a second time the lastest received SCID for its new DCID. This will break the token validation on those last packets and they will be dropped by haproxy. It was discussed there: https://mailarchive.ietf.org/arch/msg/quic/7kXVvzhNCpgPk6FwtyPuIC6tRk0/ To resume: this is not the role of the server to verify the re-use of retry's SCID for DCID in further client's INIT packets. The previous patch must be reverted in all versions where it was backported (supposed until 2.6)	2023-09-29 09:27:22 +02:00
Willy Tarreau	d956db6638	CLEANUP: stream: remove the now unused stream_dump() function It was superseded by strm_dump_to_buffer() which provides much more complete information and supports anonymizing.	2023-09-29 09:20:27 +02:00
Willy Tarreau	c185bc4656	MEDIUM: stream: now provide full stream dumps in case of loops When a stream is caught looping, we produce some output to help figure its internal state explaining why it's looping. The problem is that this debug output is quite old and the info it provides are quite insufficient to debug a modern process, and since such bugs happen only once or twice a year the situation doesn't improve. On the other hand the output of "show sess all" is extremely detailed and kept up to date with code evolutions since it's a heavily used debugging tool. This commit replaces the call to the totally outdated stream_dump() with a call to strm_dump_to_buffer(), and removes the filters dump since they are already emitted there, and it now produces much more exploitable output: [ALERT] (5936) : A bogus STREAM [0x7fa8dc02f660] is spinning at 5653514 calls per second and refuses to die, aborting now! Please report this error to developers: 0x7fa8dc02f660: [28/Sep/2023:09:53:08.811818] id=2 proto=tcpv4 source=127.0.0.1:58306 flags=0xc4a, conn_retries=0, conn_exp=<NEVER> conn_et=0x000 srv_conn=0x133f220, pend_pos=(nil) waiting=0 epoch=0x1 frontend=public (id=2 mode=http), listener=? (id=1) addr=127.0.0.1:4080 backend=public (id=2 mode=http) addr=127.0.0.1:61932 server=s1 (id=1) addr=127.0.0.1:7443 task=0x7fa8dc02fa40 (state=0x01 nice=0 calls=5749559 rate=5653514 exp=3s tid=1(1/1) age=1s) txn=0x7fa8dc02fbf0 flags=0x3000 meth=1 status=-1 req.st=MSG_DONE rsp.st=MSG_RPBEFORE req.f=0x4c rsp.f=0x00 scf=0x7fa8dc02f5f0 flags=0x00000482 state=EST endp=CONN,0x7fa8dc02b4b0,0x05004001 sub=1 rex=58s wex=<NEVER> h1s=0x7fa8dc02b4b0 h1s.flg=0x100010 .sd.flg=0x5004001 .req.state=MSG_DONE .res.state=MSG_RPBEFORE .meth=GET status=0 .sd.flg=0x05004001 .sc.flg=0x00000482 .sc.app=0x7fa8dc02f660 .subs=0x7fa8dc02f608(ev=1 tl=0x7fa8dc02fae0 tl.calls=0 tl.ctx=0x7fa8dc02f5f0 tl.fct=sc_conn_io_cb) h1c=0x7fa8dc0272d0 h1c.flg=0x0 .sub=0 .ibuf=0@(nil)+0/0 .obuf=0@(nil)+0/0 .task=0x7fa8dc0273f0 .exp=<NEVER> co0=0x7fa8dc027040 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM target=LISTENER:0x12840c0 flags=0x00000300 fd=32 fd.state=20 updt=0 fd.tmask=0x2 scb=0x7fa8dc02fb30 flags=0x00001411 state=EST endp=CONN,0x7fa8dc0300c0,0x05000001 sub=1 rex=58s wex=<NEVER> h1s=0x7fa8dc0300c0 h1s.flg=0x4010 .sd.flg=0x5000001 .req.state=MSG_DONE .res.state=MSG_RPBEFORE .meth=GET status=0 .sd.flg=0x05000001 .sc.flg=0x00001411 .sc.app=0x7fa8dc02f660 .subs=0x7fa8dc02fb48(ev=1 tl=0x7fa8dc02feb0 tl.calls=2 tl.ctx=0x7fa8dc02fb30 tl.fct=sc_conn_io_cb) h1c=0x7fa8dc02ff00 h1c.flg=0x80000000 .sub=1 .ibuf=0@(nil)+0/0 .obuf=0@(nil)+0/0 .task=0x7fa8dc030020 .exp=<NEVER> co1=0x7fa8dc02fcd0 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM target=SERVER:0x133f220 flags=0x10000300 fd=33 fd.state=10421 updt=0 fd.tmask=0x2 req=0x7fa8dc02f680 (f=0x1840000 an=0x8000 pipe=0 tofwd=0 total=79) an_exp=<NEVER> buf=0x7fa8dc02f688 data=(nil) o=0 p=0 i=0 size=0 htx=0xc18f60 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0 res=0x7fa8dc02f6d0 (f=0x80000000 an=0x1400000 pipe=0 tofwd=0 total=0) an_exp=<NEVER> buf=0x7fa8dc02f6d8 data=(nil) o=0 p=0 i=0 size=0 htx=0xc18f60 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0 call trace(10): \| 0x59f2b7 [0f 0b 0f 1f 80 00 00 00]: stream_dump_and_crash+0x1f7/0x2bf \| 0x5a0d71 [e9 af e6 ff ff ba 40 00]: process_stream+0x19f1/0x3a56 \| 0x68d7bb [49 89 c7 4d 85 ff 74 77]: run_tasks_from_lists+0x3ab/0x924 \| 0x68e0b4 [29 44 24 14 8b 4c 24 14]: process_runnable_tasks+0x374/0x6d6 \| 0x656f67 [83 3d f2 75 84 00 01 0f]: run_poll_loop+0x127/0x5a8 \| 0x6575d7 [48 8b 1d 42 50 5c 00 48]: main+0x1b22f7 \| 0x7fa8e0f35e45 [64 48 89 04 25 30 06 00]: libpthread:+0x7e45 \| 0x7fa8e0e5a4af [48 89 c7 b8 3c 00 00 00]: libc:clone+0x3f/0x5a Note that the output is subject to the global anon key so that IPs and object names can be anonymized if required. It could make sense to backport this and the few related previous patches next time such an issue is reported.	2023-09-29 09:20:27 +02:00
Willy Tarreau	5743eeea88	MINOR: stream: make stream_dump() always multi-line There used to be two working modes for this function, a single-line one and a multi-line one, the difference being made on the "eol" argument which could contain either a space or an LF (and with the prefix being adjusted accordingly). Let's get rid of the single-line mode as it's what limits the output contents because it's difficult to produce exploitable structured data this way. It was only used in the rare case of spinning streams and applets and these are the ones lacking info. Now a spinning stream produces: [ALERT] (3511) : A bogus STREAM [0x227e7b0] is spinning at 5581202 calls per second and refuses to die, aborting now! Please report this error to developers: strm=0x227e7b0,c4a src=127.0.0.1 fe=public be=public dst=s1 txn=0x2041650,3000 txn.req=MSG_DONE,4c txn.rsp=MSG_RPBEFORE,0 rqf=1840000 rqa=8000 rpf=80000000 rpa=1400000 scf=0x24af280,EST,482 scb=0x24af430,EST,1411 af=(nil),0 sab=(nil),0 cof=0x7fdb28026630,300:H1(0x24a6f60)/RAW((nil))/tcpv4(33) cob=0x23199f0,10000300:H1(0x24af630)/RAW((nil))/tcpv4(32) filters={} call trace(11): (...)	2023-09-29 09:20:27 +02:00
Willy Tarreau	48b2233d36	CLEANUP: freq_ctr: make all freq_ctr readers take a const Since 2.4-dev18 with commit `b4476c6a8` ("CLEANUP: freq_ctr: make arguments of freq_ctr_total() const"), most of the freq_ctr readers should be fine with a const, except that they were not updated to reflect this and they continue to force variable on some functions that call them. Let's update this. This could even be backported if needed.	2023-09-29 09:20:27 +02:00
Vladimir Vdovin	f8b81f6eb7	MINOR: support for http-request set-timeout client Added set-timeout for frontend side of session, so it can be used to set custom per-client timeouts if needed. Added cur_client_timeout to fetch client timeout samples.	2023-09-28 08:49:22 +02:00
Amaury Denoyelle	b9bb3b932c	MINOR: proto_reverse_connect: emit log for preconnect Add reporting using send_log() for preconnect operation. This is minimal to ensure we understand the current status of listener in active reverse connect. To limit logging quantity, only important transition are considered. This requires to implement a minimal state machine as a new field in receiver structure. Here are the logs produced : * Initiating : first time preconnect is enabled on a listener * Error : last preconnect attempt interrupted on a connection error * Reaching maxconn : all necessary connections were reversed and are operational on a listener	2023-09-22 17:21:53 +02:00
Amaury Denoyelle	1f43fb71be	MINOR: proto_reverse_connect: refactor preconnect failure When a connection is freed during preconnect before reversal, the error must be notified to the listener to remove any connection reference and rearm a new preconnect attempt. Currently, this can occur through 2 code paths : * conn_free() called directly by H2 mux * error during conn_create_mux(). For this case, connection is flagged with CO_FL_ERROR and reverse_connect task is woken up. The process task handler is then responsible to call conn_free() for such connection. Duplicated steps where done both in conn_free() and process task handler. These are now removed. To facilitate code maintenance, dedicated operation have been centralized in a new function rev_notify_preconn_err() which is called by conn_free().	2023-09-22 16:43:36 +02:00
Emeric Brun	27b2fd2e06	MINOR: quic: handle external extra CIDs generator. This patch adds the ability to externalize and customize the code of the computation of extra CIDs after the first one was derived from the ODCID. This is to prepare interoperability with extra components such as different QUIC proxies or routers for instance. To process the patch defines two function callbacks: - the first one to compute a hash 64bits from the first generated CID (itself continues to be derived from ODCID). Resulting hash is stored into the 'quic_conn' and 64bits is chosen large enought to be able to store an entire haproxy's CID. - the second callback re-uses the previoulsy computed hash to derive an extra CID using the custom algorithm. If not set haproxy will continue to choose a randomized CID value. Those two functions have also the 'cluster_secret' passed as an argument: this way, it is usable for obfuscation or ciphering.	2023-09-22 10:32:14 +02:00
Aurelien DARRAGON	acb7d8a89c	MINOR: pattern: fix pat_{parse,match}_ip() function comments Function comments were outdated, probably because they have not been updated during the previous refactors. Fixing comments to better reflect the current behavior. This may be backported up to 2.2, or even 2.0 by slightly adapting the patch (in 2.0, such functions are documented in proto/pattern.h)	2023-09-21 09:50:55 +02:00
Willy Tarreau	cbbee15462	CLEANUP: ring: rename the ring lock "RING_LOCK" instead of "LOGSRV_LOCK" The ring lock was initially mostly used for the logs and used to inherit its name in lock stats. Now that it's exclusively used by rings, let's rename it accordingly.	2023-09-20 21:38:33 +02:00
Willy Tarreau	cec8b42cb3	MEDIUM: logs: atomically check and update the log sample index The log server lock is pretty visible in perf top when using log samples because it's taken for each server in turn while trying to validate and update the log server's index. Let's change this for a CAS, since we have the index and the range at hand now. This allow us to remove the logsrv lock. The test on 4 servers now shows a 3.7 times improvement thanks to much lower contention. Without log sampling a test producing 4.4M logs/s delivers 4.4M logs/s at 21 CPUs used, everything spent in the kernel. After enabling 4 samples (1:4, 2:4, 3:4 and 4:4), the throughput would previously drop to 1.13M log/s with 37 CPUs used and 75% spent in process_send_log(). Now with this change, 4.25M logs/s are emitted, using 26 CPUs and 22% in process_send_log(). That's a 3.7x throughput improvement for a 30% global CPU usage reduction, but in practice it mostly shows that the performance drop caused by having samples is much less noticeable (each of the 4 servers has its index updated for each log). Note that in order to even avoid incrementing an index for each log srv that is consulted, it would be more convenient to have a single index per frontend and apply the modulus on each log server in turn to see if the range has to be updated. It would then only perform one write per range switch. However the place where this is done doesn't have access to a frontend, so some changes would need to be performed for this, and it would require to update the current range independently in each logsrv, which is not necessarily easier since we don't know yet if we can commit it.	2023-09-20 21:38:33 +02:00
Willy Tarreau	e00470378b	MINOR: logs: use a single index to store the current range and index By using a single long long to store both the current range and the next index, we'll make it possible to perform atomic operations instead of locking. Let's only regroup them for now under a new "curr_rg_idx". The upper word is the range, the lower is the index.	2023-09-20 21:38:33 +02:00
Willy Tarreau	3f1284560f	MINOR: log: remove the unused curr_idx in struct smp_log_range This index is useless because it only serves to know when the global index reached the end, while the global one already knows it. Let's just drop it and perform the test on the global range. It was verified with the following config that the first server continues to take 1/10 of the traffic, the 2nd one 2/10, the 3rd one 3/10 and the 4th one 4/10: log 127.0.0.1:10001 sample 1:10 local0 log 127.0.0.1:10002 sample 2,5:10 local0 log 127.0.0.1:10003 sample 3,7,9:10 local0 log 127.0.0.1:10004 sample 4,6,8,10:10 local0	2023-09-20 21:38:33 +02:00
Willy Tarreau	4351364700	MINOR: logs: clarify the check of the log range The test of the log range is not very clear, in part due to the reuse of the "curr_idx" name that happens at two levels. The call to in_smp_log_range() applies to the smp_info's index to which 1 is added: it verifies that the next index is still within the current range. Let's just have a local variable "next_index" in process_send_log() that gets assigned the next index (current+1) and compare it to the current range's boundaries. This makes the test much clearer. We can then simply remove in_smp_log_range() that's no longer needed.	2023-09-20 21:38:33 +02:00
Willy Tarreau	6cbb5a057b	Revert "MAJOR: import: update mt_list to support exponential back-off" This reverts commit `c618ed5ff4`. The list iterator is broken. As found by Fred, running QUIC single- threaded shows that only the first connection is accepted because the accepter relies on the element being initialized once detached (which is expected and matches what MT_LIST_DELETE_SAFE() used to do before). However while doing this in the quic_sock code seems to work, doing it inside the macro show total breakage and the unit test doesn't work anymore (random crashes). Thus it looks like the fix is not trivial, let's roll this back for the time it will take to fix the loop.	2023-09-15 17:13:43 +02:00
Willy Tarreau	e3b2704e26	BUG/MINOR: freq_ctr: fix possible negative rate with the scaled API In 1.9 with commit `627505d36` ("MINOR: freq_ctr: add swrate_add_scaled() to work with large samples") we got the ability to indicate when adding some values that they represent a number of samples. However there is an issue in the calculation which is that the number of samples that is added to the sum before the division in order to avoid fading away too fast, is multiplied by the scale. The problem it causes is that this is done in the negative part of the expression, and that as soon if the sum of old_sum and v*s is too small (e.g. zero), we end up with a negative value of -s. This is visible in "show pools" which occasionally report a very large value on "needed_avg" since 2.9, though the bug has been there for longer. Indeed in 2.9 since they're hashed in buckets, it suffices that any thread got one such error once for the sum to be wrong. One possible impact is memory usage not shrinking after a short burst due to pools refraining from releasing objects, believing they don't have enough. This must be backported to all versions. Note that the opportunistic version can be dropped before 2.8.	2023-09-14 11:09:07 +02:00
Willy Tarreau	c618ed5ff4	MAJOR: import: update mt_list to support exponential back-off The new mt_list code supports exponential back-off on conflict, which is important for use cases where there is contention on a large number of threads. The API evolved a little bit and required some updates: - mt_list_for_each_entry_safe() is now in upper case to explicitly show that it is a macro, and only uses the back element, doesn't require a secondary pointer for deletes anymore. - MT_LIST_DELETE_SAFE() doesn't exist anymore, instead one just has to set the list iterator to NULL so that it is not re-inserted into the list and the list is spliced there. One must be careful because it was usually performed before freeing the element. Now instead the element must be nulled before the continue/break. - MT_LIST_LOCK_ELT() and MT_LIST_UNLOCK_ELT() have always been unclear. They were replaced by mt_list_cut_around() and mt_list_connect_elem() which more explicitly detach the element and reconnect it into the list. - MT_LIST_APPEND_LOCKED() was only in haproxy so it was left as-is in list.h. It may however possibly benefit from being upstreamed. This required tiny adaptations to event_hdl.c and quic_sock.c. The test case was updated and the API doc added. Note that in order to keep include files small, the struct mt_list definition remains in list-t.h (par of the internal API) and was ifdef'd out in mt_list.h. A test on QUIC with both quictls 1.1.1 and wolfssl 5.6.3 on ARM64 with 80 threads shows a drastic reduction of CPU usage thanks to this and the refined memory barriers. Please note that the CPU usage on OpenSSL 3.0.9 is significantly higher due to the excessive use of atomic ops by openssl, but 3.1 is only slightly above 1.1.1 though: - before: 35 Gbps, 3.5 Mpps, 7800% CPU - after: 41 Gbps, 4.2 Mpps, 2900% CPU	2023-09-13 11:50:33 +02:00
Fr�d�ric L�caille	84757e32e6	BUG/MEDIUM: quic: quic_cc_conn ->cntrs counters unreachable This bug arrived with this commit in 2.9-dev3: MEDIUM: quic: Allow the quic_conn memory to be asap released. When sending packets from quic_cc_conn_io_cb(), e.g. when the quic_conn object has been released and replaced by a lighter one (quic_cc_conn), some counters may have to be incremented. But they were not reachable because not shared between quic_conn and quic_cc_conn struct. To fix this, one has only to move the ->cntrs counters from quic_conn to QUIC_CONN_COMMON struct which is shared between both quic_cc_conn Thank you to Tristan for having reported this in GH #2247. No need to backport.	2023-09-12 18:13:36 +02:00
Willy Tarreau	efc46dede9	DEBUG: pools: inspect pools on fatal error and dump information found It's a bit frustrating sometimes to see pool checks catch a bug but not provide exploitable information without a core. Here we're adding a function "pool_inspect_item()" which is called just before aborting in pool_check_pattern() and POOL_DEBUG_CHECK_MARK() and which will display the error type, the pool's pointer and name, and will try to check if the item's tag matches the pool, and if not, will iterate over all pools to see if one would be a better candidate, then will try to figure the last known caller and possibly other likely candidates if the pool's tag is not sufficiently trusted. This typically helps better diagnose corruption in use-after-free scenarios, or freeing to a pool that differs from the one the object was allocated from, and will also indicate calling points that may help figure where an object was last released or allocated. The info is printed on stderr just before the backtrace. For example, the recent off-by-one test in the PPv2 changes would have produced the following output in vtest logs: * h1 debug\|FATAL: pool inconsistency detected in thread 1: tag mismatch on free(). * h1 debug\| caller: 0x62bb87 (conn_free+0x147/0x3c5) * h1 debug\| pool: 0x2211ec0 ('pp_tlv_256', size 304, real 320, users 1) * h1 debug\|Tag does not match. Possible origin pool(s): * h1 debug\| tag: @0x2565530 = 0x2216740 (pp_tlv_128, size 176, real 192, users 1) * h1 debug\|Recorded caller if pool 'pp_tlv_128': *** h1 debug\| @0x2565538 (+0184) = 0x62c76d (conn_recv_proxy+0x4cd/0xa24) A mismatch in the allocated/released pool is already visible, and the callers confirm it once resolved, where the allocator indeed allocates from pp_tlv_128 and conn_free() releases to pp_tlv_256: $ addr2line -spafe ./haproxy <<< $'0x62bb87\n0x62c76d' 0x000000000062bb87: conn_free at connection.c:568 0x000000000062c76d: conn_recv_proxy at connection.c:1177	2023-09-11 15:46:14 +02:00
Willy Tarreau	f6bee5a50b	DEBUG: pools: make pool_check_pattern() take a pointer to the pool This will be useful to report detailed bug traces.	2023-09-11 15:19:49 +02:00
Willy Tarreau	e92e96b00f	DEBUG: pools: pass the caller pointer to the check functions and macros In preparation for more detailed pool error reports, let's pass the caller pointers to the check functions. This will be useful to produce messages indicating where the issue happened.	2023-09-11 15:19:49 +02:00
Willy Tarreau	baf2070421	DEBUG: pools: always record the caller for uncached allocs as well When recording the caller of a pool_alloc(), we currently store it only when the object comes from the cache and never when it comes from the heap. There's no valid reason for this except that the caller's pointer was not passed to pool_alloc_nocache(), so it used to set NULL there. Let's just pass it down the chain.	2023-09-11 15:19:49 +02:00
Willy Tarreau	4a18d9e560	REORG: cpuset: move parse_cpu_set() and parse_cpumap() to cpuset.c These ones were still in cfgparse.c but they're not specific to the config at all and may actually be used even when parsing cpu list entries in /sys. Better move them where they can be reused.	2023-09-08 16:25:19 +02:00
Willy Tarreau	5119109e3f	MINOR: cpuset: dynamically allocate cpu_map cpu_map is 8.2kB/entry and there's one such entry per group, that's ~520kB total. In addition, the init code is still in haproxy.c enclosed in ifdefs. Let's make this a dynamically allocated array in the cpuset code and remove that init code. Later we may even consider reallocating it once the number of threads and groups is known, in order to shrink it a little bit, as the typical setup with a single group will only need 8.2kB, thus saving half a MB of RAM. This would require that the upper bound is placed in a variable though.	2023-09-08 16:25:19 +02:00
Willy Tarreau	1f2433fb6a	MINOR: tools: add function read_line_to_trash() to read a line of a file This function takes on input a printf format for the file name, making it particularly suitable for /proc or /sys entries which take a lot of numbers. It also automatically trims the trailing CR and/or LF chars.	2023-09-08 16:25:19 +02:00
Frédéric Lécaille	e3e218b98e	CLEANUP: quic: Remove useless free_quic_tx_pkts() function. This function define but no more used since this commit: BUG/MAJOR: quic: Really ignore malformed ACK frames.	2023-09-08 10:17:25 +02:00
Frédéric Lécaille	292dfdd78d	BUG/MINOR: quic: Wrong cluster secret initialization The function generate_random_cluster_secret() which initializes the cluster secret when not supplied by configuration is buggy. There 1/256 that the cluster secret string is empty. To fix this, one stores the cluster as a reduced size first 128 bits of its own SHA1 (160 bits) digest, if defined by configuration. If this is not the case, it is initialized with a 128 bits random value. Furthermore, thus the cluster secret is always initialized. As the cluster secret is always initialized, there are several tests which are for now on useless. This patch removes such tests (if(global.cluster_secret)) in the QUIC code part and at parsing time: no need to check that a cluster secret was initialized with "quic-force-retry" option. Must be backported as far as 2.6.	2023-09-08 09:50:58 +02:00
William Lallemand	15e591b6e0	MINOR: ssl: add support for 'curves' keyword on server lines This patch implements the 'curves' keyword on server lines as well as the 'ssl-default-server-curves' keyword in the global section. It also add the keyword on the server line in the ssl_curves reg-test. These keywords allow the configuration of the curves list for a server.	2023-09-07 23:29:10 +02:00
Willy Tarreau	28ff1a5d56	MINOR: tasks/stats: report the number of niced tasks in "show info" We currently know the number of tasks in the run queue that are niced, and we don't expose it. It's too bad because it can give a hint about what share of the load is relevant. For example if one runs a Lua script that was purposely reniced, or if a stats page or the CLI is hammered with slow operations, seeing them appear there can help identify what part of the load is not caused by the traffic, and improve monitoring systems or autoscalers.	2023-09-06 17:44:44 +02:00
Remi Tricot-Le Breton	e03d060aa3	MINOR: cache: Change hash function in default normalizer used in case of "vary" When building the secondary signature for cache entries when vary is enabled, the referer part of the signature was a simple crc32 of the first referer header. This patch changes it to a 64bits hash based of xxhash algorithm with a random seed built during init. This will prevent "malicious" hash collisions between entries of the cache.	2023-09-06 16:11:31 +02:00
Aurelien DARRAGON	d9b81e5b49	MEDIUM: log/sink: make logsrv postparsing more generic We previously had postparsing logic but only for logsrv sinks, but now we need to make this operation on logsrv directly instead of sinks to prepare for additional postparsing logic that is not sink-specific. To do this, we migrated post_sink_resolve() and sink_postresolve_logsrvs() to their postresolve_logsrvs() and postresolve_logsrv_list() equivalents. Then, we split postresolve_logsrv_list() so that the sink-only logic stays in sink.c (sink_resolve_logsrv_buffer() function), and the "generic" target part stays in log.c as resolve_logsrv(). Error messages formatting was preserved as far as possible but some slight variations are to be expected. As for the functional aspect, no change should be expected.	2023-09-06 16:06:39 +02:00
Aurelien DARRAGON	969e212c66	MINOR: log: add dup_logsrv() helper function ease code maintenance by introducing dup_logsrv() helper function to properly duplicate an existing logsrv struct.	2023-09-06 16:06:39 +02:00
Aurelien DARRAGON	d499485aa9	MINOR: sink: simplify post_sink_resolve function Simplify post_sink_resolve() function to reduce code duplication and make it easier to maintain.	2023-09-06 16:06:39 +02:00
Aurelien DARRAGON	5b295ff409	MINOR: ring: add a function to compute max ring payload Add a helper function to the ring API to compute the maximum payload length that could fit into the ring based on ring size.	2023-09-06 16:06:39 +02:00
Christopher Faulet	3ec156f027	BUG/MEDIUM: applet: Fix API for function to push new data in channels buffer All applets only check the -1 error value (need room) for applet_put* functions while the underlying functions may also return -2 if the input is closed or -3 if the data length is invalid. It means applets already handle other cases by their own. The API should be fixed but for now, to ease backports, we only fix applet_put* functions to always return -1 on error. This way, at least for the applets point of view, the API is consistent. This patch should be backported to 2.8. Probably not further. Except if we suspect it could fix a bug.	2023-09-06 09:29:27 +02:00
Fr�d�ric L�caille	fb4294be55	BUG/MINOR: quic: Wrong RTT computation (srtt and rrt_var) Due to the fact that several variable values (rtt_var, srtt) were stored as multiple of their real values, some calculations were less accurate as expected. Stop storing 4rtt_var values, and 8srtt values. Adjust all the impacted statements. Must be backported as far as 2.6.	2023-09-05 17:14:51 +02:00
William Lallemand	d90d3bf894	MINOR: global: export the display_version() symbol Export the display_version() function which can be used elsewhere than in haproxy.c	2023-09-05 15:24:39 +02:00
Willy Tarreau	86854dd032	MEDIUM: threads: detect excessive thread counts vs cpu-map This detects when there are more threads bound via cpu-map than CPUs enabled in cpu-map, or when there are more total threads than the total number of CPUs available at boot (for unbound threads) and configured for bound threads. In this case, a warning is emitted to explain the problems it will cause, and explaining how to address the situation. Note that some configurations will not be detected as faulty because the algorithmic complexity to resolve all arrangements grows in O(N!). This means that having 3 threads on 2 CPUs and one thread on 2 CPUs will not be detected as it's 4 threads for 4 CPUs. But at least configs such as T0:(1,4) T1:(1,4) T2:(2,4) T3:(3,4) will not trigger a warning since they're valid.	2023-09-04 19:39:17 +02:00
Willy Tarreau	8357f950cb	MEDIUM: threads: detect incomplete CPU bindings It's very easy to mess up with some cpu-map directives and to leave some thread unbound. Let's add a test that checks that either all threads are bound or none are bound, but that we do not face the intermediary situation where some are pinned and others are left wandering around, possibly on the same CPUs as bound ones. Note that this should not be backported, or maybe turned into a notice only, as it appears that it will easily catch invalid configs and that may break updates for some users.	2023-09-04 19:39:17 +02:00
Willy Tarreau	e65f54cf96	MINOR: cpuset: centralize a reliable bound cpu detection Till now the CPUs that were bound were only retrieved in thread_cpus_enabled() in order to count the number of CPUs allowed, and it relied on arch-specific code. Let's slightly arrange this into ha_cpuset_detect_bound() that reuses the ha_cpuset struct and the accompanying code. This makes the code much clearer without having to carry along some arch-specific stuff out of this area. Note that the macos-specific code used in thread.c to only count online CPUs but not retrieve a mask, so for now we can't infer anything from it and can't implement it. In addition and more importantly, this function is reliable in that it will only return a value when the detection is accurate, and will not return incomplete sets on operating systems where we don't have an exact list, such as online CPUs.	2023-09-04 19:39:17 +02:00
Willy Tarreau	d3ecc67a01	MINOR: cpuset: add ha_cpuset_or() to bitwise-OR two CPU sets This operation was not implemented and will be needed later.	2023-09-04 19:39:17 +02:00
Willy Tarreau	eb10567254	MINOR: cpuset: add ha_cpuset_isset() to check for the presence of a CPU in a set This function will be convenient to test for the presence of a given CPU in a set.	2023-09-04 19:39:17 +02:00
Willy Tarreau	17a7baca07	BUILD: bug: make BUG_ON() void to avoid a rare warning When building without threads, the recently introduced BUG_ON(tid != 0) turns to a constant expression that evaluates to 0 and that is not used, resulting in this warning: src/connection.c: In function 'conn_free': src/connection.c:584:3: warning: statement with no effect [-Wunused-value] This is because the whole thing is declared as an expression for clarity. Make it return void to avoid this. No backport is needed.	2023-09-04 19:38:51 +02:00
Andrew Hopkins	b3f94f8b3b	BUILD: ssl: Build with new cryptographic library AWS-LC This adds a new option for the Makefile USE_OPENSSL_AWSLC, and update the documentation with instructions to use HAProxy with AWS-LC. Update the type of the OCSP callback retrieved with SSL_CTX_get_tlsext_status_cb with the actual type for libcrypto versions greater than 1.0.2. This doesn't affect OpenSSL which casts the callback to void* in SSL_CTX_ctrl.	2023-09-04 18:19:18 +02:00
Christopher Faulet	b50a471adb	BUG/MEDIUM: stconn: Don't block sends if there is a pending shutdown For the same reason than the previous patch, we must not block the sends when there is a pending shutdown. In other words, we must consider the sends are allowed when there is a pending shutdown. This patch must slowly be backported as far as 2.2. It should partially fix issue #2249.	2023-09-01 14:18:26 +02:00
Willy Tarreau	844a3bc25b	MEDIUM: checks: implement a queue in order to limit concurrent checks The progressive adoption of OpenSSL 3 and its abysmal handshake performance has started to reveal situations where it simply isn't possible anymore to succesfully run health checks on many servers, because between the moment all the checks are started and the moment the handshake finally completes, the timeout has expired! This also has consequences on production traffic which gets significantly delayed as well, all that for lots of checks. While it's possible to increase the check delays, it doesn't solve everything as checks still take a huge amount of time to converge in such conditions. Here we take a different approach by permitting to enforce the maximum concurrent checks per thread limitation and implementing an ordered queue. Thanks to this, if a thread about to start a check has reached its limit, it will add the check at the end of a queue and it will be processed once another check is finished. This proves to be extremely efficient, with all checks completing in a reasonable amount of time and not being disturbed by the rest of the traffic from other checks. They're just cycling slower, but at the speed the machine can handle. One must understand however that if some complex checks perform multiple exchanges, they will take a check slot for all the required duration. This is why the limit is not enforced by default. Tests on SSL show that a limit of 5-50 checks per thread on local servers gives excellent results already, so that could be a good starting point.	2023-09-01 14:00:04 +02:00
Willy Tarreau	cfc0bceeb5	MEDIUM: checks: search more aggressively for another thread on overload When the current check is overloaded (more running checks than the configured limit), we'll try more aggressively to find another thread. Instead of just opportunistically looking for one half as loaded, now if the current thread has more than 1% more active checks than another one, or has more than a configured limit of concurrent running checks, it will search for a more suitable thread among 3 other random ones in order to migrate the check there. The number of migrations remains very low (~1%) and the checks load very fair across all threads (~1% as well). The new parameter is called tune.max-checks-per-thread.	2023-09-01 08:26:06 +02:00
Willy Tarreau	00de9e0804	MINOR: checks: maintain counters of active checks per thread Let's keep two check counters per thread: - one for "active" checks, i.e. checks that are no more sleeping and are assigned to the thread. These include sleeping and running checks ; - one for "running" checks, i.e. those which are currently executing on the thread. By doing so, we'll be able to spread the health checks load a bit better and refrain from sending too many at once per thread. The counters are atomic since a migration increments the target thread's active counter. These numbers are reported in "show activity", which allows to check per thread and globally how many checks are currently pending and running on the system. Ideally, we should only consider checks in the process of establishing a connection since that's really the expensive part (particularly with OpenSSL 3.0). But the inner layers are really not suitable to doing this. However knowing the number of active checks is already a good enough hint.	2023-09-01 08:26:06 +02:00

... 2 3 4 5 6 ...

7454 Commits