haproxy

mirror of http://git.haproxy.org/git/haproxy.git/ synced 2024-12-15 16:04:37 +00:00

Author	SHA1	Message	Date
Willy Tarreau	b4d0cd02c1	[RELEASE] Released version 2.5-dev11 Released version 2.5-dev11 with the following main changes : - DEV: coccinelle: Add strcmp.cocci - CLEANUP: Apply strcmp.cocci - CI: Add `permissions` to GitHub Actions - CI: Clean up formatting in GitHub Action definitions - MINOR: add ::1 to predefined LOCALHOST acl - CLEANUP: assorted typo fixes in the code and comments - CLEANUP: Consistently `unsigned int` for bitfields - MEDIUM: resolvers: lower-case labels when converting from/to DNS names - MEDIUM: resolvers: replace bogus resolv_hostname_cmp() with memcmp() - MINOR: jwt: Empty the certificate tree during deinit - MINOR: jwt: jwt_verify returns negative values in case of error - MINOR: jwt: Do not rely on enum order anymore - BUG/MEDIUM: stream: Keep FLT_END analyzers if a stream detects a channel error - MINOR: httpclient/cli: access should be only done from expert mode - DOC: management: doc about the CLI httpclient - BUG/MEDIUM: tcpcheck: Properly catch early HTTP parsing errors - BUG/MAJOR: dns: tcp session can remain attached to a list after a free - BUG/MAJOR: dns: attempt to lock globaly for msg waiter list instead of use barrier - CLEANUP: dns: always detach the appctx from the dns session on release - DEBUG: dns: add a few more BUG_ON at sensitive places - BUG/MAJOR: resolvers: add other missing references during resolution removal - CLEANUP: resolvers: do not export resolv_purge_resolution_answer_records() - BUILD: resolvers: avoid a possible warning on null-deref - BUG/MEDIUM: resolvers: always check a valid item in query_list - CLEANUP: always initialize the answer_list - CLEANUP: resolvers: simplify resolv_link_resolution() regarding requesters - CLEANUP: resolvers: replace all LIST_DELETE with LIST_DEL_INIT - MEDIUM: resolvers: use a kill list to preserve the list consistency - MEDIUM: resolvers: remove the last occurrences of the "safe" argument - BUG/MEDIUM: checks: fix the starting thread for external checks - MEDIUM: resolvers: replace the answer_list with a (flat) tree - MEDIUM: resolvers: hash the records before inserting them into the tree - BUG/MAJOR: buf: fix varint API post- vs pre- increment - OPTIM: resolvers: move the eb32 node before the data in the answer_item - MINOR: list: add new macro LIST_INLIST_ATOMIC() - OPTIM: dns: use an atomic check for the list membership - BUG/MINOR: task: do not set TASK_F_USR1 for no reason - BUG/MINOR: mux-h2: do not prevent from sending a final GOAWAY frame - MINOR: connection: add a new CO_FL_WANT_DRAIN flag to force drain on close - MINOR: mux-h2: perform a full cycle shutdown+drain on close - CLEANUP: resolvers: get rid of single-iteration loop in resolv_get_ip_from_response() - MINOR: quic: Increase the size of handshake RX UDP datagrams - BUG/MEDIUM: lua: fix memory leaks with realloc() on non-glibc systems - MINOR: memprof: report the delta between alloc and free on realloc() - MINOR: memprof: add one pointer size to the size of allocations - BUILD: fix compilation on NetBSD - MINOR: backend: add traces for idle connections reuse - BUG/MINOR: backend: fix improper insert in avail tree for always reuse - MINOR: backend: improve perf with tcp proxies skipping idle conns - MINOR: connection: remove unneeded memset 0 for idle conns	2021-10-22 19:40:44 +02:00
Amaury Denoyelle	8e358af8a3	MINOR: connection: remove unneeded memset 0 for idle conns Remove the zeroing of an idle connection node on remove from a tree. This is not needed and should improve slightly the performance of idle connection usage. Besides, it breaks the memory poisoning feature.	2021-10-22 17:29:25 +02:00
Amaury Denoyelle	926712ab2d	MINOR: backend: improve perf with tcp proxies skipping idle conns Skip the hash connection calcul when reuse must not be used in connect_server() : this is the case for TCP proxies. This should result in slightly better performance when using this use-case.	2021-10-22 17:28:29 +02:00
Amaury Denoyelle	aee4fdbd17	BUG/MINOR: backend: fix improper insert in avail tree for always reuse In connect_server(), if http-reuse always is set, the backend connection is inserted into the available tree as soon as created. However, the hash connection field is only set later at the end of the function. This seems to have no impact as the hash connection field is always position before a lookup. However, this is not a proper usage of ebmb API. Fix this by setting the hash connection field before the insertion into the avail tree. This must be backported up to 2.4.	2021-10-22 17:26:22 +02:00
Amaury Denoyelle	1252b6f951	MINOR: backend: add traces for idle connections reuse Add traces in connect_server() to debug idle connection reuse. These are attached to stream trace module, as it's already in use in backend.c with the macro TRACE_SOURCE.	2021-10-22 17:21:14 +02:00
Amaury Denoyelle	28c5b3c0bc	BUILD: fix compilation on NetBSD Use include file <sys/time.h> to fix compilation error with timeval in some files. This is as reported as 'man 7 system_data_types'. The build error is reported on NetBSD 9.2. This should be backported up to 2.2.	2021-10-22 17:04:35 +02:00
Willy Tarreau	1de51eb727	MINOR: memprof: add one pointer size to the size of allocations The current model causes an issue when trying to spot memory leaks, because malloc(0) or realloc(0) do not count as allocations since we only account for the application-usable size. This is the problem that made issue #1406 not to appear as a leak. What we're doing now is to account for one extra pointer (the one that memory allocators usually place before the returned area), so that a malloc(0) will properly account for 4 or 8 bytes. We don't need something exact, we just need something non-zero so that a realloc(X) followed by a realloc(0) without a free() gives a small non-zero result. It was verified that the results are stable including in the presence of lots of malloc/realloc/free as happens when stressing Lua. It would make sense to backport this to 2.4 as it helps in bug reports.	2021-10-22 16:40:09 +02:00
Willy Tarreau	8cce4d79ff	MINOR: memprof: report the delta between alloc and free on realloc() realloc() calls are painful to analyse because they have two non-zero columns and trying to spot a leaking one requires a bit of scripting. Let's simply append the delta at the end of the line when alloc and free are non-nul. It would be useful to backport this to 2.4 to help with bug reports.	2021-10-22 16:40:09 +02:00
Willy Tarreau	a5efdff93c	BUG/MEDIUM: lua: fix memory leaks with realloc() on non-glibc systems In issue #1406, Lev Petrushchak reported a nasty memory leak on Alpine since haproxy 2.4 when using Lua, that memory profiling didn't detect. After inspecting the code and Lua's code, it appeared that Lua's default allocator does an explicit free() on size zero, while since 2.4 commit `d36c7fa5e` ("MINOR: lua: simplify hlua_alloc() to only rely on realloc()"), haproxy only calls realloc(ptr,0) that performs a free() on glibc but not on other systems as it's not required by POSIX... This patch reinstalls the explicit test for nsize==0 to call free(). Thanks to Lev for the very documented report, and to Tim for the links to a musl thread on the same subject that confirms the diagnostic. This must be backported to 2.4.	2021-10-22 16:40:09 +02:00
Frédéric Lécaille	46be7e92b4	MINOR: quic: Increase the size of handshake RX UDP datagrams Some browsers may send Initial packets with sizes greater than 1252 bytes (QUIC_INITIAL_IPV4_MTU). Let us increase this size limit up to 2048 bytes. Also use this size for "max_udp_payload_size" transport parameter to limit the size of the datagrams we want to receive.	2021-10-22 15:48:19 +02:00
Willy Tarreau	dbb0bb59e3	CLEANUP: resolvers: get rid of single-iteration loop in resolv_get_ip_from_response() In issue 1424 Coverity reports that the loop increment is unreachable, which is true, the list_for_each_entry() was replaced with a for loop, but it was already not needed and was instead used as a convenient construct for a single iteration lookup. Let's get rid of all this now and replace the loop with an "if" statement.	2021-10-22 08:34:14 +02:00
Willy Tarreau	0b22247606	MINOR: mux-h2: perform a full cycle shutdown+drain on close While in H1 we can usually close quickly, in H2 a client might be sending window updates or anything while we're sending a GOAWAY and the pending data in the socket buffers at the moment the close() is performed on the socket results in the output data being lost and an RST being emitted. One example where this happens easily is with h2spec, which randomly reports connection resets when waiting for a GOAWAY while haproxy sends it, as seen in issue #1422. With h2spec it's not window updates that are causing this but the fact that h2spec has to upload the payload that comes with invalid frames to accommodate various implementations, and does that in two different segments. When haproxy aborts on the invalid frame header, the payload was not yet received and causes an RST to be sent. Here we're dealing with this two ways: - we perform a shutdown(WR) on the connection to forcefully push pending data on a front connection after the xprt is shut and closed ; - we drain pending data - then we close This totally solves the issue with h2spec, and the extra cost is very low, especially if we consider that H2 connections are not set up and torn down often. This issue was never observed with regular clients, most likely because this pattern does not happen in regular traffic. After more testing it could make sense to backport this, at least to avoid reporting errors on h2spec tests.	2021-10-21 22:24:31 +02:00
Willy Tarreau	20b622e04b	MINOR: connection: add a new CO_FL_WANT_DRAIN flag to force drain on close Sometimes we'd like to do our best to drain pending data before closing in order to save the peer from risking to receive an RST on close. This adds a new connection flag CO_FL_WANT_DRAIN that is used to trigger a call to conn_ctrl_drain() from conn_ctrl_close(), and the sock_drain() function ignores fd_recv_ready() if this flag is set, in order to catch latest data. It's not used for now.	2021-10-21 21:48:23 +02:00
Willy Tarreau	e6dc7a0129	BUG/MINOR: mux-h2: do not prevent from sending a final GOAWAY frame Some checks were added by commit `9a3d3fcb5` ("BUG/MAJOR: mux-h2: Don't try to send data if we know it is no longer possible") to make sure we don't loop forever trying to send data that cannot leave. But one of the conditions there is not correct, the one relying on H2_CS_ERROR2. Indeed, this state indicates that the error code was serialized into the mux buffer, and since the test is placed before trying to send the data to the socket, if the connection states only contains a GOAWAY frame, it may refrain from sending and may close without sending anything. It's not dramatic, as GOAWAY reports connection errors in situations where delivery is not even certain, but it's cleaner to make sure the error is properly sent, and it avoids upsetting h2spec, as seen in github issue #1422. Given that the patch above was backported as far as 1.8, this patch will also have to be backported that far. Thanks to Ilya for reporting this one.	2021-10-21 17:37:22 +02:00
Willy Tarreau	3193eb9907	BUG/MINOR: task: do not set TASK_F_USR1 for no reason This applicationn specific flag was added in 2.4-dev by commit `6fa8bcdc7` ("MINOR: task: add an application specific flag to the state: TASK_F_USR1") to help preserve a the idle connections status across wakeup calls. While the code to do this was OK for tasklets, it was wrong for tasks, as in an effort not to lose it when setting the RUNNING flag (that tasklets don't have), it ended up being inconditionally set. It just happens that for now no regular tasks use it, only tasklets. This fix makes sure we always atomically perform (state & flags \| running) there, using a CAS. It also does it for tasklets because it was possible to lose some such flags if set by another thread, even though this should not happen with current code. In order to make the code more readable (and avoid the previous mistake of repeated flags in the bit field), a new TASK_PERSISTENT aggregate was declared in task.h for this. In practice the CAS is cheap here because task states are stable or convergent so the loop will almost never be taken. This should be backported to 2.4.	2021-10-21 16:17:29 +02:00
Willy Tarreau	dde1b4499a	OPTIM: dns: use an atomic check for the list membership The crash that was fixed by commit `7045590d8` ("BUG/MAJOR: dns: attempt to lock globaly for msg waiter list instead of use barrier") was now completely analysed and confirmed to be partially a result of the debugging code added to LIST_INLIST(), which was looking at both pointers and their reciprocals, and that, if used in a concurrent context, could perfectly return false if a neighbor was being added or removed while the current one didn't change, allowing the LIST_APPEND to fail. As the LIST API was not designed to be used in a concurrent context, we should not rely on LIST_INLIST() but on the newly introduced LIST_INLIST_ATOMIC(). This patch simply reverts the commit above to switch to the new test, saving a lock during potentially long operations. It was verified that the check doesn't fail anymore. It is unsure what the performance impact of the fix above could be in some contexts. If any performance regression is observed, it could make sense to backport this patch, along with the previous commit introducing the LIST_INLIST_ATOMIC() macro.	2021-10-21 15:28:42 +02:00
Willy Tarreau	c79f014972	MINOR: list: add new macro LIST_INLIST_ATOMIC() This macro is similar to LIST_INLIST() except that it is guaranteed to perform the test atomically, so that even if LIST_INLIST() is intrumented with debugging code to perform extra consistency checks, it will not fail when used in the context of barriers and atomic ops.	2021-10-21 15:28:24 +02:00
Willy Tarreau	9628c42284	OPTIM: resolvers: move the eb32 node before the data in the answer_item perf top shows that we spend a lot of time trying to read item->type in the lookup loop, because the node is placed after the very long name, so when the node is found, no data is in the cache yet. Let's simply move the node upper in the struct. This results in the CPU usage of resolv_validate_dns_response() to drop by 4 points.	2021-10-21 15:28:24 +02:00
Willy Tarreau	dd362b7b24	BUG/MAJOR: buf: fix varint API post- vs pre- increment A bogus test in b_get_varint(), b_put_varint(), b_peek_varint() shifts the end of the buffer by one byte. Since the bug is the same in the read and write functions, the buffer contents remain compatible, which explains why this bug was not detected earlier. But if the buffer ends on an aligned address or page, it can result in a one-byte overflow which will typically cause a crash or an inconsistent behavior. This API is only used by rings (e.g. for traces and boot messages) and by DNS responses, so the probability to hit it is extremely low, but a crash on boot was observed. This must be backported to 2.2.	2021-10-21 15:28:24 +02:00
Willy Tarreau	dcb696cd31	MEDIUM: resolvers: hash the records before inserting them into the tree We're using an XXH32() on the record to insert it into or look it up from the tree. This way we don't change the rest of the code, the comparisons are still made on all fields and the next node is visited on mismatch. This also allows to continue to use roundrobin between identical nodes. Just doing this is sufficient to see the CPU usage go down from ~60-70% to 4% at ~2k DNS requests per second for farm with 300 servers. A larger config with 12 backends of 2000 servers each shows ~8-9% CPU for 6-10000 DNS requests per second. It would probably be possible to go further with multiple levels of indexing but it's not worth it, and it's important to remember that tree nodes take space (the struct answer_list went back from 576 to 600 bytes).	2021-10-21 08:29:02 +02:00
Willy Tarreau	7893ae117f	MEDIUM: resolvers: replace the answer_list with a (flat) tree With SRV records, a huge amount of time is spent looking for records by walking long lists. It is possible to reduce this by indexing values in trees instead. However the whole code relies a lot on the list ordering, and even implements some round-robin on it to distribute IP addresses to servers. This patch starts carefully by replacing the list with a an eb32 tree that is still used like a list, with a constant key 0. Since ebtrees preserve insertion order for duplicates, the tree walk visits the nodes in the exact same order it did with the lists. This allows to implement the required infrastructure without changing the behavior.	2021-10-21 08:02:08 +02:00
Willy Tarreau	a89c19127d	BUG/MEDIUM: checks: fix the starting thread for external checks When cleaning up the code to remove most explicit task masks in commit `beeabf531` ("MINOR: task: provide 3 task_new_* wrappers to simplify the API"), a mistake was done with the external checks where the call does task_new_on(1) instead of task_new_on(0) due to the confusion with the previous mask 1. No backport is needed as that's only 2.5-dev.	2021-10-20 18:43:30 +02:00
Willy Tarreau	6878f80427	MEDIUM: resolvers: remove the last occurrences of the "safe" argument This one was used to indicate whether the callee had to follow particularly safe code path when removing resolutions. Since the code now uses a kill list, this is not needed anymore.	2021-10-20 17:54:27 +02:00
Willy Tarreau	f766ec6b53	MEDIUM: resolvers: use a kill list to preserve the list consistency When scanning resolution.curr it's possible to try to free some resolutions which will themselves result in freeing other ones. If one of these other ones is exactly the next one in the list, the list walk visits deleted nodes and causes memory corruption, double-frees and so on. The approach taken using the "safe" argument to some functions seems to work but it's extremely brittle as it is required to carefully check all call paths from process_ressolvers() and pass the argument to 1 there to refrain from deleting entries, so the bug is very likely to come back after some tiny changes to this code. A variant was tried, checking at various places that the current task corresponds to process_resolvers() but this is also quite brittle even though a bit less. This patch uses another approach which consists in carefully unlinking elements from the list and deferring their removal by placing it in a kill list instead of deleting them synchronously. The real benefit here is that the complexity only has to be placed where the complications are. A thread-local list is fed with elements to be deleted before scanning the resolutions, and it's flushed at the end by picking the first one until the list is empty. This way we never dereference the next element and do not care about its presence or not in the list. One function, resolv_unlink_resolution(), is exported and used outside, so it had to be modified to use this list as well. Internal code has to use _resolv_unlink_resolution() instead.	2021-10-20 17:54:22 +02:00
Willy Tarreau	aae7320b0d	CLEANUP: resolvers: replace all LIST_DELETE with LIST_DEL_INIT The code as it is uses crossed lists between many elements, and at many places the code relies on list iterators or emptiness checks, which does not work with only LIST_DELETE. Further, it is quite difficult to place debugging code and checks in the current situation, and gdb is helpless. This code replaces all LIST_DELETE calls with LIST_DEL_INIT so that it becomes possible to trust the lists.	2021-10-20 17:54:14 +02:00
Willy Tarreau	239675e4a9	CLEANUP: resolvers: simplify resolv_link_resolution() regarding requesters This function allocates requesters by hand for each and every type. This is complex and error-prone, and it doesn't even initialize the list part, leaving dangling pointers that complicate debugging. This patch introduces a new function resolv_get_requester() that either returns the current pointer if valid or tries to allocate a new one and links it to its destination. Then it makes use of it in the function above to clean it up quite a bit. This allows to remove complicated but unneeded tests.	2021-10-20 17:54:01 +02:00
Willy Tarreau	48664c048d	CLEANUP: always initialize the answer_list Similar to the previous patch, the answer's list was only initialized the first time it was added to a list, leading to bogus outdated pointer to appear when debugging code is added around it to watch it. Let's make sure it's always initialized upon allocation.	2021-10-20 17:53:54 +02:00
Willy Tarreau	25e010906a	BUG/MEDIUM: resolvers: always check a valid item in query_list The query_list is physically stored in the struct resolution itself, so we have a list that contains a list to items stored in itself (and there is a single item). But the list is first initialized in resolv_validate_dns_response(), while it's scanned in resolv_process_responses() later after calling the former. First, this results in crashes as soon as the code is instrumented a little bit for debugging, as elements from a previous incarnation can appear. But in addition to this, the presence of an element is checked by verifying that the return of LIST_NEXT() is not NULL, while it may never be NULL even for an empty list, resulting in bugs or crashes if the number of responses does not match the list's contents. This is easily triggered by testing for the list non-emptiness outside of the function. Let's make sure the list is always correct, i.e. it's initialized to an empty list when the structure is allocated, elements are checked by first verifying the list is not empty, they are deleted once checked, and in any case at end so that there are no dangling pointers. This should be backported, but only as long as the patch fits without modifications, as adaptations can be risky there given that bugs tend to hide each other.	2021-10-20 17:53:35 +02:00
Willy Tarreau	10c1a8c3bd	BUILD: resolvers: avoid a possible warning on null-deref Depending on the code that precedes the loop, gcc may emit this warning: src/resolvers.c: In function 'resolv_process_responses': src/resolvers.c:1009:11: warning: potential null pointer dereference [-Wnull-dereference] 1009 \| if (query->type != DNS_RTYPE_SRV && flags & DNS_FLAG_TRUNCATED) { \| ~~~~~^~~~~~ However after carefully checking, r_res->header.qdcount it exclusively 1 when reaching this place, which forces the for() loop to enter for at least one iteration, and <query> to be set. Thus there's no code path leading to a null deref. It's possibly just because the assignment is too far and the compiler cannot figure that the condition is always OK. Let's just mark it to please the compiler.	2021-10-20 17:53:35 +02:00
Willy Tarreau	2acc160c05	CLEANUP: resolvers: do not export resolv_purge_resolution_answer_records() This code is dangerous enough that we certainly don't want external code to ever approach it, let's not export unnecessary functions like this one. It was made static and a comment was added about its purpose.	2021-10-20 17:52:50 +02:00
Willy Tarreau	2a67aa0a51	BUG/MAJOR: resolvers: add other missing references during resolution removal There is a fundamental design bug in the resolvers code which is that a list of active resolutions is being walked to try to delete outdated entries, and that the code responsible for removing them also removes other elements, including the next one which will be visited by the list iterator. This randomly causes a use-after-free condition leading to crashes, infinite loops and various other issues such as random memory corruption. A first fix for the memory fix for this was brought by commit `0efc0993e` ("BUG/MEDIUM: resolvers: Don't release resolution from a requester callbacks"). While preparing for more fixes, some code was factored by commit `11c6c3965` ("MINOR: resolvers: Clean server in a dedicated function when removing a SRV item"), which inadvertently passed "0" as the "safe" argument all the time, missing one case of removal protection, instead of always using "safe". This patch reintroduces the correct argument. This must be backported with all fixes above. Cc: Christopher Faulet <cfaulet@haproxy.com>	2021-10-20 17:52:36 +02:00
Willy Tarreau	62e467c667	DEBUG: dns: add a few more BUG_ON at sensitive places A few places have been caught triggering late bugs recently, always cases of use-after-free because a freed element was still found in one of the lists. This patch adds a few checks for such elements in dns_session_free() before the final pool_free() and dns_session_io_handler() before adding elements to lists to make sure they remain consistent. They do not trigger anymore now.	2021-10-20 17:52:17 +02:00
Willy Tarreau	b56a878950	CLEANUP: dns: always detach the appctx from the dns session on release When dns_session_release() calls dns_session_free(), it was shown that it might still be attached there: Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00000000006437d7 in dns_session_free (ds=0x7f895439e810) at src/dns.c:768 768 BUG_ON(!LIST_ISEMPTY(&ds->ring.waiters)); [Current thread is 1 (Thread 0x7f895bbe2700 (LWP 31792))] (gdb) bt #0 0x00000000006437d7 in dns_session_free (ds=0x7f895439e810) at src/dns.c:768 #1 0x0000000000643ab8 in dns_session_release (appctx=0x7f89545a4ff0) at src/dns.c:805 #2 0x000000000062e35a in si_applet_release (si=0x7f89545a5550) at include/haproxy/stream_interface.h:236 #3 0x000000000063150f in stream_int_shutw_applet (si=0x7f89545a5550) at src/stream_interface.c:1697 #4 0x0000000000640ab8 in si_shutw (si=0x7f89545a5550) at include/haproxy/stream_interface.h:437 #5 0x0000000000643103 in dns_session_io_handler (appctx=0x7f89545a4ff0) at src/dns.c:725 #6 0x00000000006d776f in task_run_applet (t=0x7f89545a5100, context=0x7f89545a4ff0, state=81924) at src/applet.c:90 #7 0x000000000068b82b in run_tasks_from_lists (budgets=0x7f895bbbf5c0) at src/task.c:611 #8 0x000000000068c258 in process_runnable_tasks () at src/task.c:850 #9 0x0000000000621e61 in run_poll_loop () at src/haproxy.c:2636 #10 0x0000000000622328 in run_thread_poll_loop (data=0x8d7440 <ha_thread_info+64>) at src/haproxy.c:2807 #11 0x00007f895c54a06b in start_thread () from /lib64/libpthread.so.0 #12 0x00007f895bf3772f in clone () from /lib64/libc.so.6 (gdb) p &ds->ring.waiters $1 = (struct list ) 0x7f895439e8a8 (gdb) p ds->ring.waiters $2 = { n = 0x7f89545a5078, p = 0x7f89545a5078 } (gdb) p ds->ring.waiters->n $3 = (struct list ) 0x7f89545a5078 (gdb) p *ds->ring.waiters->n $4 = { n = 0x7f895439e8a8, p = 0x7f895439e8a8 } Let's always detach it before freeing so that it remains possible to check the dns_session's ring before releasing it, and possibly catch bugs.	2021-10-20 17:52:13 +02:00
Emeric Brun	7045590d8a	BUG/MAJOR: dns: attempt to lock globaly for msg waiter list instead of use barrier The barrier is insufficient here to protect the waiters list as we can definitely catch situations where ds->waiter shows an inconsistency whereby the element is not attached when entering the "if" block and is already attached when attaching it later. This patch uses a larger lock to maintain consistency. Without it the code would crash in 30-180 minutes under heavy stress, always showing the same problem (ds->waiter->n->p != &ds->waiter). Now it seems to always resist, suggesting that this was indeed the problem. This will have to be backported to 2.4.	2021-10-20 17:52:07 +02:00
Emeric Brun	d20dc21eec	BUG/MAJOR: dns: tcp session can remain attached to a list after a free Using tcp, after a session release and free, the session can remain attached to the list of sessions with a response message waiting for a commit (ds->waiter). This results to a use after free of this session. Also, on some error path and after free, a session could remain attached to the lists of available idle/free sessions (ds->list). This patch ensure to remove the session from those external lists before a free. This patch should be backported to all version including the dns over tcp (2.4)	2021-10-20 17:52:02 +02:00
Christopher Faulet	d16e7dd0e4	BUG/MEDIUM: tcpcheck: Properly catch early HTTP parsing errors When an HTTP response is parsed, early parsing errors are not properly handled. When this error is reported by the multiplexer, nothing is copied into the input buffer. The HTX message remains empty but the HTX_FL_PARSING_ERROR flag is set. In addition CS_FL_EOI is set on the conn-stream. This last flag must be handled to prevent subscription for receive events. Otherwise, in the best case, a L7 timeout error is reported. But a transient loop is also possible if a shutdown is received because the multiplexer notifies the check of the event while the check never handles it and waits for more data. Now, if CS_FL_EOI flag is set on the conn-stream, expect rules are evaluated. Any error must be handled there. Thanks to @kazeburo for his valuable report. This patch should fix the issue #1420. It must be backported at least to 2.4. On 2.3 and 2.2, there is no loop but the wrong error is reported (empty response instead of invalid one). Thus it may also be backported as far as 2.2.	2021-10-20 14:35:38 +02:00
William Lallemand	b175c23812	DOC: management: doc about the CLI httpclient Add the doc about the CLI httpclient.	2021-10-19 15:02:42 +02:00
William Lallemand	34b3a93655	MINOR: httpclient/cli: access should be only done from expert mode Only enable the usage of the CLI HTTP client in expert mode.	2021-10-19 15:02:42 +02:00
Christopher Faulet	813f913444	BUG/MEDIUM: stream: Keep FLT_END analyzers if a stream detects a channel error If a channel error (READ_ERRO\|READ_TIMEOUT\|WRITE_ERROR\|WRITE_TIMEOUT) is detected by the stream, in process_stream(), FLT_END analyers must be preserved. It is important to be sure to ends filter analysis and be able to release the stream. First, filters may release some ressources when FLT_END analyzers are called. Then, the CF_FL_ANALYZE flag is used to sync end of analysis for the request and the response. If FLT_END analyzer is ignored on a channel, this may block the other side and freeze the stream. This patch must be backported to all stable versions	2021-10-19 11:29:30 +02:00
Remi Tricot-Le Breton	8abed17a34	MINOR: jwt: Do not rely on enum order anymore Replace the test based on the enum value of the algorithm by an explicit switch statement in case someone reorders it for some reason (while still managing not to break the regtest).	2021-10-18 16:02:31 +02:00
Remi Tricot-Le Breton	1c891bcc90	MINOR: jwt: jwt_verify returns negative values in case of error In order for all the error return values to be distributed on the same side (instead of surrounding the success error code), the return values for errors other than a simple verification failure are switched to negative values. This way the result of the jwt_verify converter can be compared strictly to 1 as well relative to 0 (any <= 0 return value is an error). The documentation was also modified to discourage conversion of the return value into a boolean (which would definitely not work).	2021-10-18 16:02:29 +02:00
Remi Tricot-Le Breton	0b24d2fa45	MINOR: jwt: Empty the certificate tree during deinit The tree in which the JWT certificates are stored was not emptied. It is now done during deinit.	2021-10-18 16:02:28 +02:00
Willy Tarreau	75cc65356f	MEDIUM: resolvers: replace bogus resolv_hostname_cmp() with memcmp() resolv_hostname_cmp() is bogus, it is applied on labels and not plain names, but doesn't make any distinction between length prefixes and characters, so it compares the labels lengths via tolower() as well. The only reason for which it doesn't break is because labels cannot be larger than 63 bytes, and that none of the common encoding systems have upper case letters in the lower 63 bytes, that could be turned into a different value via tolower(). Now that all labels are stored in lower case, we don't need to burn CPU cycles in tolower() at run time and can use memcmp() instead of resolv_hostname_cmp(). This results in a ~22% lower CPU usage on large farms using SRV records: before: 18.33% haproxy [.] resolv_validate_dns_response 10.58% haproxy [.] process_resolvers 10.28% haproxy [.] resolv_hostname_cmp 7.50% libc-2.30.so [.] tolower 46.69% total after: 24.73% haproxy [.] resolv_validate_dns_response 7.78% libc-2.30.so [.] __memcmp_avx2_movbe 3.65% haproxy [.] process_resolvers 36.16% total	2021-10-18 10:47:36 +02:00
Willy Tarreau	814889c28a	MEDIUM: resolvers: lower-case labels when converting from/to DNS names The whole code relies on performing case-insensitive comparison on lookups, which is extremely inefficient. Let's make sure that all labels to be looked up or sent are first converted to lower case. Doing so is also the opportunity to eliminate an inefficient memcpy() in resolv_dn_label_to_str() that essentially runs over a few unaligned bytes at once. As a side note, that call was dangerous because it relied on a sign-extended size taken from a string that had to be sanitized first. This is tagged medium because while this is 100% safe, it may cause visible changes on the wire at the packet level and trigger bugs in test programs.	2021-10-18 09:14:02 +02:00
Tim Duesterhus	f480768d31	CLEANUP: Consistently `unsigned int` for bitfields see `6a0dd73390` Found using GitHub's CodeQL scan.	2021-10-18 09:13:24 +02:00
Ilya Shipitsin	bd6b4be721	CLEANUP: assorted typo fixes in the code and comments This is 27th iteration of typo fixes	2021-10-18 07:26:19 +02:00
Bj�rn Jacke	20d0f50b00	MINOR: add ::1 to predefined LOCALHOST acl The "LOCALHOST" ACL currently matches only 127.0.0.1/8. This adds the IPv6 "::1" address to the supported patterns.	2021-10-18 07:21:28 +02:00
Tim Duesterhus	662896e68e	CI: Clean up formatting in GitHub Action definitions This patch cleans up the formatting within the .yml definition files for GitHub Actions to ensure a consistent look across all actions.	2021-10-18 07:17:04 +02:00
Tim Duesterhus	89c9d0a169	CI: Add `permissions` to GitHub Actions This change locks down the permissions of the access token in GitHub Actions to only allow reading the repository contents and nothing else. see https://docs.github.com/en/actions/security-guides/automatic-token-authentication#permissions-for-the-github_token	2021-10-18 07:17:04 +02:00
Tim Duesterhus	c5aa113d80	CLEANUP: Apply strcmp.cocci This fixes the use of the various *cmp functions to use != 0 or == 0.	2021-10-18 07:17:04 +02:00

1 2 3 4 5 ...

15880 Commits