haproxy

mirror of http://git.haproxy.org/git/haproxy.git/ synced 2025-02-17 19:16:56 +00:00

Author	SHA1	Message	Date
William Lallemand	dc1c0a169c	MINOR: cli: add an 'echo' command Add an echo command to write text over the CLI output.	2024-10-24 17:20:57 +02:00
William Lallemand	944a224358	MINOR: cli: remove non-printable characters from 'debug dev fd' When using 'debug dev fd', the output of laddr and raddr can contain some garbage. This patch replaces any control or non-printable character by a '.'.	2024-10-24 16:45:11 +02:00
Willy Tarreau	4adb2d864d	MINOR: debug: do not limit backtraces to stuck threads Historically for size limitation reasons, we would only dump the backtrace of stuck threads. The problem is that when triggering a panic or other reasons, we have no backtrace, which effectively limits it to the watchdog timer. It's also visible in "show threads" which used to report backtraces for all threads in 2.4 and displays none nowadays, making its use much more limited. A first approach could be to just dump the thread that triggers the panic (in addition to stuck threads). But that remains quite limited since "show threads" would still display nothing. This patch takes a better approach consisting in dumping all non-idle threads. This way the output is less polluted that with the older approach (no need to dump all those waiting in the poller), and all active threads are visible, in panics as well as in "show threads". As such, the CLI command "debug dev panic" now dmups backtraces again. This is already a benefit which will ease testing of various locations against the ability to resolve useful symbols.	2024-10-24 16:12:46 +02:00
Willy Tarreau	e5fccfe0b6	MINOR: debug: store important pointers in post_mortem Dealing with a core and a stripped executable is a pain when it comes to finding pools, proxies or thread contexts. Let's put a pointer to these heads and arrays in the post_mortem struct for easier location. Other critical lists like this could possibly benefit from being added later. Here we now have: - tgroup_info - thread_info - tgroup_ctx - thread_ctx - pools - proxies Example: $ objdump -h haproxy\|grep post 34 _post_mortem 000014b0 0000000000cfd400 0000000000cfd400 008fc400 2*8 (gdb) set $pm=(struct post_mortem)0x0000000000cfd400 (gdb) p $pm->tgroup_ctx[0] $8 = { threads_harmless = 254, threads_idle = 254, stopping_threads = 0, timers = { b = {0x0, 0x0} }, niced_tasks = 0, __pad = 0xf5662c <ha_tgroup_ctx+44> "", __end = 0xf56640 <ha_tgroup_ctx+64> "" } (gdb) info thr Id Target Id Frame * 1 Thread 0x7f9e7706a440 (LWP 21169) 0x00007f9e76a9c868 in raise () from /lib64/libc.so.6 2 Thread 0x7f9e76a60640 (LWP 21175) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6 3 Thread 0x7f9e7613d640 (LWP 21176) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6 4 Thread 0x7f9e7493a640 (LWP 21179) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6 5 Thread 0x7f9e7593c640 (LWP 21177) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6 6 Thread 0x7f9e7513b640 (LWP 21178) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6 7 Thread 0x7f9e6ffff640 (LWP 21180) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6 8 Thread 0x7f9e6f7fe640 (LWP 21181) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6 (gdb) p/x $pm->thread_info[0].pth_id $12 = 0x7f9e7706a440 (gdb) p/x $pm->thread_info[1].pth_id $13 = 0x7f9e76a60640 (gdb) set $px = *$pm->proxies while ($px != 0) printf "%#lx %s served=%u\n", $px, $px->id, $px->served set $px = ($px)->next end 0x125eda0 GLOBAL served=0 0x12645b0 stats served=0 0x1266940 comp served=0 0x1268e10 comp_bck served=0 0x1260cf0 <OCSP-UPDATE> served=0 0x12714c0 <HTTPCLIENT> served=0	2024-10-24 16:12:46 +02:00
Willy Tarreau	93c3f2a0b4	MINOR: debug: place the post_mortem struct in its own section. Placing it in its own section will ease its finding, particularly in gdb which is too dumb to find anything in memory. Now it will be sufficient to issue this: $ gdb -ex "info files" -ex "quit" ./haproxy core 2>/dev/null \|grep _post_mortem 0x0000000000cfd300 - 0x0000000000cfe780 is _post_mortem or this: $ objdump -h haproxy\|grep post 34 _post_mortem 00001480 0000000000cfd300 0000000000cfd300 008fc300 2*8 to spot the symbol's address. Then it can be read this way: (gdb) p (struct post_mortem *)0x0000000000cfd300	2024-10-24 16:12:46 +02:00
Willy Tarreau	989b02e193	MINOR: debug: place a magic pattern at the beginning of post_mortem In order to ease finding of the post_mortem struct in core dumps, let's make it start with a recognizable pattern of exactly 32 chars (to preserve alignment): "POST-MORTEM STARTS HERE+7654321\0" It can then be found like this from gdb: (gdb) find 0x000000012345678, 0x0000000100000000, 'P','O','S','T','-','M','O','R','T','E','M' 0xcfd300 <post_mortem> 1 pattern found. Or easier with any other more practical tool (who as ever used "find" in gdb, given that it cannot iterate over maps and is 100% useless?).	2024-10-24 16:12:46 +02:00
Willy Tarreau	fba48e1c40	MINOR: pools: export the pools variable We want it to be accessible from debuggers for inspection and it's currently unavailable. Let's start by exporting it as a first step.	2024-10-24 16:12:46 +02:00
Willy Tarreau	db76949cff	CLEANUP: mux-h2: remove the unused "full" variable in h2_frt_transfer_data() During 11th and 12th iteration of the development cycle for the H2 auto rx window, several approaches were attempted to figure if another buffer could be allocated or not. One of them consisted in looping back to the beginning of the function requesting a new buffer slot and getting one if the buffer was either apparently or confirmed full. The latest one consisted in directly allocating the next buffer from the two places where it's found to be proven full, instead of checking with the now defunct h2s_may_get_rxbuf() if we were allowed to get once an loop. That approach was retained. In this case the "full" variabled is no longer needed, so let's get rid of it because the construct looks bogus and confuses coverity (and possibly code readers as the intent is unclear compared to the code).	2024-10-24 16:12:46 +02:00
Willy Tarreau	f163cbfb7f	BUILD: debug: silence a build warning with threads disabled Commit `091de0f9b2` ("MINOR: debug: slightly change the thread_dump_pointer signification") caused the following warning to be emitted when threads are disabled: src/debug.c: In function 'ha_thread_dump_one': src/debug.c:359:9: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] Let's just disguise the pointer to silence it. It should be backported where the patch above was backported, since it was part of a series aiming at making thread dumps more exploitable from core dumps.	2024-10-24 16:12:46 +02:00
William Lallemand	5db761f709	MINOR: mworker/cli: 'show proc debug' for old workers Add FD details for old workers in 'show proc debug'.	2024-10-24 14:47:28 +02:00
William Lallemand	b49ddae21b	MINOR: mworker/cli: remove comment line for program when useless Remove the '# programs' line on 'show proc' output when there are no program.	2024-10-24 14:39:41 +02:00
William Lallemand	84640aaa2a	MINOR: mworker/cli: add 'debug' to 'show proc' This patch adds a 'debug' parameter to the 'show proc' command of the master CLI. It allows to show debug details about the processes. Example: echo 'show proc debug' \| socat /tmp/master.sock - \#<PID> <type> <reloads> <uptime> <version> <ipc_fd[0]> <ipc_fd[1]> 391999 master 0 [failed: 0] 0d00h00m02s 3.1-dev10-b9095a-63 5 6 \# workers 392001 worker 0 0d00h00m02s 3.1-dev10-b9095a-63 3 -1 \# programs	2024-10-24 14:23:27 +02:00
Christopher Faulet	362de90f3e	BUG/MINOR: stconn: Don't disable 0-copy FF if EOS was reported on consumer side There is no reason to disable the 0-copy data forwarding if an end-of-stream was reported on the consumer side. Indeed, the consumer will send data in this case. So there is no reason to check the read side here. This patch may be backported as far as 2.9.	2024-10-24 12:07:50 +02:00
Christopher Faulet	5970c6abec	BUG/MINOR: http-ana: Fix wrong client abort reports during responses forwarding When the response forwarding is aborted, we must not report a client abort if a EOS was seen on client side. On abort performed by the stream must be considered. This bug was introduced when the SHUTR was splitted in 2 flags. This patch must be backported as far as 2.8.	2024-10-24 12:07:50 +02:00
Christopher Faulet	fbc3de6e9e	BUG/MEDIUM: stconn: Report blocked send if sends are blocked by an error When some data must be sent to the endpoint but an error was previously reported, nothing is performed and we leave. But, in this case, the SC is not notified the sends are blocked. It is indeed an issue if the endpoint reports an error after consuming all data from the SC. In the endpoint the outgoing data are trashed because of the error, but on the SC, everything was sent, even if an error was also reported. Because of this bug, it is possible to have outgoing data blocked at the SC level but without any write timeout armed. In some cases, this may lead to blocking conditions where the stream is never closed. So now, when outgoing data cannot be sent because an previous error was triggered, a blocked send is reported. This way, it is possible to report a write timeout. This patch should fix the issue #2754. It must be backported as far as 2.8.	2024-10-24 11:46:33 +02:00
Amaury Denoyelle	7a02fcaf20	BUG/MEDIUM: server: fix race on servers_list during server deletion Each server is inserted in a global list named servers_list on new_server(). This list is then only used to finalize servers initialization after parsing. On dynamic server creation, there is no issue as new_server() is under thread isolation. However, when a server is deleted after its refcount reached zero, srv_drop() removes it from servers_list without lock protection. In the longterm, this can cause list corruption and crashes, especially if multiple adjacent servers are removed in parallel. To fix this, convert servers_list to a mt_list. This should not impact performance as servers_list is not used during runtime outside of server creation/deletion. This should fix github issue #2733. Thanks to Chris Staite who first found the issue here. This must be backported up to 2.6.	2024-10-24 11:35:57 +02:00
Amaury Denoyelle	116178563c	BUG/MINOR: server: fix dynamic server leak with check on failed init If a dynamic server is added with check or agent-check, its refcount is incremented after server keyword parsing. However, if add server fails at a later stage, refcount is only decremented once, which prevented the server to be fully released. This causes a leak with a server which is detached from most of the lists but still exits in the system. This bug is considered minor as only a few conditions may cause a failure in add server after check/agent-check initialization. This is the case if there is a naming collision or the dynamic ID cannot be generated. To fix this, simply decrement server refcount on add server error path if either check and/or agent-check are flagged as activated. This bug is related to github issue #2733. Thanks to Chris Staite who first found the leak. This must be backported up to 2.6.	2024-10-24 11:35:57 +02:00
Valentine Krasnobaeva	ddb829bb51	MINOR: mworker/cli: split mworker_cli_proxy_create There are two parts in mworker_cli_proxy_create(): allocating and setting up MASTER proxy and allocating and setting up servers on ipc_fd[0] of the sockpairs shared with workers. So, let's split mworker_cli_proxy_create() into two functions respectively. Each of them takes **errmsg as an argument to write an error message, which may be triggered by some subcalls. The content of this errmsg will allow to extend the final alert message shown to user, if these new functions will fail. The main goals of this split is to allow to move these two parts independantly in future and makes the code of haproxy initialization in haproxy.c more transparent.	2024-10-24 11:32:20 +02:00
Valentine Krasnobaeva	a0d727e069	CLEANUP: mworker: clean mworker_reexec Before refactoring master-worker architecture, resources to setup master CLI for the new worker process (shared sockpair, entry in proc_list) were created in init() before parsing the configuration and binding listening sockets. So, master during its re-exec has had to cleanup the new worker's ressources in a case, when it fails at some initialization step before the fork. Now fork happens very early and worker parses its configuration by itself. If it fails during the initialization stage, all clean ups (deleting the fds of the shared sockpair, proc_list cleanup) are performed in SIGCHLD handler up to catching the SIGCHLD corresponded to this new worker. So, there is no longer need to call mworker_cleanup_proc() in mworker_reexec(). As for mworker_cleanlisteners(), there is no longer need to call this function. Master parses now only "global" and "program" sections, so it allocates only MASTER proxy, which is stopped in mworker_reexec() by mworker_cli_proxy_stop(). Let's keep the definitions of mworker_cleanlisteners() and mworker_cleanup_proc() in mworker.c for the moment. We may reuse parts of its code later.	2024-10-24 11:32:20 +02:00
Valentine Krasnobaeva	4db0f69527	BUG/MINOR: mworker: show worker warnings in startup logs As master-worker fork happens now at early init stage and worker then parses its configuration and performs all initialization steps, let's duplicate startup logs ring for it, just before the moment when it enters in its pollong loop. Startup logs ring content is shown as an output of the "reload" master CLI command and we should be able to dump here worker initialization logs. Log messages are written in startup logs ring only, when mode MODE_STARTING is set (see print_message()). So, to be able to keep in startup logs the last worker alerts, let's withdraw MODE_STARTING and let's reset user messages context respectively just before entering in polling loop. This fix does not need to be backported as it is a part of previous patches from this version, which refactor master-worker architecture.	2024-10-24 11:32:20 +02:00
Valentine Krasnobaeva	5ee266b745	MINOR: error: simplify startup_logs_init_shm This patch simplifies the code of startup_logs_init_shm(). We no longer re-exec master process twice after each reload to free its unused memory, which it had to allocate, because it has parsed all configuration sections. So, there is no longer need to keep SHM fd opened between the first and the next reloads. We can completely remove HAPROXY_STARTUPLOGS_FD. In step_init_1() we continue to call startup_logs_init_shm() to open SHM and to allocate startup logs ring area within it. In master-worker mode, worker duplicates initial startup logs ring after sending its READY state to master. Sharing the same ring between two processes until the worker finishes its initialization allows to show at master CLI output worker's startup logs. During the next reload master process should free the memory allocated for the ring structure. Then after the execvp() it will reopen and map SHM area again and it will reallocate again the ring structure.	2024-10-24 11:32:20 +02:00
Valentine Krasnobaeva	e9c8e0efc9	MINOR: mworker: stop MASTER proxy listener on worker mcli sockpair After sending its "READY" status worker should not keep the access to MASTER proxy, thus, it shouldn't be able to send any other commands further to master process. To achieve this, let's stop in master context master CLI listener attached on the sockpair shared with worker. We do this just after receiving the worker's status message.	2024-10-24 11:32:20 +02:00
Valentine Krasnobaeva	3a5b28e00c	BUG/MINOR: mworker/cli: show master startup logs in recovery mode When master enters in recovery mode after unsuccessfull reload HAPROXY_LOAD_SUCCESS should be set as 0. Like this cli_io_handler_show_cli_sock() could dump in master CLI its warnings and alerts, saved in startup logs ring. No need to backport this fix, as this is related to the previous patches in this version to refactor master-worker architecture.	2024-10-24 11:32:20 +02:00
Willy Tarreau	401fb0e87a	MINOR: activity/memprofile: show per-DSO stats On systems where many libs are loaded, it's hard to track suspected leaks. Having a per-DSO summary makes it more convenient. That's what we're doing here by summarizing all calls per DSO before showing the total.	2024-10-24 10:49:21 +02:00
Christopher Faulet	c91745e3a4	BUG/MINOR: mux-h1: Fix conditions on pipe in some COUNT_IF() The previous commit contains a bug in some COUNT_IF() relying on the pipe inside the IOBUF. We must take care to have a pipe before checking its size. No backport needed.	2024-10-24 09:50:16 +02:00
Christopher Faulet	7e60928c9c	DEBUG: mux-h1: Add debug counters to track errors with in/out pending data Debug counters were added on all connection error when pending data remain blocked in the input or ouput buffers. The same is performed when the H1C is released, when the connection is closed and when a timeout is reached. Idea is to be able to count all cases where data are lost, especially the outgoing ones.	2024-10-24 08:18:55 +02:00
Willy Tarreau	1eb31d30fe	Revert "OPTIM: mux-h2: make h2_send() report more accurate wake up conditions" This reverts commit `9fbc01710a`. In 3.1-dev10, commit `9fbc01710a` ("OPTIM: mux-h2: make h2_send() report more accurate wake up conditions") leveraged the more accurate distinction between demux and recv to decide when to wake the tasklet up after a send. But other cases are needed. When we just need to wake the processing task up so that it itself wakes up other streams, for example because these ones are blocked. Indeed, a temporarily blocked stream may block other ones, which will never be woken up if the demux has nothing to do. In an ideal world we would check all cases where blocking flags were dropped. However it looks like this case after a send is probably the only one that deserves waking up the connection again. It's likely that in practice the MUX_MFULL flag was dropped and that it was that one that was blocking the send. In addition, dealing with these cases was not sufficient, as one case was encountered where dbuf was empty, subs=0, short_read still present while in FRH state... and the timeouts were still there (easily found with halog -tcn cD at a rate of 1-2 every 2 minutes roughly). Interestingly, in a dump, some MBUF_HAS_DATA were seen on an empty mbuf, so it means that certain conditions must be taken very carefully in the wakeup conditions. So overall this indicates that there remain subtle inconsistencies that this optimization is sensitive to. It may have to be revisited later but for now better revert it. No backport is needed. Annex: - first dump showing a dependency on WAIT_INLIST after h2_send(): 0x6dc2800: [23/Oct/2024:18:07:22.861247] id=1696 proto=tcpv4 flags=0x100c4a, conn_retries=0, conn_exp=<NEVER> conn_et=0x000 srv_conn=0x597a900, pend_pos=(nil) waiting=0 epoch=0 frontend=public (id=2 mode=http), listener=SSL (id=5) backend=gitweb-haproxy (id=6 mode=http) task=0x6e1d090 (state=0x00 nice=0 calls=23 rate=0 exp=2s tid=0(1/0) age=57s) txn=0x6e3f7c0 flags=0x43000 meth=1 status=200 req.st=MSG_DONE rsp.st=MSG_DATA req.f=0x4c rsp.f=0x2e scf=0x6dc33a0 flags=0x00002482 ioto=1m state=EST endp=CONN,0x6dc6c20,0x40405001 sub=3 rex=<NEVER> wex=3s rto=3s wto=3s iobuf.flags=0x00000000 .pipe=0 .buf=0@(nil)+0/0 h2s=0x6dc6c20 h2s.id=59 .st=HCR .flg=0x7001 .rxwin=32712 .rxbuf.c=0 .t=0@(nil)+0/0 .h=0@(nil)+0/0 .sc=0x6dc33a0(.flg=0x00002482 .app=0x6dc2800) .sd=0x6e83fd0(.flg=0x40405001) .subs=0x6dc33b8(ev=3 tl=0x6e22a20 tl.calls=10 tl.ctx=0x6dc33a0 tl.fct=sc_conn_io_cb) h2c=0x6e66570 h2c.st0=FRH .err=0 .maxid=77 .lastid=-1 .flg=0x2000e00 .nbst=2 .nbsc=2 .nbrcv=0 .glitches=0 .fctl_cnt=0 .send_cnt=2 .tree_cnt=2 .orph_cnt=0 .sub=1 .dsi=77 .dbuf=0@(nil)+0/0 .mbuf=[4..4\|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] .task=0x6dbdc60 .exp=<NEVER> co0=0x7f84881614b0 ctrl=tcpv4 xprt=SSL mux=H2 data=STRM target=LISTENER:0x2acb7c0 flags=0x80000300 fd=19 fd.state=121 updt=0 fd.tmask=0x1 scb=0x2a8da90 flags=0x00001211 ioto=1m state=EST endp=CONN,0x6e5a530,0x106c0001 sub=0 rex=<NEVER> wex=<NEVER> rto=3s wto=<NEVER> iobuf.flags=0x00000000 .pipe=0 .buf=0@(nil)+0/0 h1s=0x6e5a530 h1s.flg=0x14094 .sd.flg=0x106c0001 .req.state=MSG_DONE .res.state=MSG_DATA .meth=GET status=200 .sd.flg=0x106c0001 .sc.flg=0x00001211 .sc.app=0x6dc2800 .subs=(nil) h1c=0x7f84880f5f40 h1c.flg=0x80000020 .sub=0 .ibuf=32704@0x6ddef30+16262/32768 .obuf=0@(nil)+0/0 .task=0x6e131d0 .exp=<NEVER> co1=0x7f8488172b70 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM target=SERVER:0x597a900 flags=0x00000300 fd=31 fd.state=10122 updt=0 fd.tmask=0x1 filters={0x6e49f30="cache store filter", 0x6e67ad0="compression filter"} req=0x6dc2828 (f=0x21840000 an=0x48000 tofwd=0 total=224) an_exp=<NEVER> buf=0x6dc2830 data=(nil) o=0 p=0 i=0 size=0 htx=0x104d2c0 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0 res=0x6dc2870 (f=0xa0040000 an=0x24000000 tofwd=0 total=309982) an_exp=<NEVER> buf=0x6dc2878 data=0x6dceef0 o=16333 p=16333 i=16435 size=32768 htx=0x6dceef0 flags=0x0 size=32720 data=16333 used=1 wrap=NO extra=0 ----------------------------------- strm.flg 0x100c4a SF_SRV_REUSED SF_HTX SF_REDIRECTABLE SF_CURR_SESS SF_BE_ASSIGNED SF_ASSIGNED task.state 0 0 txn.meth 1 GET txn.flg 0x43000 TX_NOT_FIRST TX_CACHE_COOK TX_CACHEABLE txn.req.flg 0x4c HTTP_MSGF_BODYLESS HTTP_MSGF_VER_11 HTTP_MSGF_XFER_LEN txn.rsp.flg 0x2e HTTP_MSGF_COMPRESSING HTTP_MSGF_VER_11 HTTP_MSGF_XFER_LEN HTTP_MSGF_TE_CHNK f.sc.flg 0x2482 SC_FL_SND_EXP_MORE SC_FL_RCV_ONCE SC_FL_WONT_READ SC_FL_EOI f.sc.sd.flg 0x40405001 SE_FL_HAVE_NO_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_EOI SE_FL_NOT_FIRST SE_FL_T_MUX f.h2s.flg 0x7001 H2_SF_HEADERS_RCVD H2_SF_OUTGOING_DATA H2_SF_HEADERS_SENT H2_SF_ES_RCVD f.h2s.sd.flg 0x40405001 SE_FL_HAVE_NO_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_EOI SE_FL_NOT_FIRST SE_FL_T_MUX f.h2c.flg 0x2000e00 H2_CF_MBUF_HAS_DATA H2_CF_DEM_IN_PROGRESS H2_CF_DEM_SHORT_READ H2_CF_WAIT_INLIST f.co.flg 0x80000300 CO_FL_XPRT_TRACKED CO_FL_XPRT_READY CO_FL_CTRL_READY f.co.fd.st 0x121 FD_POLL_IN FD_EV_READY_W FD_EV_ACTIVE_R b.sc.flg 0x1211 SC_FL_SND_NEVERWAIT SC_FL_NEED_ROOM SC_FL_NOHALF SC_FL_ISBACK b.sc.sd.flg 0x106c0001 SE_FL_WAIT_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_MAY_FASTFWD_PROD SE_FL_WANT_ROOM SE_FL_RCV_MORE SE_FL_T_MUX b.h1s.sd.flg 0x106c0001 SE_FL_WAIT_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_MAY_FASTFWD_PROD SE_FL_WANT_ROOM SE_FL_RCV_MORE SE_FL_T_MUX b.h1s.flg 0x14094 H1S_F_HAVE_CLEN H1S_F_HAVE_O_CONN H1S_F_NOT_FIRST H1S_F_WANT_KAL H1S_F_RX_CONGESTED b.h1c.flg 0x80000020 H1C_F_IS_BACK H1C_F_IN_FULL b.co.flg 0x300 CO_FL_XPRT_READY CO_FL_CTRL_READY b.co.fd.st 0x278a FD_POLL_OUT FD_POLL_PRI FD_POLL_IN FD_EV_ERR_RW FD_EV_READY_R 0x2008 req.flg 0x21840000 CF_FLT_ANALYZE CF_DONT_READ CF_AUTO_CONNECT CF_WROTE_DATA req.ana 0x48000 AN_REQ_FLT_END AN_REQ_HTTP_XFER_BODY req.htx.flg 0 0 res.flg 0xa0040000 CF_ISRESP CF_FLT_ANALYZE CF_WROTE_DATA res.ana 0x24000000 AN_RES_FLT_END AN_RES_HTTP_XFER_BODY res.htx.flg 0 0 ----------------------------------- - second example of stuck connection after properly checking for WAIT_INLIST as well: 0x73438d0: [23/Oct/2024:18:46:57.235709] id=3963 proto=tcpv4 flags=0x100c4a, conn_retries=0, conn_exp=<NEVER> conn_et=0x000 srv_conn=0x5dd3f50, pend_pos=(nil) waiting=0 epoch=0x13 p_stc=25 p_req=29 p_res=29 p_prp=29 frontend=public (id=2 mode=http), listener=SSL (id=5) backend=gitweb-haproxy (id=6 mode=http) task=0x72a13e0 (state=0x00 nice=0 calls=24 rate=0 exp=7s tid=0(1/0) age=53s) txn=0x7287260 flags=0x43000 meth=1 status=200 req.st=MSG_DONE rsp.st=MSG_DATA req.f=0x4c rsp.f=0x2e scf=0x729e520 flags=0x00042082 ioto=1m state=EST endp=CONN,0x737ffd0,0x4040d001 sub=2 rex=<NEVER> wex=46s rto=46s wto=46s iobuf.flags=0x00000000 .pipe=0 .buf=0@(nil)+0/0 h2s=0x737ffd0 h2s.id=57 .st=HCR .flg=0x7001 .rxwin=32712 .rxbuf.c=0 .t=0@(nil)+0/0 .h=0@(nil)+0/0 .sc=0x729e520(.flg=0x00042082 .app=0x73438d0) .sd=0x72afd50(.flg=0x4040d001) .subs=0x729e538(ev=2 tl=0x72af760 tl.calls=10 tl.ctx=0x729e520 tl.fct=sc_conn_io_cb) h2c=0x72555a0 h2c.st0=FRH .err=0 .maxid=77 .lastid=-1 .flg=0x60e00 .nbst=1 .nbsc=1 .nbrcv=0 .glitches=0 .fctl_cnt=0 .send_cnt=1 .tree_cnt=1 .orph_cnt=0 .sub=0 .dsi=77 .dbuf=0@(nil)+0/0 .mbuf=[2..2\|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] .task=0x725e660 .exp=<NEVER> co0=0x7378e00 ctrl=tcpv4 xprt=SSL mux=H2 data=STRM target=LISTENER:0x2f24800 flags=0x80040300 fd=23 fd.state=1122 updt=0 fd.tmask=0x1 scb=0x2ee74c0 flags=0x00001211 ioto=1m state=EST endp=CONN,0x7287190,0x106c0001 sub=0 rex=<NEVER> wex=<NEVER> rto=46s wto=<NEVER> iobuf.flags=0x00000000 .pipe=0 .buf=0@(nil)+0/0 h1s=0x7287190 h1s.flg=0x14094 .sd.flg=0x106c0001 .req.state=MSG_DONE .res.state=MSG_DATA .meth=GET status=200 .sd.flg=0x106c0001 .sc.flg=0x00001211 .sc.app=0x73438d0 .subs=(nil) h1c=0x7373920 h1c.flg=0x80000020 .sub=0 .ibuf=32704@0x7272700+318/32768 .obuf=0@(nil)+0/0 .task=0x729e700 .exp=<NEVER> co1=0x72f5290 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM target=SERVER:0x5dd3f50 flags=0x00000300 fd=19 fd.state=10122 updt=0 fd.tmask=0x1 filters={0x728f1f0="cache store filter" [3], 0x728fea0="compression filter" [28]} req=0x73438f8 (f=0x21840000 an=0x48000 tofwd=0 total=224) an_exp=<NEVER> buf=0x7343900 data=(nil) o=0 p=0 i=0 size=0 htx=0x105f440 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0 res=0x7343940 (f=0xa0040000 an=0x24000000 tofwd=0 total=359574) an_exp=<NEVER> buf=0x7343948 data=0x72b1b30 o=16333 p=16333 i=16435 size=32768 htx=0x72b1b30 flags=0x8 size=32720 data=16333 used=1 wrap=NO extra=0 ----------------------------------- strm.flg 0x100c4a SF_SRV_REUSED SF_HTX SF_REDIRECTABLE SF_CURR_SESS SF_BE_ASSIGNED SF_ASSIGNED task.state 0 0 txn.meth 1 GET txn.flg 0x43000 TX_NOT_FIRST TX_CACHE_COOK TX_CACHEABLE txn.req.flg 0x4c HTTP_MSGF_BODYLESS HTTP_MSGF_VER_11 HTTP_MSGF_XFER_LEN txn.rsp.flg 0x2e HTTP_MSGF_COMPRESSING HTTP_MSGF_VER_11 HTTP_MSGF_XFER_LEN HTTP_MSGF_TE_CHNK f.sc.flg 0x42082 SC_FL_EOS SC_FL_SND_EXP_MORE SC_FL_WONT_READ SC_FL_EOI f.sc.sd.flg 0x4040d001 SE_FL_HAVE_NO_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_EOS SE_FL_EOI SE_FL_NOT_FIRST SE_FL_T_MUX f.h2s.flg 0x7001 H2_SF_HEADERS_RCVD H2_SF_OUTGOING_DATA H2_SF_HEADERS_SENT H2_SF_ES_RCVD f.h2s.sd.flg 0x4040d001 SE_FL_HAVE_NO_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_EOS SE_FL_EOI SE_FL_NOT_FIRST SE_FL_T_MUX f.h2c.flg 0x60e00 H2_CF_END_REACHED H2_CF_RCVD_SHUT H2_CF_MBUF_HAS_DATA H2_CF_DEM_IN_PROGRESS H2_CF_DEM_SHORT_READ f.co.flg 0x80040300 CO_FL_XPRT_TRACKED CO_FL_SOCK_RD_SH CO_FL_XPRT_READY CO_FL_CTRL_READY f.co.fd.st 0x1122 FD_POLL_HUP FD_POLL_IN FD_EV_READY_W FD_EV_READY_R b.sc.flg 0x1211 SC_FL_SND_NEVERWAIT SC_FL_NEED_ROOM SC_FL_NOHALF SC_FL_ISBACK b.sc.sd.flg 0x106c0001 SE_FL_WAIT_DATA SE_FL_MAY_FASTFWD_CONS SE_FL_MAY_FASTFWD_PROD SE_FL_WANT_ROOM SE_FL_RCV_MORE SE_FL_T_MUX	2024-10-23 19:17:10 +02:00
Willy Tarreau	a1d0e58b06	BUILD: spoe: fix build warning on older gcc around sub-struct initialization gcc-4.8 is unhappy with the cfg_file initialization: src/flt_spoe.c: In function 'parse_spoe_flt': src/flt_spoe.c:2202:9: warning: missing braces around initializer [-Wmissing-braces] struct cfgfile cfg_file = {0}; ^ src/flt_spoe.c:2202:9: warning: (near initialization for 'cfg_file.list') [-Wmissing-braces] This is due to the embedded list member. Initializing it to empty like we do almost everywhere else makes it happy. No backport is needed as this was changed in 3.1-dev5 only.	2024-10-23 15:12:59 +02:00
Aurelien DARRAGON	b5b40a9843	BUG/MEDIUM: connection/http-reuse: fix address collision on unhandled address families As described in GH #2765, there were situations where http connections would be re-used for requests to different endpoints, which is obviously unexpected. In GH #2765, this occured with httpclient and UNIX socket combination, but later code analysis revealed that while disabling http reuse on httpclient proxy helped, it didn't fix the underlying issue since it was found that conn_calculate_hash_sockaddr() didn't take into account families such as AF_UNIX or AF_CUST_SOCKPAIR, and because of that the sock_addr part of the connection wasn't hashed. To properly fix the issue, let's explicly handle UNIX (both regular and ABNS) and AF_CUST_SOCKPAIR families, so that the destination address is properly hashed. To prevent this bug from re-appearing: when the family isn't known, instead of doing nothing like before, let's fall back to a generic (unoptimal) hashing which hashes the whole sockaddr_storage struct As a workaround, http-reuse may be disabled on impacted proxies. (unfortunately this doesn't help for httpclient since reuse policy defaults to safe and cannot be modified from the config) It should be backported to all stable versions. Shout out to @christopherhibbert for having reported the issue and provided a trivial reproducer. [ada: prior to 3.0, ctx adjt is required because conn_hash_update()'s prototype is slightly different]	2024-10-23 11:48:16 +02:00
Willy Tarreau	b74fb1325e	MINOR: sample: add the "when" converter to condition some expressions Sometimes it would be desirable to include some debugging output only under certain conditions, but the end of the transfer is too late to apply some rules. Here we take the approach of making a converter ("when") that takes a condition among an arbitrary list, and decides whether or not to let the input sample pass through or not based on the condition. This allows for example to log debugging information only when an error was encountered during the processing (sort of an extension of dontlog-normal). The conditions are quite limited (stopping, error, normal, toapplet, forwarded, processed) and can be negated. The converter can also be chained to use more complex conditions. A suggested example will be: # log "dbg={-}" when fine, or "dbg={... debug info ...}" on error: log-format "$HAPROXY_HTTP_LOG_FMT dbg={%[bs.debug_str,when(!normal)]}"	2024-10-22 20:13:00 +02:00
Willy Tarreau	19e4ec43b9	MINOR: filters: add per-filter call counters The idea here is to record how many times a filter is being called on a stream. We're incrementing the same counter all along, regardless of the type of event, since the purpose is essentially to detect one that might be misbehaving. The number of calls is reported in "show sess all" next to the filter name. It may also help detect suboptimal processing. For example compressing 1GB shows 138k calls to the compression filter, which is roughly two calls per buffer. Maybe we wake up with incomplete buffers and compress less. That's left for a future analysis.	2024-10-22 20:13:00 +02:00
Willy Tarreau	37d5c6fe3a	MINOR: stream: maintain per-stream counters of the number of passes on code Process_stream() is a complex function and a few times some lopos were either witnessed or suspected. Each time this happens it's extremely difficult to figure why because it involves combinations of analysers, filters, errors etc. Let's at least maintain a set of 4 counters per stream that report the number of times we've been through each of the 4 most important blocks (stconn changes, request analysers, response analysers, and propagation of changes down). These ones are stored in the stream and reported in "show sess all", just like they will be reported in panic dumps.	2024-10-22 20:13:00 +02:00
Christopher Faulet	ce314cfb39	MINOR: mux-h1: Add support of the debug string for logs Now it is possible to have info about front and back H1 multiplexer. For instance: <134>Oct 22 18:10:46 haproxy[3841864]: 127.0.0.1:44280 [22/Oct/2024:18:10:43.265] front-http back-http/www 0/0/-1/-1/3082 503 217 - - SC-- 1/1/0/0/3 0/0 "GET / HTTP/1.1" fs=< h1s=0x13b6f10 h1s.flg=0x14010 .sd.flg=0x50404601 .req.state=MSG_DONE .res.state=MSG_DONE .meth=GET status=503 .sd.flg=0x50404601 .sc.flg=0x00034482 .sc.app=0x11e4c30 .subs=(nil) h1c.flg=0x0 .sub=0 .ibuf =0@(nil)+0/0 .obuf=0@(nil)+0/0 .task=0x1337d10 .exp=<NEVER> conn.flg=0x80000300> bs=< h1s=0x13bb400 h1s.flg=0x100010 .sd.flg=0x10400001 .req.state=MSG_RQBEFORE .res.state=MSG_RPBEFORE .meth=UNKNOWN status=0 .sd.flg=0x10400001 .sc.flg=0x0003c007 .sc.app=0x11e4c30 .subs=(nil) h1c.flg=0x80000000 .sub=0 .ibuf=0@(nil)+0/0 .obuf=0@(nil)+0/0 .task=0x12ba610 .exp=<NEVER> conn.flg=0x5c0300> The have this log message, the log-format must be set to: log-format "$HAPROXY_HTTP_LOG_FMT fs=<%[fs.debug_str]> bs=<%[bs.debug_str]>"	2024-10-22 18:21:28 +02:00
Christopher Faulet	35ab9b8c6d	DEBUG: mux-h1: Add debug counters to track some errors Debug counters are added to track errors about wrong the payload length during the message formatting (on the sending path). Aborts are also concerned. connection shutdowns and errors while the end of the message was not reached are now tracked. On the sending path, shutdown performed while all the message was not forwarded are tracked too.	2024-10-22 17:39:32 +02:00
Christopher Faulet	c8aecc393b	DEBUG: stream: Add debug counters to track some client/server aborts Not all aborts are tracked for now but only those a bit ambiguous. Mainly, aborts during the data forwarding are concerned. Those triggered during the request or the response analysis are easier to analyze with the stream termination state.	2024-10-22 16:46:37 +02:00
Christopher Faulet	19b736a5fb	CLEANUP: stream: remove outdated comments Comments added during a refactoring session were still there while they are now totally useless. So let's remove them.	2024-10-22 16:14:15 +02:00
Christopher Faulet	7dc930d231	BUG/MINOR: stconn: Pretend the SE have more data to deliver on abortonclose When abortonclose option is enabled on the backend, at the SC level, we must still pretend the SE have more data to deliver to be able to receive the EOS. It must be performed at 2 places: * When the backend is set and the connection is requested. It is when the option is seen for the first time. * After a receive attempt, if the EOI flag is set on the sedesc. Otherwise, when an abort is detected by the mux, the SC is not notified. This patch should fix the issue #2764. This bug probably exists in all stable version but is only visible since `bca5e1423` ("OPTIM: stconn: Don't pretend mux have more data to deliver on EOI/EOS/ERROR"). So I suggest to not backport it for now, except if the commit above is backported.	2024-10-22 11:16:24 +02:00
Christopher Faulet	ded28f6e5c	BUG/MEDIUM: mux-h2: Remove H2S from send list if data are sent via 0-copy FF When data are sent via the zero-copy data forwarding, in h2_done_ff, we must be sure to remove the H2 stream from the send list if something is send. It was only performed if no blocking condition was encountered. But we must also do it if something is sent. Otherwise the transfer may be blocked till timeout. This patch must be backported as far as 2.9.	2024-10-22 08:00:32 +02:00
Christopher Faulet	529e4f36a3	BUG/MEDIUM: stats-html: Never dump more data than expected during 0-copy FF During the zero-copy data forwarding, the caller specify the maximum amount of data the producer may push. However, the HTML stats applet does not use it and can fill all the free space in the buffer. It is especially an issue when the consumer is limited by a flow control, like the H2. Because we may emit too large DATA frame in this case. It is especially visible with big buffer (for instance 32k). In the early age or zero-copy data forwarding, the caller was responsible to pass a properly resized buffer. And during the different refactoring steps, this has changed but the HTML stats applet was not updated accordingly. To fix the bug, the buffer used to dump the HTML page is resized to be sure not too much data are dumped. This patch should solve the issue #2757. It must be backported to 3.0.	2024-10-22 08:00:32 +02:00
Willy Tarreau	f2c415cec1	MINOR: debug: add "debug dev counters" to list code counters Issuing "debug dev counters" on the CLI will now scan all existing counters, and report their count, type, location, function name, the condition and an optional comment passed to the macro. The command takes a number of arguments: - "show": this is the default, it will just list the counters - "reset": will reset the matching counters instead of listing them - "all": by default, only non-zero counters are listed. With "all", they are all listed - "bug": restrict the reset or dump to counters of type "BUG" (BUG_ON usually) - "chk": restrict the reset or dump to counters of type "CHK" (CHECK_IF) - "cnt": restrict the reset or dump to counters of type "CNT" (COUNT_IF) The types may be cumulated, and the option entered in any order. Here's an example of the output of "debug dev counters show all bug": Count Type Location function(): "condition" [comment] 0 BUG ring.h:114 ring_dup(): "max > ring_size(dst)" 0 BUG vecpair.h:223 vp_getblk_ofs(): "ofs >= v1->len + v2->len" 0 BUG buf.h:395 b_add(): "b->data + count > b->size" 0 BUG buf.h:106 b_room(): "b->data > b->size" 0 BUG task.h:328 _task_queue(): "(ulong)caller & 1" 0 BUG task.h:324 _task_queue(): "task->tid != tid" 0 BUG task.h:313 _task_queue(): "(ulong)caller & 1" (...) This is expected to be convenient combined with the use and abuse of COUNT_IF() at select locations.	2024-10-21 19:17:55 +02:00
Willy Tarreau	da66c42f65	MINOR: debug: add a new debug macro COUNT_IF() This macro works exactly like BUG_ON() except that it never logs anything nor crashes, it only implements an atomic counter that is incremented on every call. This can be used to count a number of unlikely events that are worth checking at run time on setups showing unusual and unreproducible behaviors.	2024-10-21 19:14:07 +02:00
Willy Tarreau	776fd03509	MEDIUM: debug: add match counters for BUG_ON/WARN_ON/CHECK_IF These macros do not always kill the process, and sometimes it would be nice to know if some match or not, and how many times (especially for the CHECK_IF one). This commit adds a new section "dbg_cnt" made of structs that contain function name, file name, line number, check type, condition and match count. A newe macro __DBG_COUNT() adds one to the counter, and is placed inside _BUG_ON() and _BUG_ON_ONCE(). It's worth noting that the exact type of the check is not very precise but in practice we don't care, as most checks will cause the process to die anyway unless they're of type _BUG_ON_ONCE() (used by CHECK_IF by default). All of this is limited to !defined(USE_OBSOLETE_LINKER) because we're creating a section, thus we need a modern linker to be able to scan this section later. Doing so adds ~50kB to the executable due to the ~1266 BUG_ON() and others placed there. That's not huge in comparison to the visibility it can provide.	2024-10-21 19:14:07 +02:00
Willy Tarreau	8844ed2009	CLEANUP: debug: make the BUG_ON() macros check the condition in the outer one The BUG_ON() macros are made of two levels so as to resolve the condition to a string. However this doesn't offer much flexibility for performing other operations when the condition is validated, so let's adjust them so that the condition is checked in the outer macro and the operations are performed in the inner one.	2024-10-21 18:17:25 +02:00
Amaury Denoyelle	68c8c91023	BUG/MINOR: mux-quic: do not close STREAM with empty FIN if no data sent A stream may be shut without any HTX EOM reported to report a proper closure. This is the case for QCS instances flagged with QC_SF_UNKNOWN_PL_LENGTH. Shut is performed with an empty FIN emission instead of a RESET_STREAM. This has been implemented since the following patch : `24962dd178` BUG/MEDIUM: mux-quic: do not emit RESET_STREAM for unknown length However, in case of HTTP/3, an empty FIN should only be done after a full message is emitted, which requires at least a HEADERS frame. If an empty FIN is emitted without it, client may interpret this as invalid and close the connection. To prevent this, fallback to a RESET_STREAM emission if no data were emitted on the stream. This was reproduced using ngtcp2-client with 10% loss (-r 0.1) on a remote host, with httpterm request "/?s=100k&C=1&b=0&P=400". An error ERR_H3_FRAME_UNEXPECTED is returned by ngtcp2-client when the bug occurs. Note that this change is incomplete. The message validity depends solely on the application protocol in use. As such, a new app_ops callback should be implemented to ensure the stream is closed accordingly. However, this first patch ensures that at least HTTP/3 case is valid while keeping a minimal backport process. This should be backported up to 2.8.	2024-10-21 11:24:38 +02:00
Amaury Denoyelle	b200d3d80b	MINOR: mux-quic: simplify sending of empty STREAM FIN An empty STREAM frame can be emitted by QUIC MUX to notify about a delayed FIN when there is no data left to transmit. This requires a tedious comparison on stream offset in qmux_ctrl_send() to ensure an empty stream frame is not always considered as retransmitted, which is necessary to locally close the QCS instance. Simplify this by unsubscribe from streamdesc layer when the QCS is locally closed on FIN transmission notification. This prevents all future retransmitted frames to be reported to the QCS instance, especially any potentially retransmitted empty FIN.	2024-10-21 11:21:07 +02:00
Valentine Krasnobaeva	af1d170122	BUG/MINOR: mworker: fix mworker-max-reloads parser Before this patch, when wrong argument was provided in the configuration for mworker-max-reloads keyword, parser shows these errors below on the stderr: [WARNING] (1820317) : config : parsing [haproxy.cfg:154] : (null)parsing [haproxy.cfg:154] : 'mworker-max-reloads' expects an integer argument. In a case, when by mistake two arguments were provided instead of one, this has also triggered a buggy error message: [ALERT] (1820668) : config : parsing [haproxy.cfg:154] : 'mworker-max-reloads' cannot handle unexpected argument '45'. [WARNING] (1820668) : config : parsing [haproxy.cfg:154] : (null) So, as 'mworker-max-reloads' is parsed in discovery mode by master process let's align now its parser with all others, which could be called for this mode. Like this in cases, when there are too many args or argument isn't a valid integer we return proper error codes to global section parser and messages are formated properly. This fix should be backported in all stable versions.	2024-10-21 10:46:58 +02:00
Ilya Shipitsin	8a1aabb133	CI: modernize macos builds to macos-15 macos-15 support was announced few months ago: https://github.com/github/roadmap/issues/986	2024-10-21 07:54:38 +02:00
Ilya Shipitsin	50cf89ad5c	CI: bump development builds explicitely to Ubuntu 24.04 Initially we agreed to split builds into "latest" for development branch and fixed 22.04 for stable branches. It got broken when "latest" label migrated from ubuntu-22 to ubuntu-24 ... because of build cache. Cache key is built using runner label, it was not prepared to use the same "latest" cache from ubuntu 22 on ubuntu 24. To make things clear, let's stick explicitely to ubuntu 24.	2024-10-21 07:54:35 +02:00
Ilya Shipitsin	b6491ab19f	CI: prepare Coverity build for Ubuntu 24 PCRE2 is recommended, PCRE was chosen for no reason. GHA Ubuntu 22 images include both libs, but recent Ubuntu 24 does not. Let us prepare for Ubuntu 24	2024-10-21 07:54:32 +02:00
Willy Tarreau	9aa86b9dbd	BUILD: mux-h2/traces: fix build on 32-bit due to size of the DATA frame Commit `cf3fe1eed` ("MINOR: mux-h2/traces: print the size of the DATA frames") added the size of the DATA frame to the traces. Unfortunately it uses ullong instead of ulong to cast a pointer, which breaks the build on 32-bit platforms. Let's just switch it to ulong which works on both.	2024-10-21 04:17:59 +02:00

1 2 3 4 5 ...

23250 Commits