Commit Graph

18473 Commits

Author SHA1 Message Date
Willy Tarreau
77acaf5af5 MINOR: flags: add a new file to host flag dumping macros
The "flags" utility is useful but painful to maintain up to date. This
commit aims at providing a low-maintenance solution to keep flags up to
date, by proposing some macros that build a string from a set of flags
in a way that requires the least possible verbosity.

The idea will be to add an inline function dedicated to this just after
the flags declaration, and enumerate the flags one is interested in, and
that function will fill a string based on them.

Placing this inside the type files allows both haproxy and external tools
like "flags" to use it, but comes with a few constraints. First, the
files will be slightly less readable if these functions are huge, so they
need to stay as compact as possible. Second, the function will need
anprintf() and we don't want to include stdio.h in type files as it
proved to be particularly heavy and to cause definition headaches in
the past.

As such the file here only contains a macro enclosed in #ifdef EOF (that
is defined in stdio), and provides an alternate empty one when no stdio
is defined. This way it's the caller that has to include stdio first or
it won't get anything back, and in practice the locations relying on
this always have it.

The macro has to be used in 3 steps:
  - prologue: dumps 0 and exits if the value is zero
  - flags: the macro can be recursively called and it will push the
    flag from bottom to top so that they appear in the same order as
    today without requiring to be declared the other way around
  - epilogue: dump remaining flags that were not identified

The macro was arranged so that a single character can be used with no
other argument to declare all flags at once. Example:

  #define _(n, ...) __APPEND_FLAG(buf, len, del, flg, n, #n, __VA_ARGS__)
     _(0);
     _(X_FLAG1, _(X_FLAG2, _(X_FLAG3, _(X_FLAG4))));
     _(~0);
  #undef _

Existing files will have to be updated to rely on it, and more files
could come soon.
2022-09-09 14:47:31 +02:00
Willy Tarreau
20273ceec0 DEV: flags: add missing CO_FL_FDLESS connection flag
This was added in 2.6 by commit c78a9698e ("MINOR: connection: add a new
flag CO_FL_FDLESS on fd-less connections") but forgotten in flags.c.
This must be backported to 2.6.
2022-09-09 14:46:15 +02:00
Willy Tarreau
c7ac17412b DEV: flags: fix usage message to reflect available options
The proposed decoding options were not updated after the changes in 2.6,
let's fix that by taking the names from the existing declaration. This
should be backported to 2.6.
2022-09-09 14:26:29 +02:00
Ilya Shipitsin
b0ab121da1 CI: cirrus-ci: bump FreeBSD image to 13-1
we use FreeBSD binary packages that we rebuilt for FreeBSD-13.1

Newer FreeBSD version for package zstd:
To ignore this error set IGNORE_OSVERSION=yes
- package: 1301000
- running kernel: 1300139
2022-09-09 13:30:17 +02:00
Matthias Wirth
eea152ee68 BUG/MINOR: signals/poller: ensure wakeup from signals
Add self-wake in signal_handler() to fix a race condition with a signal
coming in between checking signal_queue_len and entering polling sleep.

The changes in commit 43c891dda ("BUG/MINOR: signals/poller: set the
poller timeout to 0 when there are signals") were insufficient.

Move the signal_queue_len check from the poll implementations to
run_poll_loop() to keep that logic in one place.

The poll loops are terminated either by the parameter wake being set or
wake up due to a write to their poller_wr_pipe by wake_thread() in
signal_handler().

This fixes issue #1841.

Must be backported in every stable version.
2022-09-09 11:15:22 +02:00
Frédéric Lécaille
ef2d2340e6 BUILD: udp-perturb: Add a make target for udp-perturb tool
This is only to rely on make to build this tool.
2022-09-08 20:47:28 +02:00
Frédéric Lécaille
192093b581 MINOR: dev/udp: Apply the corruption to both directions
Harden the UDP datagram corruption applying it on both sides. This approaches
the conditions of some tests run by the QUIC interop runner developed by
Marten Seeman.
2022-09-08 20:38:59 +02:00
Frédéric Lécaille
3dd79d378c MINOR: h3: Send the h3 settings with others streams (requests)
This is the ->finalize application callback which prepares the unidirectional STREAM
frames for h3 settings and wakeup the mux I/O handler to send them. As haproxy is
at the same time always waiting for the client request, this makes haproxy
call sendto() to send only about 20 bytes of stream data. Furthermore in case
of heavy loss, this give less chances to short h3 requests to succeed.

Drawback: as at this time the mux sends its streams by their IDs ascending order
the stream 0 is always embedded before the unidirectional stream 3 for h3 settings.
Nevertheless, as these settings may be lost and received after other h3 request
streams, this is permitted by the RFC.

Perhaps there is a better way to do. This will have to be checked with Amaury.

Must be backported to 2.6.
2022-09-08 18:04:58 +02:00
Frédéric Lécaille
befcf7031d MINOR: h3: Missing connection argument for a TRACE_LEAVE() argument
This should help in debbuging issues to be able to associate this trace to a
QUIC connection.

Must be backported to 2.6.
2022-09-08 18:04:58 +02:00
Frédéric Lécaille
2eb5faa2ad MINOR: h3: Add the quic_conn object to h3 traces
This is very useful to associate h3 traces to a QUIC connection when debugging.

Must be backported to 2.6.
2022-09-08 18:04:58 +02:00
Frédéric Lécaille
1c725aa9cd BUG/MINOR: h3: Crash when h3 trace verbosity is "minimal"
This was due to a missing check in h3_trace() about the first argument
presence (connection) and h3_parse_settings_frm() which calls TRACE_LEAVE()
without any argument. Then this argument was dereferenced.

Must be backported to 2.6
2022-09-08 18:04:58 +02:00
Frédéric Lécaille
3c1b81fdd7 BUG/MINOR: quic: Trace fix about packet number space information.
<qc> variable was confused with <qel>. The consequence was that it was
always the same packet number space which was displayed: the first one (or
the Initial packet number space).

Must be backported to 2.6.
2022-09-08 18:04:58 +02:00
Frédéric Lécaille
bb995eafc7 BUG/MINOR: quic: Speed up the handshake completion only one time
It is possible to speed up the handshake completion but only one time
by connection as mentionned in RFC 9002 "6.2.3. Speeding up Handshake Completion".
Add a flag to prevent this process to be run several times
(see https://www.rfc-editor.org/rfc/rfc9002#name-speeding-up-handshake-compl).

Must be backported to 2.6.
2022-09-08 18:04:58 +02:00
William Lallemand
43c891dda0 BUG/MINOR: signals/poller: set the poller timeout to 0 when there are signals
When receiving a signal before entering the poller, and without any
activity in the process, the poller will be entered with a timeout
calculated without checking the signals.

Since commit 4f59d3 ("MINOR: time: increase the minimum wakeup interval
to 60s") the issue is much more visible because it could be stuck for
60s.

When in mworker mode, if a worker quits and the SIGCHLD signal deliver
at the right time to the master, this one could be stuck for the time of
the timeout.

This should fix issue #1841

Must be backported in every stable version.
2022-09-08 17:46:31 +02:00
Willy Tarreau
e86bc35672 MINOR: activity/cli: support sorting task profiling by total CPU time
The new "bytime" sorting criterion uses the reported CPU time instead of
the usage. This is convenient to spot tasks that are mostly reponsible
for the CPU usage in a running process. It supports both the detailed
and the aggregated format. The output looks like this:

> show profiling tasks bytime
Tasks activity:
  function                      calls   cpu_tot   cpu_avg   lat_tot   lat_avg
  qc_io_cb                     117739   1.961m    999.1us   37.45s    318.1us <- h3_snd_buf@src/h3.c:1084 tasklet_wakeup
  process_stream              7376273   1.384m    11.26us   1.013h    494.2us <- stream_new@src/stream.c:563 task_wakeup
  process_stream              8104400   1.133m    8.389us   1.130h    502.0us <- sc_notify@src/stconn.c:1209 task_wakeup
  qc_io_cb                      43280   45.76s    1.057ms   13.95s    322.3us <- qc_stream_desc_ack@src/quic_stream.c:128 tasklet_wakeup
  h1_io_cb                   11025715   24.82s    2.251us   5.406m    29.42us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup
  quic_conn_app_io_cb          312861   23.86s    76.27us   2.373s    7.584us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after
  qc_io_cb                      37063   12.65s    341.4us   6.409s    172.9us <- qc_treat_acked_tx_frm@src/xprt_quic.c:1695 tasklet_wakeup
  h1_io_cb                    4783520   11.79s    2.463us   1.419h    1.068ms <- conn_subscribe@src/connection.c:732 tasklet_wakeup
  sc_conn_io_cb              12269693   11.51s    938.0ns   2.117h    621.2us <- sc_app_chk_rcv_conn@src/stconn.c:762 tasklet_wakeup
  sc_conn_io_cb               6479006   10.94s    1.689us   7.984m    73.93us <- h1_wake_stream_for_recv@src/mux_h1.c:2600 tasklet_wakeup
  qc_io_cb                      12011   10.72s    892.5us   2.120s    176.5us <- qcc_release_remote_stream@src/mux_quic.c:1200 tasklet_wakeup
  h2_io_cb                     246423   6.225s    25.26us   56.52s    229.4us <- h2_snd_buf@src/mux_h2.c:6712 tasklet_wakeup
  h2_io_cb                     137744   6.076s    44.11us   16.59s    120.4us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup
  quic_lstnr_dghdlr            323575   3.062s    9.462us   3.424m    634.9us <- quic_lstnr_dgram_dispatch@src/quic_sock.c:255 tasklet_wakeup
  sc_conn_io_cb               1206939   1.616s    1.338us   27.62m    1.373ms <- qcs_notify_send@src/mux_quic.c:529 tasklet_wakeup
  h2_io_cb                     212370   251.2ms   1.182us   6.476s    30.49us <- h2c_restart_reading@src/mux_h2.c:856 tasklet_wakeup
  h1_io_cb                      44109   197.0ms   4.466us   31.89s    723.0us <- h1_takeover@src/mux_h1.c:4085 tasklet_wakeup
  quic_conn_app_io_cb            3029   87.59ms   28.92us   999.0ms   329.8us <- qc_process_timer@src/xprt_quic.c:4635 tasklet_wakeup
  task_run_applet                  40   35.77ms   894.3us   4.407ms   110.2us <- sc_applet_create@src/stconn.c:489 appctx_wakeup
  task_run_applet                  18   27.36ms   1.520ms   19.56us   1.086us <- sc_app_chk_snd_applet@src/stconn.c:996 appctx_wakeup
  sc_conn_io_cb                  2186   11.76ms   5.377us   963.0ms   440.5us <- h1_wake_stream_for_send@src/mux_h1.c:2610 tasklet_wakeup
  qc_io_cb                          8   9.880ms   1.235ms   5.871ms   733.9us <- qcs_consume@src/mux_quic.c:800 tasklet_wakeup
  quic_conn_io_cb                   4   5.951ms   1.488ms   38.85us   9.713us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after
  qc_io_cb                        101   4.975ms   49.26us   13.91ms   137.8us <- qc_process_timer@src/xprt_quic.c:4602 tasklet_wakeup
  h1_io_cb                       2186   1.809ms   827.0ns   720.2ms   329.5us <- sock_conn_iocb@src/sock.c:849 tasklet_wakeup
  qc_process_timer               3031   1.735ms   572.0ns   1.153s    380.3us <- wake_expired_tasks@src/task.c:344 task_wakeup
  accept_queue_process            359   1.362ms   3.793us   80.32ms   223.7us <- listener_accept@src/listener.c:1099 tasklet_wakeup
  quic_conn_app_io_cb               2   921.1us   460.6us   203.1us   101.5us <- qc_xprt_start@src/xprt_quic.c:7122 tasklet_wakeup
  h1_timeout_task                2618   526.8us   201.0ns   1.121s    428.4us <- h1_release@src/mux_h1.c:1087 task_wakeup
  process_resolvers               316   283.3us   896.0ns   14.96ms   47.33us <- wake_expired_tasks@src/task.c:429 task_drop_running
  sc_conn_io_cb                   420   235.6us   560.0ns   116.7ms   277.8us <- h2s_notify_recv@src/mux_h2.c:1298 tasklet_wakeup
  qc_idle_timer_task                1   225.5us   225.5us   506.0ns   506.0ns <- wake_expired_tasks@src/task.c:344 task_wakeup
  accept_queue_process             36   153.0us   4.250us   5.834ms   162.1us <- accept_queue_process@src/listener.c:165 tasklet_wakeup
  sc_conn_io_cb                    18   54.05us   3.003us   11.50us   638.0ns <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup
  h2_io_cb                          6   38.88us   6.480us   2.089ms   348.2us <- h2_do_shutw@src/mux_h2.c:4656 tasklet_wakeup
  srv_cleanup_idle_conns           54   37.72us   698.0ns   14.21ms   263.1us <- wake_expired_tasks@src/task.c:429 task_drop_running
  sc_conn_io_cb                    50   32.86us   657.0ns   28.83ms   576.5us <- qcs_notify_recv@src/mux_quic.c:519 tasklet_wakeup
  qc_io_cb                          2   30.25us   15.12us   6.093us   3.046us <- qc_init@src/mux_quic.c:2057 tasklet_wakeup
  srv_cleanup_toremove_conns        1   27.16us   27.16us   905.6us   905.6us <- srv_cleanup_idle_conns@src/server.c:5948 task_wakeup
  task_run_applet                  39   19.61us   502.0ns   818.7us   20.99us <- run_tasks_from_lists@src/task.c:652 task_drop_running
  quic_accept_run                   2   15.46us   7.727us   305.5us   152.8us <- quic_accept_push_qc@src/quic_sock.c:458 tasklet_wakeup
  h2_timeout_task                  32   12.91us   403.0ns   4.207ms   131.5us <- h2_release@src/mux_h2.c:1191 task_wakeup
  quic_conn_app_io_cb               1   9.645us   9.645us   1.445us   1.445us <- qc_process_timer@src/xprt_quic.c:4589 tasklet_wakeup

> show profiling tasks bytime aggr
Tasks activity:
  function                      calls   cpu_tot   cpu_avg   lat_tot   lat_avg
  qc_io_cb                     212301   3.147m    889.5us   1.009m    285.2us
  process_stream             15503573   2.519m    9.747us   2.148h    498.7us
  h1_io_cb                   15916733   36.95s    2.321us   1.535h    347.1us
  quic_conn_app_io_cb          318845   24.21s    75.92us   3.410s    10.70us
  sc_conn_io_cb              20037058   24.19s    1.207us   2.737h    491.8us
  h2_io_cb                     596543   12.55s    21.04us   1.326m    133.4us
  quic_lstnr_dghdlr            326624   3.094s    9.473us   3.462m    635.9us
  task_run_applet                 100   64.43ms   644.3us   5.285ms   52.85us
  quic_conn_io_cb                   4   5.951ms   1.488ms   38.85us   9.713us
  qc_process_timer               3061   1.750ms   571.0ns   1.162s    379.5us
  accept_queue_process            396   1.521ms   3.840us   86.16ms   217.6us
  h1_timeout_task                2618   526.8us   201.0ns   1.121s    428.4us
  process_resolvers               319   286.0us   896.0ns   16.82ms   52.73us
  qc_idle_timer_task                1   225.5us   225.5us   506.0ns   506.0ns
  srv_cleanup_idle_conns           54   37.72us   698.0ns   14.21ms   263.1us
  srv_cleanup_toremove_conns        1   27.16us   27.16us   905.6us   905.6us
  quic_accept_run                   2   15.46us   7.727us   305.5us   152.8us
  h2_timeout_task                  32   12.91us   403.0ns   4.207ms   131.5us
2022-09-08 16:38:10 +02:00
Willy Tarreau
dc89b1806c MINOR: activity/cli: support aggregating task profiling outputs
By default we now dump stats between caller and callee, but by
specifying "aggr" on the command line, stats get aggregated by
callee again as it used to be before the feature was available.
It may sometimes be helpful when comparing total call counts,
though that's about all.
2022-09-08 16:32:17 +02:00
Willy Tarreau
64435aaa85 MINOR: tasks/activity: improve the caller-callee activity hash
The previous dump already showed that the "other" category was getting
a few entries. Let's proceed like for the memory profiling, by scanning
a limited range of adjacent slots to find a spare one (16 max). That's
pretty fast since close and likely prefetched and the comparison is
cheap. The new dump now shows up to 45 entries below without "other":

Now:
Tasks activity:
  function                      calls   cpu_tot   cpu_avg   lat_tot   lat_avg
  task_run_applet                  22   34.56ms   1.571ms   1.145ms   52.04us <- sc_applet_create@src/stconn.c:489 appctx_wakeup
  task_run_applet                  21   11.11us   529.0ns   2.590ms   123.3us <- run_tasks_from_lists@src/task.c:652 task_drop_running
  task_run_applet                   5   7.715ms   1.543ms   2.186us   437.0ns <- sc_app_chk_snd_applet@src/stconn.c:996 appctx_wakeup
  accept_queue_process            345   3.129ms   9.068us   72.84ms   211.1us <- listener_accept@src/listener.c:1099 tasklet_wakeup
  accept_queue_process             32   113.0us   3.529us   3.070ms   95.94us <- accept_queue_process@src/listener.c:165 tasklet_wakeup
  sc_conn_io_cb               5026032   3.037s    604.0ns   17.47m    208.5us <- sc_app_chk_rcv_conn@src/stconn.c:762 tasklet_wakeup
  sc_conn_io_cb               4361192   7.626s    1.748us   3.179m    43.74us <- h1_wake_stream_for_recv@src/mux_h1.c:2600 tasklet_wakeup
  sc_conn_io_cb                178293   275.4ms   1.544us   2.740m    922.0us <- qcs_notify_send@src/mux_quic.c:529 tasklet_wakeup
  sc_conn_io_cb                  2561   15.84ms   6.185us   1.036s    404.4us <- h1_wake_stream_for_send@src/mux_h1.c:2610 tasklet_wakeup
  sc_conn_io_cb                   453   261.4us   577.0ns   86.79ms   191.6us <- h2s_notify_recv@src/mux_h2.c:1298 tasklet_wakeup
  sc_conn_io_cb                    89   50.05us   562.0ns   100.7ms   1.131ms <- qcs_notify_recv@src/mux_quic.c:519 tasklet_wakeup
  sc_conn_io_cb                     8   19.04us   2.379us   472.5us   59.06us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup
  process_resolvers                50   57.50us   1.149us   1.116ms   22.32us <- wake_expired_tasks@src/task.c:429 task_drop_running
  srv_cleanup_idle_conns            8   5.669us   708.0ns   216.6us   27.08us <- wake_expired_tasks@src/task.c:429 task_drop_running
  process_stream              4599847   48.79s    10.61us   16.92m    220.7us <- sc_notify@src/stconn.c:1209 task_wakeup
  process_stream              4530081   52.82s    11.66us   14.92m    197.6us <- stream_new@src/stream.c:563 task_wakeup
  process_stream                   15   201.7us   13.45us   31.58ms   2.105ms <- sc_app_chk_snd_conn@src/stconn.c:857 task_wakeup
  h1_io_cb                    7861205   18.22s    2.317us   2.408m    18.38us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup
  h1_io_cb                     474763   1.379s    2.905us   6.578m    831.4us <- conn_subscribe@src/connection.c:732 tasklet_wakeup
  h1_io_cb                      34830   38.64ms   1.109us   18.85s    541.2us <- h1_takeover@src/mux_h1.c:4085 tasklet_wakeup
  h1_io_cb                       2561   2.150ms   839.0ns   674.4ms   263.3us <- sock_conn_iocb@src/sock.c:849 tasklet_wakeup
  h1_timeout_task                2634   588.5us   223.0ns   890.5ms   338.1us <- h1_release@src/mux_h1.c:1087 task_wakeup
  h2_timeout_task                  16   7.519us   469.0ns   1.146ms   71.63us <- h2_release@src/mux_h2.c:1191 task_wakeup
  h2_io_cb                      99601   2.212s    22.21us   19.33s    194.1us <- h2_snd_buf@src/mux_h2.c:6712 tasklet_wakeup
  h2_io_cb                      79777   146.6ms   1.837us   3.529s    44.24us <- h2c_restart_reading@src/mux_h2.c:856 tasklet_wakeup
  h2_io_cb                      60698   2.259s    37.21us   4.704s    77.50us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup
  h2_io_cb                          5   36.90us   7.380us   2.045ms   409.0us <- h2_do_shutw@src/mux_h2.c:4656 tasklet_wakeup
  qc_io_cb                      26595   8.007s    301.1us   4.261s    160.2us <- qc_treat_acked_tx_frm@src/xprt_quic.c:1695 tasklet_wakeup
  qc_io_cb                       7921   5.284s    667.1us   2.171s    274.1us <- qc_stream_desc_ack@src/quic_stream.c:128 tasklet_wakeup
  qc_io_cb                       6229   5.851s    939.3us   1.856s    297.9us <- h3_snd_buf@src/h3.c:1084 tasklet_wakeup
  qc_io_cb                        994   699.1ms   703.3us   174.9ms   176.0us <- qcc_release_remote_stream@src/mux_quic.c:1200 tasklet_wakeup
  qc_io_cb                         65   9.883ms   152.0us   13.33ms   205.1us <- qc_process_timer@src/xprt_quic.c:4602 tasklet_wakeup
  qc_io_cb                          1   293.5us   293.5us   105.9us   105.9us <- qcs_consume@src/mux_quic.c:800 tasklet_wakeup
  qc_io_cb                          1   10.87us   10.87us   3.307us   3.307us <- qc_init@src/mux_quic.c:2057 tasklet_wakeup
  quic_conn_io_cb                   2   2.531ms   1.265ms   2.839us   1.419us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after
  quic_conn_app_io_cb           61392   2.620s    42.67us   268.0ms   4.365us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after
  quic_conn_app_io_cb             408   10.56ms   25.88us   124.0ms   303.8us <- qc_process_timer@src/xprt_quic.c:4635 tasklet_wakeup
  quic_conn_app_io_cb               2   15.61us   7.806us   103.2us   51.59us <- qc_process_timer@src/xprt_quic.c:4589 tasklet_wakeup
  quic_conn_app_io_cb               1   410.6us   410.6us   11.52us   11.52us <- qc_xprt_start@src/xprt_quic.c:7122 tasklet_wakeup
  quic_lstnr_dghdlr             62716   409.2ms   6.523us   21.81s    347.8us <- quic_lstnr_dgram_dispatch@src/quic_sock.c:255 tasklet_wakeup
  qc_process_timer                410   245.4us   598.0ns   238.5ms   581.7us <- wake_expired_tasks@src/task.c:344 task_wakeup
  quic_accept_run                   1   7.711us   7.711us   82.28us   82.28us <- quic_accept_push_qc@src/quic_sock.c:458 tasklet_wakeup
2022-09-08 16:25:36 +02:00
Willy Tarreau
3d4cdb198c MEDIUM: tasks/activity: combine the called function with the caller
Now instead of getting aggregate stats per called function, we have
them per function AND per call place. The "byaddr" sort considers
the function pointer first, then the call count, so that dominant
callers of a given callee are instantly spotted. This allows to get
sorted outputs like this:

Tasks activity:
  function                      calls   cpu_tot   cpu_avg   lat_tot   lat_avg
  h1_io_cb                   17357952   40.91s    2.357us   4.849m    16.76us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup
  sc_conn_io_cb              10357182   6.297s    607.0ns   27.93m    161.8us <- sc_app_chk_rcv_conn@src/stconn.c:762 tasklet_wakeup
  process_stream              9891131   1.809m    10.97us   53.61m    325.2us <- sc_notify@src/stconn.c:1209 task_wakeup
  process_stream              9823934   1.887m    11.52us   48.31m    295.1us <- stream_new@src/stream.c:563 task_wakeup
  sc_conn_io_cb               9347863   16.59s    1.774us   6.143m    39.43us <- h1_wake_stream_for_recv@src/mux_h1.c:2600 tasklet_wakeup
  h1_io_cb                     501344   1.848s    3.686us   6.544m    783.2us <- conn_subscribe@src/connection.c:732 tasklet_wakeup
  sc_conn_io_cb                239717   492.3ms   2.053us   3.213m    804.3us <- qcs_notify_send@src/mux_quic.c:529 tasklet_wakeup
  h2_io_cb                     173019   4.204s    24.30us   40.95s    236.7us <- h2_snd_buf@src/mux_h2.c:6712 tasklet_wakeup
  h2_io_cb                     149487   424.3ms   2.838us   14.63s    97.87us <- h2c_restart_reading@src/mux_h2.c:856 tasklet_wakeup
  other                        101893   4.626s    45.40us   14.84s    145.7us
  quic_lstnr_dghdlr             94389   614.0ms   6.504us   30.54s    323.6us <- quic_lstnr_dgram_dispatch@src/quic_sock.c:255 tasklet_wakeup
  quic_conn_app_io_cb           92205   3.735s    40.51us   390.9ms   4.239us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after
  qc_io_cb                      50355   19.01s    377.5us   10.65s    211.4us <- qc_treat_acked_tx_frm@src/xprt_quic.c:1695 tasklet_wakeup
  h1_io_cb                      44427   155.0ms   3.489us   21.50s    484.0us <- h1_takeover@src/mux_h1.c:4085 tasklet_wakeup
  qc_io_cb                       9018   4.924s    546.0us   3.084s    342.0us <- qc_stream_desc_ack@src/quic_stream.c:128 tasklet_wakeup
  h1_timeout_task                3236   1.172ms   362.0ns   1.119s    345.9us <- h1_release@src/mux_h1.c:1087 task_wakeup
  h1_io_cb                       2804   7.974ms   2.843us   1.980s    706.0us <- sock_conn_iocb@src/sock.c:849 tasklet_wakeup
  sc_conn_io_cb                  2804   33.44ms   11.92us   2.597s    926.2us <- h1_wake_stream_for_send@src/mux_h1.c:2610 tasklet_wakeup
  qc_io_cb                       2623   2.669s    1.017ms   1.347s    513.5us <- h3_snd_buf@src/h3.c:1084 tasklet_wakeup
  qc_process_timer                662   526.4us   795.0ns   1.081s    1.633ms <- wake_expired_tasks@src/task.c:344 task_wakeup
  quic_conn_app_io_cb             648   12.62ms   19.47us   225.7ms   348.2us <- qc_process_timer@src/xprt_quic.c:4635 tasklet_wakeup
  accept_queue_process            286   1.571ms   5.494us   72.55ms   253.7us <- listener_accept@src/listener.c:1099 tasklet_wakeup
  process_resolvers               176   157.8us   896.0ns   7.835ms   44.52us <- wake_expired_tasks@src/task.c:429 task_drop_running
  qc_io_cb                        167   10.71ms   64.12us   32.47ms   194.4us <- qc_process_timer@src/xprt_quic.c:4602 tasklet_wakeup
  sc_conn_io_cb                   123   80.05us   650.0ns   50.35ms   409.4us <- qcs_notify_recv@src/mux_quic.c:519 tasklet_wakeup
  h2_timeout_task                  32   30.69us   958.0ns   9.038ms   282.4us <- h2_release@src/mux_h2.c:1191 task_wakeup
  task_run_applet                  24   33.79ms   1.408ms   5.838ms   243.3us <- sc_applet_create@src/stconn.c:489 appctx_wakeup
  accept_queue_process             17   56.34us   3.314us   7.505ms   441.5us <- accept_queue_process@src/listener.c:165 tasklet_wakeup
  srv_cleanup_toremove_conns       16   1.133ms   70.81us   5.685ms   355.3us <- srv_cleanup_idle_conns@src/server.c:5948 task_wakeup
  srv_cleanup_idle_conns           16   74.57us   4.660us   2.797ms   174.8us <- wake_expired_tasks@src/task.c:429 task_drop_running
  quic_conn_app_io_cb              12   786.9us   65.58us   2.042ms   170.1us <- qc_process_timer@src/xprt_quic.c:4589 tasklet_wakeup
  sc_conn_io_cb                     9   20.55us   2.283us   2.475ms   275.0us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup
  h2_io_cb                          8   34.12us   4.265us   1.784ms   223.0us <- h2_do_shutw@src/mux_h2.c:4656 tasklet_wakeup
  task_run_applet                   4   6.615ms   1.654ms   2.306us   576.0ns <- sc_app_chk_snd_applet@src/stconn.c:996 appctx_wakeup
  quic_conn_io_cb                   4   4.278ms   1.069ms   6.469us   1.617us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after
  qc_io_cb                          2   20.81us   10.40us   4.943us   2.471us <- qc_init@src/mux_quic.c:2057 tasklet_wakeup
  quic_conn_app_io_cb               2   752.9us   376.4us   63.97us   31.99us <- qc_xprt_start@src/xprt_quic.c:7122 tasklet_wakeup
  quic_accept_run                   2   13.84us   6.920us   172.8us   86.42us <- quic_accept_push_qc@src/quic_sock.c:458 tasklet_wakeup
  qc_idle_timer_task                2   295.0us   147.5us   8.761us   4.380us <- wake_expired_tasks@src/task.c:344 task_wakeup
  qc_io_cb                          1   867.1us   867.1us   812.8us   812.8us <- qcs_consume@src/mux_quic.c:800 tasklet_wakeup

... and calls sorted by address like this:

Tasks activity:
  function                      calls   cpu_tot   cpu_avg   lat_tot   lat_avg
  task_run_applet                  23   32.73ms   1.423ms   5.837ms   253.8us <- sc_applet_create@src/stconn.c:489 appctx_wakeup
  task_run_applet                   4   6.615ms   1.654ms   2.306us   576.0ns <- sc_app_chk_snd_applet@src/stconn.c:996 appctx_wakeup
  accept_queue_process            285   1.566ms   5.495us   72.49ms   254.3us <- listener_accept@src/listener.c:1099 tasklet_wakeup
  accept_queue_process             17   56.34us   3.314us   7.505ms   441.5us <- accept_queue_process@src/listener.c:165 tasklet_wakeup
  sc_conn_io_cb              10357182   6.297s    607.0ns   27.93m    161.8us <- sc_app_chk_rcv_conn@src/stconn.c:762 tasklet_wakeup
  sc_conn_io_cb               9347863   16.59s    1.774us   6.143m    39.43us <- h1_wake_stream_for_recv@src/mux_h1.c:2600 tasklet_wakeup
  sc_conn_io_cb                239717   492.3ms   2.053us   3.213m    804.3us <- qcs_notify_send@src/mux_quic.c:529 tasklet_wakeup
  sc_conn_io_cb                  2804   33.44ms   11.92us   2.597s    926.2us <- h1_wake_stream_for_send@src/mux_h1.c:2610 tasklet_wakeup
  sc_conn_io_cb                   123   80.05us   650.0ns   50.35ms   409.4us <- qcs_notify_recv@src/mux_quic.c:519 tasklet_wakeup
  sc_conn_io_cb                     9   20.55us   2.283us   2.475ms   275.0us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup
  process_resolvers               159   145.9us   917.0ns   7.823ms   49.20us <- wake_expired_tasks@src/task.c:429 task_drop_running
  srv_cleanup_idle_conns           16   74.57us   4.660us   2.797ms   174.8us <- wake_expired_tasks@src/task.c:429 task_drop_running
  srv_cleanup_toremove_conns       16   1.133ms   70.81us   5.685ms   355.3us <- srv_cleanup_idle_conns@src/server.c:5948 task_wakeup
  process_stream              9891130   1.809m    10.97us   53.61m    325.2us <- sc_notify@src/stconn.c:1209 task_wakeup
  process_stream              9823933   1.887m    11.52us   48.31m    295.1us <- stream_new@src/stream.c:563 task_wakeup
  h1_io_cb                   17357952   40.91s    2.357us   4.849m    16.76us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup
  h1_io_cb                     501344   1.848s    3.686us   6.544m    783.2us <- conn_subscribe@src/connection.c:732 tasklet_wakeup
  h1_io_cb                      44427   155.0ms   3.489us   21.50s    484.0us <- h1_takeover@src/mux_h1.c:4085 tasklet_wakeup
  h1_io_cb                       2804   7.974ms   2.843us   1.980s    706.0us <- sock_conn_iocb@src/sock.c:849 tasklet_wakeup
  h1_timeout_task                3236   1.172ms   362.0ns   1.119s    345.9us <- h1_release@src/mux_h1.c:1087 task_wakeup
  h2_timeout_task                  32   30.69us   958.0ns   9.038ms   282.4us <- h2_release@src/mux_h2.c:1191 task_wakeup
  h2_io_cb                     173019   4.204s    24.30us   40.95s    236.7us <- h2_snd_buf@src/mux_h2.c:6712 tasklet_wakeup
  h2_io_cb                     149487   424.3ms   2.838us   14.63s    97.87us <- h2c_restart_reading@src/mux_h2.c:856 tasklet_wakeup
  h2_io_cb                          8   34.12us   4.265us   1.784ms   223.0us <- h2_do_shutw@src/mux_h2.c:4656 tasklet_wakeup
  qc_io_cb                      50355   19.01s    377.5us   10.65s    211.4us <- qc_treat_acked_tx_frm@src/xprt_quic.c:1695 tasklet_wakeup
  qc_io_cb                       9018   4.924s    546.0us   3.084s    342.0us <- qc_stream_desc_ack@src/quic_stream.c:128 tasklet_wakeup
  qc_io_cb                       2623   2.669s    1.017ms   1.347s    513.5us <- h3_snd_buf@src/h3.c:1084 tasklet_wakeup
  qc_io_cb                        167   10.71ms   64.12us   32.47ms   194.4us <- qc_process_timer@src/xprt_quic.c:4602 tasklet_wakeup
  qc_io_cb                          2   20.81us   10.40us   4.943us   2.471us <- qc_init@src/mux_quic.c:2057 tasklet_wakeup
  qc_io_cb                          1   867.1us   867.1us   812.8us   812.8us <- qcs_consume@src/mux_quic.c:800 tasklet_wakeup
  qc_idle_timer_task                2   295.0us   147.5us   8.761us   4.380us <- wake_expired_tasks@src/task.c:344 task_wakeup
  quic_conn_io_cb                   4   4.278ms   1.069ms   6.469us   1.617us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after
  quic_conn_app_io_cb           92205   3.735s    40.51us   390.9ms   4.239us <- qc_lstnr_pkt_rcv@src/xprt_quic.c:6184 tasklet_wakeup_after
  quic_conn_app_io_cb             648   12.62ms   19.47us   225.7ms   348.2us <- qc_process_timer@src/xprt_quic.c:4635 tasklet_wakeup
  quic_conn_app_io_cb              12   786.9us   65.58us   2.042ms   170.1us <- qc_process_timer@src/xprt_quic.c:4589 tasklet_wakeup
  quic_conn_app_io_cb               2   752.9us   376.4us   63.97us   31.99us <- qc_xprt_start@src/xprt_quic.c:7122 tasklet_wakeup
  quic_lstnr_dghdlr             94389   614.0ms   6.504us   30.54s    323.6us <- quic_lstnr_dgram_dispatch@src/quic_sock.c:255 tasklet_wakeup
  qc_process_timer                662   526.4us   795.0ns   1.081s    1.633ms <- wake_expired_tasks@src/task.c:344 task_wakeup
  quic_accept_run                   2   13.84us   6.920us   172.8us   86.42us <- quic_accept_push_qc@src/quic_sock.c:458 tasklet_wakeup
  other                        101892   4.626s    45.40us   14.84s    145.7us

It already becomes visible that some tasks have different very costs
depending where they're called (e.g. process_stream). The method used
to wake them up is also shown. Applets are handled specially and shown
as appctx_wakeup.
2022-09-08 16:21:22 +02:00
Willy Tarreau
41e701e2c1 DEBUG: quic: export the few task handlers that often appear in task dumps
The following task/tasklet handlers often appear in "show profiling tasks"
but were not resolved since static:

 qc_io_cb, quic_conn_app_io_cb, process_timer,
 quic_accept_run, qc_idle_timer_task

This commit simply exports them so they can be resolved now. "process_timer"
which was a bit too generic and renamed to qc_process_timer.
2022-09-08 16:13:38 +02:00
Willy Tarreau
0fbc16cfb9 DEBUG: resolvers: unstatify process_resolvers() to make it appear in profiling
The function appears like this in "show profiling tasks", so let's export
it:

  function       calls  cpu_tot  cpu_avg  lat_tot  lat_avg
  main+0x1463f0     92  77.28us  839.0ns  2.018ms  21.93us <- wake_expired_tasks@src/task.c:429 task_drop_running
2022-09-08 16:13:38 +02:00
Willy Tarreau
a3423873fe CLEANUP: activity: make the number of sched activity entries more configurable
This removes all the hard-coded 8-bit and 256 entries to use a pair of
macros instead so that we can more easily experiment with larger table
sizes if needed.
2022-09-08 14:55:09 +02:00
Willy Tarreau
a9a2384612 CLEANUP: sched: remove duplicate code in run_tasks_from_list()
Now that ->wake_date is common to tasks and tasklets, we don't need
anymore to carry a duplicate control block to read and update it for
tasks and tasklets. And given that this code was present early in the
if/else fork between tasks and tasklets, taking it out of the block
allows to move the task part into a more visible "else" branch that
also allows to factor the epilogue that resets th_ctx->current and
updates profile_entry->cpu_time, which also used to be duplicated.

Overall, doing just that saved 253 bytes in the function, or ~1/6,
which is not bad considering that it's on a hot path. And the code
got much ore readable.
2022-09-08 14:30:38 +02:00
Willy Tarreau
e0e6d81460 CLEANUP: task: move tid and wake_date into the common part
There used to be one tid for tasklets and a thread_mask for tasks. Since
2.7, both tasks and tasklets now use a tid (albeit with a very slight
semantic difference for the negative value), to in order to limit code
duplication and to ease debugging it makes sense to move tid into the
common part. One limitation is that it will leave a hole in the structure,
but we now have the wake_date that is always present and can move there as
well to plug the hole.

This results in something overall pretty clean (and cleaner than before),
with the low-level stuff (state,tid,process,context) appearing first, then
the caller stuff (caller,wake_date,calls,debug) next, and finally the
type-specific stuff (rq/wq/expire/nice).
2022-09-08 14:30:38 +02:00
Willy Tarreau
2830d282e5 DEBUG: task: simplify the caller recording in DEBUG_TASK
Instead of storing an index that's swapped at every call, let's use the
two pointers as a shifting history. Now we have a permanent "caller"
field that records the last caller, and an optional prev_caller in the
debug section enabled by DEBUG_TASK that keeps a copy of the previous
caller one. This way, not only it's much easier to follow what's
happening during debugging, but it saves 8 bytes in the struct task in
debug mode and still keeps it under 2 cache lines in nominal mode, and
this will finally be usable everywhere and later in profiling.

The caller_idx was also used as a hint that the entry was freed, in order
to detect wakeup-after-free. This was changed by setting caller to -1
instead and preserving its value in caller[1].

Finally, the operations were made atomic. That's not critical but since
it's used for debugging and race conditions represent a significant part
of the issues in multi-threaded mode, it seems wise to at least eliminate
some possible factors of faulty analysis.
2022-09-08 14:30:38 +02:00
Willy Tarreau
8d71abf0cd DEBUG: applet: instrument appctx_wakeup() to log the caller's location
appctx_wakeup() relies on task_wakeup(), but since it calls it from a
function, the calling place is always appctx_wakeup() itself, which is
not very useful.

Let's turn it to a macro so that we can log the location of the caller
instead. As an example, the cli_io_handler() which used to be seen as
this:

  (gdb) p *appctx->t.debug.caller[0]
  $10 = {
    func = 0x9ffb78 <__func__.37996> "appctx_wakeup",
    file = 0x9b336a "include/haproxy/applet.h",
    line = 110,
    what = 1 '\001',
    arg8 = 0 '\000',
    arg32 = 0
  }

Now shows the more useful:

  (gdb) p *appctx->t.debug.caller[0]
  $6 = {
    func = 0x9ffe80 <__func__.38641> "sc_app_chk_snd_applet",
    file = 0xa00320 "src/stconn.c",
    line = 996,
    what = 6 '\006',
    arg8 = 0 '\000',
    arg32 = 0
  }
2022-09-08 14:30:38 +02:00
Willy Tarreau
e08af9a0f4 DEBUG: task: use struct ha_caller instead of arrays of file:line
This reduces the task struct by 8 bytes, reduces the code size a little
bit by simplifying the calling convention (one argument dropped), and
as a bonus provides the function name in the caller.
2022-09-08 14:30:38 +02:00
Willy Tarreau
d2b2ad902b DEBUG: task: define a series of wakeup types for tasks and tasklets
The WAKEUP_* values will be used to report how a task/tasklet was woken
up, and task_wakeup_type_str() wlil report the associated function name.
2022-09-08 14:30:16 +02:00
Willy Tarreau
d96d214b4c CLEANUP: debug: use struct ha_caller for memstat
The memstats code currently defines its own file/function/line number,
type and extra pointer. We don't need to keep them separate and we can
easily replace them all with just a struct ha_caller. Note that the
extra pointer could be converted to a pool ID stored into arg8 or
arg32 and be dropped as well, but this would first require to define
IDs for pools (which we currently do not have).
2022-09-08 14:19:15 +02:00
Willy Tarreau
7f2f1f294c MINOR: debug: add struct ha_caller to describe a calling location
The purpose of this structure is to assemble all constant parts of a
generic calling point for a specific event. These ones are created by
the compiler as a static const element outside of the code path, so
they cost nothing in terms of CPU, and a pointer to that descriptor
can be passed to the place that needs it. This is very similar to what
is being done for the mem_stat stuff.

This will be useful to simplify and improve DEBUG_TASK.
2022-09-08 14:19:15 +02:00
Willy Tarreau
4c1bc01f31 CLEANUP: activity: make taskprof use ptr_hash()
There's no more point using a different hash function here, xxh64 is
of course better distributed but we really don't care so let's unify
the code.
2022-09-08 14:19:15 +02:00
Willy Tarreau
245d32fe8f CLEANUP: activity: make memprof use the generic ptr_hash() function
There's no need to keep a local version of that function anymore.
2022-09-08 14:19:15 +02:00
Willy Tarreau
4a3907617f MINOR: tools: add generic pointer hashing functions
There are a few places where it's convenient to hash a pointer to compute
a statistics bucket. Here we're basically reusing the hash that was used
by memory profiling with a minor update that the multiplier was corrected
to be prime and stand by its promise to have equal numbers of 1 and 0,
and that 32-bit platforms won't lose range anymore.

A two-pointer variant was also added.
2022-09-08 14:19:15 +02:00
Willy Tarreau
6a28a30efa MINOR: tasks: do not keep cpu and latency times in struct task
It was a mistake to put these two fields in the struct task. This
was added in 1.9 via commit 9efd7456e ("MEDIUM: tasks: collect per-task
CPU time and latency"). These fields are used solely by streams in
order to report the measurements via the lat_ns* and cpu_ns* sample
fetch functions when task profiling is enabled. For the rest of the
tasks, this is pure CPU waste when profiling is enabled, and memory
waste 100% of the time, as the point where these latencies and usages
are measured is in the profiling array.

Let's move the fields to the stream instead, and have process_stream()
retrieve the relevant info from the thread's context.

The struct task is now back to 120 bytes, i.e. almost two cache lines,
with 32 bit still available.
2022-09-08 14:19:15 +02:00
Willy Tarreau
beee600491 BUG/MINOR: stream/sched: take into account CPU profiling for the last call
When task profiling is enabled, the reported CPU time for short requests
and responses (e.g. redirect) is always zero in the logs, because
process_stream() is only called once and the CPU time is measured after
it returns. This is particuarly annoying when dealing with denies and in
general anything that deals with parasitic traffic because it can be
difficult to figure where the CPU is spent.

The solution taken in this patch consists in having process_stream()
update the cpu time itself before logging and quitting. It's very simple.
It will not take into account the time taken to produce the log nor
freeing the stream, but that's marginal compared to always logging zero.
The task's wake_date is also reset so that the scheduler doesn't have to
perform these operations again. This is dependent on the following patch:

   MINOR: sched: store the current profile entry in the thread context

It should be backported to 2.6 as it does help for troubleshooting.
2022-09-08 14:19:15 +02:00
Willy Tarreau
1efddfa6bf MINOR: sched: store the current profile entry in the thread context
The profile entry that corresponds to the current task/tasklet being
profiled is now stored into the thread's context. This will allow it
to be accessed from the tasks themselves. This is needed for an upcoming
fix.
2022-09-08 14:19:15 +02:00
Willy Tarreau
62b5b96bcc BUG/MINOR: sched: properly account for the CPU time of dying tasks
When task profiling is enabled, the scheduler can measure and report
the cumulated time spent in each task and their respective latencies. But
this was wrong for tasks with few wakeups as well as for self-waking ones,
because the call date needed to measure how long it takes to process the
task is retrieved in the task itself (->wake_date was turned to the call
date), and we could face two conditions:
  - a new wakeup while the task is executing would reset the ->wake_date
    field before returning and make abnormally low values being reported;
    that was likely the case for taskrun_applet for self-waking applets;

  - when the task dies, NULL is returned and the call date couldn't be
    retrieved, so that CPU time was not being accounted for. This was
    particularly visible with process_stream() which is usually called
    only twice per request, and whose time was systematically halved.

The cleanest solution here is to keep in mind that the scheduler already
uses quite a bit of local context in th_ctx, and place the intermediary
values there so that they cannot vanish. The wake_date has to be reset
immediately once read, and only its copy is used along the function. Note
that this must be done both for tasks and tasklet, and that until recently
tasklets were also able to report wrong values due to their sole dependency
on TH_FL_TASK_PROFILING between tests.

One nice benefit for future improvements is that such information will now
be available from the task without having to be stored into the task itself
anymore.

Since the tasklet part was computed on wrapping 32-bit arithmetics and
the task one was on 64-bit, the values were now consistently moved to
32-bit as it's already largely sufficient (4s spent in a task is more
than twice what the watchdog would tolerate). Some further cleanups might
be necessary, but the patch aimed at staying minimal.

Task profiling output after 1 million HTTP request previously looked like
this:

  Tasks activity:
    function                      calls   cpu_tot   cpu_avg   lat_tot   lat_avg
    h1_io_cb                    2012338   4.850s    2.410us   12.91s    6.417us
    process_stream              2000136   9.594s    4.796us   34.26s    17.13us
    sc_conn_io_cb               2000135   1.973s    986.0ns   30.24s    15.12us
    h1_timeout_task                 137      -         -      2.649ms   19.34us
    accept_queue_process             49   152.3us   3.107us   321.7yr   6.564yr
    main+0x146430                     7   5.250us   750.0ns   25.92us   3.702us
    srv_cleanup_idle_conns            1   559.0ns   559.0ns   918.0ns   918.0ns
    task_run_applet                   1      -         -      2.162us   2.162us

  Now it looks like this:
  Tasks activity:
    function                      calls   cpu_tot   cpu_avg   lat_tot   lat_avg
    h1_io_cb                    2014194   4.794s    2.380us   13.75s    6.826us
    process_stream              2000151   20.01s    10.00us   36.04s    18.02us
    sc_conn_io_cb               2000148   2.167s    1.083us   32.27s    16.13us
    h1_timeout_task                 198   54.24us   273.0ns   3.487ms   17.61us
    accept_queue_process             52   158.3us   3.044us   409.9us   7.882us
    main+0x1466e0                    18   16.77us   931.0ns   63.98us   3.554us
    srv_cleanup_toremove_conns        8   282.1us   35.26us   546.8us   68.35us
    srv_cleanup_idle_conns            3   149.2us   49.73us   8.131us   2.710us
    task_run_applet                   3   268.1us   89.38us   11.61us   3.871us

Note the two-fold difference on process_stream().

This feature is essentially used for debugging so it has extremely limited
impact. However it's used quite a bit more in bug reports and it would be
desirable that at least 2.6 gets this fix backported. It depends on at least
these two previous patches which will then also have to be backported:

     MINOR: task: permanently enable latency measurement on tasklets
     CLEANUP: task: rename ->call_date to ->wake_date
2022-09-08 14:19:15 +02:00
Willy Tarreau
04e50b3d32 CLEANUP: task: rename ->call_date to ->wake_date
This field is misnamed because its real and important content is the
date the task was woken up, not the date it was called. It temporarily
holds the call date during execution but this remains confusing. In
fact before the latency measurements were possible it was indeed a call
date. Thus is will now be called wake_date.

This change is necessary because a subsequent fix will require the
introduction of the real call date in the thread ctx.
2022-09-08 14:19:15 +02:00
Willy Tarreau
768c2c5678 MINOR: task: permanently enable latency measurement on tasklets
When tasklet latency measurement was enabled in 2.4 with commit b2285de04
("MINOR: tasks: also compute the tasklet latency when DEBUG_TASK is set"),
the feature was conditionned on DEBUG_TASK because the field would add 8
bytes to the struct tasklet.

This approach was not a very good idea because the struct ends on an int
anyway thus it does finish with a 32-bit hole regardless of the presence
of this field. What is true however is that adding it turned a 64-byte
struct to 72-byte when caller debugging is enabled.

This patch revisits this with a minor change. Now only the lowest 32
bits of the call date are stored, so they always fit in the remaining
hole, and this allows to remove the dependency on DEBUG_TASK. With
debugging off, we're now seeing a 48-byte struct, and with debugging
on it's exactly 64 bytes, thus still exactly one cache line. 32 bits
allow a latency of 4 seconds on a tasklet, which already indicates a
completely dead process, so there's no point storing the upper bits at
all. And even in the event it would happen once in a while, the lost
upper bits do not really add any value to the debug reports. Also, now
one tasklet wakeup every 4 billion will not be sampled due to the test
on the value itself. Similarly we just don't care, it's statistics and
the measurements are not 9-digit accurate anyway.
2022-09-08 14:19:15 +02:00
Willy Tarreau
0fae3a0360 BUG/MINOR: task: make task_instant_wakeup() work on a task not a tasklet
There's a subtle (harmless) bug in task_instant_wakeup(). As it uses
some tasklet code instead of some task code, the debug part also acts
on the tasklet equivalent, and the call_date is only set when DEBUG_TASK
is set instead of inconditionally like with tasks. As such, without this
debugging macro, call dates are not updated for tasks woken this way.

There isn't any impact yet because this function was introduced in 2.6 to
solve certain classes of issues and is not used yet, and in the worst case
it would only affect the reported latency time.

This may be backported to 2.6 in case a future fix would depend on it but
currently will not fix existing code.
2022-09-08 14:19:15 +02:00
Willy Tarreau
f27acd961e BUG/MINOR: task: always reset a new tasklet's call date
The tasklet's call date was not reset, so if profiling was enabled while
some tasklets were in the run queue, their initial random value could be
used to preload a bogus initial latency value into the task profiling bin.
Let's just zero the initial value.

This should be backported to 2.4 as it was brought with initial commit
b2285de04 ("MINOR: tasks: also compute the tasklet latency when DEBUG_TASK
is set"). The impact is very low though.
2022-09-08 14:19:15 +02:00
Frédéric Lécaille
3122c75fd1 BUG/MINOR: quic: Wrong connection ID to thread ID association
To work, quic_pin_cid_to_tid() must set cid[0] to a value with <target_id>
as <global.nbthread> modulo. For each integer n, (n - (n % m)) + d has always
d as modulo m (with d < m).

So, this statement seemed correct:

     cid[0] = cid[0] - (cid[0] % global.nbthread) + target_tid;

except when n wraps or when another modulo is applied to the addition result.
Here, for 8bit modulo arithmetic, if m does not divides 256, this cannot
works for values which wraps when we increment them by d.
For instance n=255 m=3 and d=1 the formula result is 0 (should be d).

To fix this, we first limit c[0] to 255 - <target_id> to prevent c[0] from wrapping.

Thank you to @esb for having reported this issue in GH #1855.

Must be backported to 2.6
2022-09-07 15:59:43 +02:00
Frédéric Lécaille
614742b79c MINOR: quic: No TRACE_LEAVE() in retrieve_qc_conn_from_cid()
This macro was confused with TRACE_ENTER().

Should be backported to 2.6.
2022-09-07 15:59:43 +02:00
Frédéric Lécaille
449804e27d MINOR: quic: Add traces about sent or resent TX frames
Very useful to help in debugging issues, especially during retransmissions.

Should be backported to 2.6
2022-09-07 15:59:29 +02:00
William Lallemand
70a6e637b4 MINOR: quic: add QUIC support when no client_hello_cb
Add QUIC support to the ssl_sock_switchctx_cbk() variant used only when
no client_hello_cb is available.

This could be used with libreSSL implementation of QUIC for example.
It also works with quictls when HAVE_SSL_CLIENT_HELLO_CB is removed from
openss-compat.h
2022-09-07 11:33:28 +02:00
William Lallemand
373ce73695 BUILD: quic: fix the #ifdef in ssl_quic_initial_ctx()
As done on with ssl_sock_initial_ctx(), cleanup the ifdef for the
client_hello_cb and the no anti replay.
2022-09-07 11:11:59 +02:00
William Lallemand
4b7938d160 BUILD: ssl: fix the ifdef mess in ssl_sock_initial_ctx
ssl_sock_initial_ctx uses the wrong #ifdef to check the availability of
the client_hello_cb.

Cleanup the #ifdef, add comments and indentation.
2022-09-07 10:54:17 +02:00
William Lallemand
e6ec626ac5 BUILD: quic: enable early data only with >= openssl 1.1.1
Disable the early data in the QUIC code when not built with openssl >=
1.1.1.

LibreSSL 3.6.0 is impacted.
2022-09-07 09:33:46 +02:00
William Lallemand
d2be9d4c48 BUILD: quic: temporarly ignore chacha20_poly1305 for libressl
LibreSSL does not implement EVP_chacha20_poly1305() with EVP_CIPHER but
uses the EVP_AEAD API instead:

https://man.openbsd.org/EVP_AEAD_CTX_init

This patch disables this cipher for libreSSL for now.
2022-09-07 09:33:46 +02:00
William Lallemand
844009d77a BUILD: ssl: fix ssl_sock_switchtx_cbk when no client_hello_cb
When building HAProxy with USE_QUIC and libressl 3.6.0, the
ssl_sock_switchtx_cbk symbol is not found because libressl does not
implement the client_hello_cb.

A ssl_sock_switchtx_cbk version for the servername callback is available
but wasn't exported correctly.
2022-09-07 09:33:46 +02:00
William Lallemand
6d74e179ee BUILD: quic: add some ifdef around the SSL_ERROR_* for libressl
SSL_ERROR_WANT_ASYNC, SSL_ERROR_WANT_ASYNC_JOB and
SSL_ERROR_WANT_CLIENT_HELLO_CB does not seems supported by libressl.
2022-09-07 09:33:46 +02:00