Commit Graph

11544 Commits

Author SHA1 Message Date
Tim Duesterhus
9576ab7640 MINOR: ist: Add struct ist istdup(const struct ist)
istdup() performs the equivalent of strdup() on a `struct ist`.
2020-03-05 19:53:12 +01:00
Tim Duesterhus
ed5263739b CLEANUP: Use isttest() and istfree()
This adjusts a few locations to make use of `isttest()` and `istfree()`.
2020-03-05 19:52:07 +01:00
Tim Duesterhus
35005d01d2 MINOR: ist: Add struct ist istalloc(size_t) and void istfree(struct ist*)
`istalloc` allocates memory and returns an `ist` with the size `0` that points
to this allocation.

`istfree` frees the pointed memory and clears the pointer.
2020-03-05 19:52:07 +01:00
Tim Duesterhus
e296d3e5f0 MINOR: ist: Add int isttest(const struct ist)
`isttest` returns whether the `.ptr` is non-null.
2020-03-05 19:52:07 +01:00
Tim Duesterhus
241e29ef9c MINOR: ist: Add IST_NULL macro
`IST_NULL` is equivalent to an `struct ist` with `.ptr = NULL` and
`.len = 0`.
2020-03-05 19:52:07 +01:00
Willy Tarreau
6cbe62b858 MINOR: debug: add CLI command "debug dev write" to write an arbitrary size
This command is used to produce an arbitrary amount of data on the
output. It can be used to test the CLI's state machine as well as
the internal parts related to applets an I/O. A typical test consists
in asking for all sizes from 0 to 16384:

  $ (echo "prompt;expert-mode on";for i in {0..16384}; do
     echo "debug dev write $i"; done) | socat - /tmp/sock1 | wc -c
  134258738

A better test would consist in first waiting for the response before
sending a new request.

This command is not restricted to the admin since it's harmless.
2020-03-05 17:20:15 +01:00
Willy Tarreau
d04a2a6654 BUG/MINOR: ssl-sock: do not return an uninitialized pointer in ckch_inst_sni_ctx_to_sni_filters
There's a build error reported here:
   c9c6cdbf9c/checks

It's just caused by an inconditional assignment of tmp_filter to
*sni_filter without having been initialized, though it's harmless because
this return pointer is not used when fcount is NULL, which is the only
case where this happens.

No backport is needed as this was brought today by commit 38df1c8006
("MINOR: ssl/cli: support crt-list filters").
2020-03-05 16:26:12 +01:00
Willy Tarreau
c9c6cdbf9c DOC: fix incorrect indentation of http_auth_*
These ones were incorrectly indented and thus not displayed optimally
in the HTML version. This addresses issue #533.
2020-03-05 16:05:05 +01:00
William Lallemand
cfca1422c7 MINOR: ssl: reach a ckch_store from a sni_ctx
It was only possible to go down from the ckch_store to the sni_ctx but
not to go up from the sni_ctx to the ckch_store.

To allow that, 2 pointers were added:

- a ckch_inst pointer in the struct sni_ctx
- a ckckh_store pointer in the struct ckch_inst
2020-03-05 11:28:42 +01:00
William Lallemand
38df1c8006 MINOR: ssl/cli: support crt-list filters
Generate a list of the previous filters when updating a certificate
which use filters in crt-list. Then pass this list to the function
generating the sni_ctx during the commit.

This feature allows the update of the crt-list certificates which uses
the filters with "set ssl cert".

This function could be probably replaced by creating a new
ckch_inst_new_load_store() function which take the previous sni_ctx list as
an argument instead of the char **sni_filter, avoiding the
allocation/copy during runtime for each filter. But since are still
handling the multi-cert bundles, it's better this way to avoid code
duplication.
2020-03-05 11:27:53 +01:00
Willy Tarreau
f4629a5346 BUG/MINOR: connection/debug: do not enforce !event_type on subscribe() anymore
When building with DEBUG_STRICT, there are still some BUG_ON(events&event_type)
in the subscribe() code which are not welcome anymore since we explicitly
permit to wake the caller up on readiness. This causes some regtests to fail
since 2c1f37d353 ("OPTIM: mux-h1: subscribe rather than waking up at a few
other places") when built with this option.

No backport is needed, this is 2.2-dev.
2020-03-05 07:46:33 +01:00
Tim Duesterhus
2825b4b0ca MINOR: stream: Use stream_generate_unique_id
This patch replaces the ad-hoc generation of stream's `unique_id` values
by calls to `stream_generate_unique_id`.
2020-03-05 07:23:00 +01:00
Tim Duesterhus
127a74dd48 MINOR: stream: Add stream_generate_unique_id function
Currently unique IDs for a stream are generated using repetitive code in
multiple locations, possibly allowing for inconsistent behavior.
2020-03-05 07:23:00 +01:00
Tim Duesterhus
5fcec84c58 REGTEST: Add unique-id reg-test
This reg-test verifies the following behavior:

1. That unique IDs are stable (i.e. the bug fixed in 530408f976)
2. That unique IDs can use values from the HTTP request (see https://www.mail-archive.com/haproxy@formilux.org/msg36436.html)
2020-03-05 07:23:00 +01:00
Willy Tarreau
2c1f37d353 OPTIM: mux-h1: subscribe rather than waking up at a few other places
This is another round of conversion from a blind tasklet_wakeup() to
a more careful subscribe(). It has significantly improved the number
of function calls per HTTP request (/?s=1k/t=20) :
                        before    after
 tasklet_wakeup:          3         2
 conn_subscribe:          3         2
 h1_iocb:                 3         2
 h1_process:              3         2
 h1_parse_msg_hdrs:       4         3
 h1_rcv_buf:              5         3
 h1_send:                 5         4
 h1_subscribe:            2         1
 h1_wake_stream_for_send: 5         4
 http_wait_for_request:   2         1
 process_stream:          3         2
 si_cs_io_cb:             4         2
 si_cs_process:           3         1
 si_cs_rcv:               5         3
 si_sync_send:            2         1
 si_update_both:          2         1
 stream_int_chk_rcv_conn: 3         2
 stream_int_notify:       3         1
 stream_release_buffers:  9         4
2020-03-04 19:29:12 +01:00
Willy Tarreau
6f95f6e111 OPTIM: connection: disable receiving on disabled events when the run queue is too high
In order to save a lot on syscalls, we currently don't disable receiving
on a file descriptor anymore if its handler was already woken up. But if
the run queue is huge and the poller collects a lot of events, this causes
excess wakeups which take CPU time which is not used to flush these tasklets.
This patch simply considers the run queue size to decide whether or not to
stop receiving. Tests show that by stopping receiving when the run queue
reaches ~16 times its configured size, we can still hold maximal performance
in extreme situations like maxpollevents=20k for runqueue_depth=2, and still
totally avoid calling epoll_event under moderate load using default settings
on keep-alive connections.
2020-03-04 19:29:12 +01:00
Willy Tarreau
8de5c4fa15 MEDIUM: connection: only call ->wake() for connect() without I/O
We used to call ->wake() for any I/O event for which there was no
subscriber. But this is a problem because this causes massive wake()
storms since we disabled fd_stop_recv() to save syscalls.

The only reason for the io_available condition is to detect that an
asynchronous connect() just finished and will not be handled by any
registered event handler. Since we now properly handle synchronous
connects, we can detect this situation by the fact that we had a
success on conn_fd_check() and no requested I/O took over.
2020-03-04 19:29:12 +01:00
Willy Tarreau
4c69cff438 MINOR: tcp/uxst/sockpair: only ask for I/O when really waiting for a connect()
Now that the stream-interface properly handles synchonous connects, there
is no more reason for subscribing and doing nothing.
2020-03-04 19:29:12 +01:00
Willy Tarreau
ada4c5806b MEDIUM: stream-int: make sure to try to immediately validate the connection
In the rare case of immediate connect() (unix sockets, socket pairs, and
occasionally TCP over the loopback), it is counter-productive to subscribe
for sending and then getting immediately back to process_stream() after
having passed through si_cs_process() just to update the connection. We
already know it is established and it doesn't have any handshake anymore
so we just have to complete it and return to process_stream() with the
stream_interface in the SI_ST_RDY state. In this case, process_stream will
simply loop back to the beginning to synchronize the state and turn it to
SI_ST_EST/ASS/CLO/TAR etc.

This will save us from having to needlessly subscribe in the connect()
code, something which in addition cannot work with edge-triggered pollers.
2020-03-04 19:29:12 +01:00
Willy Tarreau
667fefdc90 BUG/MEDIUM: connection: stop polling for sending when the event is ready
With commit 065a025610 ("MEDIUM: connection: don't stop receiving events
in the FD handler") we disabled a number of fd_stop_* in conn_fd_handler(),
in order to wait for their respective handlers to deal with them. But it
is not correct to do that for the send direction, as we may very well
have nothing to send. This is visible when connecting in TCP mode to
a server with no data to send, there's nobody anymore to disable the
polling for the send direction.

And it is logical, on the recv() path we know the system has data to
deliver and that some code will be in charge of it. On the send
direction we simply don't know if it was the result of a successful
connect() or if there is still something to send. In any case we
almost never fill the network buffer on a single send() after being
woken up by the system, so disabling the FD immediately or much later
will not change the number of operations.

No backport is needed, this is 2.2-dev.
2020-03-04 19:29:12 +01:00
Willy Tarreau
6f6d96de19 BUILD: makefile: do not modify the build options during make reg-tests
I'm quite fed up with having to rebuild everything from scratch after each
and every "make reg-tests", especially during bisects. The only reason for
this is that there are no build options passed to make for reg-tests, which
modifies the .build_opts file again, resulting in a change upon next build.
Let's just keep this file out of the dependency check for make reg-tests.
2020-03-04 19:29:12 +01:00
Miroslav Zagorac
86e106e1fc CLEANUP: contrib/spoa_example: Fix several typos
This patch can be backported as far as 1.8.
2020-03-04 15:30:00 +01:00
Willy Tarreau
109201fc5c BUILD: tools: rely on __ELF__ not USE_DL to enable use of dladdr()
The approach was wrong. USE_DL is for the makefile to know if it's required
to append -ldl at link time. Some platforms do not need it (and in fact do
not have it) yet they have a working dladdr(). The real condition is related
to ELF. Given that due to Lua, all platforms that require -ldl already have
USE_DL set, let's replace USE_DL with __ELF__ here and consider that dladdr
is always needed on ELF, which basically is already the case.
2020-03-04 12:04:07 +01:00
Willy Tarreau
9133e48f2a BUILD: tools: unbreak resolve_sym_name() on non-GNU platforms
resolve_sym_name() doesn't build when USE_DL is set on non-GNU platforms
because "Elf(W)" isn't defined. Since it's only used for dladdr1(), let's
refactor all this so that we can completely ifdef out that part on other
platforms. Now we have a separate function to perform the call depending
on the platform and it only returns the size when available.
2020-03-04 12:04:07 +01:00
Willy Tarreau
a91b7946bd MINOR: debug: dump the whole trace if we can't spot the starting point
Instead of special-casing the use of the symbol resolving to decide
whether to dump a partial or complete trace, let's simply start over
and dump everything when we reach the end after having found nothing.
It will be more robust against dirty traces as well.
2020-03-04 12:04:07 +01:00
Willy Tarreau
899e5f69a1 MINOR: debug: use our own backtrace function on clang+x86_64
A test on FreeBSD with clang 4 to 8 produces this on a call to a
spinning loop on the CLI:

  call trace(5):
  |       0x53e2bc [eb 16 48 63 c3 48 c1 e0]: wdt_handler+0x10c
  |    0x800e02cfe [e8 5d 83 00 00 8b 18 8b]: libthr:pthread_sigmask+0x53e

with our own function it correctly produces this:

  call trace(20):
  |       0x53e2dc [eb 16 48 63 c3 48 c1 e0]: wdt_handler+0x10c
  |    0x800e02cfe [e8 5d 83 00 00 8b 18 8b]: libthr:pthread_sigmask+0x53e
  |    0x800e022bf [48 83 c4 38 5b 41 5c 41]: libthr:pthread_getspecific+0xdef
  | 0x7ffffffff003 [48 8d 7c 24 10 6a 00 48]: main+0x7fffffb416f3
  |    0x801373809 [85 c0 0f 84 6f ff ff ff]: libc:__sys_gettimeofday+0x199
  |    0x801373709 [89 c3 85 c0 75 a6 48 8b]: libc:__sys_gettimeofday+0x99
  |    0x801371c62 [83 f8 4e 75 0f 48 89 df]: libc:gettimeofday+0x12
  |       0x51fa0a [48 89 df 4c 89 f6 e8 6b]: ha_thread_dump_all_to_trash+0x49a
  |       0x4b723b [85 c0 75 09 49 8b 04 24]: mworker_cli_sockpair_new+0xd9b
  |       0x4b6c68 [85 c0 75 08 4c 89 ef e8]: mworker_cli_sockpair_new+0x7c8
  |       0x532f81 [4c 89 e7 48 83 ef 80 41]: task_run_applet+0xe1

So let's add clang+x86_64 to the list of platforms that will use our
simplified version. As a bonus it will not require to link with
-lexecinfo on FreeBSD and will work out of the box when passing
USE_BACKTRACE=1.
2020-03-04 12:04:07 +01:00
Willy Tarreau
13faf16e1e MINOR: debug: improve backtrace() on aarch64 and possibly other systems
It happens that on aarch64 backtrace() only returns one entry (tested
with gcc 4.7.4, 5.5.0 and 7.4.1). Probably that it refrains from unwinding
the stack due to the risk of hitting a bad pointer. Here we can use
may_access() to know when it's safe, so we can actually unwind the stack
without taking risks. It happens that the faulting function (the one
just after the signal handler) is not listed here, very likely because
the signal handler uses a special stack and did not create a new frame.

So this patch creates a new my_backtrace() function in standard.h that
either calls backtrace() or does its own unrolling. The choice depends
on HA_HAVE_WORKING_BACKTRACE which is set in compat.h based on the build
target.
2020-03-04 12:04:07 +01:00
Willy Tarreau
cdd8074433 MINOR: debug: report the number of entries in the backtrace
It's useful to get an indication of unresolved stuff or memory
corruption to have the apparent depth of the stack trace in the
output, especially if we dump nothing.
2020-03-04 12:02:27 +01:00
Willy Tarreau
e58114e0e5 MINOR: wdt: do not depend on USE_THREAD
There is no reason for restricting the use of the watchdog to threads
anymore, as it works perfectly without threads as well.
2020-03-04 12:02:27 +01:00
Willy Tarreau
d6f1966543 MEDIUM: wdt: fall back to CLOCK_REALTIME if CLOCK_THREAD_CPUTIME is not available
At least FreeBSD has a fully functional CLOCK_THREAD_CPUTIME but it
cannot create a timer on it. This is not a problem since our timer is
only used to measure each thread's usage using now_cpu_time_thread().
So by just replacing this clock with CLOCK_REALTIME we allow such
platforms to periodically call the wdt and check the thread's CPU usage.
The consequence is that even on a totally idle system there will still
be a few extra periodic wakeups, but the watchdog becomes usable there
as well.
2020-03-04 12:02:27 +01:00
Willy Tarreau
c0bbdc196d BUILD: Makefile: include librt before libpthread
Statically building on for i386/x86_64 on linux+glibc 2.18 fails in rt with
undefined references to pthread_attr_init and a few others. Let's just swap
the two libs in order to fix this.
2020-03-04 12:02:27 +01:00
Willy Tarreau
7259fa2b89 BUG/MINOR: wdt: do not return an error when the watchdog couldn't be enabled
On operating systems not supporting to create a timer on
POSIX_THREAD_CPUTIME we emit a warning but we return an error so the
process fails to start, which is absurd. Let's return a success once
the warning is emitted instead.

This may be backported to 2.1 and 2.0.
2020-03-04 12:02:27 +01:00
Emmanuel Hocdet
842e94ee06 MINOR: ssl: add "ca-verify-file" directive
It's only available for bind line. "ca-verify-file" allows to separate
CA certificates from "ca-file". CA names sent in server hello message is
only compute from "ca-file". Typically, "ca-file" must be defined with
intermediate certificates and "ca-verify-file" with certificates to
ending the chain, like root CA.

Fix issue #404.
2020-03-04 11:53:11 +01:00
Willy Tarreau
0214b45a61 MINOR: debug: call backtrace() once upon startup
Calling backtrace() will access libgcc at runtime. We don't want to do
it after the chroot, so let's perform a first call to have it ready in
memory for later use.
2020-03-04 06:01:40 +01:00
Willy Tarreau
f5b4e064dc MEDIUM: debug: add support for dumping backtraces of stuck threads
When a panic() occurs due to a stuck thread, we'll try to dump a
backtrace of this thread if the config directive USE_BACKTRACE is
set (which is the case on linux+glibc). For this we use the
backtrace() call provided by glibc and iterate the pointers through
resolve_sym_name(). In order to minimize the output (which is limited
to one buffer), we only do this for stuck threads, and we start the
dump above ha_panic()/ha_thread_dump_all_to_trash(), and stop when
meeting known points such as main/run_tasks_from_list/run_poll_loop.

If enabled without USE_DL, the dump will be complete with no details
except that pointers will all be given relative to main, which is
still better than nothing.

The new USE_BACKTRACE config option is enabled by default on glibc since
it has been present for ages. When it is set, the export-dynamic linker
option is enabled so that all non-static symbols are properly resolved.
2020-03-03 18:40:03 +01:00
Willy Tarreau
cf12f2ee66 MINOR: cli: make "show fd" rely on resolve_sym_name()
This way we can drop all hard-coded iocb matching.
2020-03-03 18:19:04 +01:00
Willy Tarreau
2e89b0930b MINOR: debug: use resolve_sym_name() to dump task handlers
Now in "show threads", the task/tasklet handler will be resolved
using this function, which will provide more detailed results and
will still support offsets to main for unresolved symbols.
2020-03-03 18:19:04 +01:00
Willy Tarreau
eb8b1ca3eb MINOR: tools: add resolve_sym_name() to resolve function pointers
We use various hacks at a few places to try to identify known function
pointers in debugging outputs (show threads & show fd). Let's centralize
this into a new function dedicated to this. It already knows about the
functions matched by "show threads" and "show fd", and when built with
USE_DL, it can rely on dladdr1() to resolve other functions. There are
some limitations, as static functions are not resolved, linking with
-rdynamic is mandatory, and even then some functions will not necessarily
appear. It's possible to do a better job by rebuilding the whole symbol
table from the ELF headers in memory but it's less portable and the gains
are still limited, so this solution remains a reasonable tradeoff.
2020-03-03 18:18:40 +01:00
Willy Tarreau
762fb3ec8e MINOR: tools: add new function dump_addr_and_bytes()
This function dumps <n> bytes from <addr> in hex form into buffer <buf>
enclosed in brackets after the address itself, formatted on 14 chars
including the "0x" prefix. This is meant to be used as a prefix for code
areas. For example: "0x7f10b6557690 [48 c7 c0 0f 00 00 00 0f]: "
It relies on may_access() to know if the bytes are dumpable, otherwise "--"
is emitted. An optional prefix is supported.
2020-03-03 17:46:37 +01:00
Willy Tarreau
55a6c4f34d BUILD: tools: remove obsolete and conflicting trace() from standard.c
Since commit 4c2ae48375 ("MINOR: trace: implement a very basic trace()
function") merged in 2.1, trace() is an inline function. It must not
appear in standard.c anymore and may break build depending on includes.

This can be backported to 2.1.
2020-03-03 17:46:37 +01:00
Willy Tarreau
27d00c0167 MINOR: task: export run_tasks_from_list
This will help refine debug traces.
2020-03-03 15:26:10 +01:00
Willy Tarreau
3ebd55ee51 MINOR: haproxy: export run_poll_loop
This will help refine debug traces.
2020-03-03 15:26:10 +01:00
Willy Tarreau
1827845a3d MINOR: haproxy: export main to ease access from debugger
Better just export main instead of declaring it as extern, it's cleaner
and may be usable elsewhere.
2020-03-03 15:26:10 +01:00
Willy Tarreau
82aafc4a0f BUG/MEDIUM: debug: make the debug_handler check for the thread in threads_to_dump
It happens that just sending the debug signal to the process makes on
thread wait for its turn while nobody wants to dump. We need to at
least verify that a dump was really requested for this thread.

This can be backported to 2.1 and 2.0.
2020-03-03 08:31:34 +01:00
Willy Tarreau
516853f1cc MINOR: debug: report the task handler's pointer relative to main
Often in crash dumps we see unknown function pointers. Let's display
them relative to main, that helps quite a lot figure the function
from an executable, for example:

  (gdb) x/a main+645360
  0x4c56a0 <h1_timeout_task>:     0x2e6666666666feeb

This could be backported to 2.0.
2020-03-03 07:04:42 +01:00
Willy Tarreau
7d9421deca MINOR: tools: make sure to correctly check the returned 'ms' in date2std_log
In commit 4eee38a ("BUILD/MINOR: tools: fix build warning in the date
conversion functions") we added some return checks to shut build
warnings but the last test is useless since the tested pointer is not
updated by the last call to utoa_pad() used to convert the milliseconds.
It turns out the original code from 2012 already skipped this part,
probably in order to avoid the risk of seeing a millisecond field not
belonging to the 0-999 range. Better keep the check and put the code
into stricter shape.

No backport is needed. This fixes issue #526.
2020-02-29 09:08:02 +01:00
Willy Tarreau
77e463f729 BUG/MINOR: arg: don't reject missing optional args
Commit 80b53ffb1c ("MEDIUM: arg: make make_arg_list() stop after its
own arguments") changed the way we detect the empty list because we
cannot stop by looking up the closing parenthesis anymore, thus for
the first missing arg we have to enter the parsing loop again. And
there, finding an empty arg means we go to the empty_err label, where
it was not initially planned to handle this condition. This results
in %[date()] to fail while %[date] works. Let's simply check if we've
reached the minimally supported args there (it used to be done during
the function entry).

Thanks to Jérôme for reporting this issue. No backport is needed,
this is 2.2-dev2+ only.
2020-02-28 16:41:29 +01:00
Willy Tarreau
493d9dc6ba MEDIUM: mux-h1: do not blindly wake up the tasklet at end of request anymore
Since commit "MEDIUM: connection: make the subscribe() call able to wakeup
if ready" we have the guarantee that the tasklet will be woken up if
subscribing to a connection for an even that's ready. Since we have too
many tasklet_wakeup() calls in mux-h1, let's now use this property to
improve the situation a bit.

With this change, no syscall count changed, however the number of useless
calls to some functions significantly went down. Here are the differences
for the test below (100k req), in number of calls per request :

  $ ./h1load -n 100000 -t 4 -c 1000 -T 20 -F 127.0.0.1:8001/?s=1k/t=20

                           before   after  change   note
  tasklet_wakeup:           3        1      -66%
  h1_io_cb:                 4        3      -25%
  h1_send:                  6.7      5.4    -19%
  h1_wake:                  0.73     0.44   -39%
  h1_process:               4.7      3.4    -27%
  h1_wake_stream_for_send:  6.7      5.5    -18%
  si_cs_process             3.7      3.4     -7.8%
  conn_fd_handler           2.7      2.4    -10%
  raw_sock_to_buf:          4        2      -50%
  pool_free:                4        2      -50%    from failed rx calls

Note that the situation could be further improved by having muxes lazily
subscribe to Rx events in case the FD is already being polled. However
this requires deeper changes to implement a LAZY_RECV subscribe mode,
and to replace the FD's active bit by 3 states representing the desired
action to perform on the FD during the update, among NONE (no need to
change), POLL (can't proceed without), and STOP (buffer full). This
would only impact Rx since on Tx we know what we have to send. The
savings to expect from this might be more visible with splicing and/or
when dealing with many connections having long think times.
2020-02-28 16:17:09 +01:00
Willy Tarreau
065a025610 MEDIUM: connection: don't stop receiving events in the FD handler
The remaining epoll_ctl() calls are exclusively caused by the disagreement
between conn_fd_handler() and the mux receiving the data: the fd handler
wants to stop after having woken up the tasklet, then the mux after
receiving data wants to receive again. Given that they don't happen in
the same poll loop when there are many FDs, this causes a lot of state
changes.

As suggested by Olivier, if the task is already scheduled for running,
we don't need to disable the event because it's in the run queue, poll()
cannot stop, and reporting it again will be harmless. What *might*
happen however is that a sampling-based poller like epoll() would report
many times the same event and has trouble getting others behind. But if
it would happen, it would still indicate the run queue has plenty of
pending operations, so it would in fact only displace the problem from
the poller to the run queue, which doesn't seem to be worse (and in
fact we do support priorities while the poller does not).

By doing this change, the keep-alive test with 1k conns and 100k reqs
completely gets rid of the per-request epoll_ctl changes, while still
not causing extra recvfrom() :

  $ ./h1load -n 100000 -t 4 -c 1000 -T 20 -F 127.0.0.1:8001/?s=1k/t=20

  200000 sendto 1
  200000 recvfrom 1
   10762 epoll_wait 1
    3664 epoll_ctl 1
    1999 recvfrom -1

In close mode, it didn't change anything, we're still in the optimal
case (2 epoll per connection) :

  $ ./h1load -n 100000 -r 1 -t 4 -c 1000 -T 20 -F 127.0.0.1:8001/?s=1k/t=20

  203764 epoll_ctl 1
  200000 sendto 1
  200000 recvfrom 1
    6091 epoll_wait 1
    2994 recvfrom -1
2020-02-28 16:17:09 +01:00
Willy Tarreau
7e59c0a5e1 MEDIUM: connection: make the subscribe() call able to wakeup if ready
There's currently an internal API limitation at the connection layer
regarding conn_subscribe(). We must not subscribe if we haven't yet
met EAGAIN or such a condition, so we sometimes force ourselves to read
in order to meet this condition and being allowed to call subscribe.
But reading cannot always be done (e.g. at the end of a loop where we
cannot afford to retrieve new data and start again) so we instead
perform a tasklet_wakeup() of the requester's io_cb. This is what is
done in mux_h1 for example. The problem with this is that it forces
a new receive when we're not necessarily certain we need one. And if
the FD is not ready and was already being polled, it's a useless
wakeup.

The current patch improves the connection-level subscribe() so that
it really manipulates the polling if the FD is marked not-ready, but
instead schedules the argument tasklet for a wakeup if the FD is
ready. This guarantees we'll wake this tasklet up in any case once the
FD is ready, either immediately or after polling.

By doing so, a test on pure close mode shows we cut in half the number
of epoll_ctl() calls and almost eliminate failed recvfrom():

  $ ./h1load -n 100000 -r 1 -t 4 -c 1000 -T 20 -F 127.0.0.1:8001/?s=1k/t=20

  before:
   399464 epoll_ctl 1
   200007 recvfrom 1
   200000 sendto 1
   100000 recvfrom -1
     7508 epoll_wait 1

  after:
   205739 epoll_ctl 1
   200000 sendto 1
   200000 recvfrom 1
     6084 epoll_wait 1
     2651 recvfrom -1

On keep-alive there is no change however.
2020-02-28 16:17:09 +01:00