OPTIM: mux-h2: use tasklet_wakeup_after() in h2s_notify_recv()

This reduces the avg wakeup latency of sc_conn_io_cb() from 1900 to 51us.
The L2 cache misses from from 1.4 to 1.2 billion for 20k req. But the
perf is not better. Also there are situations where we must not perform
such wakeup, these may only be done from h2_io_cb, hence the test on the
next_tasklet pointer and its reset when leaving the function. In practice
all callers to h2s_close() or h2s_destroy() can reach that code, this
includes h2_detach, h2_snd_buf, h2_shut etc.

Another test with 40 concurrent connections, transferring 40k 1MB objects
at different concurrency levels from 1 to 80 also showed a 21% drop in L2
cache misses, and a 2% perf improvement:

Before:
   329,510,887,528  instructions
    50,907,966,181  branches
       843,515,912  branch-misses
     2,753,360,222  cache-misses
    19,306,172,474  L1-icache-load-misses
    17,321,132,742  L1-dcache-load-misses
       951,787,350  LLC-load-misses

      44.660469000 seconds user
      62.459354000 seconds sys

   => avg perf: 373 MB/s

After:
   331,310,219,157  instructions
    51,343,396,257  branches
       851,567,572  branch-misses
     2,183,369,149  cache-misses
    19,129,827,134  L1-icache-load-misses
    17,441,877,512  L1-dcache-load-misses
       906,923,115  LLC-load-misses

      42.795458000 seconds user
      62.277983000 seconds sys

   => avg perf: 380 MB/s

With small requests, it's the L1 and L3 cache misses which reduced by
3% and 7% respectively, and the performance went up by 3%.
This commit is contained in:
Willy Tarreau 2024-10-12 12:43:34 +02:00
parent 04ce6536e1
commit fcab647613
1 changed files with 12 additions and 2 deletions

View File

@ -96,6 +96,8 @@ struct h2c {
struct list blocked_list; /* list of streams blocked for other reasons (e.g. sfctl, dep) */
struct buffer_wait buf_wait; /* wait list for buffer allocations */
struct wait_event wait_event; /* To be used if we're waiting for I/Os */
struct list *next_tasklet; /* which applet to wake up next (NULL by default) */
};
@ -1218,6 +1220,7 @@ static int h2_init(struct connection *conn, struct proxy *prx, struct session *s
h2c->proxy = prx;
h2c->task = NULL;
h2c->wait_event.tasklet = NULL;
h2c->next_tasklet = NULL;
h2c->shared_rx_bufs = NULL;
h2c->idle_start = now_ms;
if (tick_isset(h2c->timeout)) {
@ -1491,7 +1494,11 @@ static void __maybe_unused h2s_notify_recv(struct h2s *h2s)
{
if (h2s->subs && h2s->subs->events & SUB_RETRY_RECV) {
TRACE_POINT(H2_EV_STRM_WAKE, h2s->h2c->conn, h2s);
tasklet_wakeup(h2s->subs->tasklet);
if (h2s->h2c->next_tasklet ||
(th_ctx->current && th_ctx->current->process == h2_io_cb))
h2s->h2c->next_tasklet = tasklet_wakeup_after(h2s->h2c->next_tasklet, h2s->subs->tasklet);
else
tasklet_wakeup(h2s->subs->tasklet);
h2s->subs->events &= ~SUB_RETRY_RECV;
if (!h2s->subs->events)
h2s->subs = NULL;
@ -4763,10 +4770,13 @@ struct task *h2_io_cb(struct task *t, void *ctx, unsigned int state)
/* If we were in an idle list, we want to add it back into it,
* unless h2_process() returned -1, which mean it has destroyed
* the connection (testing !ret is enough, if h2_process() wasn't
* called then ret will be 0 anyway.
* called then ret will be 0 anyway. Otherwise we reset the next
* tasklet to disable instant wakeups from external callers.
*/
if (ret < 0)
t = NULL;
else
h2c->next_tasklet = NULL;
if (!ret && conn_in_list) {
struct server *srv = objt_server(conn->target);