MINOR: listener: refine the default MAX_ACCEPT from 64 to 4

The maximum number of connections accepted at once by a thread for a single
listener used to default to 64 divided by the number of processes but the
tasklet-based model is much more scalable and benefits from smaller values.
Experimentation has shown that 4 gives the highest accept rate for all
thread values, and that 3 and 5 come very close, as shown below (HTTP/1
connections forwarded per second at multi-accept 4 and 64):

 ac\thr|    1     2    4     8     16
 ------+------------------------------
      4|   80k  106k  168k  270k  336k
     64|   63k   89k  145k  230k  274k

Some tests were also conducted on SSL and absolutely no change was observed.

The value was placed into a define because it used to be spread all over the
code.

It might be useful at some point to backport this to 2.3 and 2.2 to help
those who observed some performance regressions from 1.6.
This commit is contained in:
Willy Tarreau 2021-02-19 15:50:27 +01:00
parent 4327d0ac00
commit 66161326fd
5 changed files with 29 additions and 12 deletions

View File

@ -2403,14 +2403,15 @@ tune.lua.service-timeout <timeout>
tune.maxaccept <number>
Sets the maximum number of consecutive connections a process may accept in a
row before switching to other work. In single process mode, higher numbers
give better performance at high connection rates. However in multi-process
modes, keeping a bit of fairness between processes generally is better to
increase performance. This value applies individually to each listener, so
that the number of processes a listener is bound to is taken into account.
This value defaults to 64. In multi-process mode, it is divided by twice
the number of processes the listener is bound to. Setting this value to -1
completely disables the limitation. It should normally not be needed to tweak
this value.
used to give better performance at high connection rates, though this is not
the case anymore with the multi-queue. This value applies individually to
each listener, so that the number of processes a listener is bound to is
taken into account. This value defaults to 4 which showed best results. If a
significantly higher value was inherited from an ancient config, it might be
worth removing it as it will both increase performance and lower response
time. In multi-process mode, it is divided by twice the number of processes
the listener is bound to. Setting this value to -1 completely disables the
limitation. It should normally not be needed to tweak this value.
tune.maxpollevents <number>
Sets the maximum amount of events that can be processed at once in a call to

View File

@ -170,6 +170,22 @@
#define MAX_POLL_EVENTS 200
#endif
// The maximum number of connections accepted at once by a thread for a single
// listener. It used to default to 64 divided by the number of processes but
// the tasklet-based model is much more scalable and benefits from smaller
// values. Experimentation has shown that 4 gives the highest accept rate for
// all thread values, and that 3 and 5 come very close, as shown below (HTTP/1
// connections forwarded per second at multi-accept 4 and 64):
//
// ac\thr| 1 2 4 8 16
// ------+------------------------------
// 4| 80k 106k 168k 270k 336k
// 64| 63k 89k 145k 230k 274k
//
#ifndef MAX_ACCEPT
#define MAX_ACCEPT 4
#endif
// the max number of tasks to run at once. Tests have shown the following
// number of requests/s for 1 to 16 threads (1c1t, 1c2t, 2c4t, 4c8t, 4c16t):
//

View File

@ -3431,7 +3431,7 @@ out_uri_auth_compat:
if (curproxy->options & PR_O_TCP_NOLING)
listener->options |= LI_O_NOLINGER;
if (!listener->maxaccept)
listener->maxaccept = global.tune.maxaccept ? global.tune.maxaccept : 64;
listener->maxaccept = global.tune.maxaccept ? global.tune.maxaccept : MAX_ACCEPT;
/* we want to have an optimal behaviour on single process mode to
* maximize the work at once, but in multi-process we want to keep

View File

@ -131,7 +131,7 @@ struct task *accept_queue_process(struct task *t, void *context, unsigned short
/* if global.tune.maxaccept is -1, then max_accept is UINT_MAX. It
* is not really illimited, but it is probably enough.
*/
max_accept = global.tune.maxaccept ? global.tune.maxaccept : 64;
max_accept = global.tune.maxaccept ? global.tune.maxaccept : MAX_ACCEPT;
for (; max_accept; max_accept--) {
conn = accept_queue_pop_sc(ring);
if (!conn)

View File

@ -3923,7 +3923,7 @@ int cfg_parse_log_forward(const char *file, int linenum, char **args, int kwm)
}
}
list_for_each_entry(l, &bind_conf->listeners, by_bind) {
l->maxaccept = global.tune.maxaccept ? global.tune.maxaccept : 64;
l->maxaccept = global.tune.maxaccept ? global.tune.maxaccept : MAX_ACCEPT;
l->accept = session_accept_fd;
l->analysers |= cfg_log_forward->fe_req_ana;
l->default_target = cfg_log_forward->default_target;
@ -3991,7 +3991,7 @@ int cfg_parse_log_forward(const char *file, int linenum, char **args, int kwm)
}
list_for_each_entry(l, &bind_conf->listeners, by_bind) {
/* the fact that the sockets are of type dgram is guaranteed by str2receiver() */
l->maxaccept = global.tune.maxaccept ? global.tune.maxaccept : 64;
l->maxaccept = global.tune.maxaccept ? global.tune.maxaccept : MAX_ACCEPT;
l->rx.iocb = syslog_fd_handler;
global.maxsock++;
}