MINOR: fd: don't scan the full fdtab on all threads
During tests, it's pretty visible that with many threads and a large number of FDs, the process may take time to be ready. The reason for this is that the full fdtab array is scanned by each and every thread at boot in fd_reregister_all() in order to make each thread-local poller adopt the FDs that are relevant to it. The problem is that when dealing with 1-2M FDs and 64+ threads, it starts to represent quite a number of loops, and usually the fdtab array doesn't entirely fit in the CPU's L3 cache, causing extra memory accesses. It's particularly visible when issuing debugging commands to the CLI because usually the first one fails while the CPU is at 100% for half a second (which also is socat's timeout). A quick test with this: global stats socket /tmp/sock1 level admin mode 666 stats timeout 1h maxconn 2000000 And the following script started in another window: while ! time socat -t5 - /tmp/sock1 <<< "show version";do date -Ins;done shows that it takes 1.58s for the socat instance that succeeds on an Ampere Altra with 80 cores, this requires to change the timeout (defaults to half a second) otherwise it returns nothing. In addition it also means that during reloads, some CPU spikes will be noticed. Adding a prefetch of the current FD + 16 improves the startup time by 30% but that's far from being sufficient. In practice all of this is performed at boot time, a moment at which we know that extremely few FDs are registered (basically just the listeners), so FD numbers are usually very low and the rest of the table is scanned for no benefit. Ideally, knowing upfront how many FDs we have should be sufficient. A first approach would consist in counting the entries on a single thread before registering pollers. It's not necessarily efficient and would take time anyway. This patch takes a different approach. It consists in keeping a thread-local max ("fd_highest") that is updated whenever fd_insert() is called with a larger number. Of course this is not correct once all threads have started, but it will remain valid during boot since the same value is used during startup and is cloned for each thread, and no scheduling happens anywhere during this period, so that all threads are aware of the highest FD they've seen registered, even if it had been done in some init code, and this without having to deal with a shared variable. Here on the test platform, the script gets its response in 10ms vs 1580 before.
This commit is contained in:
parent
a5c5a68454
commit
75b335abc7
|
@ -48,6 +48,7 @@ extern struct polled_mask *polled_mask;
|
|||
|
||||
extern THREAD_LOCAL int *fd_updt; // FD updates list
|
||||
extern THREAD_LOCAL int fd_nbupdt; // number of updates in the list
|
||||
extern THREAD_LOCAL int fd_highest;// highest FD known by the current thread
|
||||
|
||||
extern int poller_wr_pipe[MAX_THREADS];
|
||||
|
||||
|
@ -466,6 +467,19 @@ static inline void fd_insert(int fd, void *owner, void (*iocb)(int fd), int tgid
|
|||
if ((global.tune.options & GTUNE_FD_ET) && iocb == sock_conn_iocb)
|
||||
newstate |= FD_ET_POSSIBLE;
|
||||
|
||||
/* We must update fd_highest to reflect the highest known FD for this
|
||||
* thread. It's important to note that it's not necessarily the highest
|
||||
* FD the thread will see, it's the highest FD that was inserted by
|
||||
* this thread or by the main thread. The purpose is essentially to
|
||||
* let all threads know the highest known FD at boot, that will be
|
||||
* cloned into each thread, in order to limit the work range for init
|
||||
* functions such as fork_poller() and fd_reregister_all(). Keeping the
|
||||
* value thread-local substantially limits the cost, since after a few
|
||||
* thousand calls the value will just stop changing.
|
||||
*/
|
||||
if (unlikely(fd > fd_highest))
|
||||
fd_highest = fd;
|
||||
|
||||
/* This must never happen and would definitely indicate a bug, in
|
||||
* addition to overwriting some unexpected memory areas.
|
||||
*/
|
||||
|
|
5
src/fd.c
5
src/fd.c
|
@ -112,6 +112,7 @@ volatile struct fdlist update_list[MAX_TGROUPS]; // Global update list
|
|||
|
||||
THREAD_LOCAL int *fd_updt = NULL; // FD updates list
|
||||
THREAD_LOCAL int fd_nbupdt = 0; // number of updates in the list
|
||||
THREAD_LOCAL int fd_highest = -1; // highest FD known by the current thread
|
||||
THREAD_LOCAL int poller_rd_pipe = -1; // Pipe to wake the thread
|
||||
int poller_wr_pipe[MAX_THREADS] __read_mostly; // Pipe to wake the threads
|
||||
|
||||
|
@ -836,7 +837,7 @@ void fd_reregister_all(int tgrp, ulong mask)
|
|||
{
|
||||
int fd;
|
||||
|
||||
for (fd = 0; fd < global.maxsock; fd++) {
|
||||
for (fd = 0; fd < fd_highest; fd++) {
|
||||
if (!fdtab[fd].owner)
|
||||
continue;
|
||||
|
||||
|
@ -1271,7 +1272,7 @@ int list_pollers(FILE *out)
|
|||
int fork_poller()
|
||||
{
|
||||
int fd;
|
||||
for (fd = 0; fd < global.maxsock; fd++) {
|
||||
for (fd = 0; fd < fd_highest; fd++) {
|
||||
if (fdtab[fd].owner) {
|
||||
HA_ATOMIC_OR(&fdtab[fd].state, FD_CLONED);
|
||||
}
|
||||
|
|
Loading…
Reference in New Issue