This registers a mapping of threads to groups by enumerating for each thread
what group it belongs to, and marking the group as assigned. It takes care of
checking for redefinitions, overlaps, and holes. It supports both individual
numbers and ranges. The thread group is referenced from the thread config.
This creates a struct tgroup_info which knows the thread ID of the first
thread in a group, and the number of threads in it. For now there's only
one thread group supported in the configuration, but it may be forced to
other values for development purposes by defining MAX_TGROUPS, and it's
enabled even when threads are disabled and will need to remain accessible
during boot to keep a simple enough internal API.
For the purpose of easing the configurations which do not specify a thread
group, we're starting group numbering at 1 so that thread group 0 can be
"undefined" (i.e. for "bind" lines or when binding tasks).
The goal will be to later move there some global items that must be
made per-group.
We want to make sure that the current thread_info accessed via "ti" will
remain constant, so that we don't accidentally place new variable parts
there and so that the compiler knows that info retrieved from there is
not expected to have changed between two function calls.
Only a few init locations had to be adjusted to use the array and the
rest is unaffected.
The last 3 fields were 3 list heads that are per-thread, and which are:
- the pool's LRU head
- the buffer_wq
- the streams list head
Moving them into thread_ctx completes the removal of dynamic elements
from the struct thread_info. Now all these dynamic elements are packed
together at a single place for a thread.
The TI_FL_STUCK flag is manipulated by the watchdog and scheduler
and describes the apparent life/death of a thread so it changes
all the time and it makes sense to move it to the thread's context
for an active thread.
The "thread_info" name was initially chosen to store all info about
threads but since we now have a separate per-thread context, there is
no point keeping some of its elements in the thread_info struct.
As such, this patch moves prev_cpu_time, prev_mono_time and idle_pct to
thread_ctx, into the thread context, with the scheduler parts. Instead
of accessing them via "ti->" we now access them via "th_ctx->", which
makes more sense as they're totally dynamic, and will be required for
future evolutions. There's no room problem for now, the structure still
has 84 bytes available at the end.
The scheduler contains a lot of stuff that is thread-local and not
exclusively tied to the scheduler. Other parts (namely thread_info)
contain similar thread-local context that ought to be merged with
it but that is even less related to the scheduler. However moving
more data into this structure isn't possible since task.h is high
level and cannot be included everywhere (e.g. activity) without
causing include loops.
In the end, it appears that the task_per_thread represents most of
the per-thread context defined with generic types and should simply
move to tinfo.h so that everyone can use them.
The struct was renamed to thread_ctx and the variable "sched" was
renamed to "th_ctx". "sched" used to be initialized manually from
run_thread_poll_loop(), now it's initialized by ha_set_tid() just
like ti, tid, tid_bit.
The memset() in init_task() was removed in favor of a bss initialization
of the array, so that other subsystems can put their stuff in this array.
Since the tasklet array has TL_CLASSES elements, the TL_* definitions
was moved there as well, but it's not a problem.
The vast majority of the change in this patch is caused by the
renaming of the structures.
We used to remap SI_TKILL to SI_LWP when SI_TKILL was not available
(e.g. FreeBSD) but that's ugly and since we need this only in a single
switch/case block in wdt.c it's even simpler and cleaner to perform the
two tests there, so let's do this.
The watchdog timer had no more reason for being shared with the struct
thread_info since the watchdog is the only user now. Let's remove it
from the struct and move it to a static array in wdt.c. This removes
some ifdefs and the need for the ugly mapping to empty_t that might be
subject to a cast to a long when compared to TIMER_INVALID. Now timer_t
is not known outside of wdt.c and clock.c anymore.
This removes the knowledge of clockid_t from anywhere but clock.c, thus
eliminating a source of includes burden. The unused clock_id field was
removed from thread_info, and the definition setting of clockid_t was
removed from compat.h. The most visible change is that the function
now_cpu_time_thread() now takes the thread number instead of a tinfo
pointer.
The code that deals with timer creation for the WDT was moved to clock.c
and is called with the few relevant arguments. This removes the need for
awareness of clock_id from wdt.c and as such saves us from having to
share it outside. The timer_t is also known only from both ends but not
from the public API so that we don't have to create a fake timer_t
anymore on systems which do not support it (e.g. macos).
This was previously open-coded in run_thread_poll_loop(). Now that
we have clock.c dedicated to such stuff, let's move the code there
so that we don't need to keep such ifdefs nor to depend on the
clock_id.
Instead of fiddling with before_poll and after_poll in
activity_count_runtime(), the function is now called by
clock_entering_poll() which passes it the number of microseconds
spent working. This allows to remove all calls to
activity_count_runtime() from the pollers.
The entering_poll/leaving_poll/measure_idle functions that were hard
to classify and used to move to various locations have now been placed
into clock.c since it's precisely about time-keeping. The functions
were renamed to clock_*. The samp_time and idle_time values are now
static since there is no reason for them to be read from outside.
There is currently a problem related to time keeping. We're mixing
the functions to perform calculations with the os-dependent code
needed to retrieve and adjust the local time.
This patch extracts from time.{c,h} the parts that are solely dedicated
to time keeping. These are the "now" or "before_poll" variables for
example, as well as the various now_*() functions that make use of
gettimeofday() and clock_gettime() to retrieve the current time.
The "tv_*" functions moved there were also more appropriately renamed
to "clock_*".
Other parts used to compute stolen time are in other files, they will
have to be picked next.
It was brough by an unneeded addition of a local variable after a test
in commit f7f53afcf ("BUILD/MEDIUM: tcp: set-mark setting support for
FreeBSD."). No backport needed.
Remove the quic_conn from the receiver connection_ids tree on
quic_conn_free. This fixes a crash due to dangling references in the
tree after a quic connection release.
This operation must be conducted under the listener lock. For this
reason, the quic_conn now contains a reference to its attached listener.
Following include reorganzation, there is some missing include files for
task.h when compiling with DEBUG_TASK :
- activity.h for task_profiling_mask
- time.h for now_mono_time()
This is present since the following commit
d8b325c748
REORG: task: uninline the loop time measurement code
No need to backport this.
These ones are rarely used or only to waste CPU cycles waiting, and are
the last ones requiring system includes in thread.h. Let's uninline them
and move them to thread.c.
This removes the thread identifiers from struct thread_info and moves
them only in static array in thread.c since it's now the only file that
needs to touch it. It's also the only file that needs to include
pthread.h, beyond haproxy.c which needs it to start the poll loop. As
a result, much less system includes are needed and the LoC reduced by
around 3%.
haproxy.c still has to deal with pthread-specific low-level stuff that
is OS-dependent. We should not have to deal with this there, and we do
not need to access pthread anywhere else.
Let's move these 3 functions to thread.c and keep empty inline ones for
when threads are disabled.
It's not needed to inline it at all (one call per loop) and it introduces
dependencies, let's move it to fd.c.
Removing the few remaining includes that came with it further reduced
by ~0.2% the LoC and the build time is now below 6s.
TV_ETERNITY, TV_ETERNITY_MS and MAX_DELAY_MS may be configured and
ought to be in defaults.h so that they can be inherited from everywhere
without including time.h and could also be redefined if neede
(particularly for MAX_DELAY_MS).
It's pointless to inline this, it's called exactly once per poll loop,
and it depends on time.h which is quite deep. Let's move that to task.c
along with sched_report_idle().
The remaining large functions are those allocating/initializing and
occasionally freeing connections, conn_streams and sockaddr. Let's
move them to connection.c. In fact, cs_free() is the only one-liner
but let's move it along with the other ones since a call will be
small compared to the rest of the work done there.
The following inlined functions are particularly large (and probably not
inlined at all by the compiler), and together represent roughly half of
the file, while they're used at most once per connection. They were moved
to connection.c.
conn_upgrade_mux_fe, conn_install_mux_fe, conn_install_mux_be,
conn_install_mux_chk, conn_delete_from_tree, conn_init, conn_new,
conn_free
No need to include the full tree management code, type files only
need the definitions. Doing so reduces the whole code size by around
3.6% and the build time is down to just 6s.
ebtree is one piece using a lot of inlines and each tree root or node
definition needed by many of our structures requires to parse and
compile all these includes, which is large and painfully slow. Let's
move the very basic definitions to their own file and include it from
ebtree.h.
The following functions are quite heavy and have no reason to be kept
inlined:
srv_release_conn, srv_lookup_conn, srv_lookup_conn_next,
srv_add_to_idle_list
They were moved to server.c. It's worth noting that they're a bit
at the edge between server and connection and that maybe we could
create an idle-conn file for these in the near future.
We do not really need to have them inlined, and having xxhash.h included
by connection.h results in this 4700-lines file being processed 101 times
over the whole project, which accounts for 13.5% of the total size!
Additionally, half of the functions are only needed from connection.c.
Let's move the functions there and get rid of the painful include.
The build time is now down to 6.2s just due to this.
The hash type stored everywhere is XXH64_hash_t, which annoyingly forces
everyone to include the huge xxhash file. We know it's an uint64_t because
that's its purpose and the type is only made to abstract it on machines
where uint64_t is not availble. Let's switch the type to uint64_t
everywhere and avoid including xxhash from the type file.
Plenty of includes were present there only for struct pointers resulting
in them being used from many other places. The LoC reduced again by more
than 1% by cleaning this.
This one is expensive in code size because it comes with xxhash.h at a
low level of dependency that's inherited at plenty of places, and for
a function does doesn't benefit from inlining and could possibly even
benefit from not being inline given that it's large and called from the
scheduler.
Moving it to activity.c reduces the LoC by 1.2% and the binary size by
~1kB.
This function has no reason for being inlined, it's called from non
critical places (once in pollers), is quite large and comes with
dependencies (time and freq_ctr). Let's move it to acitvity.c. That's
another 0.4% less LoC to build.
These are ticks, not timeval, and they're a cause for plenty of files
including time.h just to access now_ms that's only used with ticks
functions. Let's move them over there.
The idle time calculation stuff was moved to task.h by commit 6dfab112e
("REORG: sched: move idle time calculation from time.h to task.h") but
these two variables that are only maintained by task.{c,h} were still
left in time.{c,h}. They have to move as well.
These two counters were the only ones not in the global struct, while
the SSL freq counters or the req counts are already in it, this forces
stats.c to include ssl_sock just to know about them. Let's move them
over there with their friends. This reduces from 408 to 384 the number
of includes of opensslconf.h.
This one has nothing to do with ssl_sock as it manipulates the struct
server only. Let's move it to server.c and remove unneeded dependencies
on ssl_sock.h. This further reduces by 10% the number of includes of
opensslconf.h and by 0.5% the number of compiled lines.
This one doesn't use anything from an SSL context, it only checks the
type of the transport layer of a connection, thus it belongs to
connection.h. This is particularly visible due to all the ifdefs
around it in various call places.
This is exactly the same as for listeners, servers only include
openssl-compat to provide the SSL_CTX type to use as two pointers to
contexts, and to detect if NPN, ALPN, and cipher suites are supported,
and save up to 5 pointers in the ssl_ctx struct if not supported. This
is pointless, as these ones have all been supported for about a decade,
and including this file comes with a long dependency chain that impacts
lots of other files. The ctx was made a void*.
Now the build time was significantly reduced, from 9.2 to 8.1 seconds,
thanks to opensslconf.h being included "only" 456 times instead of 2424
previously!
The total number of lines of code compiled was reduced by 15%.