Commit Graph

20438 Commits

Author SHA1 Message Date
Willy Tarreau
e1f1ba13c2 WIP/MEDIUM: thread: order CPUs before picking the first ones
WIP: still not true because the number of threads is not necessarily
equal to the number of CPUs we're bound to. For now the node id
remains the sole criterion used on this test. Also, we should only
bind CPUs that are either:
  - not excluded at boot when !cpu_map_configured()
  - mapped when cpu_map_configured()
and not offline of course.

Now when starting a default number of threads, instead of limiting
ourselves to the first 32 or 64 CPUs that appear in the list, we'll
take the same number after having sorted them by capacity and vicinity.
This means that setups which have a single declared NUMA node and CPUs
spread between sockets before threads will at least have a chance to
bind only to threads of related cores and to avoid using the second
package by default.

The goal will now be to improve on this selection.
2023-07-21 17:53:05 +02:00
Willy Tarreau
d7e21999c8 MINOR: cpuset: implement a sorting mechanism for CPU perf/topo
The new function cpu_optimize_topology() sorts a topology array by
advertised capacity (cpu_capacity, SMT count) and vicinity (cluster
ID, L3 cache ID etc). The purpose is to place the big cores first on
heterogenous architectures, and then sort them so that groups can
be lineraly composed from adjacent CPUs.
2023-07-21 17:53:05 +02:00
Willy Tarreau
ec8eb37361 WIP/MEDIUM: cfgparse: remote numa & thread-count detection
Not needed anymore since already done before landing here.

NOTE that the cmp_hw_cpus() function is here!
2023-07-21 17:53:05 +02:00
Willy Tarreau
404ff1c9d0 WIP: haproxy: call thread_detect_count() even without cpu affinity 2023-07-21 17:53:05 +02:00
Willy Tarreau
2fcfbf7873 WIP: thread: make thread_detect_count() also work without CPU affinity 2023-07-21 17:53:05 +02:00
Willy Tarreau
a2aee8c495 WIP: start to move numa detection
we reimplement automatic binding to the first NUMA node.
2023-07-21 17:53:05 +02:00
Willy Tarreau
a31da151e3 WIP: reimplement automatic thread setting
We continue to default to the max number of available threads and 1
tgroup by default, with the limit. This normally allows to get rid
of that test in check_config_validity().
2023-07-21 17:53:05 +02:00
Willy Tarreau
263b2228e2 EXP: thrd: start to detect thread groups and threads min/max
By mutually refining the thread count and group count, we can try
to detect the most suitable setup for the current machine. Taskset
is implicitly handled correctly. tgroups automatically adapt to the
configured number of threads. cpu-map manages to limit tgroups to
the smallest supported value. In fact this could even work without
the rest of the changes for now, just ignoring CPU locations.

Also it could be useful to calculate a "distance" to next CPU for each
of them so that when trying to create multiple groups we could cut around
the largest ones and evaluate for certain distances how many groups that
would provide.

Last, the per-core capacity and number of siblings would be useful to
figure which groups to consider first.
2023-07-21 17:53:05 +02:00
Willy Tarreau
b253fd556b WIP: cfgparse: comment out detection 2023-07-21 17:53:05 +02:00
Willy Tarreau
83b0887e10 WIP: cfgparse: fill the NUMA node IDs
The ID is the one retrieved from the directory name.

Now the current situation makes no sense anymore. Let's rework it:

  - call cpu_detect_topology() from haproxy.c:2251, just before calling
    check_config_validity(). At this point we know tgroups, threads,
    and cpumap if they were explicitly set.

  - the function must use cpumap[] if set to indicate bound CPUs,
    otherwise fall back to getaffinity().
    NOTE: better set UNBOUND rather than BOUND so that when we don't
          know they're all there ?

  - the function can now check the online map only for bound CPUs.
    Should probably also set OFFLINE rather than online.

  - can now check the CPU+NUMA topology for all bound+online

Then config_check_validity() should call thread_adjust_to_cpu() which
will consider the number of threads, of groups etc.
2023-07-21 17:53:05 +02:00
Willy Tarreau
3280195d0c WIP: cfgparse: detect CPU topology and order all CPUs by vicinity
This detects package, NUMA node, l3, core cluster, L2, thread set, L1,
and orders them accordingly. For now groups are not made yet, but it
should be possible to compose them using a linear scan now.

Note there's still a DEBUG dump of what was discovered at boot!

TODO: integrate NUMA node numbering (ignored for now)

wip: cfgparse: use ha_cpu_topo now

wip: cfgparse: no need to clear HA_CPU_F_BOUND anymore

already done in cpuset at boot.

wip: cfgparse: drop useless err from parse_cpu_set

WIP: cfgparse: temporarily hide online detect code and list all cpus
2023-07-21 17:53:05 +02:00
Willy Tarreau
6a62a480b4 WIP: DEBUG: use ./sys/devices temporarily 2023-07-21 17:53:05 +02:00
Willy Tarreau
9c316a528c MINOR: cfgparse: use already known offline CPU information
No need to reparse cpu/online, let's just rely on the info we learned
previously about offline CPUs.
2023-07-21 17:53:05 +02:00
Willy Tarreau
0aba375027 MINOR: cfgparse: move the binding detection into numa_detect_topology()
For now the function refrains from detecting the CPU topology when a
restrictive taskset or cpu-map was already performed on the process,
and it's documented as such, the reason being that until we're able
to automatically create groups, better not change user settings. But
we'll need to be able to detect bound CPUs and to process them as
desired by the user, so we now need to move that detection into the
function itself. It changes nothing to the logic, just gives more
freedom to the function.
2023-07-21 17:53:05 +02:00
Willy Tarreau
f064e524d4 MINOR: cpuset: add NUMA node identification to CPUs on FreeBSD
With this patch we're also NUMA node IDs to each CPU when the info is
found. The code is highly inspired from the one in commit f5d48f8b3
("MEDIUM: cfgparse: numa detect topology on FreeBSD."), the difference
being that we're just setting the value in ha_cpu_topo[].
2023-07-21 17:53:05 +02:00
Willy Tarreau
6a839c3dfc MINOR: cpuset: add NUMA node identification to CPUs on Linux
With this patch we're also assigning NUMA node IDs to each CPU when one
is found. The code is highly inspired from the one in commit b56a7c89a
("MEDIUM: cfgparse: detect numa and set affinity if needed") that already
did the job, except that it could be simplified since we're just collecting
info to fill the ha_cpu_topo[] array.
2023-07-21 17:53:05 +02:00
Willy Tarreau
0c3a4c9733 MINOR: cpuset: add CPU topology detection for linux
This uses the publicly available information from /sys to figure the cache
and package arrangements between logical CPUs and fill ha_cpu_topo[], as
well as their SMT capabilities and relative capacity for those which expose
this. The functions clearly have to be OS-specific.
2023-07-21 17:52:36 +02:00
Willy Tarreau
ef5d4ab8da MINOR: cpuset: try to detect offline cpus at boot
When possible, the offline CPUs are detected at boot and their OFFLINE
flag is set in the ha_cpu_topo[] array. When the detection is not
possible (e.g. not linux, /sys not mounted etc), we just mark none of
them as being offline, as we don't want to infer wrong info that could
hinder automatic CPU placement detection.
2023-07-21 16:25:07 +02:00
Willy Tarreau
fb2311518f MINOR: cpuset: add detection of online CPUs on FreeBSD
On FreeBSD we can detect online CPUs at least by doing the bitwise-OR of
the CPUs of all domains, so we're using this and adding this detection
to ha_cpuset_detect_online(). If we find simpler later, we can always
rework it, but it's reasonably inexpensive since we only check existing
domains.
2023-07-21 16:25:07 +02:00
Willy Tarreau
6bef0d7298 MINOR: cpuset: add detection of online CPUs on Linux
This adds a generic function ha_cpuset_detect_online() which for now
only supports linux via /sys. It fills a cpuset with the list of online
CPUs that were detected (or returns a failure).
2023-07-21 16:25:07 +02:00
Willy Tarreau
3a0e03bd4c MINOR: thread: rely on the cpuset functions to count bound CPUs
let's just clean up the thread_cpus_enabled() code a little bit
by removing the OS-specific code and rely on ha_cpuset_detect_bound()
instead. On macos we continue to use sysconf() for now.
2023-07-21 16:25:07 +02:00
Willy Tarreau
3f83b78961 MINOR: cpuset: update CPU topology from excluded CPUs at boot
Now before trying to resolve the thread assignment to groups, we detect
which CPUs are not bound at boot so that we can mark them with
HA_CPU_F_EXCLUDED. This will be useful to better know on which CPUs we
can count later. Note that we purposely ignore cpu-map here as we
don't know how threads and groups will map to cpu-map entries, hence
which CPUs will really be used.

It's important to proceed this way so that when we have no info we
assume they're all available.
2023-07-21 16:25:07 +02:00
Willy Tarreau
e9fd787b96 MINOR: cpuset: allocate and initialize the ha_cpu_topo array.
This does the bare minimum to allocate and initialize a global
ha_cpu_topo array for the number of supported CPUs and release
it at deinit time.
2023-07-21 16:25:06 +02:00
Willy Tarreau
b6556995bf MINOR: cpuset: add ha_cpu_topo definition
This structure will be used to store information about each CPU's
topology (package ID, L3 cache ID, NUMA node ID etc). This will be used
in conjunction with CPU affinity setting to try to perform a mostly
optimal binding between threads and CPU numbers by default.
2023-07-21 16:25:03 +02:00
Willy Tarreau
f14975c74a MINOR: cpuset: centralize bound cpu detection
Till now the CPUs that were bound were only retrieved in
thread_cpus_enabled() in order to count the number of CPUs allowed,
and it relied on arch-specific code.

Let's slightly arrange this into ha_cpuset_detect_bound() that
reuses the ha_cpuset struct and the accompanying code. This makes
the code much clearer without having to carry along some arch-specific
stuff out of this area.

Note that the macos-specific code used in thread.c to only count
online CPUs but not retrieve a mask, so for now we can't infer
anything from it and can't implement it.
2023-07-20 15:36:11 +02:00
Willy Tarreau
b5809a4a52 REORG: cpuset: move parse_cpu_set() and parse_cpumap() to cpuset.c
These ones were still in cfgparse.c but they're not specific to the
config at all and may actually be used even when parsing cpu list
entries in /sys. Better move them where they can be reused.
2023-07-20 15:36:11 +02:00
Willy Tarreau
69d2ff7078 MINOR: cpuset: add ha_cpuset_or() to bitwise-OR two CPU sets
This operation was not implemented and will be needed later.
2023-07-20 15:36:11 +02:00
Willy Tarreau
f2af7a5d20 MINOR: cpuset: add ha_cpuset_isset() to check for the presence of a CPU in a set
This function will be convenient to test for the presence of a given CPU
in a set.
2023-07-20 15:36:11 +02:00
Willy Tarreau
c982e4ca4f MINOR: cpuset: dynamically allocate cpu_map
cpu_map is 8.2kB/entry and there's one such entry per group, that's
~520kB total. In addition, the init code is still in haproxy.c enclosed
in ifdefs. Let's make this a dynamically allocated array in the cpuset
code and remove that init code.

Later we may even consider reallocating it once the number of threads
and groups is known, in order to shrink it a little bit, as the typical
setup with a single group will only need 8.2kB, thus saving half a MB
of RAM. This would require that the upper bound is placed in a variable
though.
2023-07-20 15:36:11 +02:00
Willy Tarreau
c11d09ac81 MINOR: cfgparse: use read_line_from_trash() to read from /sys
It's easier to use this function now to natively support variable
fields in the file's path. This also removes read_file_from_trash()
that was only used here and was static.
2023-07-20 15:36:11 +02:00
Willy Tarreau
3c3261c676 MINOR: tools: add function read_line_to_trash() to read a line of a file
This function takes on input a printf format for the file name, making
it particularly suitable for /proc or /sys entries which take a lot of
numbers. It also automatically trims the trailing CR and/or LF chars.
2023-07-20 15:36:11 +02:00
Willy Tarreau
c5c1ea6930 MEDIUM: init: initialize the trash earlier
More and more utility function rely on the trash while most of the init
code doesn't have access to it because it's initialized very late (in
PRE_CHECK for the initial one). It's a pool, and it purposely supports
being reallocated, so let's initialize it in STG_POOL so that early
STG_INIT code can at least use it.
2023-07-20 15:36:11 +02:00
Willy Tarreau
8a4466bd15 MEDIUM: cfgparse: assign NUMA affinity to cpu-maps
Do not force affinity on the process, instead let's just apply it to
cpu-map, it will automatically be used later in the init process. We
can do this because we know that cpu-map was not set when we're using
this detection code.

This is much saner, as we don't need to manipulate the process' affinity
at this point in time, and just update the info that the user omitted to
set by themselves, which guarantees a better long-term consistency with
the documented feature.
2023-07-20 15:33:18 +02:00
Willy Tarreau
6ecabb3f35 CLEANUP: config: make parse_cpu_set() return documented values
parse_cpu_set() stopped returning the undocumented -1 which was a
leftover from an earlier attempt, changed from ulong to int since
it only returns a success/failure and no more a mask. Thus it must
not return -1 and its callers must only test for != 0, as is
documented.
2023-07-20 11:01:09 +02:00
Willy Tarreau
f54d8c6457 CLEANUP: cpuset: remove the unused proc_t1 field in cpu_map
This field used to store the cpumap of the first thread in a group, and
was used till 2.4 to hold some default settings, after which it was no
longer used. Let's just drop it.
2023-07-20 11:01:09 +02:00
Willy Tarreau
c955659906 BUG/MINOR: init: set process' affinity even in foreground
The per-process CPU affinity settings are only applied during forking,
which means that cpu-map are ignored when running in foreground (e.g.
haproxy started with -db). This is historic due to the original semantics
of a process array, but isn't documented and causes surprises when trying
to debug affinity settings.

Let's make sure the setting is applied to the workers themselves even
in foreground. This may be backported to 2.6 though it is really not
important. If backported, it also depends on previous commit:

  BUG/MINOR: cpuset: remove the bogus "proc" from the cpu_map struct
2023-07-20 11:01:09 +02:00
Willy Tarreau
151f9a2808 BUG/MINOR: cpuset: remove the bogus "proc" from the cpu_map struct
We're currently having a problem with the porting from cpu_map from
processes to thread-groups as it happened in 2.7 with commit 5b09341c0
("MEDIUM: cpu-map: replace the process number with the thread group
number"), though it seems that it has deeper roots even in 2.0 and
that it was progressively made worng over time.

The issue stems in the way the per-process and per-thread cpu-sets were
employed over time. Originally only processes were supported. Then
threads were added after an optional "/" and it was documented that
"cpu-map 1" is exactly equivalent to "cpu-map 1/all" (this was clarified
in 2.5 by commit 317804d28 ("DOC: update references to process numbers
in cpu-map and bind-process").

The reality is different: when processes were still supported, setting
"cpu-map 1" would apply the mask to the process itself (and only when
run in the background, which is not documented either and is also a
bug for another fix), and would be combined with any possible per-thread
mask when calculating the threads' affinity, possibly resulting in empty
sets. However, "cpu-map 1/all" would only set the mask for the threads
and not the process. As such the following:

    cpu-map 1 odd
    cpu-map 1/1-8 even

would leave no CPU while doing:

    cpu-map 1/all odd
    cpu-map 1/1-8 even

would allow all CPUs.

While such configs are very unlikely to ever be met (which is why this
bug is tagged minor), this is becoming quite more visible while testing
automatic CPU binding during 2.9 development because due to this bug
it's much more common to end up with incorrect bindings.

This patch fixes it by simply removing the .proc entry from cpu_map and
always setting all threads' maps. The process is no longer arbitrarily
bound to the group 1's mask, but in case threads are disabled, we'll
use thread 1's mask since it contains the configured CPUs.

This fix should be backported at least to 2.6, but no need to insist if
it resists as it's easier to break cpu-map than to fix an unlikely issue.
2023-07-20 11:01:09 +02:00
Willy Tarreau
67c99db0a7 BUG/MINOR: config: do not detect NUMA topology when cpu-map is configured
As documented, the NUMA auto-detection is not supposed to be used when
the CPU affinity was set either by taskset (already checked) or by a
cpu-map directive. However this check was missing, so that configs
having cpu-map entries would still first bind to a single node. In
practice it has no impact on correct configs since bindings will be
replaced. However for those where the cpu-map directive are not
exhaustive it will have the impact of binding those threads to one node,
which disagrees with the doc (and makes future evolutions significantly
more complicated).

This could be backported to 2.4 where numa-cpu-mapping was added, though
if nobody encountered this by then maybe we should only focus on recent
versions that are more NUMA-friendly (e.g. 2.8 only). This patch depends
on this previous commit that brings the function we rely on:

   MINOR: cpuset: add cpu_map_configured() to know if a cpu-map was found
2023-07-20 11:01:09 +02:00
Willy Tarreau
7134417613 MINOR: cpuset: add cpu_map_configured() to know if a cpu-map was found
Since we'll soon want to adjust the "thread-groups" degree of freedom
based on the presence of cpu-map, we first need to be able to detect
if cpu-map was used. This function scans all cpu-map sets to detect if
any is present, and returns true accordingly.
2023-07-20 11:01:09 +02:00
Daan van Gorkum
f034139bc0 MINOR: lua: Allow reading "proc." scoped vars from LUA core.
This adds the "core.get_var()" method allow the reading
of "proc." scoped variables outside of TXN or HTTP/TCPApplet.

Fixes: #2212
Signed-off-by: Daan van Gorkum <djvg@djvg.net>
2023-07-20 10:55:28 +02:00
Christopher Faulet
083f917fe2 BUG/MINOR: h1-htx: Return the right reason for 302 FCGI responses
A FCGI response may contain a "Location" header with no status code. In this
case a 302-Found HTTP response must be returned to the client. However,
while the status code is indeed 302, the reason is wrong. "Found" must be
set instead of "Moved Temporarily".

This patch must be backported as far as 2.2. With the commit e3e4e0006
("BUG/MINOR: http: Return the right reason for 302"), this should fix the
issue #2208.
2023-07-20 09:51:00 +02:00
firexinghe
bfff46f411 BUG/MINOR: hlua: add check for lua_newstate
Calling lual_newstate(Init main lua stack) in the hlua_init_state()
function, the return value of lua_newstate() may be NULL (for example
in case of OOM). In this case, L will be NULL, and then crash happens
in lua_getextraspace(). So, we add a check for lua_newstate.

This should be backported at least to 2.4, maybe further.
2023-07-19 10:16:14 +02:00
Emeric Brun
c0456f45c8 BUILD: quic: fix warning during compilation using gcc-6.5
Building with gcc-6.5:

src/quic_conn.c: In function 'send_retry':
src/quic_conn.c:6554:2: error: dereferencing type-punned pointer will break
strict-aliasing rules [-Werror=strict-aliasing]
   *((uint32_t *)((unsigned char *)&buf[i])) = htonl(qv->num);

This patch use write_n32 to set the value.

This could be backported until v2.6
2023-07-19 08:58:55 +02:00
Frédéric Lécaille
e5a17b0bc0 BUG/MINOR: quic: Unckecked encryption levels availability
This bug arrived with this commit:

   MEDIUM: quic: Dynamic allocations of QUIC TLS encryption levels

It is possible that haproxy receives a late Initial packet after it has
released its Initial or Handshake encryption levels. In this case
it must not try to retransmit packets from such encryption levels to
speed up the handshake completion.

No need to backport.
2023-07-18 11:50:31 +02:00
Ilya Shipitsin
f7dcceccc9 CI: explicitely highlight VTest result section if there's something
it turned out that people miss VTest result section because it is not highlighted,
let us fix that
2023-07-17 15:56:53 +02:00
Ilya Shipitsin
ddedefcaaa CI: add naming convention documentation
branches "haproxy-" stand for stable branches, otherwise development
2023-07-17 15:56:52 +02:00
Mariam John
00b7b49a46 MEDIUM: ssl: new sample fetch method to get curve name
Adds a new sample fetch method to get the curve name used in the
key agreement to enable better observability. In OpenSSLv3, the function
`SSL_get_negotiated_group` returns the NID of the curve and from the NID,
we get the curve name by passing the NID to OBJ_nid2sn. This was not
available in v1.1.1. SSL_get_curve_name(), which returns the curve name
directly was merged into OpenSSL master branch last week but will be available
only in its next release.
2023-07-17 15:45:41 +02:00
Christopher Faulet
e3e4e00063 BUG/MINOR: http: Return the right reason for 302
Because of a cut/paste error, the wrong reason was returned for 302
code. The 301 reason was returned instead. Thus now, "Found" is returned for
302, instead of "Moved Permanently".

This pathc should fix the issue 2208. It must be backported to all stable
versions.
2023-07-17 11:14:10 +02:00
Christopher Faulet
b982fc2177 BUG/MINOR: sample: Fix wrong overflow detection in add/sub conveters
When "add" or "sub" conveters are used, an overflow detection is performed.
When 2 negative integers are added (or a positive integer is substracted to
a positive one), we take care to not exceed the low limit (LLONG_MIN) and
when 2 positive integers are added, we take care to not exceed the high
limit (LLONG_MAX).

However, because of a missing 'else' statement, if there is no overflow in
the first case, we fall back on the second check (the one for positive adds)
and LLONG_MAX is returned. It means that most of time, when 2 negative
integers are added (or a positive integer is substracted to a negative one),
LLONG_MAX is returned.

This patch should solve the issue #2216. It must be backported to all stable
versions.
2023-07-17 11:14:10 +02:00
Christopher Faulet
46e5876035 DOC: config: Fix fc_src description to state the source address is returned
A typo in the "fc_src" description was fixed. This sample returns the
original source IP address and not the destination one.

This patch should be backported as far as 2.6.
2023-07-17 11:11:39 +02:00