Commit Graph

6610 Commits

Author SHA1 Message Date
Willy Tarreau
d5cae6a0c7 MINOR: stick-table: change the API of the function used to calculate the shard
The function used to calculate the shard number currently requires a
stktable_key on input for this. Unfortunately, it happens that peers
currently miss this calculation and they do not provide stktable_key
at all, instead they're open-coding all the low-level stick-table work
(hence why it's missing). Thus we'll need to be able to calculate the
shard number in keys coming from peers as well but the current API does
not make it possible.

This commit addresses this by inverting the order where the length and
the shard number are used. Now the low-level function is independent on
stksess and stktable_key, it takes a table, pointer and length and does
all the job. The upper function takes care of the type and key to get
the its length, and is for use only from stick-table code.

This doesn't change anything except that the low-level one will be usable
from outside (hence why it's exported now).
2022-11-29 18:06:42 +01:00
Amaury Denoyelle
d64a26f023 CLEANUP: ncbuf: inline small functions
ncbuf API relies on lot of small functions. Mark these functions as
inline to reduce call invocations and facilitate compiler optimizations
to reduce code size.

This should be backported up to 2.6.
2022-11-29 15:14:39 +01:00
Willy Tarreau
56460ee52a MINOR: stick-table: store a per-table hash seed and use it
Instead of using memcpy() to concatenate the table's name to the key when
allocating an stksess, let's compute once for all a per-table seed at boot
time and use it to calculate the key's hash. This saves two memcpy() and
the usage of a chunk, it's always nice in a fast path.

When tested under extreme conditions with a 80-byte long table name, it
showed a 1% performance increase.
2022-11-28 18:58:06 +01:00
Willy Tarreau
63b5b33ba8 CLEANUP: stick-table: fill alignment holes in the stktable struct
There were two 32-bit holes in the stktable struct surrounding 32-bit
words, so let's just reorder them a little bit to address the issue.
2022-11-28 18:49:55 +01:00
William Lallemand
0a2d63236c BUG/MINOR: ssl: shut the ca-file errors emitted during httpclient init
With an OpenSSL library which use the wrong OPENSSLDIR, HAProxy tries to
load the OPENSSLDIR/certs/ into @system-ca, but emits a warning when it
can't.

This patch fixes the issue by allowing to shut the error when the SSL
configuration for the httpclient is not explicit.

Must be backported in 2.6.
2022-11-24 19:14:19 +01:00
Uriah Pollock
3cbf09ed64 MEDIUM: ssl: add minimal WolfSSL support with OpenSSL compatibility mode
This adds a USE_OPENSSL_WOLFSSL option, wolfSSL must be used with the
OpenSSL compatibility layer. This must be used with USE_OPENSSL=1.

WolfSSL build options:

   ./configure --prefix=/opt/wolfssl --enable-haproxy

HAProxy build options:

  USE_OPENSSL=1 USE_OPENSSL_WOLFSSL=1 WOLFSSL_INC=/opt/wolfssl/include/ WOLFSSL_LIB=/opt/wolfssl/lib/ ADDLIB='-Wl,-rpath=/opt/wolfssl/lib'

Using at least the commit 54466b6 ("Merge pull request #5810 from
Uriah-wolfSSL/haproxy-integration") from WolfSSL. (2022-11-23).

This is still to be improved, reg-tests are not supported yet, and more
tests are to be done.

Signed-off-by: William Lallemand <wlallemand@haproxy.org>
2022-11-24 11:29:03 +01:00
Uriah Pollock
79320cb074 BUILD: quic: use openssl-compat.h instead of openssl/ssl.h
Replace the include of openssl/ssl.h by openssl-compat.h.

Signed-off-by: William Lallemand <wlallemand@haproxy.org>
2022-11-24 11:29:03 +01:00
Willy Tarreau
946d370d22 BUILD: flags: really restrict the cases where flags are exposed
A number of internal flags started to be exposed to external programs
at the location of their definition since commit 77acaf5af ("MINOR:
flags: add a new file to host flag dumping macros"). This allowed the
"flags" utility to decode many more of them and always correctly. The
condition to expose them was to rely on the preliminary definition of
EOF that indicates that stdio is already included. But this was a
wrong approach. It only guarantees that snprintf() can safely be used
but still causes large functions to be built. But stdio is often
included before some of these includes, so these heavy inline functions
actually have to be compiled in many cases. The result is that the
build time significantly increased, especially with fast compilers
like gcc -O0 which took +50% or TCC which took +100%!

This patch addresses the problem by instead relying on an explicit
macro HA_EXPOSE_FLAGS that the calling program must explicitly define
before including these files. flags.c does this and that's all. The
previous build time is now restored with a speed up of 20 to 50%
depending on the build options.
2022-11-24 08:32:27 +01:00
Willy Tarreau
08093cc0fa CLEANUP: tools: do not needlessly include xxhash nor cli from tools.h
These includes brought by commit 9c76637ff ("MINOR: anon: add new macros
and functions to anonymize contents") resulted in an increase of exactly
20% of the number of lines to build. These include are not needed there,
only tools.c needs xxhash.h.
2022-11-24 08:30:48 +01:00
Willy Tarreau
4d46638540 BUILD: compiler: include compiler's definitions before ours
Building with TCC caused a warning on __attribute__() being redefined,
because we do define it on compilers that don't have it, but we didn't
include the compiler's definitions first to leave it a chance to expose
its definitions. The correct way to do this would be to include
sys/cdefs.h but we currently don't include it explicitly and a few
reports on the net mention some platforms where it could be missing
by default. Let's use inttypes.h instead, it always causes it (or its
equivalent) to be included and we know it's present on supported
platforms since we already depend on it.

No backport is needed.
2022-11-24 08:30:48 +01:00
Willy Tarreau
fc50b9dd14 BUG/MAJOR: sched: protect task during removal from wait queue
The issue addressed by commit fbb934da9 ("BUG/MEDIUM: stick-table: fix
a race condition when updating the expiration task") is still present
when thread groups are enabled, but this time it lies in the scheduler.

What happens is that a task configured to run anywhere might already
have been queued into one group's wait queue. When updating a stick
table entry, sometimes the task will have to be dequeued and requeued.

For this a lock is taken on the current thread group's wait queue lock,
but while this is necessary for the queuing, it's not sufficient for
dequeuing since another thread might be in the process of expiring this
task under its own group's lock which is different. This is easy to test
using 3 stick tables with 1ms expiration, 3 track-sc rules and 4 thread
groups. The process crashes almost instantly under heavy traffic.

One approach could consist in storing the group number the task was
queued under in its descriptor (we don't need 32 bits to store the
thread id, it's possible to use one short for the tid and another
one for the tgrp). Sadly, no safe way to do this was figured, because
the race remains at the moment the thread group number is checked, as
it might be in the process of being changed by another thread. It seems
that a working approach could consist in always having it associated
with one group, and only allowing to change it under this group's lock,
so that any code trying to change it would have to iterately read it
and lock its group until the value matches, confirming it really holds
the correct lock. But this seems a bit complicated, particularly with
wait_expired_tasks() which already uses upgradable locks to switch from
read state to a write state.

Given that the shared tasks are not that common (stick-table expirations,
rate-limited listeners, maybe resolvers), it doesn't seem worth the extra
complexity for now. This patch takes a simpler and safer approach
consisting in switching back to a single wq_lock, but still keeping
separate wait queues. Given that shared wait queues are almost always
empty and that otherwise they're scanned under a read lock, the
contention remains manageable and most of the time the lock doesn't
even need to be taken since such tasks are not present in a group's
queue. In essence, this patch reverts half of the aforementionned
patch. This was tested and confirmed to work fine, without observing
any performance degradation under any workload. The performance with
8 groups on an EPYC 74F3 and 3 tables remains twice the one of a
single group, with the contention remaining on the table's lock first.

No backport is needed.
2022-11-22 09:10:08 +01:00
Willy Tarreau
c21a187ec0 MINOR: server/idle: make the next_takeover index per-tgroup
In order to evenly pick idle connections from other threads, there is
a "next_takeover" index in the server, that is incremented each time
a connection is picked from another thread, and indicates which one to
start from next time.

With thread groups this doesn't work well because the index is the same
regardless of the group, and if a group has more threads than another,
there's even a risk to reintroduce an imbalance.

This patch introduces a new per-tgroup storage in servers which, for now,
only contains an instance of this next_takeover index. This way each
thread will now only manipulate the index specific to its own group, and
the takeover will become fair again. More entries may come soon.
2022-11-21 19:21:07 +01:00
Willy Tarreau
9dc231a6b2 BUG/MINOR: server/idle: at least use atomic stores when updating max_used_conns
In 2.2, some idle conns usage metrics were added by commit cf612a045
("MINOR: servers: Add a counter for the number of currently used
connections."), which mentioned that the operation doesn't need to be
atomic since we're not seeking exact values. This is true but at least
we should use atomic stores to make sure not to cause invalid values
to appear on archs that wouldn't guarantee atomicity when writing an
int, such as writing two 16-bit words. This is pretty unlikely on our
targets but better keep the code safe against this.

This may be backported as far as 2.2.
2022-11-21 19:21:07 +01:00
Willy Tarreau
2fba08faec MINOR: cli/pools: add sorting capabilities to "show pools"
The "show pools" command is used a lot for debugging but didn't get much
love over the years. This patch brings new capabilities:
  - sorting the output by pool names to ese their finding ("byname").
  - sorting the output by reverse item size to spot the biggest ones("bysize")
  - sorting the output by reverse number of allocated bytes ("byusage")

The last one (byusage) also omits displaying the ones with zero allocation.

In addition, an optional max number of output entries may be passed so as
to dump only the N most relevant ones.
2022-11-21 10:14:52 +01:00
Ilya Shipitsin
ace3da8dd4 CLEANUP: quic: replace "choosen" with "chosen" all over the code
Some variables were set as "choosen" instead of "chosen", this is dedicated
spelling fix
2022-11-21 09:22:28 +01:00
Frédéric Lécaille
74b5f7b31b BUG/MAJOR: quic: Crash after discarding packet number spaces
This previous patch was not sufficient to prevent haproxy from
crashing when some Handshake packets had to be inspected before being possibly
retransmitted:

     "BUG/MAJOR: quic: Crash upon retransmission of dgrams with several packets"

This patch introduced another issue: access to packets which have been
released because still attached to others (in the same datagram). This was
the case for instance when discarding the Initial packet number space before
inspecting an Handshake packet in the same datagram through its ->prev or
member in our case.

This patch implements quic_tx_packet_dgram_detach() which detaches a packet
from the adjacent ones in the same datagram to be called when ackwowledging
a packet (as done in the previous commit) and when releasing its memory. This
was, we are sure the released packets will not be accessed during retransmissions.

Thank you to @gabrieltz for having reported this issue in GH #1903.

Must be backported to 2.6.
2022-11-20 18:35:46 +01:00
Frdric Lcaille
814645f42f BUG/MAJOR: quic: Crash upon retransmission of dgrams with several packets
As revealed by some traces provided by @gabrieltz in GH #1903 issue,
there are clients (chrome I guess) which acknowledge only one packet among others
in the same datagram. This is the case for the first datagram sent by a QUIC haproxy
listener made an Initial packet followed by an Handshake one. In this identified
case, this is the Handshake packet only which is acknowledged. But if the
client is able to respond with an Handshake packet (ACK frame) this is because
it has successfully parsed the Initial packet. So, why not also acknowledging it?
AFAIK, this is mandatory. On our side, when restransmitting this datagram, the
Handshake packet was accessed from the Initial packet after having being released.

Anyway. There is an issue on our side. Obviously, we must not expect an
implementation to respect the RFC especially when it want to build an attack ;)

With this simple patch for each TX packet we send, we also set the previous one
in addition to the next one. When a packet is acknowledged, we detach the next one
and the next one in the same datagram from this packet, so that it cannot be
resent when resending these packets (the previous one, in our case).

Thank you to @gabrieltz for having reported this issue.

Must be backported to 2.6.
2022-11-19 04:56:55 +01:00
Christopher Faulet
037e3f8735 MINOR: cfgparse: Always check the section position
In diag mode, the section position is checked and a warning is emitted if a
global section is defined after any non-global one. Now, this check is
always performed. But the warning is still only emitted in diag mode. In
addition, the result of this check is now stored in a global variable, to be
used from anywhere.

The aim of this patch is to be able to restrict usage of some global
directives to the very first global sections. It will be useful to avoid
undefined behaviors. Indeed, some config parts may depend on global settings
and it is a problem if these settings are changed after.
2022-11-18 16:03:45 +01:00
Christopher Faulet
62138aab3e MINOR: mux-h1: Rely on a H1S flag to know a WS key was found or not
h1_process_mux() is written to allow partial headers formatting. For now,
all headers are forwarded in one time. But it is still good to keep this
ability at the H1 mux level. So we must rely on a H1S flag instead of a
local variable to know a WebSocket key was found in headers to be able to
generate a key if necessary.

There is no reason to backport this patch.
2022-11-17 14:33:15 +01:00
Christopher Faulet
ab79b321d6 MEDIUM: mux-fcgi: Introduce flags to deal with connection read/write errors
Similarly to the H1 and H2 multiplexers, FCFI_CF_ERR_PENDING is now used to
report an error when we try to send data and FCGI_CF_ERROR to report an
error when we try to read data. In other funcions, we rely on these flags
instead of connection ones. Only FCGI_CF_ERROR is considered as a final
error.  FCGI_CF_ERR_PENDING does not block receive attempt.

In addition, FCGI_CF_EOS flag was added. we rely on it to test if a read0
was received or not.
2022-11-17 14:33:15 +01:00
Christopher Faulet
68ee7845cf CLEANUP: mux-h2: Remove unused fields in h2c structures
Some fields in h2c structures are not used: .mfl, .mft and .mff. Just remove
them.

.msi field is also removed. It is tested but never set, except when a H2
connection is initialized. It also means h2c_mux_busy() function is useless
because it always returns 0 (.msi is always -1). And thus, by transitivity,
H2_CF_DEM_MBUSY is also useless because it is never set. So .msi field,
h2c_mux_busy() function and H2C_MUX_BUSY flag are removed.
2022-11-17 14:33:15 +01:00
Christopher Faulet
ff7925dce0 MEDIUM: mux-h2: Introduce flags to deal with connection read/write errors
Similarly to the H1 multiplexer, H2_CF_ERR_PENDING is now used to report an
error when we try to send data and H2_CF_ERROR to report an error when we
try to read data. In other funcions, we rely on these flags instead of
connection ones. Only H2_CF_ERROR is considered as a final error.
H2_CF_ERR_PENDING does not block receive attempt.

In addition, we rely on H2_CF_RCVD_SHUT flag to test if a read0 was received
or not.
2022-11-17 14:33:15 +01:00
Christopher Faulet
31da34d1e7 MEDIUM: mux-h1: Don't report a final error whe a message is aborted
When the H1 connection is aborted, we no longer set a final error. To do so,
the flag H1C_F_ABORTED was added. For now, it is only set when a error is
detected on the H1 stream. Idea is to use ERR_PENDING/ERROR for upgoing
errors and ABRT_PENDING/ABRTED for downgoing errors.
2022-11-17 14:33:15 +01:00
Christopher Faulet
b3de5e5084 CLEANUP: mux-h1: Reorder H1 connection flags to avoid holes 2022-11-17 14:33:15 +01:00
Christopher Faulet
fc473a6453 MEDIUM: mux-h1: Rely on the H1C to deal with shutdown for reads
read0 is now handled with a H1 connection flag (H1C_F_EOS). Corresponding
flag was removed on the H1 stream and we fully rely on the SE descriptor at
the stream level.

Concretly, it means we rely on the H1 connection flags instead of the
connection one. H1C_F_EOS is only set in h1_recv() or h1_rcv_pipe() after a
read if a read0 was detected.
2022-11-17 14:33:15 +01:00
Christopher Faulet
bef8900cd6 MINOR: mux-h1: Add flag on H1 stream to deal with internal errors
A new error is added on H1 stream to deal with internal errors. For now,
this error is only reported when we fail to create a stream-connector. This
way, the error is reported at the H1 stream level and not the H1 connection
level.
2022-11-17 14:33:14 +01:00
Christopher Faulet
56a499475f CLEANUP: mux-h1: Rename H1C_F_ERR_PENDING into H1C_F_ABRT_PENDING
H1C_F_ERR_PENDING flags will be used to refactor error handling at the H1
connection level. It will be used to notify error during sends. Thus, the
flag to notify an error must be sent before closing the connection is now
named H1C_F_ABRT_PENDING.

This introduce a naming convertion: ERROR must be used to notify upper layer
of an event at the lower ones while ABORT must be used in the opposite
direction.
2022-11-17 14:33:14 +01:00
Christopher Faulet
4e72b172d7 MEDIUM: mux-h1: Handle H1C states via its state field instead of H1C_F_ST_*
The H1 connection state is now handled in a dedicated state. H1C_F_ST_*
flags are removed. All states are now exclusives. It is easier to know the
H1 connection states. It is alive, or usable, if it is not CLOSING or
CLOSED. It is CLOSING if it should be closed ASAP but a stream is still
attached and/or the output buffer is not empty. CLOSED is used when the H1
connection is ready to be closed. Other states are quite easy to understand.

There is no special changes in the H1 connection behavior. Except in
h1_send(). When a CLOSING connection is CLOSED, the function now reports an
activity. In addition, when an embryonic H1 stream is aborted, it is
destroyed. This way, the H1 connection can be switched to CLOSED state.
2022-11-17 14:33:14 +01:00
Christopher Faulet
ef93be2a7b MINOR: mux-h1: Add a dedicated enum to deal with H1 connection state
The H1 connection state will be handled is a dedicated field. To do so,
h1_cs enum was added. The different states are more or less equivalent to
H1C_F_ST_* flags:

 * H1_CS_IDLE      <=> H1C_F_ST_IDLE
 * H1_CS_EMBRYONIC <=> H1C_F_ST_EMBRYONIC
 * H1_CS_UPGRADING <=> H1C_F_ST_ATTACHED && !H1C_F_ST_READY
 * H1_CS_RUNNING   <=> H1C_F_ST_ATTACHED && H1C_F_ST_READY
 * H1_CS_CLOSING   <=> H1C_F_ST_SHUTDOWN && (H1C_F_ST_ATTACHED || b_data(&h1c->ibuf))
 * H1_CS_CLOSED    <=> H1C_F_ST_SHUTDOWN && !H1C_F_ST_ATTACHED && !b_data(&h1c->ibuf)

In addition, in this patch, the h1_is_alive() and h1_close() function are
added. The first one will be used to know if a H1 connection is alive or
not. The second one will be used to set the connection in CLOSING or CLOSED
state, depending on the output buffer state and if there is still a H1
stream or not.

For now, the H1 connection state is not used.
2022-11-17 14:33:14 +01:00
Christopher Faulet
71abc0cfd5 CLEANUP: mux-h1: Rename H1C_F_ST_ERROR and H1C_F_ST_SILENT_SHUT flags
_ST_ part is removed from these 2 flags because they don't reflect a
state. In addition, the H1 connection state will be handled in a dedicated
enum.
2022-11-17 14:33:14 +01:00
Christopher Faulet
7fcbcc0e4c CLEANUP: mux-h1; Rename H1S_F_ERROR flag into H1S_F_ERROR_MASK
In fact, H1S_F_ERROR is not a flag but a mask. So rename it to make it
clear.
2022-11-17 14:33:14 +01:00
Willy Tarreau
2fd6dbfb0d BUILD: makefile: move the compiler option detection stuff to compiler.mk
There's quite a large barely readable functions block in the makefile
dedicated to compiler option support. It provides no value here and
makes it harder to find user-configurable stuff, so let's move it to
include/make/compiler.mk to keep the makefile a bit cleaner. It's better
to keep the options themselves in the makefile however.
2022-11-17 10:56:35 +01:00
Willy Tarreau
8b5a998c9c BUILD: makefile: use $(cmd_MAKE) in quiet mode
It's better to see "make" entering a subdir than seeing nothing, so
let's use a command name for make. Since make 3.81, "+" needs to be
prepended in front of the command to pass the job server to the subdir.
2022-11-17 10:56:35 +01:00
Willy Tarreau
8dd672523f BUILD: makefile: move default verbosity settings to include/make/verbose.mk
The $(Q), $(V), $(cmd_xx) handling needs to be reused in sub-project
makefiles and it's a pain to maintain inside the main makefile. Let's
just move that into a new subdir include/make/ with a dedicated file
"verbose.mk". It slightly cleans up the makefile in addition.
2022-11-17 10:56:35 +01:00
Willy Tarreau
a58af5b0a1 MINOR: dynbuf: switch allocation and release to macros to better track users
When building with DEBUG_MEM_STATS, we only see b_alloc() and b_free() as
users of the "buffer" pool, because all call places rely on these more
convenient functions. It's annoying because it makes it very hard to see
which parts of the code are consuming buffers.

By switching the b_alloc() and b_free() inline functions to macros, we
can now finally track the users of struct buffer, e.g:

  mux_h1.c:513            P_FREE  size:   1275002880  calls:     38910  size/call:  32768 buffer
  mux_h1.c:498           P_ALLOC  size:   1912438784  calls:     58363  size/call:  32768 buffer
  stream.c:763            P_FREE  size:   4121493504  calls:    125778  size/call:  32768 buffer
  stream.c:759            P_FREE  size:   2061697024  calls:     62918  size/call:  32768 buffer
  stream.c:742           P_ALLOC  size:   3341123584  calls:    101963  size/call:  32768 buffer
  stream.c:632            P_FREE  size:   1275068416  calls:     38912  size/call:  32768 buffer
  stream.c:631            P_FREE  size:    637435904  calls:     19453  size/call:  32768 buffer
  channel.h:850          P_ALLOC  size:   4116480000  calls:    125625  size/call:  32768 buffer
  channel.h:850          P_ALLOC  size:       720896  calls:        22  size/call:  32768 buffer
  dynbuf.c:55             P_FREE  size:        65536  calls:         2  size/call:  32768 buffer

Let's do this since it doesn't change anything for the output code
(beyond adding the call places). Interestingly the code even got
slightly smaller now.
2022-11-16 11:44:26 +01:00
Willy Tarreau
f7c475df5c MINOR: pool/debug: create a new pool_alloc_flag() macro
This macro just serves as an intermediary for __pool_alloc() and forwards
the flag. When DEBUG_MEM_STATS is set, it will be used to collect all
pool allocations including those which need to pass an explicit flag.

It's now used by b_alloc() which previously couldn't be tracked by
DEBUG_MEM_STATS, causing some free() calls to have no corresponding
allocations.
2022-11-16 11:44:26 +01:00
Willy Tarreau
91d31c9e1c OPTIM: ebtree: make ebmb_insert_prefix() keep a copy the new node's key
Similarly to the previous patch, it's better to keep a local copy of
the new node's key instead of accessing it every time. This slightly
reduces the code's size in the descent and further improves the load
time to 7.45s.
2022-11-15 09:37:09 +01:00
Willy Tarreau
bf13e53964 OPTIM: ebtree: make ebmb_insert_prefix() keep a copy the new node's pfx
looking at a perf profile while loading a conf with a huge map, it
appeared that there was a hot spot on the access to the new node's
prefix, which is unexpectedly being reloaded for each visited node
during the tree descent. Better keep a copy of it because with large
trees that don't fit into the L3 cache the memory bandwidth is scarce.
Doing so reduces the load time from 8.0 to 7.5 seconds.
2022-11-15 09:37:09 +01:00
Willy Tarreau
e98d385819 MINOR: deinit: add a "quick-exit" option to bypass the deinit step
Once in a while we spot a bug in the deinit code that is complex,
especially when it has to deal with incomplete initializations, and the
ability to bypass this step has regularly been raised. In addition for
fast-reloading setups it could theoretically save some time. Tests have
shown that very large configs can barely save ~100-150ms by skipping the
deinit step. However the ability not to crash if a bug is encountered can
occasionally help.

This patch adds an option to do exactly this. It's obviously not enabled
by default and the documentation discourages from using it, but this might
be useful in the future.
2022-11-15 09:37:09 +01:00
Willy Tarreau
6342714052 CLEANUP: stick-table: remove the unused table->exp_next
The ->exp_next field of the stick-table was probably useful in 1.5 but
it currently only carries a copy of what the future value of the table's
task's expire value will be, while it's systematically copied over there
immediately after being assigned. As such it provides exactly a local
variable. Let's remove it, as it costs atomic operations.
2022-11-14 18:20:38 +01:00
Remi Tricot-Le Breton
e239e4938d BUG/MINOR: ssl: Fix potential overflow
Coverity raised a potential overflow issue in these new functions that
work on unsigned long long objects. They were added in commit 9b25982
"BUG/MEDIUM: ssl: Verify error codes can exceed 63".

This patch needs to be backported alongside 9b25982.
2022-11-14 15:30:54 +01:00
Willy Tarreau
7ed0597ce8 BUILD: sample: use __fallthrough in smp_is_rw() and smp_dup()
This avoids three build warnings when preprocessing happens before compiling
with gcc >= 7.
2022-11-14 11:14:02 +01:00
Willy Tarreau
1f344c0f30 BUILD: compiler: define a __fallthrough statement for switch/case
When the code is preprocessed first and compiled later, such as when
built under distcc, a lot of fallthrough warnings are emitted because
the preprocessor has already stripped the comments.

As an alternative, a "fallthrough" attribute was added with the same
compilers as those which started to emit those warnings. However it's
not portable to older compilers. Let's just define a __fallthrough
statement that corresponds to this attribute on supported compilers
and only switches to the classical empty do {} while (0) on other ones.

This way the code will support being cleaned up using __fallthrough.
2022-11-14 11:14:02 +01:00
Willy Tarreau
2b080f713f BUILD: compiler: add a default definition for __has_attribute()
It happens that gcc since 5.x has this macro which is only mentioned
once in the doc, associated with __builtin_has_attribute(). Clang had
it at least since 3.0. In addition it validates #ifdef when present,
so it's easy to detect it. Here we're providing a fallback to another
macro __has_attribute_<name> so that it's possible to define that macro
to the value 1 for older compilers when the attribute is supported.
2022-11-14 11:14:02 +01:00
Willy Tarreau
08e09f0b3c BUILD: compiler: add a macro to detect if another one is set and equals 1
In order to simplify compiler-specific checks, we'll need to check if some
attributes exist. In order to ease declarations, we'll only focus on those
that exist and will set them to 1. Let's first add a macro aimed at doing
this. Passed a macro name in argument, it will return 1 if the macro is
defined and equals 1, otherwise it will return 0. This is based on the
concatenation of the macro's value with a name to form the name of a macro
which contains one comma, resulting in some other macros arguments being
shifted by one when the macro is defined. As such it's only a matter of
pushing both a 1 and a 0 and picking the correct argument to see the
desired one. It was verified to work since at least gcc-3.4 so it should
be portable enough.
2022-11-14 11:14:02 +01:00
Willy Tarreau
71de04134e IMPORT: slz: define and use a __fallthrough statement for switch/case
When the code is preprocessed first and compiled later, such as when
built under distcc, the "fall through" comments are dropped and warnings
are emitted. Let's use the alternative "fallthrough" attribute instead,
that is supported by versions of gcc and clang that also produce this
warning.

This is libslz upstream commit 0fdf8ae218f3ecb0b7f22afd1a6b35a4f94053e2
2022-11-14 11:14:02 +01:00
Dridi Boukelmoune
4bd53c397c IMPORT: slz: mention the potential header in slz_finish()
There may be 2 or 10 bytes sent respectively for zlib and gzip.

This is libslz upstream commit de1cac155ac730ba0491a6c866a510760c01fa9b
2022-11-14 11:14:02 +01:00
Willy Tarreau
eab4256a9c IMPORT: xxhash: update xxHash to version 0.8.1
This is the latest released version and a minor update on top of the
current one (0.8.0). It addresses a few build issues (some for which
patches were already backported), and particularly the fallthrough
issue by using an attribute instead of a comment.
2022-11-14 11:14:02 +01:00
Willy Tarreau
eedcea8b90 BUILD: debug: remove unnecessary quotes in HA_WEAK() calls
HA_WEAK() is supposed to take a symbol in argument, not a string, since
the asm statements it produces already quote the argument. Having it
quoted twice doesn't work on older compilers and was the only reason
why DEBUG_MEM_STATS didn't work on older compilers.
2022-11-14 11:12:49 +01:00
Amaury Denoyelle
24e9961a8f MINOR: cli: define usermsgs print context
CLI 'add server' handler relies on usermsgs_ctx to display errors in
internal function on CLI output. This may be also extended to other
handlers.

However, to not clutter stderr from another contextes, usermsgs_ctx must
be resetted when it is not needed anymore. This operation cannot be
conducted in the CLI parse handler as display is conducted after it.

To achieve this, define new CLI states CLI_ST_PRINT_UMSG /
CLI_ST_PRINT_UMSGERR. Their principles is nearly identical to states for
dynamic messages printing.
2022-11-10 16:42:47 +01:00
Amaury Denoyelle
56f50a03b7 CLEANUP: cli: rename dynamic error printing state
Rename CLI_ST_PRINT_FREE to CLI_ST_PRINT_DYNERR.

Most notably, this highlights that this is reserved to error printing.

This is done to ensure consistency between CLI_ST_PRINT/CLI_ST_PRINT_DYN
and CLI_ST_PRINT_ERR/CLI_ST_PRINT_DYNERR. The name is also consistent
with the function cli_dynerr() which activates it.
2022-11-10 16:42:47 +01:00
William Lallemand
960fb74cae MEDIUM: ssl: {ca,crt}-ignore-err can now use error constant name
The ca-ignore-err and crt-ignore-err directives are now able to use the
openssl X509_V_ERR constant names instead of the numerical values.

This allow a configuration to survive an OpenSSL upgrade, because the
numerical ID can change between versions. For example
X509_V_ERR_INVALID_CA was 24 in OpenSSL 1 and is 79 in OpenSSL 3.

The list of errors must be updated when a new major OpenSSL version is
released.
2022-11-10 13:28:37 +01:00
Remi Tricot-Le Breton
9b25982716 BUG/MEDIUM: ssl: Verify error codes can exceed 63
The CRT and CA verify error codes were stored in 6 bits each in the
xprt_st field of the ssl_sock_ctx meaning that only error code up to 63
could be stored. Likewise, the ca-ignore-err and crt-ignore-err options
relied on two unsigned long longs that were used as bitfields for all
the ignored error codes. On the latest OpenSSL1.1.1 and with OpenSSLv3
and newer, verify errors have exceeded this value so these two storages
must be increased. The error codes will now be stored on 7 bits each and
the ignore-err bitfields are replaced by a big enough array and
dedicated bit get and set functions.

It can be backported on all stable branches.

[wla: let it be tested a little while before backport]
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
2022-11-10 11:45:48 +01:00
Ilya Shipitsin
4a689dad03 CLEANUP: assorted typo fixes in the code and comments
This is 32nd iteration of typo fixes
2022-10-30 17:17:56 +01:00
Amaury Denoyelle
735b44f5df MINOR: quic: add counter for interrupted reception
Add a new counter "quic_rxbuf_full". It is incremented each time
quic_sock_fd_iocb() is interrupted on full buffer.

This should help to debug github issue #1903. It is suspected that
QUIC receiver buffers are full which in turn cause quic_sock_fd_iocb()
to be called repeatedly resulting in a high CPU consumption.
2022-10-27 18:35:42 +02:00
Amaury Denoyelle
bbb1c68508 BUG/MINOR: quic: fix subscribe operation
Subscribing was not properly designed between quic-conn and quic MUX
layers. Align this as with in other haproxy components : <subs> field is
moved from the MUX to the quic-conn structure. All mention of qcc MUX is
cleaned up in quic_conn_subscribe()/quic_conn_unsubscribe().

Thanks to this change, ACK reception notification has been simplified.
It's now unnecessary to check for the MUX existence before waking it.
Instead, if <subs> quic-conn field is set, just wake-up the upper layer
tasklet without mentionning MUX. This should probably be extended to
other part in quic-conn code.

This should be backported up to 2.6.
2022-10-26 18:18:26 +02:00
Frdric Lcaille
36d1565640 MINOR: peers: Support for peer shards
Add "shards" new keyword for "peers" section to configure the number
of peer shards attached to such secions. This impact all the stick-tables
attached to the section.
Add "shard" new "server" parameter to configure the peers which participate to
all the stick-tables contents distribution. Each peer receive the stick-tables updates
only for keys with this shard value as distribution hash. The "shard" value
is stored in ->shard new server struct member.
cfg_parse_peers() which is the function which is called to parse all
the lines of a "peers" section is modified to parse the "shards" parameter
stored in ->nb_shards new peers struct member.
Add srv_parse_shard() new callback into server.c to pare the "shard"
parameter.
Implement stksess_getkey_hash() to compute the distribution hash for a
stick-table key as the 64-bits xxhash of the key concatenated to the stick-table
name. This function is called by stksess_setkey_shard(), itself
called by the already implemented function which create a new stick-table
key (stksess_new()).
Add ->idlen new stktable struct member to store the stick-table name length
to not have to compute it each time a stick-table key hash is computed.
2022-10-24 10:55:53 +02:00
Amaury Denoyelle
7941ead3aa MINOR: quic: display unknown error sendto counter on stat page
This patch complete the previous incomplete commit. The new counter
sendto_err_unknown is now displayed on stats page/CLI show stats.

This is related to github issue #1903.

This should be backported up to 2.6.
2022-10-24 10:52:59 +02:00
Amaury Denoyelle
1d9f170edd MINOR: quic: do not crash on unhandled sendto error
Remove ABORT_NOW() statement on unhandled sendto error. Instead use a
dedicated counter sendto_err_unknown to report these cases.

If we detect increment of this counter, strace can be used to detect
errno value :
  $ strace -p $(pidof haproxy) -f -e trace=sendto -Z

This should be backported up to 2.6.

This should help to debug github issue #1903.
2022-10-24 10:18:44 +02:00
Amaury Denoyelle
176174f7e4 BUG/MINOR: mux-quic: complete flow-control for uni streams
Max stream data was not enforced and respect for local/remote uni
streams. Previously, qcs instances incorrectly reused the limit defined
from bidirectional ones.

This is now fixed. Two fields are added in qcc structure connection :
* value for local flow control to enforce on remote uni streams
* value for remote flow control to respect on local uni streams

These two values can be reused to properly initialized msd field of a
qcs instance in qcs_new(). The rest of the code is similar.

This must be backported up to 2.6.
2022-10-21 17:31:18 +02:00
Aurelien DARRAGON
e951c3435c MINOR: list: adding MT_LIST_APPEND_LOCKED macro
adding a new mt macro: MT_LIST_APPEND_LOCKED.

This macro may be used to append an item to an existing
list, like MT_LIST_APPEND.

But here the item will be forced into locked/busy state
prior to appending, so that it is already referenced
in the list while still preventing concurrent accesses
until we decide to unlock it.

The macro returns a struct mt_list "np", that is needed
at unlock time using regular MT_LIST_UNLOCK_ELT macro.
2022-10-21 16:26:27 +02:00
Aurelien DARRAGON
18c284c126 DOC/MINOR: list: fixing MT_LIST_LOCK_ELT macro documentation
MT_LIST_LOCK_ELT macro was documented with an ambiguous
usage restriction, implying that concurrent list deletion
was not supported.

But it seems that either the code has evolved, or the comment is
wrong because the locking behavior implemented here is exactly
the same one used in MT_LIST_DELETE, and no such restriction is
described for MT_LIST_DELETE.

I made some tests to make sure concurrent MT_LIST_DELETE (or deletion
from mt_list_for_each_entry_safe) don't cause unexepected results.

At the present time, this macro is not used, this fix only
targets upcoming developments that might rely on this.

No backport needed.
2022-10-21 16:26:27 +02:00
Aurelien DARRAGON
bcaa401646 MINOR: list: fixing typo in MT_LIST_LOCK_ELT
A minor typo was made in MT_LIST_LOCK_ELT, preventing
haproxy from compiling if MT_LIST_LOCK_ELT is
used in the code.

Today, the macro is unused, and that's the reason why
the typo has remained unnoticed for such a long time.

Fixing it so it can be used in upcoming developments.

No backport required.
2022-10-21 16:26:27 +02:00
William Lallemand
bb581423b3 BUG/MEDIUM: httpclient/lua: crash when the lua task timeout before the httpclient
When the lua task finished  before the httpclient that are associated to
it, there is a risk that the httpclient try to task_wakeup() the lua
task which does not exist anymore.

To fix this issue the httpclient used in a lua task are stored in a
list, and the httpclient are destroyed at the end of the lua task.

Must be backported in 2.5 and 2.6.
2022-10-20 18:47:15 +02:00
Amaury Denoyelle
deb7c87f55 MINOR: quic: define first packet flag
Received packets treatment has some difference regarding if this is the
first one or not of the encapsulating datagram. Previously, this was set
via a function argument. Simplify this by defining a new Rx packet flag
named QUIC_FL_RX_PACKET_DGRAM_FIRST.

This change does not have functional impact. It will simplify API when
qc_lstnr_pkt_rcv() is broken into several functions : their number of
arguments will be reduced thanks to this patch.

This should be backported up to 2.6.
2022-10-19 18:12:56 +02:00
Amaury Denoyelle
845169da58 MINOR: quic: extend pn_offset field from quic_rx_packet
pn_offset field was only set if header protection cannot be removed.
Extend the usage of this field : it is now set everytime on packet
parsing in qc_lstnr_pkt_rcv().

This change helps to clean up API of Rx functions by removing
unnecessary variables and function argument.

This change has no functional impact. It is a part of a refactoring
series on qc_lstnr_pkt_rcv(). The objective is facilitate integration of
FD-owned socket patches.

This should be backported up to 2.6.
2022-10-19 18:12:56 +02:00
Amaury Denoyelle
0eae57273b MINOR: quic: add version field on quic_rx_packet
Add a new field version on quic_rx_packet structure. This is set on
header parsing in qc_lstnr_pkt_rcv() function.

This change has no functional impact. It is a part of a refactoring
series on qc_lstnr_pkt_rcv(). The objective is facilitate integration of
FD-owned socket patches.

This should be backported up to 2.6.
2022-10-19 18:12:56 +02:00
Willy Tarreau
f5a0c8abf5 MEDIUM: quic: respect the threads assigned to a bind line
Right now the QUIC thread mapping derives the thread ID from the CID
by dividing by global.nbthread. This is a problem because this makes
QUIC work on all threads and ignores the "thread" directive on the
bind lines. In addition, only 8 bits are used, which is no more
compatible with the up to 4096 threads we may have in a configuration.

Let's modify it this way:
  - the CID now dedicates 12 bits to the thread ID
  - on output we continue to place the TID directly there.
  - on input, the value is extracted. If it corresponds to a valid
    thread number of the bind_conf, it's used as-is.
  - otherwise it's used as a rank within the current bind_conf's
    thread mask so that in the end we still get a valid thread ID
    for this bind_conf.

The extraction function now requires a bind_conf in order to get the
group and thread mask. It was better to use bind_confs now as the goal
is to make them support multiple listeners sooner or later.
2022-10-13 18:08:05 +02:00
William Lallemand
eba6a54cd4 MINOR: logs: startup-logs can use a shm for logging the reload
When compiled with USE_SHM_OPEN=1 the startup-logs are now able to use
an shm which is used to keep the logs when switching to mworker wait
mode. This allows to keep the failed reload logs.

When allocating the startup-logs at first start of the process, haproxy
will do a shm_open with a unique path using the PID of the process, the
file is unlink immediatly so we don't let unwelcomed files be. The fd
resulting from this shm is stored in the HAPROXY_STARTUPLOGS_FD
environment variable so it can be mmap again when switching to wait
mode.

When forking children, the process is copying the mmap to a a mallocated
ring so we never share the same memory section between the master and
the workers. When switching to wait mode, the shm is not used anymore as
it is also copied to a mallocated structure.

This allow to use the "show startup-logs" command over the master CLI,
to get the logs of the latest startup or reload. This way the logs of
the latest failed reload are also kept.

This is only activated on the linux-glibc target for now.
2022-10-13 16:50:22 +02:00
William Lallemand
35df34223b MINOR: buffers: split b_force_xfer() into b_cpy() and b_force_xfer()
Split the b_force_xfer() into b_ncat() and b_force_xfer().

The previous b_force_xfer() implementation was basically a copy with a
b_del on the src buffer. Keep this implementation to make b_ncat(), and
just call b_ncat() + b_del() into b_force_xfer().
2022-10-13 16:45:28 +02:00
William Lallemand
9e4ead3095 MINOR: ring: ring_cast_from_area() cast from an allocated area
Cast an unified ring + storage area to a ring from area, without
reinitializing the data buffer. Reinitialize the waiters and the lock.

It helps retrieving a previously allocated ring, from an mmap for
example.
2022-10-13 16:45:28 +02:00
Amaury Denoyelle
1cba8d60f3 CLEANUP: quic: improve naming for rxbuf/datagrams handling
QUIC datagrams are read from a random thread. They are then redispatch
to the connection thread according to the first packet DCID. These
operations are implemented through a special buffer designed to avoid
locking.

Refactor this code with the following changes :
* <rxbuf> type is renamed <quic_receiver_buf>. Its list element is also
  renamed to highligh its attach point to a receiver.
* <quic_dgram> and <quic_receiver_buf> definition are moved to
  quic_sock-t.h. This helps to reduce the size of quic_conn-t.h.
* <quic_dgram> list elements are renamed to highlight their attach point
  into a <quic_receiver_buf> and a <quic_dghdlr>.

This should be backported up to 2.6.
2022-10-13 11:06:48 +02:00
Amaury Denoyelle
8c4d062d25 CLEANUP: quic: remove unused rxbufs member in receiver
rxbuf is the structure used to store QUIC datagrams and redispatch them
to the connection thread.

Each receiver manages a list of rxbuf. This was stored both as an array
and a mt_list. Currently, only mt_list is needed so removed <rxbufs>
member from receiver structure.

This should be backported up to 2.6.
2022-10-13 11:05:41 +02:00
Frédéric Lécaille
e1a49cfd4d MINOR: quic: Split the secrets key allocation in two parts
Implement quic_tls_secrets_keys_alloc()/quic_tls_secrets_keys_free() to allocate
the memory for only one direction (RX or TX).
Modify ha_quic_set_encryption_secrets() to call these functions for one of this
direction (or both). So, for now on we can rely on the value of the secret keys
to know if it was derived.
Remove QUIC_FL_TLS_SECRETS_SET flag which is no more useful.
Consequently, the secrets are dumped by the traces only if derived.

Must be backported to 2.6.
2022-10-13 10:12:03 +02:00
Frédéric Lécaille
4aa7d8197a BUG/MINOR: quic: Stalled 0RTT connections with big ClientHello TLS message
This issue was reproduced with -Q picoquic client option to split a big ClientHello
message into two Initial packets and haproxy as server without any knowledged of
any previous ORTT session (restarted after a firt 0RTT session). The ORTT received
packets were removed from their queue when the second Initial packet was parsed,
and the QUIC handshake state never progressed and remained at Initial state.

To avoid such situations, after having treated some Initial packets we always
check if there are ORTT packets to parse and we never remove them from their
queue. This will be done after the hanshake is completed or upon idle timeout
expiration.

Also add more traces to be able to analize the handshake progression.

Tested with ngtcp2 and picoquic

Must be backported to 2.6.
2022-10-13 10:12:03 +02:00
Frédéric Lécaille
9f9263ed13 MINOR: quic: Use a non-contiguous buffer for RX CRYPTO data
Implement quic_get_ncbuf() to dynamically allocate a new ncbuf to be attached to
any quic_cstream struct which needs such a buffer. Note that there is no quic_cstream
for 0RTT encryption level. quic_free_ncbuf() is added to release the memory
allocated for a non-contiguous buffer.

Modify qc_handle_crypto_frm() to call this function and allocate an ncbuf for
crypto data which are not received in order. The crypto data which are received in
order are not buffered but provide to the TLS stack (calling qc_provide_cdata()).

Modify qc_treat_rx_crypto_frms() which is called after having provided the
in order received crypto data to the TLS stack to provide again the remaining
crypto data which has been buffered, if possible (if they are in order). Each time
buffered CRYPTO data were consumed, we try to release the memory allocated for
the non-contiguous buffer (ncbuf).
Also move rx.crypto.offset quic_enc_level struct member to rx.offset quic_cstream
struct member.

Must be backported to 2.6.
2022-10-13 10:12:03 +02:00
Frédéric Lécaille
7e3f7c47e9 MINOR: quic: New quic_cstream object implementation
Add new quic_cstream struct definition to implement the CRYPTO data stream.
This is a simplication of the qcs object (QUIC streams) for the CRYPTO data
without any information about the flow control. They are not attached to any
tree, but to a QUIC encryption level, one by encryption level except for
the early data encryption level (for 0RTT). A stream descriptor is also allocated
for each CRYPTO data stream.

Must be backported to 2.6
2022-10-13 10:12:03 +02:00
Willy Tarreau
d114f4a68f MEDIUM: checks: spread the checks load over random threads
The CPU usage pattern was found to be high (5%) on a machine with
48 threads and only 100 servers checked every second That was
supposed to be only 100 connections per second, which should be very
cheap. It was figured that due to the check tasks unbinding from any
thread when going back to sleep, they're queued into the shared queue.

Not only this requires to manipulate the global queue lock, but in
addition it means that all threads have to check the global queue
before going to sleep (hence take a lock again) to figure how long
to sleep, and that they would all sleep only for the shortest amount
of time to the next check, one would pick it and all other ones would
go down to sleep waiting for the next check.

That's perfectly visible in time-to-first-byte measurements. A quick
test consisting in retrieving the stats page in CSV over a 48-thread
process checking 200 servers every 2 seconds shows the following tail:

  percentile   ttfb(ms)
  99.98        2.43
  99.985       5.72
  99.99       32.96
  99.995     82.176
  99.996     82.944
  99.9965    83.328
  99.997      83.84
  99.9975    84.288
  99.998      85.12
  99.9985    86.592
  99.999         88
  99.9995    89.728
  99.9999   100.352

One solution could consist in forcefully binding checks to threads at
boot time, but that's annoying, will cause trouble for dynamic servers
and may cause some skew in the load depending on some server patterns.

Instead here we take a different approach. A check remains bound to its
thread for as long as possible, but upon every wakeup, the thread's load
is compared with another random thread's load. If it's found that that
other thread's load is less than half of the current one's, the task is
bounced to that thread. In order to prevent that new thread from doing
the same, we set a flag "CHK_ST_SLEEPING" that indicates that it just
woke up and we're bouncing the task only on this condition.

Tests have shown that the initial load was very unfair before, with a few
checks threads having a load of 15-20 and the vast majority having zero.
With this modification, after two "inter" delays, the load is either zero
or one everywhere when checks start. The same test shows a CPU usage that
significantly drops, between 0.5 and 1%. The same latency tail measurement
is much better, roughly 10 times smaller:

  percentile   ttfb(ms)
  99.98        1.647
  99.985       1.773
  99.99        4.912
  99.995        8.76
  99.996        8.88
  99.9965      8.944
  99.997       9.016
  99.9975      9.104
  99.998       9.224
  99.9985      9.416
  99.999         9.8
  99.9995      10.04
  99.9999     10.432

In fact one difference here is that many threads work while in the past
they were waking up and going down to sleep after having perturbated the
shared lock. Thus it is anticipated that this will scale way smoother
than before. Under strace it's clearly visible that all threads are
sleeping for the time it takes to relaunch a check, there's no more
thundering herd wakeups.

However it is also possible that in some rare cases such as very short
check intervals smaller than a scheduler's timeslice (such as 4ms),
some users might have benefited from the work being concentrated on
less threads and would instead observe a small increase of apparent
CPU usage due to more total threads waking up even if that's for less
work each and less total work. That's visible with 200 servers at 4ms
where show activity shows that a few threads were overloaded and others
doing nothing. It's not a problem, though as in practice checks are not
supposed to eat much CPU and to wake up fast enough to represent a
significant load anyway, and the main issue they could have been
causing (aside the global lock) is an increase last-percentile latency.
2022-10-12 21:49:30 +02:00
Christopher Faulet
c8db114afc MINOR: flags/mux-fcgi: Decode FCGI connection and stream flags
The new functions fconn_show_flags() and fstrm_show_flags() decode the flags
state into a string, and are used by dev/flags:

$ /dev/flags/flags fconn 0x3100
fconn->flags = FCGI_CF_GET_VALUES | FCGI_CF_KEEP_CONN | FCGI_CF_MPXS_CONNS

./dev/flags/flags fstrm  0x3300
fstrm->flags = FCGI_SF_WANT_SHUTW | FCGI_SF_WANT_SHUTR | FCGI_SF_OUTGOING_DATA | FCGI_SF_BEGIN_SENT
2022-10-12 17:10:41 +02:00
Christopher Faulet
3965aa7494 REORG: mux-fcgi: Extract flags and enums into mux_fcgi-t.h
The same was performed for the H2 and H1 multiplexers. FCGI connection and
stream flags are moved in a dedicated header file. It will be mainly used to
be able to decode mux-fcgi flags from the flags utility.

In this patch, we move the flags and enums to mux_fcgi-t.h, as well as the
two state decoding inline functions.
2022-10-12 17:10:37 +02:00
Willy Tarreau
dbae89e09c MEDIUM: stick-table: always use atomic ops to requeue the table's task
We're generalizing the change performed in previous commit "MEDIUM:
stick-table: requeue the expiration task out of the exclusive lock"
to stktable_requeue_exp() so that it can also be used by callers of
__stktable_store(). At the moment there's still no visible change
since it's still called under the write lock. However, the previous
code in stitable_touch_with_exp() was updated to use this function.
2022-10-12 14:19:05 +02:00
Willy Tarreau
8d3c3336f9 MEDIUM: stick-table: make stksess_kill_if_expired() avoid the exclusive lock
stream_store_counters() calls stksess_kill_if_expired() for each active
counter. And this one takes an exclusive lock on the table before
checking if it has any work to do (hint: it almost never has since it
only wants to delete expired entries). However a lock is still neeed for
now to protect the ref_cnt, but we can do it atomically under the read
lock.

Let's change the mechanism. Now what we do is to check out of the lock
if the entry is expired. If it is, we take the write lock, expire it,
and decrement the refcount. Otherwise we just decrement the refcount
under a read lock. With this change alone, the config based on 3
trackers without the previous patches saw a 2.6x improvement, but here
it doesn't yet change anything because some heavy contention remains
on the lookup part.
2022-10-12 14:19:05 +02:00
Willy Tarreau
9f5cb435b6 MINOR: stick-table: move the write lock inside stktable_touch_with_exp()
Taking the write lock prior to entering that function is a problem
because this function is full of conditions that most of the time can
lead to eliminating the lock.

This commit first moves the write lock inside the function and passes
the extra argument required to implement stktable_touch_remote() and
stktable_touch_local(). It also renames the function to remove the
underscores since there's no other variant and it's exported under
this name (probably an old rename that was not propagated). The code
was stressed under 48 threads using 3 trackers on the same table. It
already shows a tiny 3% improvement from 187k to 193k rps.
2022-10-12 14:19:05 +02:00
Willy Tarreau
76642223f0 MEDIUM: stick-table: switch the table lock to rwlock
Right now a spinlock is used, but most accesses are for reads, so let's
switch the lock to an rwlock and switch all accesses to exclusive locks
for now. There should be no visible difference at this point.
2022-10-12 14:19:05 +02:00
Willy Tarreau
f6a42c3a37 MINOR: freq_ctr: use the thread's local time whenever possible
Right now when dealing with freq_ctr updates, we're using the process-
wide monotinic time, and accessing it is expensive since every thread
needs to update it, so this adds some contention. However we don't need
it all the time, the thread's local time is most of the time strictly
equal to the global time, and may be off by one millisecond when the
global time is switched to the next one by another thread, and in this
case we don't want to use the local time because it would risk to cause
a rotation of the counter. But that's precisely the condition we're
already relying on for the slow path!

What this patch does is to add a check for the period against the
local time prior to anything else, and immediately return after
updating the counter if still within the period, otherwise fall back
to the existing code. Given that the function starts to inflate a bit,
it was split between s very short inline part that does the hot path,
and the slower fallback that's in a cold function. It was measured that
on a 24-CPU machine it was called ~0.003% of the time.

The resulting improvement sits between 2 and 3% at 500k req/s tracking
an http_req_rate counter.
2022-10-12 14:19:05 +02:00
Willy Tarreau
b13044cc1a MINOR: plock: support disabling exponential back-off
The new macro PLOCK_DISABLE_EBO may be defined to disable exponential
backoff. This can be useful to more easily spot functions that cause
contention. In this case the CPU will be spent inside the functions
themselves instead of the pl_wait_unlock_{long,int}() functions, making
them easier to spot using "perf top" even if that causes a significant
degradation of the thread scalability.
2022-10-12 14:19:05 +02:00
Willy Tarreau
cab054bbf9 CLEANUP: quic/receiver: remove the now unused tx_qring list
The tx_qrings[] and tx_qring_list in the receiver are not used
anymore since commit f2476053f ("MINOR: quic: replace custom buf on Tx
by default struct buffer"), the only place where they're referenced
was in quic_alloc_tx_rings_listener(), which by the way implies that
these were not even freed on exit.

Let's just remove them. This should be backported to 2.6 since the
commit above also was.
2022-10-11 08:40:38 +02:00
Amaury Denoyelle
97ecc7a8ea MEDIUM: quic: retrieve frontend destination address
Retrieve the frontend destination address for a QUIC connection. This
address is retrieve from the first received datagram and then stored in
the associated quic-conn.

This feature relies on IP_PKTINFO or affiliated flags support on the
socket. This flag is set for each QUIC listeners in
sock_inet_bind_receiver(). To retrieve the destination address,
recvfrom() has been replaced by recvmsg() syscall. This operation and
parsing of msghdr structure has been extracted in a wrapper quic_recv().

This change is useful to finalize the implementation of 'dst' sample
fetch. As such, quic_sock_get_dst() has been edited to return local
address from the quic-conn. As a best effort, if local address is not
available due to kernel non-support of IP_PKTINFO, address of the
listener is returned instead.

This should be backported up to 2.6.
2022-10-10 11:48:27 +02:00
Amaury Denoyelle
2ed840015f MINOR: quic: limit usage of ssl_sock_ctx in favor of quic_conn
Continue on the cleanup of QUIC stack and components.

quic_conn uses internally a ssl_sock_ctx to handle mandatory TLS QUIC
integration. However, this is merely as a convenience, and it is not
equivalent to stackable ssl xprt layer in the context of HTTP1 or 2.

To better emphasize this, ssl_sock_ctx usage in quic_conn has been
removed wherever it is not necessary : namely in functions not related
to TLS. quic_conn struct now contains its own wait_event for tasklet
quic_conn_io_cb().

This should be backported up to 2.6.
2022-10-05 11:08:32 +02:00
Willy Tarreau
922a907926 MINOR: fd: add a new function to only raise RLIMIT_NOFILE
In issue #1866 an issue was reported under docker, by which a user cannot
lower the number of FD needed. It looks like a restriction imposed in this
environment, but it results in an error while it ought not have to in the
case of shrinking.

This patch adds a new function raise_rlim_nofile() that takes the desired
new setting, compares it to the current one, and only calls setrlimit() if
one of the values in the new setting is larger than the older one. As such
it will continue to emit warnings and errors in case of failure to raise
the limit but will never shrink it.

This patch is only preliminary to another one, but will have to be
backported where relevant (likely only 2.6).
2022-10-04 08:38:47 +02:00
Amaury Denoyelle
92fa63f735 CLEANUP: quic: create a dedicated quic_conn module
xprt_quic module was too large and did not reflect the true architecture
by contrast to the other protocols in haproxy.

Extract code related to XPRT layer and keep it under xprt_quic module.
This code should only contains a simple API to communicate between QUIC
lower layer and connection/MUX.

The vast majority of the code has been moved into a new module named
quic_conn. This module is responsible to the implementation of QUIC
lower layer. Conceptually, it overlaps with TCP kernel implementation
when comparing QUIC and HTTP1/2 stacks of haproxy.

This should be backported up to 2.6.
2022-10-03 16:25:17 +02:00
Amaury Denoyelle
a2639383ec CLEANUP: quic: remove duplicated varint code from xprt_quic.h
There was some identical code between xprt_quic and quic_enc modules.
This concerns helper on QUIC varint type. Keep only the version in
quic_enc file : this should help to reduce dependency on xprt_quic
module.

Note that quic_max_int_by_size() has been removed and is replaced by the
identical quic_max_int().

This should be backported up to 2.6.
2022-10-03 16:25:17 +02:00
Amaury Denoyelle
5c25dc5bfd CLEANUP: quic: fix headers
Clean up quic sources by adjusting headers list included depending
on the actual dependency of each source file.

On some occasion, xprt_quic.h was removed from included list. This is
useful to help reducing the dependency on this single file and cleaning
up QUIC haproxy architecture.

This should be backported up to 2.6.
2022-10-03 16:25:17 +02:00
Amaury Denoyelle
f3c40f83fb BUG/MINOR: quic: adjust quic_tls prototypes
Two prototypes in quic_tls module were not identical to the actual
function definition.

* quic_tls_decrypt2() : the second argument const attribute is not
  present, to be able to use it with EVP_CIPHER_CTX_ctlr(). As a
  consequence of this change, token field of quic_rx_packet is now
  declared as non-const.

* quic_tls_generate_retry_integrity_tag() : the second argument type
  differ between the two. Adjust this by fixing it to as unsigned char
  to match EVP_EncryptUpdate() SSL function.

This situation did not seem to have any visible effect. However, this is
clearly an undefined behavior and should be treated as a bug.

This should be backported up to 2.6.
2022-10-03 16:25:17 +02:00
Amaury Denoyelle
a19bb6f0b2 CLEANUP: quic: remove global var definition in quic_tls header
Some variables related to QUIC TLS were defined in a header file : their
definitions are now moved properly in the implementation file, with only
declarations in the header.

This should be backported up to 2.6.
2022-10-03 16:25:17 +02:00
Willy Tarreau
406efb96d1 BUG/MINOR: backend: only enforce turn-around state when not redispatching
In github issue #1878, Bart Butler reported observing turn-around states
(1 second pause) after connection retries going to different servers,
while this ought not happen.

In fact it does happen because back_handle_st_cer() enforces the TAR
state for any algo that's not round-robin. This means that even leastconn
has it, as well as hashes after the number of servers changed.

Prior to doing that, the call to stream_choose_redispatch() has already
had a chance to perform the correct choice and to check the algo and
the number of retries left. So instead we should just let that function
deal with the algo when needed (and focus on deterministic ones), and
let the former just obey. Bart confirmed that the fixed version works
as expected (no more delays during retries).

This may be backported to older releases, though it doesn't seem very
important. At least Bart would like to have it in 2.4 so let's go there
for now after it has cooked a few weeks in 2.6.
2022-10-03 15:04:55 +02:00
Willy Tarreau
8522348482 BUG/MAJOR: conn-idle: fix hash indexing issues on idle conns
Idle connections do not work on 32-bit machines due to an alignment issue
causing the connection nodes to be indexed with their lower 32-bits set to
zero and the higher 32 ones containing the 32 lower bitss of the hash. The
cause is the use of ebmb_node with an aligned data, as on this platform
ebmb_node is only 32-bit aligned, leaving a hole before the following hash
which is a uint64_t:

  $ pahole -C conn_hash_node ./haproxy
  struct conn_hash_node {
        struct ebmb_node           node;                 /*     0    20 */

        /* XXX 4 bytes hole, try to pack */

        int64_t                    hash;                 /*    24     8 */
        struct connection *        conn;                 /*    32     4 */

        /* size: 40, cachelines: 1, members: 3 */
        /* sum members: 32, holes: 1, sum holes: 4 */
        /* padding: 4 */
        /* last cacheline: 40 bytes */
  };

Instead, eb64 nodes should be used when it comes to simply storing a
64-bit key, and that is what this patch does.

For backports, a variant consisting in simply marking the "hash" member
with a "packed" attribute on the struct also does the job (tested), and
might be preferable if the fix is difficult to adapt. Only 2.6 and 2.5
are affected by this.
2022-10-03 12:06:36 +02:00
Erwan Le Goas
d78693178c MINOR: cli: correct commentary and replace 'set global-key' name
Correct a commentary in in include/haproxy/global-t.h and include/haproxy/tools.h
Replace the CLI command 'set global-key <key>' by 'set anon global-key <key>' in
order to find it easily when you don't remember it, the recommandation can guide
you when you just tap 'set anon'.

No backport needed, except if anonymization mechanism is backported.
2022-09-29 10:53:15 +02:00
Erwan Le Goas
f30c5d7666 MINOR: config: Add option line when the configuration file is dumped
Add an option to dump the number lines of the configuration file when
it's dumped. Other options can be easily added. Options are separated
by ',' when tapping the command line:
'./haproxy -dC[key],line -f [file]'

No backport needed, except if anonymization mechanism is backported.
2022-09-29 10:53:15 +02:00
Erwan Le Goas
5eef1588a1 MINOR: tools: modify hash_ipanon in order to use it in cli
Add a parameter hasport to return a simple hash or ipstring when
ipstring has no port. Doesn't hash if scramble is null. Add
option PA_O_PORT_RESOLVE to str2sa_range. Add a case UNIX.
Those modification permit to use hash_ipanon in cli section
in order to dump the same anonymization of address in the
configuration file and with CLI.

No backport needed, except if anonymization mechanism is backported.
2022-09-29 10:53:14 +02:00