Commit Graph

6593 Commits

Author SHA1 Message Date
William Lallemand
46bea1c616 BUILD: peers: peers-t.h depends on stick-table-t.h
peers-t.h uses "struct stktable" as well as STKTABLE_DATA_TYPES which
are defined in stick-table-t.h. It works by accident because
stick-table-t.h was always included before. But could provoke build
issue with EXTRA code.

To be backported as far as 2.2.
2022-12-16 15:51:44 +01:00
Aurelien DARRAGON
5594184190 MINOR: stats: introduce stats field ctx
Add a new value in stats ctx: field.
Implement field support in line dumping parent functions
stats_print_proxy_field_json() and stats_dump_proxy_to_buffer().

This will allow child dumping functions to support partial line dumping
when needed. ie: when dumping buffer is exhausted: do a partial send and
wait for a new buffer to finish the dump. Thanks to field ctx, the function
can start dumping where it left off on previous (unterminated) invokation.
2022-12-15 16:53:49 +01:00
Amaury Denoyelle
15f3cc4b38 MINOR: http: extract content-length parsing from H2
Extract function h2_parse_cont_len_header() in the generic HTTP module.
This allows to reuse it for all HTTP/x parsers. The function is now
available as http_parse_cont_len_header().

Most notably, this will be reused in the next bugfix for the H3 parser.
This is necessary to check that content-length header match the length
of DATA frames.

Thus, it must be backported to 2.6.
2022-12-14 11:34:18 +01:00
Amaury Denoyelle
dbf6ad470b BUG/MINOR: quic: properly handle alloc failure in qc_new_conn()
qc_new_conn() is used to allocate a quic_conn instance and its various
internal members. If one allocation fails, quic_conn_release() is used
to cleanup things.

For the moment, pool_zalloc() is used which ensures that all content is
null. However, some members must be initialized to a special values
to be able to use quic_conn_release() safely. This is the case for
quic_conn lists and its tasklet.

Also, some quic_conn internal allocation functions were doing their own
cleanup on failure without reset to NULL. This caused an issue with
quic_conn_release() which also frees this members. To fix this, these
functions now only return an error without cleanup. It is the caller
responsibility to free the allocated content, which is done via
quic_conn_release().

Without this patch, allocation failure in qc_new_conn() would often
result in segfault. This was reproduced easily using fail-alloc at 10%.

This should be backported up to 2.6.
2022-12-12 11:44:34 +01:00
Cedric Paillet
e06e31ea3b MINOR: promex: introduce haproxy_backend_agg_check_status
This patch introduces haproxy_backend_agg_check_status metric
as we wanted in 42d7c402d but with the right data source.

This patch could be backported as far as 2.4.
2022-12-09 10:54:48 +01:00
Cedric Paillet
7d6644e689 BUG/MINOR: promex: create haproxy_backend_agg_server_status
haproxy_backend_agg_server_check_status currently aggregates
haproxy_server_status instead of haproxy_server_check_status.
We deprecate this and create a new one,
haproxy_backend_agg_server_status to clarify what it really
does.

This patch could be backported as far as 2.4.
2022-12-09 10:54:27 +01:00
Willy Tarreau
9192d20f02 MINOR: pools: make DEBUG_UAF a runtime setting
Since the massive pools cleanup that happened in 2.6, the pools
architecture was made quite more hierarchical and many alternate code
blocks could be moved to runtime flags set by -dM. One of them had not
been converted by then, DEBUG_UAF. It's not much more difficult actually,
since it only acts on a pair of functions indirection on the slow path
(OS-level allocator) and a default setting for the cache activation.

This patch adds the "uaf" setting to the options permitted in -dM so
that it now becomes possible to set or unset UAF at boot time without
recompiling. This is particularly convenient, because every 3 months on
average, developers ask a user to recompile haproxy with DEBUG_UAF to
understand a bug. Now it will not be needed anymore, instead the user
will only have to disable pools and enable uaf using -dMuaf. Note that
-dMuaf only disables previously enabled pools, but it remains possible
to re-enable caching by specifying the cache after, like -dMuaf,cache.
A few tests with this mode show that it can be an interesting combination
which catches significantly less UAF but will do so with much less
overhead, so it might be compatible with some high-traffic deployments.

The change is very small and isolated. It could be helpful to backport
this at least to 2.7 once confirmed not to cause build issues on exotic
systems, and even to 2.6 a bit later as this has proven to be useful
over time, and could be even more if it did not require a rebuild. If
a backport is desired, the following patches are needed as well:

  CLEANUP: pools: move the write before free to the uaf-only function
  CLEANUP: pool: only include pool-os from pool.c not pool.h
  REORG: pool: move all the OS specific code to pool-os.h
  CLEANUP: pools: get rid of CONFIG_HAP_POOLS
  DEBUG: pool: show a few examples in -dMhelp
2022-12-08 18:54:59 +01:00
Willy Tarreau
4da51bd190 CLEANUP: pools: get rid of CONFIG_HAP_POOLS
This one was set in defaults.h only when neither DEBUG_NO_POOLS nor
DEBUG_UAF were set. This was not the most convenient location to look
for it, and it was only used in pool.c to decide on the initial value
of POOL_DBG_NO_CACHE.

Let's just use DEBUG_NO_POOLS || DEBUG_UAF directly on this flag and
get rid of the intermediary condition. This also has the benefit of
removing a double inversion, which is always nice for understanding.
2022-12-08 17:45:08 +01:00
Willy Tarreau
a95636682d REORG: pool: move all the OS specific code to pool-os.h
Till now pool-os used to contain a mapping from pool_{alloc,free}_area()
to pool_{alloc,free}_area_uaf() in case of DEBUG_UAF, or the regular
malloc-based function. And the *_uaf() functions were in pool.c. But
since 2.4 with the first cleanup of the pools, there has been no more
calls to pool_{alloc,free}_area() from anywhere but pool.c, from exactly
one place each. As such, there's no more need to keep *_uaf() apart in
pool.c, we can inline it into pool-os.h and leave all the OS stuff there,
with pool.c calling either based on DEBUG_UAF. This is cleaner with less
round trips between both files and easier to find.
2022-12-08 17:32:57 +01:00
Willy Tarreau
76a97a98ca CLEANUP: pool: only include pool-os from pool.c not pool.h
There's no need for the low-level pool functions to be known from all
callers anymore, they're only used by pool.c. Let's reduce the amount
of header files processed.
2022-12-08 17:32:40 +01:00
Willy Tarreau
5ab3c61932 BUILD: atomic: atomic.h may need compiler.h on ARMv8.2-a
We get a build error in ncbuf.c when building for ARMv8.2-a because ncbuf
has minimal includes and among them bug.h which includes atomic.h. Atomic.h
may use "forceinline" without including compiler.h, hence the build error.
It was verified that adding it doesn't inflate the total headers.

Since all other C files include api.h which already covers this, there's
no real need to bapkport this. The issue was already there in 2.3 though.
2022-12-08 08:36:24 +01:00
Aurelien DARRAGON
7d541a91ec BUG/MINOR: checks: restore legacy on-error fastinter behavior
With previous commit, 9e080bf ("BUG/MINOR: checks: make sure fastinter is used
even on forced transitions"), on-error mark-down|sudden-death|fail-check are
now working as expected.

However, on-error fastinter remains broken because srv_getinter(), used in
the above commit to check the expiration date, won't return fastinter interval
if server health is maxed out (which is the case with on-error fastinter mode).

To fix this, we introduce a check flag named CHK_ST_FASTINTER.
This flag is set when on-error is triggered. This way we can force
srv_getinter() to return fastinter interval whenever the flag is set.
The flag is automatically cleared as soon as the new check task expiry is
recalculated in process_chk_conn().
This restores original behavior prior to d114f4a ("MEDIUM: checks: spread the
checks load over random threads").

It must be backported to 2.7 along with the aforementioned commits.
2022-12-07 17:03:55 +01:00
Ilya Shipitsin
5fa29b8a74 CLEANUP: assorted typo fixes in the code and comments
This is 34th iteration of typo fixes
2022-12-07 09:08:18 +01:00
Aurelien DARRAGON
22f82f81e5 MINOR: server/event_hdl: add support for SERVER_UP and SERVER_DOWN events
We're using srv_update_status() as the only event source or UP/DOWN server
events in an attempt to simplify the support for these 2 events.

It seems srv_update_status() is the common path for server state changes anyway

Tested with server state updated from various sources:
  - the cli
  - server-state file (maybe we could disable this or at least don't publish
  in global event queue in the future if it ends in slower startup for setups
  relying on huge server state files)
  - dns records (ie: srv template)
  (again, could be fined tuned to only publish in server specific subscriber
  list and no longer in global subscription list if mass dns update tend to
  slow down srv_update_status())
  - normal checks and observe checks (HCHK_STATUS_HANA)
  (same as above, if checks related state update storms are expected)
  - lua scripts
  - html stats page (admin mode)
2022-12-06 10:22:07 +01:00
Aurelien DARRAGON
129ecf441f MINOR: server/event_hdl: add support for SERVER_ADD and SERVER_DEL events
Basic support for ADD and DEL server events are added through this commit:
	SERVER_ADD is published on dynamic server addition through cli.
	SERVER_DEL is published on dynamic server deletion through cli.

This work depends on:
	"MINOR: event_hdl: add event handler base api"
	"MINOR: server: add srv->rid (revision id) value"
2022-12-06 10:22:07 +01:00
Aurelien DARRAGON
745ce8e8ad MINOR: stats: add server revision id support
Make use of the new srv->rid value in stats.

Stat is referred as ST_F_SRID, it is now used in stats_fill_sv_stats
function in order to be included in csv and json stats dumps.

Moreover, "rid: $value" will be displayed next to server puid
in html stats page if "stats show-legend" is specified in the stats frontend.
(mouse hovering tooltip)

Depends on the following commit:
	"MINOR: server: add srv->rid (revision id) value"
2022-12-06 10:22:06 +01:00
Aurelien DARRAGON
61e3894dfe MINOR: server: add srv->rid (revision id) value
With current design, we could not distinguish between
previously existing deleted server and a new server reusing
the deleted server name/id.

This can cause some confusion when auditing stats/events/logs,
because the new server will look similar to the old
one.

To address this, we're adding a new value in server structure: rid

rid (revision id) value is an unsigned 32bits value that is set upon
server creation. Value is derived from a global counter that starts
at 0 and is incremented each time one or multiple server deletions are
followed by a server addition (meaning that old name/id reuse could occur).

Thanks to this revision id, it is now easy to tell whether the server
we're looking at is the same as before or if it has been deleted and
re-added in the meantime.
(combining server name/id + server revision id yields a process-wide unique
identifier)
2022-12-06 10:22:06 +01:00
Amaury Denoyelle
d3083c9df9 MINOR: quic: reconnect quic-conn socket on address migration
UDP addresses may change over time for a QUIC connection. When using
quic-conn owned socket, we have to detect address change to break the
bind/connect association on the socket.

For the moment, on change detected, QUIC connection socket is closed and
a new one is opened. In the future, we may improve this by trying to
keep the original socket and reexecute only bind/connect syscalls.

This change is part of quic-conn owned socket implementation.
It may be backported to 2.7 after a period of observation.
2022-12-02 14:45:43 +01:00
Amaury Denoyelle
7c9fdd9c3a MEDIUM: quic: move receive out of FD handler to quic-conn io-cb
This change is the second part for reception on QUIC connection socket.
All operations inside the FD handler has been delayed to quic-conn
tasklet via the new function qc_rcv_buf().

With this change, buffer management on reception has been simplified. It
is now possible to use a local buffer inside qc_rcv_buf() instead of
quic_receiver_buf().

This change is part of quic-conn owned socket implementation.
It may be backported to 2.7 after a period of observation.
2022-12-02 14:45:43 +01:00
Amaury Denoyelle
5b41486b7f MEDIUM: quic: use quic-conn socket for reception
Try to use the quic-conn socket for reception if it is allocated. For
this, the socket is inserted in the fdtab. This will call the new
handler quic_conn_io_cb() which is responsible to process the recv()
system call. It will reuse datagram dispatch for simplicity. However,
this is guaranteed to be called on the quic-conn thread, so it will be
more efficient to use a dedicated buffer. This will be implemented in
another commit.

This patch should improve performance by reducing contention on the
receiver socket. However, more gain can be obtained when the datagram
dispatch operation will be skipped.

Older quic_sock_fd_iocb() is renamed to quic_lstnr_sock_fd_iocb() to
emphasize its usage for the receiver socket.

This change is part of quic-conn owned socket implementation.
It may be backported to 2.7 after a period of observation.
2022-12-02 14:45:43 +01:00
Amaury Denoyelle
40909dfec5 MINOR: quic: allocate a socket per quic-conn
Allocate quic-conn owned socket if possible. This requires that this is
activated in haproxy configuration. Also, this is done only if local
address is known so it depends on the support of IP_PKTINFO.

For the moment this socket is not used. This causes QUIC support to be
broken as received datagram are not read. This commit will be completed
by a following patch to support recv operation on the newly allocated
socket.

This change is part of quic-conn owned socket implementation.
It may be backported to 2.7 after a period of observation.
2022-12-02 14:45:43 +01:00
Amaury Denoyelle
75839a44e7 MINOR: quic: startup detect for quic-conn owned socket support
To be able to use individual sockets for QUIC connections, we rely on
the OS network stack which must support UDP sockets binding on the same
local address.

Add a detection code for this feature executed on startup. When the
first QUIC listener socket is binded, a test socket is created and
binded on the same address. If the bind call fails, we consider that
it's impossible to use individual socket for QUIC connections.

A new global option GTUNE_QUIC_SOCK_PER_CONN is defined. If startup
detect fails, this value is resetted from global options. For the
moment, there is no code to activate the option : this will be in a
follow-up patch with the introduction of a new configuration option.

This change is part of quic-conn owned socket implementation.
It may be backported to 2.7 after a period of observation.
2022-12-02 14:45:43 +01:00
Amaury Denoyelle
eec0b3c1bd MINOR: quic: detect connection migration
Detect connection migration attempted by the client. This is done by
comparing addresses stored in quic-conn with src/dest addresses of the
UDP datagram.

A new function qc_handle_conn_migration() has been added. For the
moment, no operation is conducted and the function will be completed
during connection migration implementation. The only notable things is
the increment of a new counter "quic_conn_migration_done".

This should be backported up to 2.7.
2022-12-02 14:45:43 +01:00
Amaury Denoyelle
21e611dc89 MINOR: tools: add port for ipcmp as optional criteria
Complete ipcmp() function with a new argument <check_port>. If this
argument is true, the function will compare port values besides IP
addresses and return true only if both are identical.

This commit will simplify QUIC connection migration detection. As such,
it should be backported to 2.7.
2022-12-02 14:45:43 +01:00
Amaury Denoyelle
8687b63c69 MINOR: quic: extract datagram parsing code
Extract individual datagram parsing code outside of datagrams list loop
in quic_lstnr_dghdlr(). This is moved in a new function named
quic_dgram_parse().

To complete this change, quic_lstnr_dghdlr() has been moved into
quic_sock source file : it belongs to QUIC socket lower layer and is
directly called by quic_sock_fd_iocb().

This commit will ease implementation of quic-conn owned socket.
New function quic_dgram_parse() will be easily usable after a receive
operation done on quic-conn IO-cb.

This should be backported up to 2.7.
2022-12-02 14:45:43 +01:00
Amaury Denoyelle
518c98f150 MINOR: quic: remove qc from quic_rx_packet
quic_rx_packet struct had a reference to the quic_conn instance. This is
useless as qc instance is always passed through function argument. In
fact, pkt.qc is used only in qc_pkt_decrypt() on key update, even though
qc is also passed as argument.

Simplify this by removing qc field from quic_rx_packet structure
definition. Also clean up qc_pkt_decrypt() documentation and interface
to align it with other quic-conn related functions.

This should be backported up to 2.7.
2022-12-02 14:45:43 +01:00
William Lallemand
52ddd99940 MEDIUM: ssl: rename the struct "cert_key_and_chain" to "ckch_data"
Rename the structure "cert_key_and_chain" to "ckch_data" in order to
avoid confusion with the store whcih often called "ckchs".

The "cert_key_and_chain *ckch" were renamed "ckch_data *data", so we now
have store->data instead of ckchs->ckch.

Marked medium because it changes the API.
2022-12-02 11:48:30 +01:00
Aurelien DARRAGON
68e692da02 MINOR: event_hdl: add event handler base api
Adding base code to provide subscribe/publish API for internal
events processing.

event_hdl provides two complementary APIs, both are implemented
in src/event_hdl.c and include/haproxy/event_hdl{-t.h,.h}:

	One API targeting developers that want to register event handlers
	that will be notified on specific events.
	(SUBSCRIBE)

	One API targeting developers that want to notify registered handlers
	about an event.
	(PUBLISH)

This feature is being considered to address the following scenarios:
	- mailers code refactoring (getting rid of deprecated
	tcp-check ruleset implementation)
	- server events from lua code (registering user defined
	lua function that is executed with relevant data when a
	server is dynamically added/removed or on server state change)
	- providing a stable and easy to use API for upcoming
	developments that rely on specific events to perform actions.
	(e.g: ressource cleanup when a server is deleted from haproxy)

At this time though, we don't have much use cases in mind in addition to
server events handling, but the API is aimed at being multipurpose
so that new event families, with their own particularities, can be
easily implemented afterwards (and hopefully) without requiring breaking
changes to the API.

Moreover, you should know that the API was not designed to cope well
with high rate event publishing.
Mostly because publishing means iterating over unsorted subscriber list.
So it won't scale well as subscriber list increases, but it is intended in
order to keep the code simple and versatile.

Instead, it is assumed that events implemented using this API
should be periodic events, and that events related to critical
io/networking processing should be handled using
dedicated facilities anyway.
(After all, this is meant to be a general purpose event API)

Apart from being easily extensible, one of the main goals of this API is
to make subscriber code as simple and safe as possible.

This is done by offering multiple event handling modes:
	- SYNC mode:
		publishing code directly
		leverages handler code (callback function)
		and handler code has a direct access to "live" event data
		(pointers mostly, alongside with lock hints/context
		so that accessing data pointers can be done properly)
	- normal ASYNC mode:
		handler is executed in a backward compatible way with sync mode,
		so that it is easy to switch from and to SYNC/ASYNC mode.
		Only here the handler has access to "offline" event data, and
		not "live" data (ptrs) so that data consistency is guaranteed.
		By offline, you should understand "snapshot" of relevant data
		at the time of the event, so that the handler can consume it
		later (even if associated ressource is not valid anymore)
	- advanced ASYNC mode
		same as normal ASYNC mode, but here handler is not a function
		that is executed with event data passed as argument: handler is a
		user defined tasklet that is notified when event occurs.
		The tasklet may consume pending events and associated data
		through its own message queue.

ASYNC mode should be considered first if you don't rely on live event
data and you wan't to make sure that your code has the lowest impact
possible on publisher code. (ie: you don't want to break stuff)

Internal API documentation will follow:
	You will find more details about the notions we roughly approached here.
2022-12-02 09:40:52 +01:00
Willy Tarreau
eaded987ee [RELEASE] Released version 2.8-dev0
Released version 2.8-dev0 with the following main changes :
    - MINOR: version: mention that it's development again
2022-12-01 15:25:34 +01:00
Willy Tarreau
989c55dc2f MINOR: version: mention that it's development again
This essentially reverts d705b85a4a.
2022-12-01 15:24:10 +01:00
Willy Tarreau
d705b85a4a MINOR: version: mention that it's stable now
This version will be maintained up to around Q1 2024. The INSTALL file
also mentions it.
2022-12-01 15:15:24 +01:00
Stefan Eissing
b82296c10e BUILD: quic: allow build with USE_QUIC and USE_OPENSSL_WOLFSSL
WolfSSL does not implement the TLS1_3_CK_AES_128_CCM_SHA256 cipher as
well as the SSL_ERROR_WANT_ASYNC, SSL_ERROR_WANT_ASYNC_JOB and
SSL_ERROR_WANT_CLIENT_HELLO_CB error codes.

This patch disables them for WolfSSL.

Signed-off-by: William Lallemand <wlallemand@haproxy.org>
2022-11-30 17:38:27 +01:00
Ilya Shipitsin
6f86eaae4f CLEANUP: assorted typo fixes in the code and comments
This is 33rd iteration of typo fixes
2022-11-30 14:02:36 +01:00
Willy Tarreau
d5cae6a0c7 MINOR: stick-table: change the API of the function used to calculate the shard
The function used to calculate the shard number currently requires a
stktable_key on input for this. Unfortunately, it happens that peers
currently miss this calculation and they do not provide stktable_key
at all, instead they're open-coding all the low-level stick-table work
(hence why it's missing). Thus we'll need to be able to calculate the
shard number in keys coming from peers as well but the current API does
not make it possible.

This commit addresses this by inverting the order where the length and
the shard number are used. Now the low-level function is independent on
stksess and stktable_key, it takes a table, pointer and length and does
all the job. The upper function takes care of the type and key to get
the its length, and is for use only from stick-table code.

This doesn't change anything except that the low-level one will be usable
from outside (hence why it's exported now).
2022-11-29 18:06:42 +01:00
Amaury Denoyelle
d64a26f023 CLEANUP: ncbuf: inline small functions
ncbuf API relies on lot of small functions. Mark these functions as
inline to reduce call invocations and facilitate compiler optimizations
to reduce code size.

This should be backported up to 2.6.
2022-11-29 15:14:39 +01:00
Willy Tarreau
56460ee52a MINOR: stick-table: store a per-table hash seed and use it
Instead of using memcpy() to concatenate the table's name to the key when
allocating an stksess, let's compute once for all a per-table seed at boot
time and use it to calculate the key's hash. This saves two memcpy() and
the usage of a chunk, it's always nice in a fast path.

When tested under extreme conditions with a 80-byte long table name, it
showed a 1% performance increase.
2022-11-28 18:58:06 +01:00
Willy Tarreau
63b5b33ba8 CLEANUP: stick-table: fill alignment holes in the stktable struct
There were two 32-bit holes in the stktable struct surrounding 32-bit
words, so let's just reorder them a little bit to address the issue.
2022-11-28 18:49:55 +01:00
William Lallemand
0a2d63236c BUG/MINOR: ssl: shut the ca-file errors emitted during httpclient init
With an OpenSSL library which use the wrong OPENSSLDIR, HAProxy tries to
load the OPENSSLDIR/certs/ into @system-ca, but emits a warning when it
can't.

This patch fixes the issue by allowing to shut the error when the SSL
configuration for the httpclient is not explicit.

Must be backported in 2.6.
2022-11-24 19:14:19 +01:00
Uriah Pollock
3cbf09ed64 MEDIUM: ssl: add minimal WolfSSL support with OpenSSL compatibility mode
This adds a USE_OPENSSL_WOLFSSL option, wolfSSL must be used with the
OpenSSL compatibility layer. This must be used with USE_OPENSSL=1.

WolfSSL build options:

   ./configure --prefix=/opt/wolfssl --enable-haproxy

HAProxy build options:

  USE_OPENSSL=1 USE_OPENSSL_WOLFSSL=1 WOLFSSL_INC=/opt/wolfssl/include/ WOLFSSL_LIB=/opt/wolfssl/lib/ ADDLIB='-Wl,-rpath=/opt/wolfssl/lib'

Using at least the commit 54466b6 ("Merge pull request #5810 from
Uriah-wolfSSL/haproxy-integration") from WolfSSL. (2022-11-23).

This is still to be improved, reg-tests are not supported yet, and more
tests are to be done.

Signed-off-by: William Lallemand <wlallemand@haproxy.org>
2022-11-24 11:29:03 +01:00
Uriah Pollock
79320cb074 BUILD: quic: use openssl-compat.h instead of openssl/ssl.h
Replace the include of openssl/ssl.h by openssl-compat.h.

Signed-off-by: William Lallemand <wlallemand@haproxy.org>
2022-11-24 11:29:03 +01:00
Willy Tarreau
946d370d22 BUILD: flags: really restrict the cases where flags are exposed
A number of internal flags started to be exposed to external programs
at the location of their definition since commit 77acaf5af ("MINOR:
flags: add a new file to host flag dumping macros"). This allowed the
"flags" utility to decode many more of them and always correctly. The
condition to expose them was to rely on the preliminary definition of
EOF that indicates that stdio is already included. But this was a
wrong approach. It only guarantees that snprintf() can safely be used
but still causes large functions to be built. But stdio is often
included before some of these includes, so these heavy inline functions
actually have to be compiled in many cases. The result is that the
build time significantly increased, especially with fast compilers
like gcc -O0 which took +50% or TCC which took +100%!

This patch addresses the problem by instead relying on an explicit
macro HA_EXPOSE_FLAGS that the calling program must explicitly define
before including these files. flags.c does this and that's all. The
previous build time is now restored with a speed up of 20 to 50%
depending on the build options.
2022-11-24 08:32:27 +01:00
Willy Tarreau
08093cc0fa CLEANUP: tools: do not needlessly include xxhash nor cli from tools.h
These includes brought by commit 9c76637ff ("MINOR: anon: add new macros
and functions to anonymize contents") resulted in an increase of exactly
20% of the number of lines to build. These include are not needed there,
only tools.c needs xxhash.h.
2022-11-24 08:30:48 +01:00
Willy Tarreau
4d46638540 BUILD: compiler: include compiler's definitions before ours
Building with TCC caused a warning on __attribute__() being redefined,
because we do define it on compilers that don't have it, but we didn't
include the compiler's definitions first to leave it a chance to expose
its definitions. The correct way to do this would be to include
sys/cdefs.h but we currently don't include it explicitly and a few
reports on the net mention some platforms where it could be missing
by default. Let's use inttypes.h instead, it always causes it (or its
equivalent) to be included and we know it's present on supported
platforms since we already depend on it.

No backport is needed.
2022-11-24 08:30:48 +01:00
Willy Tarreau
fc50b9dd14 BUG/MAJOR: sched: protect task during removal from wait queue
The issue addressed by commit fbb934da9 ("BUG/MEDIUM: stick-table: fix
a race condition when updating the expiration task") is still present
when thread groups are enabled, but this time it lies in the scheduler.

What happens is that a task configured to run anywhere might already
have been queued into one group's wait queue. When updating a stick
table entry, sometimes the task will have to be dequeued and requeued.

For this a lock is taken on the current thread group's wait queue lock,
but while this is necessary for the queuing, it's not sufficient for
dequeuing since another thread might be in the process of expiring this
task under its own group's lock which is different. This is easy to test
using 3 stick tables with 1ms expiration, 3 track-sc rules and 4 thread
groups. The process crashes almost instantly under heavy traffic.

One approach could consist in storing the group number the task was
queued under in its descriptor (we don't need 32 bits to store the
thread id, it's possible to use one short for the tid and another
one for the tgrp). Sadly, no safe way to do this was figured, because
the race remains at the moment the thread group number is checked, as
it might be in the process of being changed by another thread. It seems
that a working approach could consist in always having it associated
with one group, and only allowing to change it under this group's lock,
so that any code trying to change it would have to iterately read it
and lock its group until the value matches, confirming it really holds
the correct lock. But this seems a bit complicated, particularly with
wait_expired_tasks() which already uses upgradable locks to switch from
read state to a write state.

Given that the shared tasks are not that common (stick-table expirations,
rate-limited listeners, maybe resolvers), it doesn't seem worth the extra
complexity for now. This patch takes a simpler and safer approach
consisting in switching back to a single wq_lock, but still keeping
separate wait queues. Given that shared wait queues are almost always
empty and that otherwise they're scanned under a read lock, the
contention remains manageable and most of the time the lock doesn't
even need to be taken since such tasks are not present in a group's
queue. In essence, this patch reverts half of the aforementionned
patch. This was tested and confirmed to work fine, without observing
any performance degradation under any workload. The performance with
8 groups on an EPYC 74F3 and 3 tables remains twice the one of a
single group, with the contention remaining on the table's lock first.

No backport is needed.
2022-11-22 09:10:08 +01:00
Willy Tarreau
c21a187ec0 MINOR: server/idle: make the next_takeover index per-tgroup
In order to evenly pick idle connections from other threads, there is
a "next_takeover" index in the server, that is incremented each time
a connection is picked from another thread, and indicates which one to
start from next time.

With thread groups this doesn't work well because the index is the same
regardless of the group, and if a group has more threads than another,
there's even a risk to reintroduce an imbalance.

This patch introduces a new per-tgroup storage in servers which, for now,
only contains an instance of this next_takeover index. This way each
thread will now only manipulate the index specific to its own group, and
the takeover will become fair again. More entries may come soon.
2022-11-21 19:21:07 +01:00
Willy Tarreau
9dc231a6b2 BUG/MINOR: server/idle: at least use atomic stores when updating max_used_conns
In 2.2, some idle conns usage metrics were added by commit cf612a045
("MINOR: servers: Add a counter for the number of currently used
connections."), which mentioned that the operation doesn't need to be
atomic since we're not seeking exact values. This is true but at least
we should use atomic stores to make sure not to cause invalid values
to appear on archs that wouldn't guarantee atomicity when writing an
int, such as writing two 16-bit words. This is pretty unlikely on our
targets but better keep the code safe against this.

This may be backported as far as 2.2.
2022-11-21 19:21:07 +01:00
Willy Tarreau
2fba08faec MINOR: cli/pools: add sorting capabilities to "show pools"
The "show pools" command is used a lot for debugging but didn't get much
love over the years. This patch brings new capabilities:
  - sorting the output by pool names to ese their finding ("byname").
  - sorting the output by reverse item size to spot the biggest ones("bysize")
  - sorting the output by reverse number of allocated bytes ("byusage")

The last one (byusage) also omits displaying the ones with zero allocation.

In addition, an optional max number of output entries may be passed so as
to dump only the N most relevant ones.
2022-11-21 10:14:52 +01:00
Ilya Shipitsin
ace3da8dd4 CLEANUP: quic: replace "choosen" with "chosen" all over the code
Some variables were set as "choosen" instead of "chosen", this is dedicated
spelling fix
2022-11-21 09:22:28 +01:00
Frédéric Lécaille
74b5f7b31b BUG/MAJOR: quic: Crash after discarding packet number spaces
This previous patch was not sufficient to prevent haproxy from
crashing when some Handshake packets had to be inspected before being possibly
retransmitted:

     "BUG/MAJOR: quic: Crash upon retransmission of dgrams with several packets"

This patch introduced another issue: access to packets which have been
released because still attached to others (in the same datagram). This was
the case for instance when discarding the Initial packet number space before
inspecting an Handshake packet in the same datagram through its ->prev or
member in our case.

This patch implements quic_tx_packet_dgram_detach() which detaches a packet
from the adjacent ones in the same datagram to be called when ackwowledging
a packet (as done in the previous commit) and when releasing its memory. This
was, we are sure the released packets will not be accessed during retransmissions.

Thank you to @gabrieltz for having reported this issue in GH #1903.

Must be backported to 2.6.
2022-11-20 18:35:46 +01:00
Frdric Lcaille
814645f42f BUG/MAJOR: quic: Crash upon retransmission of dgrams with several packets
As revealed by some traces provided by @gabrieltz in GH #1903 issue,
there are clients (chrome I guess) which acknowledge only one packet among others
in the same datagram. This is the case for the first datagram sent by a QUIC haproxy
listener made an Initial packet followed by an Handshake one. In this identified
case, this is the Handshake packet only which is acknowledged. But if the
client is able to respond with an Handshake packet (ACK frame) this is because
it has successfully parsed the Initial packet. So, why not also acknowledging it?
AFAIK, this is mandatory. On our side, when restransmitting this datagram, the
Handshake packet was accessed from the Initial packet after having being released.

Anyway. There is an issue on our side. Obviously, we must not expect an
implementation to respect the RFC especially when it want to build an attack ;)

With this simple patch for each TX packet we send, we also set the previous one
in addition to the next one. When a packet is acknowledged, we detach the next one
and the next one in the same datagram from this packet, so that it cannot be
resent when resending these packets (the previous one, in our case).

Thank you to @gabrieltz for having reported this issue.

Must be backported to 2.6.
2022-11-19 04:56:55 +01:00