Commit Graph

6900 Commits

Author SHA1 Message Date
Amaury Denoyelle d6646dddcc MINOR: quic: finalize affinity change as soon as possible
During accept, a quic-conn is rebind to a new thread. This process is
done in two times :
* first on the original thread via qc_set_tid_affinity()
* then on the newly assigned thread via qc_finalize_affinity_rebind()

Most quic_conn operations (I/O tasklet, task and quic_conn FD socket
read) are reactivated ony after the second step. However, there is a
possibility that datagrams are handled before it via quic_dgram_parse()
when using listener sockets. This does not seem to cause any issue but
this may cause unexpected behavior in the future.

To simplify this, qc_finalize_affinity_rebind() will be called both by
qc_xprt_start() and quic_dgram_parse(). Only one invocation will be
performed thanks to the new flag QUIC_FL_CONN_AFFINITY_CHANGED.

This should be backported up to 2.7.
2023-04-26 17:50:16 +02:00
Amaury Denoyelle 24962dd178 BUG/MEDIUM: mux-quic: do not emit RESET_STREAM for unknown length
Some HTX responses may not always contain a EOM block. For example this
is the case if content-length header is missing from the HTTP server
response. Stream termination is thus signaled to QUIC mux via shutw
callback. However, this is interpreted inconditionnally as an early
close by the mux with a RESET_STREAM emission. Most of the times, QUIC
clients report this as an error.

To fix this, check if htx.extra is set to HTX_UNKOWN_PAYLOAD_LENGTH for
a qcs instance. If true, shutw will never be used to emit a
RESET_STREAM. Instead, the stream will be closed properly with a FIN
STREAM frame. If all data were already transfered, an empty STREAM frame
is sent.

This fix may help with the github issue #2004 where chrome browser stop
to use QUIC after receiving RESET_STREAM frames.

This issue was reported by Vladimir Zakharychev. Thanks to him for his
help and testing. It was also reproduced locally using httpterm with the
query string "/?s=1k&b=0&C=1".

This should be backported up to 2.7.
2023-04-26 17:50:09 +02:00
Willy Tarreau 543e2544ca DEBUG: crash using an invalid opcode on aarch64 instead of an invalid access
On aarch64 there's also a guaranted invalid instruction, called UDF, and
which even supports an optional 16-bit immediate operand:

   https://developer.arm.com/documentation/ddi0596/2021-12/Base-Instructions/UDF--Permanently-Undefined-?lang=en

It's conveniently encoded as 4 zeroes (when the operand is zero). It's
unclear when support for it was added into GAS, if at all; even a
not-so-old 2.27 doesn't know about it. Let's byte-encode it.

Tested on an A72 and works as expected.
2023-04-25 19:53:39 +02:00
Willy Tarreau 77787ec9bc DEBUG: crash using an invalid opcode on x86/x86_64 instead of an invalid access
BUG_ON() calls currently trigger a segfault. This is more convenient
than abort() as it doesn't rely on any function call nor signal handler
and never causes non-unwindable stacks when opening cores. But it adds
quite some confusion in bug reports which are rightfully tagged "segv"
and do not instantly allow to distinguish real segv (e.g. null derefs)
from code asserts.

Some CPU architectures offer various crashing methods. On x86 we have
INT3 (0xCC), which stops into the debugger, and UD0/UD1/UD2. INT3 looks
appealing but for whatever reason (maybe signal handling somewhere) it
loses the last call point in the stack, making backtraces unusable. UD2
has the merit of being only 2 bytes and causing an invalid instruction,
which almost never happens normally, so it's easily distinguishable.
Here it was defined as a macro so that the line number in the core
matches the one where the BUG_ON() macro is called, and the debugger
shows the last frame exactly at its calligg point.

E.g. when calling "debug dev bug":

Program terminated with signal SIGILL, Illegal instruction.
  #0  debug_parse_cli_bug (args=<optimized out>, payload=<optimized out>, appctx=<optimized out>, private=<optimized out>) at src/debug.c:408
  408             BUG_ON(one > zero);
  [Current thread is 1 (Thread 0x7f7a660cc1c0 (LWP 14238))]
  (gdb) bt
  #0  debug_parse_cli_bug (args=<optimized out>, payload=<optimized out>, appctx=<optimized out>, private=<optimized out>) at src/debug.c:408
  #1  debug_parse_cli_bug (args=<optimized out>, payload=<optimized out>, appctx=<optimized out>, private=<optimized out>) at src/debug.c:402
  #2  0x000000000061a69f in cli_parse_request (appctx=appctx@entry=0x181c0160) at src/cli.c:832
  #3  0x000000000061af86 in cli_io_handler (appctx=0x181c0160) at src/cli.c:1035
  #4  0x00000000006ca2f2 in task_run_applet (t=0x181c0290, context=0x181c0160, state=<optimized out>) at src/applet.c:449
2023-04-25 18:51:10 +02:00
Amaury Denoyelle d5f03cd576 CLEANUP: quic: rename frame variables
Rename all frame variables with the suffix _frm. This helps to
differentiate frame instances from other internal objects.

This should be backported up to 2.7.
2023-04-24 15:35:22 +02:00
Amaury Denoyelle 888c5f283a CLEANUP: quic: rename frame types with an explicit prefix
Each frame type used in quic_frame union has been renamed with the
following prefix "qf_". This helps to differentiate frame instances from
other internal objects.

This should be backported up to 2.7.
2023-04-24 15:35:03 +02:00
Willy Tarreau 7310164b2c MINOR: listener: add a new global tune.listener.default-shards setting
This new setting accepts "by-process", "by-group" and "by-thread" and
will dictate how listeners will be sharded by default when nothing is
specified. While the default remains "by-process", "by-group" should be
much more efficient with many threads, while not changing anything for
single-group setups.
2023-04-23 09:46:15 +02:00
Willy Tarreau f1003ea7fa MINOR: protocol: perform a live check for SO_REUSEPORT support
When testing if a protocol supports SO_REUSEPORT, we're now able to
verify if the OS does really support it. While it may be supported at
build time, it may possibly have been blocked in a container for
example so we'd rather know what it's like.
2023-04-23 09:46:15 +02:00
Willy Tarreau b073573c10 MINOR: sock: add a function to check for SO_REUSEPORT support at runtime
The new function _sock_supports_reuseport() will be used to check if a
protocol type supports SO_REUSEPORT or not. This will be useful to verify
that shards can really work.
2023-04-23 09:46:15 +02:00
Willy Tarreau 8a5e6f4cca MINOR: protocol: add a function to check if some features are supported
The new function protocol_supports_flag() checks the protocol flags
to verify if some features are supported, but will support being
extended to refine the tests. Let's use it to check for REUSEPORT.
2023-04-23 09:46:15 +02:00
Willy Tarreau 785b89f551 MINOR: protocol: move the global reuseport flag to the protocols
Some protocol support SO_REUSEPORT and others not. Some have such a
limitation in the kernel, and others in haproxy itself (e.g. sock_unix
cannot support multiple bindings since each one will unbind the previous
one). Also it's really protocol-dependent and not just family-dependent
because on Linux for some time it was supported for TCP and not UDP.

Let's move the definition to the protocols instead. Now it's preset in
tcp/udp/quic when SO_REUSEPORT is defined, and is otherwise left unset.
The enabled() config condition test validates IPv4 (generally sufficient),
and -dR / noreuseport all protocols at once.
2023-04-23 09:46:15 +02:00
Willy Tarreau 65df7e028d MINOR: protocol: add a flags field to store info about protocols
We'll use these flags to know if some protocols are supported, and if
so, with what options/extensions. Reuseport will move there for example.
Two functions were added to globally set/clear a flag.
2023-04-23 09:46:15 +02:00
Willy Tarreau da0d2cb698 MINOR: proxy: make proxy_type_str() recognize peers sections
Now proxy_type_str() will emit "peers section" when the mode is set to
peers, so as to ease sharing more code between peers and proxies.
2023-04-23 09:46:15 +02:00
Willy Tarreau f6a8444f55 REORG: listener: move the bind_conf's thread setup code to listener.c
What used to be only two lines to apply a mask in a loop in
check_config_validity() grew into a 130-line block that performs deeply
listener-specific operations that do not have their place there anymore.
In addition it's worth noting that the peers code still doesn't support
shards nor being bound to more than one group, which is a second reason
for moving that code to its own function. Nothing was changed except
recreating the missing variables from the bind_conf itself (the fe only).
2023-04-23 09:46:15 +02:00
Willy Tarreau 4c538df28c CLEANUP: protocol: move the nb_receivers to plug a hole in protocol
This field forces an unaligned hole between two list heads. Let's move
it up where it will be more easily combined with other fields. In
addition, turn it to unsigned while it's still not used.
2023-04-23 09:46:15 +02:00
Willy Tarreau 798d6b4124 CLEANUP: protocol: move the l3_addrlen to plug a hole in proto_fam
There's a two-byte hole in proto_fam after sock_family, let's move the
l3_addrlen there as a ushort. Note that contrary to what the comment
says, it's still not used by hash algorithms though it could.
2023-04-23 09:46:15 +02:00
Willy Tarreau df4051cd58 BUILD: proto_tcp: export the correct names for proto_tcpv[46]
The exported names were not correct (missing the 'v').
2023-04-23 09:46:15 +02:00
Willy Tarreau 968a4f34fc BUILD: sock_inet: forward-declare struct receiver
Including sock_inet.h without receiver-t.h causes build failures due to
struct receiver not being defined. Let's just forward-declare it.
2023-04-23 09:46:15 +02:00
Ilya Shipitsin ccf8012f28 CLEANUP: assorted typo fixes in the code and comments
This is 36th iteration of typo fixes
2023-04-23 09:44:53 +02:00
Tim Duesterhus 3a8c63d48d MINOR: Make `tasklet_free()` safe to be called with `NULL`
Make this freeing function safe, like other freeing functions are as discussed
in GitHub issue #2126.
2023-04-23 00:28:25 +02:00
Willy Tarreau ff18504d73 MINOR: listener: make sure to avoid ABA updates in per-thread index
One limitation of the current thread index mechanism is that if the
values are assigned multiple times to the same thread and the index
loops, it can match again the old value, which will not prevent a
competing thread from finishing its CAS and assigning traffic to a
thread that's not the optimal one. The probability is low but the
solution is simple enough and consists in implementing an update
counter in the high bits of the index to force a mismatch in this
case (assuming we don't try to cover for extremely unlikely cases
where the update counter loops while the index remains equal). So
let's do that. In order to improve the situation a little bit, we
now set the index to a ulong so that in 32 bits we have 8 bits of
counter and in 64 bits we have 40 bits.
2023-04-21 17:41:26 +02:00
Willy Tarreau e6f5ab5afa MINOR: listener: make accept_queue index atomic
There has always been a race when checking the length of an accept queue
to determine which one is more loaded that another, because the head and
tail are read at two different moments. This is not required, we can merge
them as two 16 bit numbers inside a single 32-bit index that is always
accessed atomically. This way we read both values at once and always have
a consistent measurement.
2023-04-21 17:41:26 +02:00
Willy Tarreau e4c36aa8a1 MINOR: receiver: add RX_F_MUST_DUP to indicate that an rx must be duped
The purpose of this new flag will be to mark that some listeners
duplicate their reference's FD instead of trying to setup a completely
new listener from scratch. This will be used when multiple groups want
to listen to the same socket, via multiple FDs.
2023-04-21 17:41:26 +02:00
Willy Tarreau aae1810b4d MINOR: receiver: add a struct shard_info to store info about each shard
In order to create multiple receivers for one multi-group shard, we'll
need some more info about the shard. Here we store:
  - the number of groups (= number of receivers)
  - the number of threads (will be used for accept LB)
  - pointer to the reference rx (to get the FD and to find all threads)
  - pointers to the other members (to iterate over all threads)

For now since there's only one group per shard it remains simple. The
listener deletion code already takes care of removing the current
member from its shards list and moving others' reference to the last
one if it was their reference (so as to avoid o(n^2) updates during
ordered deletes).

Since the vast majority of setups will not use multi-group shards, we
try to save memory usage by only allocating the shard_info when it is
needed, so the principle here is that a receiver shard_info==NULL is
alone and doesn't share its socket with another group.

Various approaches were considered and tests show that the management
of the listeners during boot makes it easier to just attach to or
detach from a shard_info and automatically allocate it if it does not
exist, which is what is being done here.

For now the attach code is not called, but detach is already called
on delete.
2023-04-21 17:41:26 +02:00
Willy Tarreau 84fe1f479b MINOR: listener: support another thread dispatch mode: "fair"
This new algorithm for rebalancing incoming connections to multiple
threads is simpler and instead of considering the threads load, it will
only cycle through all of them, offering a fair share of the traffic to
each thread. It may be well suited for short-lived connections but is
also convenient for very large thread counts where it's not always certain
that the least loaded thread will always be found.
2023-04-21 17:41:26 +02:00
Willy Tarreau 6a4d48b736 MINOR: quic_sock: index li->per_thr[] on local thread id, not global one
There's a li_per_thread array in each listener for use with QUIC
listeners. Since thread groups were introduced, this array can be
allocated too large because global.nbthread is allocated for each
listener, while only no more than MIN(nbthread,MAX_THREADS_PER_GROUP)
may be used by a single listener. This was because the global thread
ID is used as the index instead of the local ID (since a listener may
only be used by a single group). Let's just switch to local ID and
reduce the allocated size.
2023-04-21 17:41:26 +02:00
Willy Tarreau 77d37b07b1 MINOR: quic: support migrating the listener as well
When migrating a quic_conn to another thread, we may need to also
switch the listener if the thread belongs to another group. When
this happens, the freshly created connection will already have the
target listener, so let's just pick it from the connection and use
it in qc_set_tid_affinity(). Note that it will be the caller's
responsibility to guarantee this.
2023-04-21 17:41:26 +02:00
Aurelien DARRAGON 76e255520f MINOR: server: pass adm and op cause to srv_update_status()
Operational and administrative state change causes are not propagated
through srv_update_status(), instead they are directly consumed within
the function to provide additional info during the call when required.

Thus, there is no valid reason for keeping adm and op causes within
server struct. We are wasting space and keeping uneeded complexity.

We now exlicitly pass change type (operational or administrative) and
associated cause to srv_update_status() so that no extra storage is
needed since those values are only relevant from srv_update_status().
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON 1746b56e68 MINOR: server: change srv_op_st_chg_cause storage type
This one is greatly inspired by "MINOR: server: change adm_st_chg_cause storage type".

While looking at current srv_op_st_chg_cause usage, it was clear that
the struct needed some cleanup since some leftovers from asynchronous server
state change updates were left behind and resulted in some useless code
duplication, and making the whole thing harder to maintain.

Two observations were made:

- by tracking down srv_set_{running, stopped, stopping} usage,
  we can see that the <reason> argument is always a fixed statically
  allocated string.
- check-related state change context (duration, status, code...) is
  not used anymore since srv_append_status() directly extracts the
  values from the server->check. This is pure legacy from when
  the state changes were applied asynchronously.

To prevent code duplication, useless string copies and make the reason/cause
more exportable, we store it as an enum now, and we provide
srv_op_st_chg_cause() function to fetch the related description string.
HEALTH and AGENT causes (check related) are now explicitly identified to
make consumers like srv_append_op_chg_cause() able to fetch checks info
from the server itself if they need to.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON f3b48a808e MINOR: server: srv_append_status refacto
srv_append_status() has become a swiss-knife function over time.
It is used from server code and also from checks code, with various
inputs and distincts code paths, making it very hard to guess the
actual behavior of the function (resulting string output).

To simplify the logic behind it, we're dividing it in multiple contextual
functions that take simple inputs and do explicit things, making them
more predictable and easier to maintain.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON 9b1ccd7325 MINOR: server: change adm_st_chg_cause storage type
Even though it doesn't look like it at first glance, this is more like
a cleanup than an actual code improvement:

Given that srv->adm_st_chg_cause has been used to exclusively store
static strings ever since it was implemented, we make the choice to
store it as an enum instead of a fixed-size string within server
struct.

This will allow to save some space in server struct, and will make
it more easily exportable (ie: event handlers) because of the
reduced memory footprint during handling and the ability to later get
the corresponding human-readable message when it's explicitly needed.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON e9314fb7a7 MINOR: event_hdl: provide event->when for advanced handlers
For advanced async handlers only
(Registered using EVENT_HDL_ASYNC_TASK() macro):

event->when is provided as a struct timeval and fetched from 'date'
haproxy global variable.

Thanks to 'when', related event consumers will be able to timestamp
events, even if they don't work in real-time or near real-time.
Indeed, unlike sync or normal async handlers, advanced async handlers
could purposely delay the consumption of pending events, which means
that the date wouldn't be accurate if computed directly from within
the handler.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON ebf58e991a MINOR: event_hdl: dynamically allocated event data members
Add the ability to provide a cleanup function for event data passed
via the publishing function.

One use case could be the need to provide valid pointers in the safe
section of the data struct.
Cleanup function will be automatically called with data (or copy of data)
as argument when all handlers consumed the event, which provides an easy
way to release some memory or decrement refcounts to ressources that were
provided through the data struct.
data in itself may not be freed by the cleanup function, it is handled
by the API.

This would allow passing large (allocated) data blocks through the data
struct while keeping data struct size under the EVENT_HDL_ASYNC_EVENT_DATA
size limit.

To do so, when publishing an event, where we would currently do:

        struct event_hdl_cb_data_new_family event_data;

        /* safe data, available from both sync and async contexts
	 * may not use pointers to short-living resources
	 */
        event_data.safe.my_custom_data = x;

        /* unsafe data, only available from sync contexts */
        event_data.unsafe.my_unsafe_data = y;

        /* once data is prepared, we can publish the event */
        event_hdl_publish(NULL,
                          EVENT_HDL_SUB_NEW_FAMILY_SUBTYPE_1,
                          EVENT_HDL_CB_DATA(&event_data));

We could do:

        struct event_hdl_cb_data_new_family event_data;

        /* safe data, available from both sync and async contexts
	 * may not use pointers to short-living resources,
	 * unless EVENT_HDL_CB_DATA_DM is used to ensure pointer
	 * consistency (ie: refcount)
	 */
        event_data.safe.my_custom_static_data = x;
	event_data.safe.my_custom_dynamic_data = malloc(1);

        /* unsafe data, only available from sync contexts */
        event_data.unsafe.my_unsafe_data = y;

        /* once data is prepared, we can publish the event */
        event_hdl_publish(NULL,
                          EVENT_HDL_SUB_NEW_FAMILY_SUBTYPE_1,
                          EVENT_HDL_CB_DATA_DM(&event_data, data_new_family_cleanup));

With data_new_family_cleanup func which would look like this:

      void data_new_family_cleanup(const void *data)
      {
      	const struct event_hdl_cb_data_new_family *event_data = ptr;

	/* some data members require specific cleanup once the event
	 * is consumed
	 */
      	free(event_data.safe.my_custom_dynamic_data);
	/* don't ever free data! it is not ours */
      }

Not sure if this feature will become relevant in the future, so I prefer not
to mention it in the doc for now.

But given that the implementation is trivial and does not put a burden
on the existing API, it's a good thing to have it there, just in case.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON 147691fd83 CLEANUP: event_hdl: fix comment typo about _sync assertion
Fixing a comment relative to EVENT_HDL_ASSERT_SYNC macro where a
typo was made and the comment was lacking some context.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON 363ef4daa7 CLEANUP: event_hdl: updating obsolete comment for EVENT_HDL_CB_DATA
EVENT_HDL_CB_DATA macro comments were not updated during the API
refactor, fixing that.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON 8273bfc639 BUG/MINOR: event_hdl: don't waste 1 event subtype slot
ESUB_INDEX(n) index macro is used exclusively with n > 0
Fixing it so that it starts numbering at 1 instead of 2.

This way, we don't waste a subtype slot in event_hdl_sub_type
struct, and we comply with the structure comments about max
supported event subtypes (currently set at 16).

If 68e692da0 ("MINOR: event_hdl: add event handler base api")
is being backported, then this commit should be backported with it.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON a63f4903c9 MINOR: server/event_hdl: prepare for upcoming refactors
This commit does nothing that ought to be mentioned, except that
it adds missing comments and slighty moves some function calls
out of "sensitive" code in preparation of some server code refactors.
2023-04-21 14:36:45 +02:00
Aurelien DARRAGON d714213862 MINOR: server/event_hdl: add proxy_uuid to event_hdl_cb_data_server
Expose proxy_uuid variable in event_hdl_cb_data_server struct to
overcome proxy_name fixed length limitation.

proxy_uuid may be used by the handler to perform proxy lookups.
This should be preferred over lookups relying proxy_name.
(proxy_name is suitable for printing / logging purposes but not for
ID lookups since it has a maximum fixed length)
2023-04-21 14:36:45 +02:00
Frédéric Lécaille 0ed94032b2 MINOR: quic: Do not allocate too much ack ranges
Limit the maximum number of ack ranges to QUIC_MAX_ACK_RANGES(32).

Must be backported to 2.6 and 2.7.
2023-04-19 11:36:54 +02:00
Frédéric Lécaille 4b2627beae BUG/MINOR: quic: Stop removing ACK ranges when building packets
Since this commit:

    BUG/MINOR: quic: Possible wrapped values used as ACK tree purging limit.

There are more chances that ack ranges may be removed from their trees when
building a packet. It is preferable to impose a limit to these trees. This
will be the subject of the a next commit to come.

For now on, it is sufficient to stop deleting ack range from their trees.
Remove quic_ack_frm_reduce_sz() and quic_rm_last_ack_ranges() which were
there to do that.
Make qc_frm_len() support ACK frames and calls it to ensure an ACK frame
may be added to a packet before building it.

Must be backported to 2.6 and 2.7.
2023-04-19 11:36:54 +02:00
Aurelien DARRAGON 2a9764baae CLEANUP: hlua: avoid confusion between internal timers and tick based timers
Not all hlua "time" variables use the same time logic.

hlua->wake_time relies on ticks since its meant to be used in conjunction
with task scheduling. Thus, it should be stored as a signed int and
manipulated using the tick api.
Adding a few comments about that to prevent mixups with hlua internal
timer api which doesn't rely on the ticks api.
2023-04-19 11:03:31 +02:00
Aurelien DARRAGON da9503ca9a MEDIUM: hlua: reliable timeout detection
For non yieldable lua handlers (converters, fetches or yield
incompatible lua functions), current timeout detection relies on now_ms
thread local variable.

But within non-yieldable contexts, now_ms won't be updated if not by us
(because we're momentarily stuck in lua context so we won't
re-enter the polling loop, which is responsible for clock updates).

To circumvent this, clock_update_date(0, 1) was manually performed right
before now_ms is being read for the timeout checks.

But this fails to work consistently, because if no other concurrent
threads periodically run clock_update_global_date(), which do happen if
we're the only active thread (nbthread=1 or low traffic), our
clock_update_date() call won't reliably update our local now_ms variable

Moreover, clock_update_date() is not the right tool for this anyway, as
it was initially meant to be used from the polling context.
Using it could have negative impact on other threads relying on now_ms
to be stable. (because clock_update_date() performs global clock update
from time to time)

-> Introducing hlua multipurpose timer, which is internally based on
now_cpu_time_fast() that provides per-thread consistent clock readings.

Thanks to this new hlua timer API, hlua timeout logic is less error-prone
and more robust.

This allows the timeout detection to work as expected for both yieldable
and non-yieldable lua handlers.

This patch depends on commit "MINOR: clock: add now_cpu_time_fast() function"

While this could theorically be backported to all stable versions,
it is advisable to avoid backports unless we're confident enough
since it could cause slight behavior changes (timing related) in
existing setups.
2023-04-19 11:03:31 +02:00
Aurelien DARRAGON df188f145b MINOR: clock: add now_cpu_time_fast() function
Same as now_cpu_time(), but for fast queries (less accurate)
Relies on now_cpu_time() and now_mono_time_fast() is used
as a cache expiration hint to prevent now_cpu_time() from being
called too often since it is known to be quite expensive.

Depends on commit "MINOR: clock: add now_mono_time_fast() function"
2023-04-19 11:03:31 +02:00
Aurelien DARRAGON 07cbd8e074 MINOR: clock: add now_mono_time_fast() function
Same as now_mono_time(), but for fast queries (less accurate)
Relies on coarse clock source (also known as fast clock source on
some systems).

Fallback to now_mono_time() if coarse source is not supported on the system.
2023-04-19 11:03:31 +02:00
Amaury Denoyelle 0783a7b08e MINOR: listener: remove unneeded local accept flag
Remove the receiver RX_F_LOCAL_ACCEPT flag. This was used by QUIC
protocol before thread rebinding was supported by the quic_conn layer.

This should be backported up to 2.7 after the previous patch has also
been taken.
2023-04-18 17:09:34 +02:00
Amaury Denoyelle 739de3f119 MINOR: quic: properly finalize thread rebinding
When a quic_conn instance is rebinded on a new thread its tasks and
tasklet are destroyed and new ones created. Its socket is also migrated
to a new thread which stop reception on it.

To properly reactivate a quic_conn after rebind, wake up its tasks and
tasklet if they were active before thread rebind. Also reactivate
reading on the socket FD. These operations are implemented on a new
function qc_finalize_affinity_rebind().

This should be backported up to 2.7 after a period of observation.
2023-04-18 17:09:02 +02:00
Amaury Denoyelle 25174d51ef MEDIUM: quic: implement thread affinity rebinding
Implement a new function qc_set_tid_affinity(). This function is
responsible to rebind a quic_conn instance to a new thread.

This operation consists mostly of releasing existing tasks and tasklet
and allocating new instances on the new thread. If the quic_conn uses
its owned socket, it is also migrated to the new thread. The migration
is finally completed with updated the CID TID to the new thread. After
this step, the connection is thus accessible to the new thread and
cannot be access anymore on the old one without risking race condition.

To ensure rebinding is either done completely or not at all, tasks and
tasklet are pre-allocated before all operations. If this fails, an error
is returned and rebiding is not done.

To destroy the older tasklet, its context is set to NULL before wake up.
In I/O callbacks, a new function qc_process() is used to check context
and free the tasklet if NULL.

The thread rebinding can cause a race condition if the older thread
quic_dghdlrs::dgrams list contains datagram for the connection after
rebinding is done. To prevent this, quic_rx_pkt_retrieve_conn() always
check if the packet CID is still associated to the current thread or
not. In the latter case, no connection is returned and the new thread is
returned to allow to redispatch the datagram to the new thread in a
thread-safe way.

This should be backported up to 2.7 after a period of observation.
2023-04-18 17:08:34 +02:00
Amaury Denoyelle 1304d19dee MINOR: quic: delay post handshake frames after accept
When QUIC handshake is completed on our side, some frames are prepared
to be sent :
* HANDSHAKE_DONE
* several NEW_CONNECTION_ID with CIDs allocated

This step was previously executed in quic_conn_io_cb() directly after
CRYPTO frames parsing. This patch delays it to be completed after
accept. Special care have been taken to ensure it is still functional
with 0-RTT activated.

For the moment, this patch should have no impact. However, when
quic_conn thread migration on accept will be implemented, it will be
easier to remap only one CID to the new thread. New CIDs will be
allocated after migration on the new thread.

This should be backported up to 2.7 after a period of observation.
2023-04-18 17:08:28 +02:00
Amaury Denoyelle a66e04338e MINOR: protocol: define new callback set_affinity
Define a new protocol callback set_affinity. This function is used
during listener_accept() to notify about a rebind on a new thread just
before pushing the connection on the selected thread queue. If the
callback fails, accept is done locally.

This change will be useful for protocols with state allocated before
accept is done. For the moment, only QUIC protocol is concerned. This
will allow to rebind the quic_conn to a new thread depending on its
load.

This should be backported up to 2.7 after a period of observation.
2023-04-18 16:54:52 +02:00
Amaury Denoyelle 1e959ad522 MINOR: quic: remove TID encoding in CID
CIDs were moved from a per-thread list to a global list instance. The
TID-encoded is thus non needed anymore.

This should be backported up to 2.7 after a period of observation.
2023-04-18 16:54:31 +02:00