Operational and administrative state change causes are not propagated
through srv_update_status(), instead they are directly consumed within
the function to provide additional info during the call when required.
Thus, there is no valid reason for keeping adm and op causes within
server struct. We are wasting space and keeping uneeded complexity.
We now exlicitly pass change type (operational or administrative) and
associated cause to srv_update_status() so that no extra storage is
needed since those values are only relevant from srv_update_status().
This one is greatly inspired by "MINOR: server: change adm_st_chg_cause storage type".
While looking at current srv_op_st_chg_cause usage, it was clear that
the struct needed some cleanup since some leftovers from asynchronous server
state change updates were left behind and resulted in some useless code
duplication, and making the whole thing harder to maintain.
Two observations were made:
- by tracking down srv_set_{running, stopped, stopping} usage,
we can see that the <reason> argument is always a fixed statically
allocated string.
- check-related state change context (duration, status, code...) is
not used anymore since srv_append_status() directly extracts the
values from the server->check. This is pure legacy from when
the state changes were applied asynchronously.
To prevent code duplication, useless string copies and make the reason/cause
more exportable, we store it as an enum now, and we provide
srv_op_st_chg_cause() function to fetch the related description string.
HEALTH and AGENT causes (check related) are now explicitly identified to
make consumers like srv_append_op_chg_cause() able to fetch checks info
from the server itself if they need to.
srv_append_status() has become a swiss-knife function over time.
It is used from server code and also from checks code, with various
inputs and distincts code paths, making it very hard to guess the
actual behavior of the function (resulting string output).
To simplify the logic behind it, we're dividing it in multiple contextual
functions that take simple inputs and do explicit things, making them
more predictable and easier to maintain.
Even though it doesn't look like it at first glance, this is more like
a cleanup than an actual code improvement:
Given that srv->adm_st_chg_cause has been used to exclusively store
static strings ever since it was implemented, we make the choice to
store it as an enum instead of a fixed-size string within server
struct.
This will allow to save some space in server struct, and will make
it more easily exportable (ie: event handlers) because of the
reduced memory footprint during handling and the ability to later get
the corresponding human-readable message when it's explicitly needed.
For advanced async handlers only
(Registered using EVENT_HDL_ASYNC_TASK() macro):
event->when is provided as a struct timeval and fetched from 'date'
haproxy global variable.
Thanks to 'when', related event consumers will be able to timestamp
events, even if they don't work in real-time or near real-time.
Indeed, unlike sync or normal async handlers, advanced async handlers
could purposely delay the consumption of pending events, which means
that the date wouldn't be accurate if computed directly from within
the handler.
Add the ability to provide a cleanup function for event data passed
via the publishing function.
One use case could be the need to provide valid pointers in the safe
section of the data struct.
Cleanup function will be automatically called with data (or copy of data)
as argument when all handlers consumed the event, which provides an easy
way to release some memory or decrement refcounts to ressources that were
provided through the data struct.
data in itself may not be freed by the cleanup function, it is handled
by the API.
This would allow passing large (allocated) data blocks through the data
struct while keeping data struct size under the EVENT_HDL_ASYNC_EVENT_DATA
size limit.
To do so, when publishing an event, where we would currently do:
struct event_hdl_cb_data_new_family event_data;
/* safe data, available from both sync and async contexts
* may not use pointers to short-living resources
*/
event_data.safe.my_custom_data = x;
/* unsafe data, only available from sync contexts */
event_data.unsafe.my_unsafe_data = y;
/* once data is prepared, we can publish the event */
event_hdl_publish(NULL,
EVENT_HDL_SUB_NEW_FAMILY_SUBTYPE_1,
EVENT_HDL_CB_DATA(&event_data));
We could do:
struct event_hdl_cb_data_new_family event_data;
/* safe data, available from both sync and async contexts
* may not use pointers to short-living resources,
* unless EVENT_HDL_CB_DATA_DM is used to ensure pointer
* consistency (ie: refcount)
*/
event_data.safe.my_custom_static_data = x;
event_data.safe.my_custom_dynamic_data = malloc(1);
/* unsafe data, only available from sync contexts */
event_data.unsafe.my_unsafe_data = y;
/* once data is prepared, we can publish the event */
event_hdl_publish(NULL,
EVENT_HDL_SUB_NEW_FAMILY_SUBTYPE_1,
EVENT_HDL_CB_DATA_DM(&event_data, data_new_family_cleanup));
With data_new_family_cleanup func which would look like this:
void data_new_family_cleanup(const void *data)
{
const struct event_hdl_cb_data_new_family *event_data = ptr;
/* some data members require specific cleanup once the event
* is consumed
*/
free(event_data.safe.my_custom_dynamic_data);
/* don't ever free data! it is not ours */
}
Not sure if this feature will become relevant in the future, so I prefer not
to mention it in the doc for now.
But given that the implementation is trivial and does not put a burden
on the existing API, it's a good thing to have it there, just in case.
ESUB_INDEX(n) index macro is used exclusively with n > 0
Fixing it so that it starts numbering at 1 instead of 2.
This way, we don't waste a subtype slot in event_hdl_sub_type
struct, and we comply with the structure comments about max
supported event subtypes (currently set at 16).
If 68e692da0 ("MINOR: event_hdl: add event handler base api")
is being backported, then this commit should be backported with it.
This commit does nothing that ought to be mentioned, except that
it adds missing comments and slighty moves some function calls
out of "sensitive" code in preparation of some server code refactors.
Expose proxy_uuid variable in event_hdl_cb_data_server struct to
overcome proxy_name fixed length limitation.
proxy_uuid may be used by the handler to perform proxy lookups.
This should be preferred over lookups relying proxy_name.
(proxy_name is suitable for printing / logging purposes but not for
ID lookups since it has a maximum fixed length)
Since this commit:
BUG/MINOR: quic: Possible wrapped values used as ACK tree purging limit.
There are more chances that ack ranges may be removed from their trees when
building a packet. It is preferable to impose a limit to these trees. This
will be the subject of the a next commit to come.
For now on, it is sufficient to stop deleting ack range from their trees.
Remove quic_ack_frm_reduce_sz() and quic_rm_last_ack_ranges() which were
there to do that.
Make qc_frm_len() support ACK frames and calls it to ensure an ACK frame
may be added to a packet before building it.
Must be backported to 2.6 and 2.7.
Not all hlua "time" variables use the same time logic.
hlua->wake_time relies on ticks since its meant to be used in conjunction
with task scheduling. Thus, it should be stored as a signed int and
manipulated using the tick api.
Adding a few comments about that to prevent mixups with hlua internal
timer api which doesn't rely on the ticks api.
For non yieldable lua handlers (converters, fetches or yield
incompatible lua functions), current timeout detection relies on now_ms
thread local variable.
But within non-yieldable contexts, now_ms won't be updated if not by us
(because we're momentarily stuck in lua context so we won't
re-enter the polling loop, which is responsible for clock updates).
To circumvent this, clock_update_date(0, 1) was manually performed right
before now_ms is being read for the timeout checks.
But this fails to work consistently, because if no other concurrent
threads periodically run clock_update_global_date(), which do happen if
we're the only active thread (nbthread=1 or low traffic), our
clock_update_date() call won't reliably update our local now_ms variable
Moreover, clock_update_date() is not the right tool for this anyway, as
it was initially meant to be used from the polling context.
Using it could have negative impact on other threads relying on now_ms
to be stable. (because clock_update_date() performs global clock update
from time to time)
-> Introducing hlua multipurpose timer, which is internally based on
now_cpu_time_fast() that provides per-thread consistent clock readings.
Thanks to this new hlua timer API, hlua timeout logic is less error-prone
and more robust.
This allows the timeout detection to work as expected for both yieldable
and non-yieldable lua handlers.
This patch depends on commit "MINOR: clock: add now_cpu_time_fast() function"
While this could theorically be backported to all stable versions,
it is advisable to avoid backports unless we're confident enough
since it could cause slight behavior changes (timing related) in
existing setups.
Same as now_cpu_time(), but for fast queries (less accurate)
Relies on now_cpu_time() and now_mono_time_fast() is used
as a cache expiration hint to prevent now_cpu_time() from being
called too often since it is known to be quite expensive.
Depends on commit "MINOR: clock: add now_mono_time_fast() function"
Same as now_mono_time(), but for fast queries (less accurate)
Relies on coarse clock source (also known as fast clock source on
some systems).
Fallback to now_mono_time() if coarse source is not supported on the system.
Remove the receiver RX_F_LOCAL_ACCEPT flag. This was used by QUIC
protocol before thread rebinding was supported by the quic_conn layer.
This should be backported up to 2.7 after the previous patch has also
been taken.
When a quic_conn instance is rebinded on a new thread its tasks and
tasklet are destroyed and new ones created. Its socket is also migrated
to a new thread which stop reception on it.
To properly reactivate a quic_conn after rebind, wake up its tasks and
tasklet if they were active before thread rebind. Also reactivate
reading on the socket FD. These operations are implemented on a new
function qc_finalize_affinity_rebind().
This should be backported up to 2.7 after a period of observation.
Implement a new function qc_set_tid_affinity(). This function is
responsible to rebind a quic_conn instance to a new thread.
This operation consists mostly of releasing existing tasks and tasklet
and allocating new instances on the new thread. If the quic_conn uses
its owned socket, it is also migrated to the new thread. The migration
is finally completed with updated the CID TID to the new thread. After
this step, the connection is thus accessible to the new thread and
cannot be access anymore on the old one without risking race condition.
To ensure rebinding is either done completely or not at all, tasks and
tasklet are pre-allocated before all operations. If this fails, an error
is returned and rebiding is not done.
To destroy the older tasklet, its context is set to NULL before wake up.
In I/O callbacks, a new function qc_process() is used to check context
and free the tasklet if NULL.
The thread rebinding can cause a race condition if the older thread
quic_dghdlrs::dgrams list contains datagram for the connection after
rebinding is done. To prevent this, quic_rx_pkt_retrieve_conn() always
check if the packet CID is still associated to the current thread or
not. In the latter case, no connection is returned and the new thread is
returned to allow to redispatch the datagram to the new thread in a
thread-safe way.
This should be backported up to 2.7 after a period of observation.
When QUIC handshake is completed on our side, some frames are prepared
to be sent :
* HANDSHAKE_DONE
* several NEW_CONNECTION_ID with CIDs allocated
This step was previously executed in quic_conn_io_cb() directly after
CRYPTO frames parsing. This patch delays it to be completed after
accept. Special care have been taken to ensure it is still functional
with 0-RTT activated.
For the moment, this patch should have no impact. However, when
quic_conn thread migration on accept will be implemented, it will be
easier to remap only one CID to the new thread. New CIDs will be
allocated after migration on the new thread.
This should be backported up to 2.7 after a period of observation.
Define a new protocol callback set_affinity. This function is used
during listener_accept() to notify about a rebind on a new thread just
before pushing the connection on the selected thread queue. If the
callback fails, accept is done locally.
This change will be useful for protocols with state allocated before
accept is done. For the moment, only QUIC protocol is concerned. This
will allow to rebind the quic_conn to a new thread depending on its
load.
This should be backported up to 2.7 after a period of observation.
CIDs were moved from a per-thread list to a global list instance. The
TID-encoded is thus non needed anymore.
This should be backported up to 2.7 after a period of observation.
Previously, quic_connection_id were stored in a per-thread tree list.
Datagram were first dispatched to the correct thread using the encoded
TID before a tree lookup was done.
Remove these trees and replace it with a global trees list of 256
entries. A CID is using the list index corresponding to its first byte.
On datagram dispatch, CID is lookup on its tree and TID is retrieved
using new member quic_connection_id.tid. As such, a read-write lock
protects each list instances. With 256 entries, it is expected that
contention should be reduced.
A new structure quic_cid_tree served as a tree container associated with
its read-write lock. An API is implemented to ensure lock safety for
insert/lookup/delete operation.
This patch is a step forward to be able to break the affinity between a
CID and a TID encoded thread. This is required to be able to migrate a
quic_conn after accept to select thread based on their load.
This should be backported up to 2.7 after a period of observation.
Remove <tid> member in quic_conn. This is moved to quic_connection_id
instance.
For the moment, this change has no impact. Indeed, qc.tid reference
could easily be replaced by tid as all of this work was already done on
the connection thread. However, it is planified to support quic_conn
thread migration in the future, so removal of qc.tid will simplify this.
This should be backported up to 2.7.
ODCID are never stored in the CID tree. Instead, we store our generated
CID which is directly derived from the CID using a hash function. This
operation is done via quic_derive_cid().
Previously, generated CID was returned as a 64-bits integer. However,
this is cumbersome to convert as an array of bytes which is the most
common CID representation. Adjust this by modifying return type to a
quic_cid struct.
This should be backported up to 2.7.
qc_parse_hd_form() is the function used to parse the first byte of a
packet and return its type and version. Its API has been simplified with
the following changes :
* extra out paremeters are removed (long_header and version). All infos
are now stored directly in quic_rx_packet instance
* a new dummy version is declared in quic_versions array with a 0 number
code. This can be used to match Version negotiation packets.
* a new default packet type is defined QUIC_PACKET_TYPE_UNKNOWN to be
used as an initial value.
Also, the function has been exported to an include file. This will be
useful to be able to reuse on quic-sock to parse the first packet of a
datagram.
This should be backported up to 2.7.
Two different structs exists for QUIC connection ID :
* quic_connection_id which represents a full CID with its sequence
number
* quic_cid which is just a buffer with a length. It is contained in the
above structure.
To better differentiate them, rename all quic_connection_id variable
instances to "conn_id" by contrast to "cid" which is used for quic_cid.
This should be backported up to 2.7.
QUIC_LOCK label is never used. Indeed, lock usage is minimal on QUIC as
every connection is pinned to its owned thread.
This should be backported up to 2.7.
SC_FL_EOS flag is added to report the end-of-stream at the SC level. It will
be used to distinguish end of stream reported by the endoint, via the
SE_FL_EOS flag, and the abort triggered by the stream, via the
SC_FL_ABRT_DONE flag.
In this patch, the flag is defined and is systematically tested everywhere
SC_FL_ABRT_DONE is tested. It should be safe because it is never set.
The flag SC_FL_ERROR is added to ack errors on the endpoint. When
SE_FL_ERROR flag is detected on the SE descriptor, the corresponding is set
on the SC. Idea is to avoid, as far as possible, to manipulated the SE
descriptor in upper layers and know when an error in the endpoint is handled
by the SC.
For now, this flag is only set and cleared but never tested.
Here again, it is just a flag renaming. In SC flags, there is no longer
shutdown for reads but aborts. For now this flag is set when a read0 is
detected. It is of couse not accurate. This will be changed later.
After the flag renaming, it is now the turn for the channel function to be
renamed and moved in the SC scope. channel_shutw_now() is replaced by
sc_schedule_shutdown(). The request channel is replaced by the front SC and
the response is replace by the back SC.
Because shutowns for reads are now considered as aborts, the shudowns for
writes can now be considered as shutdowns. Here it is just a flag
renaming. SC_FL_SHUTW_NOW is renamed SC_FL_SHUT_WANTED.
After the flag renaming, it is now the turn for the channel function to be
renamed and moved in the SC scope. channel_shutr_now() is replaced by
sc_schedule_abort(). The request channel is replaced by the front SC and the
response is replace by the back SC.
Most of calls to channel_abort() are associated to a call to
channel_auto_close(). Others are in areas where the auto close is the
default. So, it is now systematically enabled when an abort is performed on
a channel, as part of channel_abort() function.
Add the number of packet losts and the maximum congestion control window computed
by the algorithms to "show quic".
Same thing for the traces of existent congestion control algorithms.
Must be backported to 2.7 and 2.6.
Instead of artificially setting the shards count to MAX_THREAD when
"by-thread" is used, let's reserve special values for symbolic names
so that we can add more in the future. For now we use value -1 for
"by-thread", which requires to turn the type to signed int but it was
already used as such everywhere anyway.
fd_migrate_on() can be used to migrate an existing FD to any thread, even
one belonging to a different group from the current one and from the
caller's. All that is needed is to make sure the FD is still valid when
the operation is performed (which is the case when such operations happen).
This is potentially slightly expensive since it locks the tgid during the
delicate operation, but it is normally performed only from an owning
thread to offer the FD to another one (e.g. reassign a better thread upon
accept()).
In order to permit to migrate FDs from one thread group to another,
we'll need to be able to set a TGID that is compatible with no other
thread group. Either we use a special value or we dedicate a special
bit. Given that we already have way more bits than needed, let's just
sacrifice the topmost one to serve as a lock bit, indicating the tgid
is not valid anymore. This will make all fd_grab_tgid() fail to grab
it.
The new fd_lock_tgid() function now tries to assign a locked tgid to
an idle FD, and fd_unlock_tgid() simply drops the lock bit, revealing
the target tgid.
For now it's still unused so it must not have any effect.
fd_claim_tgid() uses a CAS to set the desired TGID on the FD. It's only
called from fd_insert() where in the vast majority of cases, the tgid
and refcount are zero before the call. However the loop was optimized
for the case where it was equal to the desired TGID, systematically
causing one extra round in the loop there. Better start assuming a
zero value.
We're only checking for 0, 1, or >1 groups enabled there, and we'll soon
need to be more precise and know quickly which groups are non-empty.
Let's just replace the count with a mask of enabled groups. This will
allow to quickly spot the presence of any such group in a set.
Alert when the len argument of a stick table type contains incorrect
characters.
Replace atol by strtol.
Could be backported in every maintained versions.
Once in a while we introduce an sprintf() or strncat() function by
accident. These ones are particularly dangerous and must never ever
be used because the only way to use them safely is at least as
complicated if not more, than their safe counterparts. By redefining
a few of these functions with an attribute_warning() we can deliver a
message to the developer who is tempted to use them. This commit does
it for strcat(), strcpy(), strncat(), sprintf(), vsprintf(). More could
come later if needed, such as strtok() and maybe a few others, but these
are less common.
__attribute__((deprecated)) is convenient to discourage from using
something deprecated, but gcc >= 4.3 provides __attribute__((warning(x)))
that allows to display a specific warning if something is used. This is
particularly convenient to give indications when some API parts need to
be adapted. Let's just define it as a macro that falls back to the older
deprecated attribute when not available.
It's supported on clang 14 as well but works differently and errors
out when redefined (while the main purpose precisely is to add such a
redefinition). Thus instead on clang we use deprecated(msg) which is
OK. See https://github.com/llvm/llvm-project/issues/56519
It appeared that __has_attribute() doesn't work on gcc 4.4 and older
because the concatenation of __has_attribute##x isn't resolved as a one
before being passed to __equals_1() which immediately concatenates it to
comma_for_one. We first need to pass it through an extra layer to resolve
this name to a value. The new version was tested with gcc 4.2 to 11.3.
This may be backported though it's pretty minor.
Add code so that compression can be used for requests as well.
New compression keywords are introduced :
"direction" that specifies what we want to compress. Valid values are
"request", "response", or "both".
"type-req" and "type-res" define content-type to be compressed for
requests and responses, respectively. "type" is kept as an alias for
"type-res" for backward compatibilty.
"algo-req" specifies the compression algorithm to be used for requests.
Only one algorithm can be provided.
"algo-res" provides the list of algorithm that can be used to compress
responses. "algo" is kept as an alias for "algo-res" for backward
compatibility.
Make provision for being able to store both compression algorithms and
content-types to compress for both requests and responses. For now only
the responses one are used.
Make provision for storing the compression algorithm and the compression
context twice, one for requests, and the other for responses. Only the
response ones are used for now.
In the commit 2954bcc1e (BUG/MINOR: http-ana: Don't switch message to DATA
when waiting for payload), the HTTP message flags were extended and don't
fit anymore in an unsigned char. So, we must use an unsigned integer now. It
is not a big deal because there was already a 6-bytes hole in the structure,
just after the flags. Now, there are a 3-bytes hold before.
This patch should fix the issue #2105. It is 2.8-specific, no backport
needed.
Previously, ODCID were concatenated with the client address. This was
done to prevent a collision between two endpoints which used the same
ODCID.
Thanks to the two previous patches, first connection generated CID is
now directly derived from the client ODCID using a hash function which
uses the client source address from the same purpose. Thus, it is now
unneeded to concatenate client address to <odcid> quic-conn member.
This change allows to simplify the quic_cid structure management and
reduce its size which is important as it is embedded several times in
various structures such as quic_conn and quic_rx_packet.
This should be backported up to 2.7.
First connection CID generation has been altered. It is now directly
derived from client ODCID since previous commit :
commit 162baaff7a
MINOR: quic: derive first DCID from client ODCID
This patch removes the ODCID tree which is now unneeded. On connection
lookup via CID, if a DCID is not found the hash derivation is performed
for an INITIAL/0-RTT packet only. In case a client has used multiple
times an ODCID, this will allow to retrieve our generated DCID in the
CID tree without storing the ODCID node.
The impact of this two combined patch is that it may improve slightly
haproxy memory footprint by removing a tree node from quic_conn
structure. The cpu calculation induced by hash derivation should only be
performed only a few times per connection as the client will start to
use our generated CID as soon as it received it.
This should be backported up to 2.7.
HTTP_MSGF_EXPECT_CHECKED is now set on the request message to know the
"Expect: " header was already handled, if any. The flag is set from the
moment we try to handle the header to send a "100-continue" response,
whether it was found or not.
This way, when we are waiting for the request payload, thanks to this flag,
we only try to handle "Expect: " header only once. Before it was performed
by changing the message state from BODY to DATA. But this has some side
effects and it is no accurate. So, it is better to rely on a flag to do so.
Now that the event handler API is pretty mature, we can expose it in
the lua API.
Introducing the core.event_sub(<event_types>, <cb>) lua function that
takes an array of event types <event_types> as well as a callback
function <cb> as argument.
The function returns a subscription <sub> on success.
Subscription <sub> allows you to manage the subscription from anywhere
in the script.
To this day only the sub->unsub method is implemented.
The following event types are currently supported:
- "SERVER_ADD": when a server is added
- "SERVER_DEL": when a server is removed from haproxy
- "SERVER_DOWN": server states goes from up to down
- "SERVER_UP": server states goes from down to up
As for the <cb> function: it will be called when one of the registered
event types occur. The function will be called with 3 arguments:
cb(<event>,<data>,<sub>)
<event>: event type (string) that triggered the function.
(could be any of the types used in <event_types> when registering
the subscription)
<data>: data associated with the event (specific to each event family).
For "SERVER_" family events, server details such as server name/id/proxy
will be provided.
If the server still exists (not yet deleted), a reference to the live
server is provided to spare you from an additionnal lookup if you need
to have direct access to the server from lua.
<sub> refers to the subscription. In case you need to manage it from
within an event handler.
(It refers to the same subscription that the one returned from
core.event_sub())
Subscriptions are per-thread: the thread that will be handling the
event is the one who performed the subscription using
core.event_sub() function.
Each thread treats events sequentially, it means that if you have,
let's say SERVER_UP, then SERVER_DOWN in a short timelapse, then your
cb function will first be called with SERVER_UP, and once you're done
handling the event, your function will be called again with SERVER_DOWN.
This is to ensure event consitency when it comes to logging / triggering
logic from lua.
Your lua cb function may yield if needed, but you're pleased to process
the event as fast as possible to prevent the event queue from growing up
To prevent abuses, if the event queue for the current subscription goes
over 100 unconsumed events, the subscription will pause itself
automatically for as long as it takes for your handler to catch up.
This would lead to events being missed, so a warning will be emitted in
the logs to inform you about that. This is not something you want to let
happen too often, it may indicate that you subscribed to an event that
is occurring too frequently or/and that your callback function is too
slow to keep up the pace and you should review it.
If you want to do some parallel processing because your callback
functions are slow: you might want to create subtasks from lua using
core.register_task() from within your callback function to perform the
heavy job in a dedicated task and allow remaining events to be processed
more quickly.
Please check the lua documentation for more information.
Adding alternative findserver() functions to be able to perform an
unique match based on name or puid and by leveraging revision id (rid)
to make sure the function won't match with a new server reusing the
same name or puid of the "potentially deleted" server we were initially
looking for.
For example, if you were in the position of finding a server based on
a given name provided to you by a different context:
Since dynamic servers were implemented, between the time the name was
picked and the time you will perform the findserver() call some dynamic
server deletion/additions could've been performed in the mean time.
In such cases, findserver() could return a new server that re-uses the
name of a previously deleted server. Depending on your needs, it could
be perfectly fine, but there are some cases where you want to lookup
the original server that was provided to you (if it still exists).
While working on event handling from lua, the need for a pause/resume
function to temporarily disable a subscription was raised.
We solve this by introducing the EHDL_SUB_F_PAUSED flag for
subscriptions.
The flag is set via _pause() and cleared via _resume(), and it is
checked prior to notifying the subscription in publish function.
Pause and Resume functions are also available for via lookups for
identified subscriptions.
If 68e692da0 ("MINOR: event_hdl: add event handler base api")
is being backported, then this commit should be backported with it.
Use event_hdl_async_equeue_size() in advanced async task handler to
get the near real-time event queue size.
By near real-time, you should understand that the queue size is not
updated during element insertion/removal, but shortly before insertion
and shortly after removal, so the size should reflect the approximate
queue size at a given time but should definitely not be used as a
unique source of truth.
If 68e692da0 ("MINOR: event_hdl: add event handler base api")
is being backported, then this commit should be backported with it.
Add event_hdl_async_equeue_isempty() to check is the event queue is
empty from an advanced async task handler.
If 68e692da0 ("MINOR: event_hdl: add event handler base api")
is being backported, then this commit should be backported with it.
advanced async mode (EVENT_HDL_ASYNC_TASK) provided full support for
custom tasklets registration.
Due to the similarities between tasks and tasklets, it may be useful
to use the advanced mode with an existing task (not a tasklet).
While the API did not explicitly disallow this usage, things would
get bad if we try to wakeup a task using tasklet_wakeup() for notifying
the task about new events.
To make the API support both custom tasks and tasklets, we use the
TASK_IS_TASKLET() macro to call the proper waking function depending
on the task's type:
- For tasklets: we use tasklet_wakeup()
- For tasks: we use task_wakeup()
If 68e692da0 ("MINOR: event_hdl: add event handler base api")
is being backported, then this commit should be backported with it.
soft-stop was not explicitly handled in event_hdl API.
Because of this, event_hdl was causing some leaks on deinit paths.
Moreover, a task responsible for handling events could require some
additional cleanups (ie: advanced async task), and as the task was not
protected against abort when soft-stopping, such cleanup could not be
performed unless the task itself implements the required protections,
which is not optimal.
Consider this new approach:
'jobs' global variable is incremented whenever an async subscription is
created to prevent the related task from being aborted before the task
acknowledges the final END event.
Once the END event is acknowledged and freed by the task, the 'jobs'
variable is decremented, and the deinit process may continue (including
the abortion of remaining tasks not guarded by the 'jobs' variable).
To do this, a new global mt_list is required: known_event_hdl_sub_list
This list tracks the known (initialized) subscription lists within the
process.
sub_lists are automatically added to the "known" list when calling
event_hdl_sub_list_init(), and are removed from the list with
event_hdl_sub_list_destroy().
This allows us to implement a global thread-safe event_hdl deinit()
function that is automatically called on soft-stop thanks to signal(0).
When event_hdl deinit() is initiated, we simply iterate against the known
subscription lists to destroy them.
event_hdl_subscribe_ptr() was slightly modified to make sure that a sub_list
may not accept new subscriptions once it is destroyed (removed from the
known list)
This can occur between the time the soft-stop is initiated (signal(0)) and
haproxy actually enters in the deinit() function (once tasks are either
finished or aborted and other threads already joined).
It is safe to destroy() the subscription list multiple times as long
as the pointer is still valid (ie: first on soft-stop when handling
the '0' signal, then from regular deinit() path): the function does
nothing if the subscription list is already removed.
We partially reverted "BUG/MINOR: event_hdl: make event_hdl_subscribe thread-safe"
since we can use parent mt_list locking instead of a dedicated lock to make
the check gainst duplicate subscription ID.
(insert_lock is not useful anymore)
The check in itself is not changed, only the locking method.
sizeof(event_hdl_sub_list) slightly increases: from 24 bits to 32bits due
to the additional mt_list struct within it.
With that said, having thread-safe list to store known subscription lists
is a good thing: it could help to implement additional management
logic for subcription lists and could be useful to add some stats or
debugging tools in the future.
If 68e692da0 ("MINOR: event_hdl: add event handler base api")
is being backported, then this commit should be backported with it.
event_hdl_sub_list_init() and event_hdl_sub_list_destroy() don't expect
to be called with a NULL argument (to use global subscription list
implicitly), simply because the global subscription list init and
destroy is internally managed.
Adding BUG_ON() to detect such invalid usages, and updating some comments
to prevent confusion around these functions.
If 68e692da0 ("MINOR: event_hdl: add event handler base api")
is being backported, then this commit should be backported with it.
List insertion in event_hdl_subscribe() was not thread-safe when dealing
with unique identifiers. Indeed, in this case the list insertion is
conditional (we check for a duplicate, then we insert). And while we're
using mt lists for this, the whole operation is not atomic: there is a
race between the check and the insertion.
This could lead to the same ID being registered multiple times with
concurrent calls to event_hdl_subscribe() on the same ID.
To fix this, we add 'insert_lock' dedicated lock in the subscription
list struct. The lock's cost is nearly 0 since it is only used when
registering identified subscriptions and the lock window is very short:
we only guard the duplicate check and the list insertion to make the
conditional insertion "atomic" within a given subscription list.
This is the only place where we need the lock: as soon as the item is
properly inserted we're out of trouble because all other operations on
the list are already thread-safe thanks to mt lists.
A new lock hint is introduced: LOCK_EHDL which is dedicated to event_hdl
The patch may seem quite large since we had to rework the logic around
the subscribe function and switch from simple mt_list to a dedicated
struct wrapping both the mt_list and the insert_lock for the
event_hdl_sub_list type.
(sizeof(event_hdl_sub_list) is now 24 instead of 16)
However, all the changes are internal: we don't break the API.
If 68e692da0 ("MINOR: event_hdl: add event handler base api")
is being backported, then this commit should be backported with it.
rid is stored as a uint32_t within struct server, but it was stored as
a signed int within the server event data struct.
Switching from signed int to uint32_t in event_hdl_cb_data_server struct
to make sure it won't overflow.
If 129ecf441 ("MINOR: server/event_hdl: add support for SERVER_ADD and SERVER_DEL events")
is being backported, then this commit should be backported with it.
This patch proposes to enumerate servers using internal HAProxy list.
Also, remove the flag SRV_F_NON_PURGEABLE which makes the server non
purgeable each time Lua uses the server.
Removing reg-tests/cli_delete_server_lua.vtc since this test is no
longer relevant (we don't set the SRV_F_NON_PURGEABLE flag anymore)
and we already have a more generic test:
reg-tests/server/cli_delete_server.vtc
Co-authored-by: Aurelien DARRAGON <adarragon@haproxy.com>
When HAproxy is loaded with a lot of frontends/backends (tested with 300k),
it is slow to start and it uses a lot of memory just for indexing backends
in the lua tables.
This patch uses the internal frontend/backend index of HAProxy in place of
lua table.
HAProxy startup is now quicker as each frontend/backend object is created
on demand and not at init.
This has to come with some cost: the execution of Lua will be a little bit
slower.
Two lua init function seems to return something useful, but it
is not the case. The function "hlua_concat_init" seems to return
a failure status, but the function never fails. The function
"hlua_fcn_reg_core_fcn" seems to return a number of elements in
the stack, but it is not the case.
We recently discovered a bug which affects dynamic server deletion:
When a server is deleted, it is removed from the "visible" server list.
But as we've seen in previous commit
("MINOR: server: add SRV_F_DELETED flag"), it can still be accessed by
someone who keeps a reference on it (waiting for the final srv_drop()).
Throughout this transient state, server ptr is still valid (may be
dereferenced) and the flag SRV_F_DELETED is set.
However, as the server is not part of server list anymore, we have
an issue: srv->next pointer won't be updated anymore as the only place
where we perform such update is in cli_parse_delete_server() by
iterating over the "visible" server list.
Because of this, we cannot guarantee that a server with the
SRV_F_DELETED flag has a valid 'next' ptr: 'next' could be pointing
to a fully removed (already freed) server.
This problem can be easily demonstrated with server dumping in
the stats:
server list dumping is performed in stats_dump_proxy_to_buffer()
The function can be interrupted and resumed later by design.
ie: output buffer is full: partial dump and finish the dump after
the flush
This is implemented by calling srv_take() on the server being dumped,
and only releasing it when we're done with it using srv_drop().
(drop can be delayed after function resume if buffer is full)
While the function design seems OK, it works with the assumption that
srv->next will still be valid after the function resumes, which is
not true. (especially if multiple servers are being removed in between
the 2 dumping attempts)
In practice, this did not cause any crash yet (at least this was not
reported so far), because server dumping is so fast that it is very
unlikely that multiple server deletions make their way between 2
dumping attempts in most setups. But still, this is a problem that we
need to address because some upcoming work might depend on this
assumption as well and for the moment it is not safe at all.
========================================================================
Here is a quick reproducer:
With this patch, we're creating a large deletion window of 3s as soon
as we reach a server named "t2" while iterating over the list.
This will give us plenty of time to perform multiple deletions before
the function is resumed.
| diff --git a/src/stats.c b/src/stats.c
| index 84a4f9b6e..15e49b4cd 100644
| --- a/src/stats.c
| +++ b/src/stats.c
| @@ -3189,11 +3189,24 @@ int stats_dump_proxy_to_buffer(struct stconn *sc, struct htx *htx,
| * Temporarily increment its refcount to prevent its
| * anticipated cleaning. Call free_server to release it.
| */
| + struct server *orig = ctx->obj2;
| for (; ctx->obj2 != NULL;
| ctx->obj2 = srv_drop(sv)) {
|
| sv = ctx->obj2;
| + printf("sv = %s\n", sv->id);
| srv_take(sv);
| + if (!strcmp("t2", sv->id) && orig == px->srv) {
| + printf("deletion window: 3s\n");
| + thread_idle_now();
| + thread_harmless_now();
| + sleep(3);
| + thread_harmless_end();
| +
| + thread_idle_end();
| +
| + goto full; /* simulate full buffer */
| + }
|
| if (htx) {
| if (htx_almost_full(htx))
| @@ -4353,6 +4366,7 @@ static void http_stats_io_handler(struct appctx *appctx)
| struct channel *res = sc_ic(sc);
| struct htx *req_htx, *res_htx;
|
| + printf("http dump\n");
| /* only proxy stats are available via http */
| ctx->domain = STATS_DOMAIN_PROXY;
|
Ok, we're ready, now we start haproxy with the following conf:
global
stats socket /tmp/ha.sock mode 660 level admin expose-fd listeners thread 1-1
nbthread 2
frontend stats
mode http
bind *:8081 thread 2-2
stats enable
stats uri /
backend farm
server t1 127.0.0.1:1899 disabled
server t2 127.0.0.1:18999 disabled
server t3 127.0.0.1:18998 disabled
server t4 127.0.0.1:18997 disabled
And finally, we execute the following script:
curl localhost:8081/stats&
sleep .2
echo "del server farm/t2" | nc -U /tmp/ha.sock
echo "del server farm/t3" | nc -U /tmp/ha.sock
This should be enough to reveal the issue, I easily manage to
consistently crash haproxy with the following reproducer:
http dump
sv = t1
http dump
sv = t1
sv = t2
deletion window = 3s
[NOTICE] (2940566) : Server deleted.
[NOTICE] (2940566) : Server deleted.
http dump
sv = t2
sv = �����U
[1] 2940566 segmentation fault (core dumped) ./haproxy -f ttt.conf
========================================================================
To fix this, we add prev_deleted mt_list in server struct.
For a given "visible" server, this list will contain the pending
"deleted" servers references that point to it using their 'next' ptr.
This way, whenever this "visible" server is going to be deleted via
cli_parse_delete_server() it will check for servers in its
'prev_deleted' list and update their 'next' pointer so that they no
longer point to it, and then it will push them in its
'next->prev_deleted' list to transfer the update responsibility to the
next 'visible' server (if next != NULL).
Then, following the same logic, the server about to be removed in
cli_parse_delete_server() will push itself as well into its
'next->prev_deleted' list (if next != NULL) so that it may still use its
'next' ptr for the time it is in transient removal state.
In srv_drop(), right before the server is finally freed, we make sure
to remove it from the 'next->prev_deleted' list so that 'next' won't
try to perform the pointers update for this server anymore.
This has to be done atomically to prevent 'next' srv from accessing a
purged server.
As a result:
for a valid server, either deleted or not, 'next' ptr will always
point to a non deleted (ie: visible) server.
With the proposed fix, and several removal combinations (including
unordered cli_parse_delete_server() and srv_drop() calls), I cannot
reproduce the crash anymore.
Example tricky removal sequence that is now properly handled:
sv list: t1,t2,t3,t4,t5,t6
ops:
take(t2)
del(t4)
del(t3)
del(t5)
drop(t3)
drop(t4)
drop(t5)
drop(t2)
Set the SRV_F_DELETED flag when server is removed from the cli.
When removing a server from the cli (in cli_parse_delete_server()),
we update the "visible" server list so that the removed server is no
longer part of the list.
However, despite the server being removed from "visible" server list,
one could still access the server data from a valid ptr (ie: srv_take())
Deleted flag helps detecting when a server is in transient removal
state: that is, removed from the list, thus not visible but not yet
purged from memory.
The purpose of this patch is only a one-to-one replacement, as far as
possible.
CF_SHUTR(_NOW) and CF_SHUTW(_NOW) flags are now carried by the
stream-connecter. CF_ prefix is replaced by SC_FL_ one. Of course, it is not
so simple because at many places, we were testing if a channel was shut for
reads and writes in same time. To do the same, shut for reads must be tested
on one side on the SC and shut for writes on the other side on the opposite
SC. A special care was taken with process_stream(). flags of SCs must be
saved to be able to detect changes, just like for the channels.
If OPENSSL_NO_DEPRECATED is set, we get a 'error: ‘RSA_PKCS1_PADDING’
undeclared' when building jwt.c. The symbol is not deprecated, we are
just missing an include.
This was raised in GitHub issue #2098.
It does not need to be backported.
This very old bug is there since the first implementation of newreno congestion
algorithm implementation. This was a very bad idea to put a state variable
into quic_cc_algo struct which only defines the congestion control algorithm used
by a QUIC listener, typically its type and its callbacks.
This bug could lead to crashes since BUG_ON() calls have been added to each algorithm
implementation. This was revealed by interop test, but not very often as there was
not very often several connections run at the time during these tests.
Hopefully this was also reported by Tristan in GH #2095.
Move the congestion algorithm state to the correct structures which are private
to a connection (see cubic and nr structs).
Must be backported to 2.7 and 2.6.
This algorithm does nothing except initializing the congestion control window
to a fixed value. Very smart!
Modify the QUIC congestion control configuration parser to support this new
algorithm. The congestion control algorithm must be set as follows:
quic-cc-algo nocc-<cc window size(KB))
For instance if "nocc-15" is provided as quic-cc-algo keyword value, this
will set a fixed window of 15KB.
Reuse the idle timeout task to delay the acknowledgments. The time of the
idle timer expiration is for now on stored in ->idle_expire. The one to
trigger the acknowledgements is stored in ->ack_expire.
Add QUIC_FL_CONN_ACK_TIMER_FIRED new connection flag to mark a connection
as having its acknowledgement timer been triggered.
Modify qc_may_build_pkt() to prevent the sending of "ack only" packets and
allows the connection to send packet when the ack timer has fired.
It is possible that acks are sent before the ack timer has triggered. In
this case it is cancelled only if ACK frames are really sent.
The idle timer expiration must be set again when the ack timer has been
triggered or when it is cancelled.
Must be backported to 2.7.
Dump variables displayed by TRACE_ENTER() or TRACE_LEAVE() by calls to TRACE_PROTO().
No more variables are displayed by the two former macros. For now on, these information
are accessible from proto level.
Add new calls to TRACE_PROTO() at important locations in relation whith QUIC transport
protocol.
When relevant, try to prefix such traces with TX or RX keyword to identify the
concerned subpart (transmission or reception) of the protocol.
Must be backported to 2.7.
As now_ms may wrap, one must use the ticks API to protect the cubic congestion
control algorithm implementation from side effects due to this.
Furthermore to make the cubic congestion control algorithm more readable and easy
to maintain, adding a new state ("in recovery period" QUIC_CC_ST_RP new enum) helps
in reaching this goal. Implement quic_cc_cubic_rp_cb() which is the callback for
this new state.
Must be backported to 2.7 and 2.6.
When a proxy enters the STOPPED state, it will no longer accept new
connections.
However, it doesn't mean that it's completely inactive yet: it will
still be able to handle already pending / keep-alive connections,
thus finishing ongoing work before effectively stopping.
be_usable_srv(), which is used by nbsrv converter and sample fetch,
will return 0 if the proxy is either stopped or disabled.
nbsrv behaves this way since it was originally implemented in b7e7c4720
("MINOR: Add nbsrv sample converter").
(Since then, multiple refactors were performed around this area, but
the current implementation still follows the same logic)
It was found that if nbsrv is used in a proxy section to perform routing
logic, unexpected decisions are being made when nbsrv is used on a proxy
with STOPPED state, since in-flight requests will suffer from nbsrv
returning 0 instead of the current number of usable servers which may
still process existing connections.
For instance, this can happen during process soft-stop, or even when
stopping the proxy from the cli / lua.
To fix this: we now make sure be_usable_srv() always returns the
current number of usable servers, unless the proxy is explicitly
disabled (from the config, not at runtime)
This could be backported up to 2.6.
For older versions, the need for a backport should be evaluated first.
--
Note for 2.4: proxy flags did not exist, it was implemented with fd10ab5e
("MINOR: proxy: Introduce proxy flags to replace disabled bitfield")
For 2.2: STOPPED and DISABLED states were not separated, so we have no
easy way to apply the fix anyway.
This commit adds a new argument to smp_fetch_url_param
that makes the parameter key comparison case-insensitive.
Several levels of callers were modified to pass this info.
Since eb77824 ("MEDIUM: proxy: remove the deprecated "grace" keyword"),
stop_time is never set, so the related code in manage_proxy() is not
relevant anymore.
Removing code that refers to p->stop_time, since it was probably
overlooked.
This patch follows this commit which was not sufficient:
BUG/MINOR: quic: Missing STREAM frame data pointer updates
Indeed, after updating the ->offset field, the bit which informs the
frame builder of its presence must be systematically set.
This bug was revealed by the following BUG_ON() from
quic_build_stream_frame() :
bug condition "!!(frm->type & 0x04) != !!stream->offset.key" matched at src/quic_frame.c:515
This should fix the last crash occured on github issue #2074.
Must be backported to 2.6 and 2.7.
Instead of reporting the inaccurate "malloc_trim() support" on -vv, let's
report the case where the memory allocator was actively replaced from the
one used at build time, as this is the corner case we want to be cautious
about. We also put a tainted bit when this happens so that it's possible
to detect it at run time (e.g. the user might have inherited it from an
environment variable during a reload operation).
The now unused is_trim_enabled() function was finally dropped.
As reported by Miroslav in commit d8a97d8f6 ("BUG/MINOR: illegal use of
the malloc_trim() function if jemalloc is used") there are still occasional
cases where it's discovered that malloc_trim() is being used without its
suitability being checked first. This is a problem when using another
incompatible allocator. But there's a class of use cases we'll never be
able to cover, it's dynamic libraries loaded from Lua. In order to address
this more reliably, we now define our own malloc_trim() that calls the
previous one after checking that the feature is supported and that the
allocator is the expected one. This way child libraries that would call
it will also be safe.
The function is intentionally left defined all the time so that it will
be possible to clean up some code that uses it by removing ifdefs.
In the event that HAProxy is linked with the jemalloc library, it is still
shown that malloc_trim() is enabled when executing "haproxy -vv":
..
Support for malloc_trim() is enabled.
..
It's not so much a problem as it is that malloc_trim() is called in the
pat_ref_purge_range() function without any checking.
This was solved by setting the using_default_allocator variable to the
correct value in the detect_allocator() function and before calling
malloc_trim() it is checked whether the function should be called.
Building without thread support was broken in 2.8-dev2 with commit
7e70bfc8c ("MINOR: threads: add a thread_harmless_end() version that
doesn't wait") that forgot to define the function for the threadless
cases. No backport is needed.
b_alloc() is used to allocate a buffer. We can provoke fault injection
based on forced memory allocation failures using -dMfail on the command
line, but we know that the buffer_wait list is a bit weak and doesn't
always recover well. As such, submitting buffer allocation to such a
treatment seriously limits the usefulness of -dMfail which cannot really
be used for other purposes. Let's just disable it for buffers for now.
Despite having replaced the SSL BIOs to use our own raw_sock layer, we
still didn't exploit the CO_SFL_MSG_MORE flag which is pretty useful to
avoid sending incomplete packets. It's particularly important for SSL
since the extra overhead almost guarantees that each send() will be
followed by an incomplete (and often odd-sided) segment.
We already have an xprt_st set of flags to pass info to the various
layers, so let's just add a new one, SSL_SOCK_SEND_MORE, that is set
or cleared during ssl_sock_from_buf() to transfer the knowledge of
CO_SFL_MSG_MORE. This way we can recover this information and pass
it to raw_sock.
This alone is sufficient to increase by ~5-10% the H2 bandwidth over
SSL when multiple streams are used in parallel.
This patch follows this one which was not sufficient:
"BUG/MINOR: quic: Missing STREAM frame length updates"
Indeed, it is not sufficient to update the ->len and ->offset member
of a STREAM frame to move it forward. The data pointer must also be updated.
This is not done by the STREAM frame builder.
Must be backported to 2.6 and 2.7.
It's cheaper and cleaner than using br_count()==1 given that it just compares
two indexes, and that a ring having a single buffer is in a special case where
it is between empty and used up-to-1. In other words it's not congested.
The commit 5e1b0e7bf ("BUG/MEDIUM: connection: Clear flags when a conn is
removed from an idle list") introduced a regression. CO_FL_SAFE_LIST and
CO_FL_IDLE_LIST flags are used when the connection is released to properly
decrement used/idle connection counters. if a connection is idle, these
flags must be preserved till the connection is really released. It may be
removed from the list but not immediately released. If these flags are lost
when it is finally released, the current number of used connections is
erroneously decremented. If means this counter may become negative and the
counters tracking the number of idle connecitons is not decremented,
suggesting a leak.
So, the above commit is reverted and instead we improve a bit the way to
detect an idle connection. The function conn_get_idle_flag() must now be
used to know if a connection is in an idle list. It returns the connection
flag corresponding to the idle list if the connection is idle
(CO_FL_SAFE_LIST or CO_FL_IDLE_LIST) or 0 otherwise. But if the connection
is scheduled to be removed, 0 is also returned, regardless the connection
flags.
This new function is used when the connection is temporarily removed from
the list to be used, mainly in muxes.
This patch should fix#2078 and #2057. It must be backported as far as 2.2.
Instead of having a dedicated httpclient instance and its own code
decorrelated from the actual auto update one, the "update ssl
ocsp-response" will now use the update task in order to perform updates.
Since the cli command allows to update responses that were never
included in the auto update tree, a new flag was added to the
certificate_ocsp structure so that the said entry can be inserted into
the tree "by hand" and it won't be reinserted back into the tree after
the update process is performed. The 'update_once' flag "stole" a bit
from the 'fail_count' counter since it is the one less likely to reach
UINT_MAX among the ocsp counters of the certificate_ocsp structure.
This new logic required that every certificate_ocsp entry contained all
the ocsp-related information at all time since entries that are not
supposed to be configured automatically can still be updated through the
cli. The logic of the ssl_sock_load_ocsp was changed accordingly.
This one is printed as the iocb in the "show fd" output, and arguably
this wasn't very convenient as-is:
293 : st=0x000123(cl heopI W:sRa R:sRA) ref=0 gid=1 tmask=0x8 umask=0x0 prmsk=0x8 pwmsk=0x0 owner=0x7f488487afe0 iocb=0x50a2c0(main+0x60f90)
Let's unstatify it and export it so that the symbol can now be resolved
from the various points that need it.
In environments where SYSTEM_MAXCONN is defined when compiling, the
master will use this value instead of the original minimal value which
was set to 100. When this happens, the master process could allocate
RAM excessively since it does not need to have an high maxconn. (For
example if SYSTEM_MAXCONN was set to 100000 or more)
This patch fixes the issue by using the new define MASTER_MAXCONN which
define a default maxconn of 100 for the master process.
Must be backported as far as 2.5.
As mentioned in commit 237e6a0d6 ("BUG/MAJOR: fd/thread: fix race between
updates and closing FD"), a race was found during stress tests involving
heavy backend connection reuse with many competing closes.
Here the problem is complex. The analysis in commit f69fea64e ("MAJOR:
fd: get rid of the DWCAS when setting the running_mask") that removed
the DWCAS in 2.5 overlooked a few races.
First, a takeover from thread1 could happen just after fd_update_events()
in thread2 validates it holds the tmask bit in the CAS loop. Since thread1
releases running_mask after the operation, thread2 will succeed the CAS
and both will believe the FD is theirs. This does explain the occasional
crashes seen with h1_io_cb() being called on a bad context, or
sock_conn_iocb() seeing conn->subs vanish after checking it. This issue
can be addressed using a DWCAS in both fd_takeover() and fd_update_events()
as it was before the patch above but this is not portable to all archs and
is not easy to adapt for those lacking it, due to some operations still
happening only on individual masks after the thread groups were added.
Second, the checks after fd_clr_running() for the current thread being
the last one is not sufficient: at the exact moment the operation
completes, another thread may also set and drop the running bit and see
itself as alone, and both can call _fd_close_orphan() in parallel. In
order to prevent this from happening, we cannot rely on the absence of
others, we need an explicit flag indicating that the FD must be closed.
One approach that was attempted consisted in playing with the thread_mask
but that was not reliable since it could still match between the late
deletion and the early insertion that follows. Instead, a new FD flag
was added, FD_MUST_CLOSE, that exactly indicates that the call to
_fd_delete_orphan() must be done. It is set by fd_delete(), and
atomically cleared by the first one which checks it, and which is the
only one to call _fd_delete_orphan().
With both points addressed, there's no more visible race left:
- takeover() only happens under the connection list's lock and cannot
compete with fd_delete() since fd_delete() must first remove the
connection from the list before deleting the FD. That's also why it
doesn't need to call _fd_delete_orphan() when dropping its running
bit.
- takeover() sets its running bit then atomically replaces the thread
mask, so that until that's done, it doesn't validate the condition
to end the synchonization loop in fd_update_events(). Once it's OK,
the previous thread's bit is lost, and this is checked for in
fd_update_events()
- fd_update_events() can compete with fd_delete() at various places
which are explained above. Since fd_delete() clears the thread mask
as after setting its running bit and after setting the FD_MUST_CLOSE
bit, the synchronization loop guarantees that the thread mask is seen
before going further, and that once it's seen, the FD_MUST_CLOSE flag
is already present.
- fd_delete() may start while fd_update_events() has already started,
but fd_delete() must hold a bit in thread_mask before starting, and
that is checked by the first test in fd_update_events() before setting
the running_mask.
- the poller's _update_fd() will not compete against _fd_delete_orphan()
nor fd_insert() thanks to the fd_grab_tgid() that's always done before
updating the polled_mask, and guarantees that we never pretend that a
polled_mask has a bit before the FD is added.
The issue is very hard to reproduce and is extremely time-sensitive.
Some tests were required with a 1-ms timeout with request rates
closely matching 1 kHz per server, though certain tests sometimes
benefitted from saturation. It was found that adding the following
slowdown at a few key places helped a lot and managed to trigger the
bug in 0.5 to 5 seconds instead of tens of minutes on a 20-thread
setup:
{ volatile int i = 10000; while (i--); }
Particularly, placing it at key places where only one of running_mask
or thread_mask is set and not the other one yet (e.g. after the
synchronization loop in fd_update_events or after dropping the
running bit) did yield great results.
Many thanks to Olivier Houchard for this expert help analysing these
races and reviewing candidate fixes.
The patch must be backported to 2.5. Note that 2.6 does not have tgid
in FDs, and that it requires a change of output on fd_clr_running() as
we need the previous bit. This is provided by carefully backporting
commit d6e1987612 ("MINOR: fd: make fd_clr_running() return the previous
value instead"). Tests have shown that the lack of tgid is a showstopper
for 2.6 and that unless a better workaround is found, it could still be
preferable to backport the minimum pieces required for fd_grab_tgid()
to 2.6 so that it stays stable long.
This bug arrived with this commit:
b5a8020e9 MINOR: quic: RETIRE_CONNECTION_ID frame handling (RX)
and was revealed by h3 interop tests with clients like s2n-quic and quic-go
as noticed by Amaury.
Indeed, one must check that the CID matching the sequence number provided by a received
RETIRE_CONNECTION_ID frame does not match the DCID of the packet.
Remove useless ->curr_cid_seq_num member from quic_conn struct.
The sequence number lookup must be done in qc_handle_retire_connection_id_frm()
to check the validity of the RETIRE_CONNECTION_ID frame, it returns the CID to be
retired into <cid_to_retire> variable passed as parameter to this function if
the frame is valid and if the CID was not already retired
Must be backported to 2.7.
Since the following commit :
commit fb375574f9
MINOR: quic: mark quic-conn as jobs on socket allocation
quic-conn instances are marked as jobs. This prevent haproxy process to
stop while there is transfer in progress. To not delay process
termination, idle connections are woken up through their MUX instances
to be able to release them immediately.
However, there is no mechanism to wake up quic connections left on
closing or draining state. This means that haproxy process termination
is delayed until every closing quic connections timer has expired.
To improve this, a new function quic_handle_stopping() is called when
haproxy process is stopping. It simply wakes up the idle timer task of
all connections in the global closing list. These connections will thus
be released immediately to not interrupt haproxy process stopping.
This should be backported up to 2.7.
When a CONNECTION_CLOSE is emitted or received, a QUIC connection enters
respectively in draining or closing state. These states are a loose
equivalent of TCP TIME_WAIT. No data can be exchanged anymore but the
connection is maintained during a certain timer to handle packet
reordering or loss.
A new global list has been defined for QUIC connections in
closing/draining state inside thread_ctx structure. Each time a
connection enters in one of this state, it will be moved from the
default global list to the new closing list.
The objective of this patch is to quickly filter connections on
closing/draining. Most notably, this will be used to wake up these
connections and avoid that haproxy process stopping is delayed by them.
A dedicated function qc_detach_th_ctx_list() has been implemented to
transfer a quic-conn from one list instance to the other. This takes
care of back-references attach to a quic-conn instance in case of a
running "show quic".
This should be backported up to 2.7.
Modify quic_transport_params_dump() and others function relative to the
transport parameters value dump from TRACE() to make their output more
compact.
Add call to quic_transport_params_dump() to dump the transport parameters
from "show quic" CLI command.
Must be backported to 2.7.
Add QUIC_FL_RX_PACKET_SPIN_BIT new RX packet flag to mark an RX packet as having
the spin bit set. Idem for the connection with QUIC_FL_CONN_SPIN_BIT flag.
Implement qc_handle_spin_bit() to set/unset QUIC_FL_CONN_SPIN_BIT for the connection
as soon as a packet number could be deciphered.
Modify quic_build_packet_short_header() to set the spin bit when building
a short packet header.
Validated by quic-tracker spin bit test.
Must be backported to 2.7.
Add ->curr_cid_seq_num new quic_conn struct frame to store the connection
ID sequence number currently used by the connection.
Implement qc_handle_retire_connection_id_frm() to handle this RX frame.
Implement qc_retire_connection_seq_num() to remove a connection ID from its
sequence number.
Implement qc_build_new_connection_id_frm to allocate a new NEW_CONNECTION_ID
frame from a CID.
Modify qc_parse_pkt_frms() which parses the frames of an RX packet to handle
the case of the RETIRE_CONNECTION_ID frame.
Must be backported to 2.7.
Add ->next_cid_seq_num new member to quic_conn struct to store the next
connection ID to be used to alloacated a connection ID.
It is initialized to 0 from qc_new_conn() which initializes a connection.
Modify new_quic_cid() to use this variable each time it is called without
giving the possibility to the caller to pass the sequence number for the
connection to be allocated.
Modify quic_build_post_handshake_frames() to use ->next_cid_seq_num
when building NEW_CONNECTION_ID frames after the hanshake has been completed.
Limit the number of connection IDs provided to the peer to the minimum
between 4 and the value it sent with active_connection_id_limit transport
parameter. This includes the connection ID used by the connection to send
this new connection IDs.
Must be backported to 2.7.
Dump the secret used to derive the next one during a key update initiated by the
client and dump the resulted new secret and the new key and iv to be used to
decryption Application level packets.
Also add a trace when the key update is supposed to be initiated on haproxy side.
This has already helped in diagnosing an issue evealed by the key update interop
test with xquic as client.
Must be backported to 2.7.
When a STREAM frame is re-emitted, it will point to the same stream
buffer as the original one. If an ACK is received for either one of
these frame, the underlying buffer may be freed. Thus, if the second
frame is declared as lost and schedule for retransmission, we must
ensure that the underlying buffer is still allocated or interrupt the
retransmission.
Stream buffer is stored as an eb_tree indexed by the stream ID. To avoid
to lookup over a tree each time a STREAM frame is re-emitted, a lost
STREAM frame is flagged as QUIC_FL_TX_FRAME_LOST.
In most cases, this code is functional. However, there is several
potential issues which may cause a segfault :
- when explicitely probing with a STREAM frame, the frame won't be
flagged as lost
- when splitting a STREAM frame during retransmission, the flag is not
copied
To fix both these cases, QUIC_FL_TX_FRAME_LOST flag has been converted
to a <dup> field in quic_stream structure. This field is now properly
copied when splitting a STREAM frame. Also, as this is now an inner
quic_frame field, it will be copied automatically on qc_frm_dup()
invocation thus ensuring that it will be set on probing.
This issue was encounted randomly with the following backtrace :
#0 __memmove_avx512_unaligned_erms ()
#1 0x000055f4d5a48c01 in memcpy (__len=18446698486215405173, __src=<optimized out>,
#2 quic_build_stream_frame (buf=0x7f6ac3fcb400, end=<optimized out>, frm=0x7f6a00556620,
#3 0x000055f4d5a4a147 in qc_build_frm (buf=buf@entry=0x7f6ac3fcb5d8,
#4 0x000055f4d5a23300 in qc_do_build_pkt (pos=<optimized out>, end=<optimized out>,
#5 0x000055f4d5a25976 in qc_build_pkt (pos=0x7f6ac3fcba10,
#6 0x000055f4d5a30c7e in qc_prep_app_pkts (frms=0x7f6a0032bc50, buf=0x7f6a0032bf30,
#7 qc_send_app_pkts (qc=0x7f6a0032b310, frms=0x7f6a0032bc50) at src/quic_conn.c:4184
#8 0x000055f4d5a35f42 in quic_conn_app_io_cb (t=0x7f6a0009c660, context=0x7f6a0032b310,
This should fix github issue #2051.
This should be backported up to 2.6.
When adding a new certificate through the CLI and appending it to a
crt-list with the 'ocsp-update' option set, the new certificate would
not be added to the OCSP response update list.
The only thing that was missing was the copy of the ocsp_update mode
from the ssl_bind_conf into the ckch_store's object.
An extra wakeup of the update task also needed to happen in case the
newly inserted entry needs to be updated before the next wakeup of the
task.
This patch does not need to be backported.
The minimum and maximum delays between two automatic updates of a given
OCSP response can now be set via global options. It allows to limit the
update rate of OCSP responses for configurations that use many frontend
certificates with the ocsp-update option set if the updates are deemed
too costly.
In order to have some information about the frontend certificate when
dumping the contents of the ocsp update tree from the cli, we could
either keep a reference to a ckch_store in the certificate_ocsp
structure, which might cause some dangling reference problems, or
simply copy the path to the certificate in the ocsp response structure.
This latter solution was chosen because of its simplicity.
Those new specific error codes will enable to know a bit better what
went wrong during and OCSP update process. They will come to use in
future sample fetches as well as in debugging means (via the cli or
future traces).
Implement qc_notify_send(). This function is responsible to notify the
upper layer subscribed on SUB_RETRY_SEND if sending condition are back
to normal.
For the moment, this patch has no functional change as only congestion
window room is checked before notifying the upper layer. However, this
will be extended when poller subscribe of socket on sendto() error will
be implemented. qc_notify_send() will thus be responsible to ensure that
all condition are met before wake up the upper layer.
This should be backported up to 2.7.
Send is conducted through qc_send_ppkts() for a QUIC connection. There
is two types of error which can be encountered on sendto() or affiliated
syscalls :
* transient error. In this case, sending is simulated with the remaining
data and retransmission process is used to have the opportunity to
retry emission
* fatal error. If this happens, the connection should be closed as soon
as possible. This is done via qc_kill_conn() function. Until this
patch, only ECONNREFUSED errno was considered as fatal.
Modify the QUIC send API to be able to differentiate transient and fatal
errors more easily. This is done by fixing the return value of the
sendto() wrapper qc_snd_buf() :
* on fatal error, a negative error code is returned. This is now the
case for every errno except EAGAIN, EWOULDBLOCK, ENOTCONN, EINPROGRESS
and EBADF.
* on a transient error, 0 is returned. This is the case for the listed
errno values above and also if a partial send has been conducted by
the kernel.
* on success, the return value of sendto() syscall is returned.
This commit will be useful to be able to handle transient error with a
quic-conn owned socket. In this case, the socket should be subscribed to
the poller and no simulated send will be conducted.
This commit allows errno management to be confined in the quic-sock
module which is a nice cleanup.
On a final note, EBADF should be considered as fatal. This will be the
subject of a next commit.
This should be backported up to 2.7.
thread_set_first_group() and thread_set_first_tmask() were modified
and renamed to instead return the number and mask of the nth group.
Passing zero continues to return the first one, but it will be more
convenient to use this way when building shards.
The listeners have a thr_conn[] array indexed on the thread number that
is used during connection redispatching to know what threads are the least
loaded. Since we introduced thread groups, and based on the fact that a
listener may only belong to one group, there's no point storing counters
for all threads, we just need to store them for all threads in the group.
Doing so reduces the struct listener from 1500 to 632 bytes. This may be
backported to 2.7 to save a bit of resources.
As for the H1 and H2 stream, the QUIC stream now states it does not expect
data from the server as long as the request is unfinished. The aim is the
same. We must be sure to not trigger a read timeout on server side if the
client is still uploading data.
From the moment the end of the request is received and forwarded to upper
layer, the QUIC stream reports it expects to receive data from the opposite
endpoint. This re-enables read timeout on the server side.
When the endpoint (applet or mux) is now willing to consume data while it
said it wouldn't, a send activity is reported. Indeed, the writes was
blocked because of the endpoint. It is now ready to consume outgoing
data. So an send activity must be reported to reset corresponding timers.
Concretly, when the flag SE_FL_WONT_CONSULE is removed, a send activity is
reported.
Since the previous patch, the ring's offset is not used anymore. The
haring utility remains backward-compatible since it can trust the
buffer element that's at the beginning of the map and which still
contains all the valid data.
We are simply renaming pause_listener() to suspend_listener() to prevent
confusion around listener pausing.
A suspended listener can be in two differents valid states:
- LI_PAUSED: the listener is effectively paused, it will unpause on
resume_listener()
- LI_ASSIGNED (not bound): the listener does not support the LI_PAUSED
state, so it was unbound to satisfy the suspend request, it will
correcly re-bind on resume_listener()
Besides that, we add the LI_F_SUSPENDED flag to mark suspended listeners in
suspend_listener() and unmark them in resume_listener().
We're also adding li_suspend proxy variable to track the number of currently
suspended listeners:
That is, the number of listeners that were suspended through suspend_listener()
and that are either in LI_PAUSED or LI_ASSIGNED state.
Counter is increased on successful suspend in suspend_listener() and it is
decreased on successful resume in resume_listener()
--
Backport notes:
-> 2.4 only, as "MINOR: proxy/listener: support for additional PAUSED state"
was not backported:
Replace this:
| /* PROXY_LOCK is require
| proxy_cond_resume(px);
By this:
| ha_warning("Resumed %s %s.\n", proxy_cap_str(px->cap), px->id);
| send_log(px, LOG_WARNING, "Resumed %s %s.\n", proxy_cap_str(px->cap), px->id);
-> 2.6 and 2.7 only, as "MINOR: listener: make sure we don't pause/resume" was
custom patched:
Replace this:
|@@ -253,6 +253,7 @@ struct listener {
|
| /* listener flags (16 bits) */
| #define LI_F_FINALIZED 0x0001 /* listener made it to the READY||LIMITED||FULL state at least once, may be suspended/resumed safely */
|+#define LI_F_SUSPENDED 0x0002 /* listener has been suspended using suspend_listener(), it is either is LI_PAUSED or LI_ASSIGNED state */
|
| /* Descriptor for a "bind" keyword. The ->parse() function returns 0 in case of
| * success, or a combination of ERR_* flags if an error is encountered. The
By this:
|@@ -222,6 +222,7 @@ struct li_per_thread {
|
| #define LI_F_QUIC_LISTENER 0x00000001 /* listener uses proto quic */
| #define LI_F_FINALIZED 0x00000002 /* listener made it to the READY||LIMITED||FULL state at least once, may be suspended/resumed safely */
|+#define LI_F_SUSPENDED 0x00000004 /* listener has been suspended using suspend_listener(), it is either is LI_PAUSED or LI_ASSIGNED state */
|
| /* The listener will be directly referenced by the fdtab[] which holds its
| * socket. The listener provides the protocol-specific accept() function to
Some listeners are kept in LI_ASSIGNED state but are not supposed to be
started since they were bypassed on initial startup (eg: in protocol_bind_all()
or in enable_listener()...)
Introduce the LI_F_FINALIZED flag: when the variable is non
zero it means that the listener made it past the LI_LISTEN state (finalized)
at least once so we can safely pause / resume. This way we won't risk starting
a previously bypassed listener which never made it that far and thus was not
expected to be lazy-started by accident.
As listener_pause() and listener_resume() are currently partially broken, such
unexpected lazy-start won't happen. But we're trying to restore pause() and
resume() behavior so this patch will be required before going any further.
We had to re-introduce listeners 'flags' struct member since it was recently
moved into bind_conf struct. But here we do have a legitimate need for these
listener-only flags.
This should only be backported if explicitly required by another commit.
--
Backport notes:
-> 2.4 and 2.5:
The 2-bytes hole we're using in the current patch does not apply, let's
use the 4-byte hole located under the 'option' field.
Replace this:
|@@ -226,7 +226,8 @@ struct li_per_thread {
| struct listener {
| enum obj_type obj_type; /* object type = OBJ_TYPE_LISTENER */
| enum li_state state; /* state: NEW, INIT, ASSIGNED, LISTEN, READY, FULL */
|- /* 2-byte hole here */
|+ uint16_t flags; /* listener flags: LI_F_* */
| int luid; /* listener universally unique ID, used for SNMP */
| int nbconn; /* current number of connections on this listener */
| unsigned int thr_idx; /* thread indexes for queue distribution : (t2<<16)+t1 */
By this:
|@@ -209,6 +209,8 @@ struct listener {
| short int nice; /* nice value to assign to the instantiated tasks */
| int luid; /* listener universally unique ID, used for SNMP */
| int options; /* socket options : LI_O_* */
|+ uint16_t flags; /* listener flags: LI_F_* */
|+ /* 2-bytes hole here */
| __decl_thread(HA_RWLOCK_T lock);
|
| struct fe_counters *counters; /* statistics counters */
-> 2.4 only:
We need to adjust some contextual lines.
Replace this:
|@@ -477,7 +478,7 @@ int pause_listener(struct listener *l, int lpx, int lli)
| if (!lli)
| HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock);
|
|- if (l->state <= LI_PAUSED)
|+ if (!(l->flags & LI_F_FINALIZED) || l->state <= LI_PAUSED)
| goto end;
|
| if (l->rx.proto->suspend)
By this:
|@@ -477,7 +478,7 @@ int pause_listener(struct listener *l, int lpx, int lli)
| !(proc_mask(l->rx.settings->bind_proc) & pid_bit))
| goto end;
|
|- if (l->state <= LI_PAUSED)
|+ if (!(l->flags & LI_F_FINALIZED) || l->state <= LI_PAUSED)
| goto end;
|
| if (l->rx.proto->suspend)
And this:
|@@ -535,7 +536,7 @@ int resume_listener(struct listener *l, int lpx, int lli)
| if (MT_LIST_INLIST(&l->wait_queue))
| goto end;
|
|- if (l->state == LI_READY)
|+ if (!(l->flags & LI_F_FINALIZED) || l->state == LI_READY)
| goto end;
|
| if (l->rx.proto->resume)
By this:
|@@ -535,7 +536,7 @@ int resume_listener(struct listener *l, int lpx, int lli)
| !(proc_mask(l->rx.settings->bind_proc) & pid_bit))
| goto end;
|
|- if (l->state == LI_READY)
|+ if (!(l->flags & LI_F_FINALIZED) || l->state == LI_READY)
| goto end;
|
| if (l->rx.proto->resume)
-> 2.6 and 2.7 only:
struct listener 'flags' member still exists, let's use it.
Remove this from the current patch:
|@@ -226,7 +226,8 @@ struct li_per_thread {
| struct listener {
| enum obj_type obj_type; /* object type = OBJ_TYPE_LISTENER */
| enum li_state state; /* state: NEW, INIT, ASSIGNED, LISTEN, READY, FULL */
|- /* 2-byte hole here */
|+ uint16_t flags; /* listener flags: LI_F_* */
| int luid; /* listener universally unique ID, used for SNMP */
| int nbconn; /* current number of connections on this listener */
| unsigned int thr_idx; /* thread indexes for queue distribution : (t2<<16)+t1 */
Then, replace this:
|@@ -251,6 +250,9 @@ struct listener {
| EXTRA_COUNTERS(extra_counters);
| };
|
|+/* listener flags (16 bits) */
|+#define LI_F_FINALIZED 0x0001 /* listener made it to the READY||LIMITED||FULL state at least once, may be suspended/resumed safely */
|+
| /* Descriptor for a "bind" keyword. The ->parse() function returns 0 in case of
| * success, or a combination of ERR_* flags if an error is encountered. The
| * function pointer can be NULL if not implemented. The function also has an
By this:
|@@ -221,6 +221,7 @@ struct li_per_thread {
| };
|
| #define LI_F_QUIC_LISTENER 0x00000001 /* listener uses proto quic */
|+#define LI_F_FINALIZED 0x00000002 /* listener made it to the READY||LIMITED||FULL state at least once, may be suspended/resumed safely */
|
| /* The listener will be directly referenced by the fdtab[] which holds its
| * socket. The listener provides the protocol-specific accept() function to
There is a need for a small difference between resuming and relaxing
a listener.
When resuming, we expect that the listener may completely resume, this includes
unpausing or rebinding if required.
Resuming a listener is a best-effort operation: no matter the current state,
try our best to bring the listener up to the LI_READY state.
There are some cases where we only want to "relax" listeners that were
previously restricted using limit_listener() or listener_full() functions.
Here we don't want to ressucitate listeners, we're simply interested in
cancelling out the previous restriction.
To this day, listener_resume() on a unbound listener is broken, that's why
the need for this wasn't felt yet.
But we're trying to restore historical listener_resume() behavior, so we better
prepare for this by introducing an explicit relax_listener() function that
only does what is expected in such cases.
This commit depends on:
- "MINOR: listener/api: add lli hint to listener functions"
Add listener lock hint (AKA lli) to (stop/resume/pause)_listener() functions.
All these functions implicitely take the listener lock when they are called:
It could be useful to be able to call them while already holding the lock, so
we're adding lli hint to make them take the lock only when it is missing.
This should only be backported if explicitly required by another commit
--
-> 2.4 and 2.5 common backport notes:
These 2 commits need to be backported first:
- 187396e34 "CLEANUP: listener: function comment typo in stop_listener()"
- a57786e87 "BUG/MINOR: listener: null pointer dereference suspected by
coverity"
-> 2.4 special backport notes:
In addition to the previously mentionned dependencies, the patch needs to be
slightly adapted to match the corresponding contextual lines:
Replace this:
|@@ -471,7 +474,8 @@ int pause_listener(struct listener *l, int lpx)
| if (!lpx && px)
| HA_RWLOCK_WRLOCK(PROXY_LOCK, &px->lock);
|
|- HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock);
|+ if (!lli)
|+ HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock);
|
| if (l->state <= LI_PAUSED)
| goto end;
By this:
|@@ -471,7 +474,8 @@ int pause_listener(struct listener *l, int lpx)
| if (!lpx && px)
| HA_RWLOCK_WRLOCK(PROXY_LOCK, &px->lock);
|
|- HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock);
|+ if (!lli)
|+ HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock);
|
| if ((global.mode & (MODE_DAEMON | MODE_MWORKER)) &&
| !(proc_mask(l->rx.settings->bind_proc) & pid_bit))
Replace this:
|@@ -169,7 +169,7 @@ void protocol_stop_now(void)
| HA_SPIN_LOCK(PROTO_LOCK, &proto_lock);
| list_for_each_entry(proto, &protocols, list) {
| list_for_each_entry_safe(listener, lback, &proto->receivers, rx.proto_list)
|- stop_listener(listener, 0, 1);
|+ stop_listener(listener, 0, 1, 0);
| }
| HA_SPIN_UNLOCK(PROTO_LOCK, &proto_lock);
| }
By this:
|@@ -169,7 +169,7 @@ void protocol_stop_now(void)
| HA_SPIN_LOCK(PROTO_LOCK, &proto_lock);
| list_for_each_entry(proto, &protocols, list) {
| list_for_each_entry_safe(listener, lback, &proto->receivers, rx.proto_list)
| if (!listener->bind_conf->frontend->grace)
|- stop_listener(listener, 0, 1);
|+ stop_listener(listener, 0, 1, 0);
| }
| HA_SPIN_UNLOCK(PROTO_LOCK, &proto_lock);
Replace this:
|@@ -2315,7 +2315,7 @@ void stop_proxy(struct proxy *p)
| HA_RWLOCK_WRLOCK(PROXY_LOCK, &p->lock);
|
| list_for_each_entry(l, &p->conf.listeners, by_fe)
|- stop_listener(l, 1, 0);
|+ stop_listener(l, 1, 0, 0);
|
| if (!(p->flags & (PR_FL_DISABLED|PR_FL_STOPPED)) && !p->li_ready) {
| /* might be just a backend */
By this:
|@@ -2315,7 +2315,7 @@ void stop_proxy(struct proxy *p)
| HA_RWLOCK_WRLOCK(PROXY_LOCK, &p->lock);
|
| list_for_each_entry(l, &p->conf.listeners, by_fe)
|- stop_listener(l, 1, 0);
|+ stop_listener(l, 1, 0, 0);
|
| if (!p->disabled && !p->li_ready) {
| /* might be just a backend */
The half-closed timeout is now directly retrieved from the proxy
settings. There is no longer usage for the .hcto field in the stconn
structure. So let's remove it.
We now directly use the proxy settings to set the half-close timeout of a
stream-connector. The function sc_set_hcto() must be used to do so. This
timeout is only set when a shutw is performed. So it is not really a big
deal to use a dedicated function to do so.
We stop to use the channel's expiration dates to detect read and write
timeouts on the channels. We now rely on the stream-endpoint descriptor to
do so. All the stuff is handled in process_stream().
The stream relies on 2 helper functions to know if the receives or sends may
expire: sc_rcv_may_expire() and sc_snd_may_expire().
An endpoint should now set SE_FL_EXP_NO_DATA flag if it does not expect any
data from the opposite endpoint. This way, the stream will be able to
disable any read timeout on the opposite endpoint. Applets should use
applet_expect_no_data() and applet_expect_data() functions to set or clear
the flag. For now, only dns and sink forwarder applets are concerned.
The stream endpoint descriptor now owns two date, lra (last read activity) and
fsb (first send blocked).
The first one is updated every time a read activity is reported, including data
received from the endpoint, successful connect, end of input and shutdown for
reads. A read activity is also reported when receives are unblocked. It will be
used to detect read timeouts.
The other one is updated when no data can be sent to the endpoint and reset
when some data are sent. It is the date of the first send blocked by the
endpoint. It will be used to detect write timeouts.
Helper functions are added to report read/send activity and to retrieve lra/fsb
date.
Read and write timeouts (.rto and .wto) are now replaced by an unique
timeout, call .ioto. Since the recent refactoring on channel's timeouts,
both use the same value, the client timeout on client side and the server
timeout on the server side. Thus, this part may be simplified. Now it
represents the I/O timeout.
These timers are related to the I/O. Thus it is logical to move them into
the SE descriptor. The patch is a bit huge but it is just a
replacement. However it is error-prone.
From the stconn or the stream, helper functions are used to get, set or
reset these timers. This simplify the timers manipulations.
Read and write timeouts concerns the I/O. Thus, it is logical to move it into
the stconn. At the end, the stream is responsible to detect the timeouts. So
it is logcial to have these values in the stconn and not in the SE
descriptor. But it may change depending on the recfactoring.
So, now:
* scf->rto is used instead of req->rto
* scf->wto is used instead of res->wto
* scb->rto is used instead of res->rto
* scb->wto is used instead of req->wto