haproxy

mirror of http://git.haproxy.org/git/haproxy.git/ synced 2025-01-03 02:32:03 +00:00

Author	SHA1	Message	Date
Amaury Denoyelle	47f502df5e	MEDIUM: proto_reverse_connect: bootstrap active reverse connection Implement active reverse connection initialization. This is done through a new task stored in the receiver structure. This task is instantiated via bind callback and first woken up via enable callback. Task handler is separated into two halves. On the first step, a new connection is allocated and stored in <pend_conn> member of the receiver. This new client connection will proceed to connect using the server instance referenced in the bind_conf. When connect has successfully been executed and HTTP/2 connection is ready for exchange after SETTINGS, reverse_connect task is woken up. As <pend_conn> is still set, the second halve is executed which only execute listener_accept(). This will in turn execute accept_conn callback which is defined to return the pending connection. The task is automatically requeued inside accept_conn callback if bind maxconn is not yet reached. This allows to specify how many connection should be opened. Each connection is instantiated and reversed serially one by one until maxconn is reached. conn_free() has been modified to handle failure if a reverse connection fails before being accepted. In this case, no session exists to notify about the failure. Instead, reverse_connect task is requeud with a 1 second delay, giving time to fix a possible network issue. This will allow to attempt a new connection reverse. Note that for the moment connection rebinding after accept is disabled for simplicity. Extra operations are required to migrate an existing connection and its stack to a new thread which will be implemented later.	2023-08-24 17:03:06 +02:00
Amaury Denoyelle	0747e493a0	MINOR: proto_reverse_connect: parse rev@ addresses for bind Implement parsing for "rev@" addresses on bind line. On config parsing, server name is stored on the bind_conf. Several new callbacks are defined on reverse_connect protocol to complete parsing. listen callback is used to retrieve the server instance from the bind_conf server name. If found, the server instance is stored on the receiver. Checks are implemented to ensure HTTP/2 protocol only is used by the server.	2023-08-24 17:02:37 +02:00
Christopher Faulet	ff1c803279	BUG/MEDIUM: listener: Acquire proxy's lock in relax_listener() if necessary Listener functions must follow a common locking pattern: 1. Get the proxy's lock if necessary 2. Get the protocol's lock if necessary 3. Get the listener's lock if necessary We must take care to respect this order to avoid any ABBA issue. However, an issue was introduced in the commit `bcad7e631` ("MINOR: listener: add relax_listener() function"). relax_listener() gets the lisener's lock and if resume_listener() is called, the proxy's lock is then acquired. So to fix the issue, the proxy's lock is first acquired in relax_listener(), if necessary. This patch should fix the issue #2222. It must be backported as far as 2.4 because the above commit is marked to be backported there.	2023-07-21 15:08:27 +02:00
Willy Tarreau	9615102b01	MINOR: stats: report the number of times the global maxconn was reached As discussed a few times over the years, it's quite difficult to know how often we stop accepting connections because the global maxconn was reached. This is not easy to know because when we reach the limit we stop accepting but we don't know if incoming connections are pending, so it's not possible to know how many were delayed just because of this. However, an interesting equivalent metric consist in counting the number of times an accepted incoming connection resulted in the limit being reached. I.e. "we've accepted the last one for now". That doesn't imply any other one got delayed but it's a factual indicator that something might have been delayed. And by counting the number of such events, it becomes easier to know whether some limits need to be adjusted because they're reached often, or if it's exceptionally rare. The metric is reported as a counter in show info and on the stats page in the info section right next to "maxconn".	2023-05-11 13:51:31 +02:00
Ilya Shipitsin	83f54b9aef	CLEANUP: src/listener.c: remove redundant NULL check fixes #2031 quoting Willy Tarreau: "Originally the listeners were intended to work without a bind_conf (e.g. for FTP processing) hence these tests, but over time the bind_conf has become omnipresent"	2023-05-11 05:30:03 +02:00
Willy Tarreau	7310164b2c	MINOR: listener: add a new global tune.listener.default-shards setting This new setting accepts "by-process", "by-group" and "by-thread" and will dictate how listeners will be sharded by default when nothing is specified. While the default remains "by-process", "by-group" should be much more efficient with many threads, while not changing anything for single-group setups.	2023-04-23 09:46:15 +02:00
Willy Tarreau	c38499ceae	MINOR: listener: do not restrict CLI to first group anymore Now that we're able to run listeners on any set of groups, we don't need to maintain a special case about the stats socket anymore. It used to be forced to group 1 only so as to avoid startup failures in case several groups were configured, but if it's done now, it will automatically bind the needed FDs to have one per group so this is no more an issue.	2023-04-23 09:46:15 +02:00
Willy Tarreau	8a5e6f4cca	MINOR: protocol: add a function to check if some features are supported The new function protocol_supports_flag() checks the protocol flags to verify if some features are supported, but will support being extended to refine the tests. Let's use it to check for REUSEPORT.	2023-04-23 09:46:15 +02:00
Willy Tarreau	c1fbdd6397	MINOR: listener: automatically adjust shards based on support for SO_REUSEPORT Now if multiple shards are explicitly requested, and the listener's protocol doesn't support SO_REUSEPORT, sharding is disabled, which will result in the socket being automatically duped if needed. A warning is emitted when this happens. If "shards by-group" or "shards by-thread" are used, these will automatically be turned down to 1 since we want this to be possible easily using -dR on the command line without having to djust the config. For "by-thread", a diag warning will be emitted to help troubleshoot possible performance issues.	2023-04-23 09:46:15 +02:00
Willy Tarreau	a22db6567f	MEDIUM: peers: call bind_complete_thread_setup() to finish the config The listeners in peers sections were still not handing the thread groups fine. Shards were silently ignored and if a listener was bound to more than one group, it would simply fail. Now we can call the dedicated function to resolve all this and possibly create the missing extra listeners. bind_complete_thread_setup() was adjusted to use the proxy_type_str() instead of writing "proxy" at the only place where this word was still hard-coded so that we continue to speak about peers sections when relevant.	2023-04-23 09:46:15 +02:00
Willy Tarreau	f6a8444f55	REORG: listener: move the bind_conf's thread setup code to listener.c What used to be only two lines to apply a mask in a loop in check_config_validity() grew into a 130-line block that performs deeply listener-specific operations that do not have their place there anymore. In addition it's worth noting that the peers code still doesn't support shards nor being bound to more than one group, which is a second reason for moving that code to its own function. Nothing was changed except recreating the missing variables from the bind_conf itself (the fe only).	2023-04-23 09:46:15 +02:00
Tim Duesterhus	b1ec21d259	CLEANUP: Stop checking the pointer before calling `tasklet_free()` Changes performed with this Coccinelle patch: @@ expression e; @@ - if (e != NULL) { tasklet_free(e); - } @@ expression e; @@ - if (e) { tasklet_free(e); - } @@ expression e; @@ - if (e) tasklet_free(e); @@ expression e; @@ - if (e != NULL) tasklet_free(e); See GitHub Issue #2126	2023-04-23 00:28:25 +02:00
Willy Tarreau	8adffaa899	MINOR: listener: always compare the local thread as well By comparing the local thread's load with the least loaded thread's load, we can further improve the fairness and at the same time also improve locality since it allows a small ratio of connections not to be migrated. This is visible on CPU usage with long connections on very large thread counts (224) and high bandwidth (200G). The cost of checking the local thread's load remains fairly low so there's no reason not to do this. We continue to update the index if we select the local thread, because it means that the two other threads were both more loaded so we'd rather find better ones.	2023-04-21 17:41:26 +02:00
Willy Tarreau	ff18504d73	MINOR: listener: make sure to avoid ABA updates in per-thread index One limitation of the current thread index mechanism is that if the values are assigned multiple times to the same thread and the index loops, it can match again the old value, which will not prevent a competing thread from finishing its CAS and assigning traffic to a thread that's not the optimal one. The probability is low but the solution is simple enough and consists in implementing an update counter in the high bits of the index to force a mismatch in this case (assuming we don't try to cover for extremely unlikely cases where the update counter loops while the index remains equal). So let's do that. In order to improve the situation a little bit, we now set the index to a ulong so that in 32 bits we have 8 bits of counter and in 64 bits we have 40 bits.	2023-04-21 17:41:26 +02:00
Willy Tarreau	77e33509c8	MINOR: listener: resync with the thread index before heavy calculations During heavy accept competition, the CAS will occasionally fail and we'll have to go through all the calculation again. While the first two loops look heavy, they're almost never taken so they're quite cheap. However the rest of the operation is heavy because we have to consult connection counts and queue indexes for other threads, so better double-check if the index is still valid before continuing. Tests show that it's more efficient do retry half-way like this.	2023-04-21 17:41:26 +02:00
Willy Tarreau	b657492680	MINOR: listener: use a common thr_idx from the reference listener Instead of seeing each listener use its own thr_idx, let's use the same for all those from a shard. It should provide more accurate and smoother thread allocation.	2023-04-21 17:41:26 +02:00
Willy Tarreau	9d360604bd	MEDIUM: listener: rework thread assignment to consider all groups Till now threads were assigned in listener_accept() to other threads of the same group only, using a single group mask. Now that we have all the relevant info (array of listeners of the same shard), we can spread the thr_idx to cover all assigned groups. The thread indexes now contain the group number in their upper bits, and the indexes run over te whole list of threads, all groups included. One particular subtlety here is that switching to a thread from another group also means switching the group, hence the listener. As such, when changing the group we need to update the connection's owner to point to the listener of the same shard that is bound to the target group.	2023-04-21 17:41:26 +02:00
Willy Tarreau	e6f5ab5afa	MINOR: listener: make accept_queue index atomic There has always been a race when checking the length of an accept queue to determine which one is more loaded that another, because the head and tail are read at two different moments. This is not required, we can merge them as two 16 bit numbers inside a single 32-bit index that is always accessed atomically. This way we read both values at once and always have a consistent measurement.	2023-04-21 17:41:26 +02:00
Willy Tarreau	aae1810b4d	MINOR: receiver: add a struct shard_info to store info about each shard In order to create multiple receivers for one multi-group shard, we'll need some more info about the shard. Here we store: - the number of groups (= number of receivers) - the number of threads (will be used for accept LB) - pointer to the reference rx (to get the FD and to find all threads) - pointers to the other members (to iterate over all threads) For now since there's only one group per shard it remains simple. The listener deletion code already takes care of removing the current member from its shards list and moving others' reference to the last one if it was their reference (so as to avoid o(n^2) updates during ordered deletes). Since the vast majority of setups will not use multi-group shards, we try to save memory usage by only allocating the shard_info when it is needed, so the principle here is that a receiver shard_info==NULL is alone and doesn't share its socket with another group. Various approaches were considered and tests show that the management of the listeners during boot makes it easier to just attach to or detach from a shard_info and automatically allocate it if it does not exist, which is what is being done here. For now the attach code is not called, but detach is already called on delete.	2023-04-21 17:41:26 +02:00
Willy Tarreau	84fe1f479b	MINOR: listener: support another thread dispatch mode: "fair" This new algorithm for rebalancing incoming connections to multiple threads is simpler and instead of considering the threads load, it will only cycle through all of them, offering a fair share of the traffic to each thread. It may be well suited for short-lived connections but is also convenient for very large thread counts where it's not always certain that the least loaded thread will always be found.	2023-04-21 17:41:26 +02:00
Willy Tarreau	6a4d48b736	MINOR: quic_sock: index li->per_thr[] on local thread id, not global one There's a li_per_thread array in each listener for use with QUIC listeners. Since thread groups were introduced, this array can be allocated too large because global.nbthread is allocated for each listener, while only no more than MIN(nbthread,MAX_THREADS_PER_GROUP) may be used by a single listener. This was because the global thread ID is used as the index instead of the local ID (since a listener may only be used by a single group). Let's just switch to local ID and reduce the allocated size.	2023-04-21 17:41:26 +02:00
Amaury Denoyelle	0783a7b08e	MINOR: listener: remove unneeded local accept flag Remove the receiver RX_F_LOCAL_ACCEPT flag. This was used by QUIC protocol before thread rebinding was supported by the quic_conn layer. This should be backported up to 2.7 after the previous patch has also been taken.	2023-04-18 17:09:34 +02:00
Amaury Denoyelle	a66e04338e	MINOR: protocol: define new callback set_affinity Define a new protocol callback set_affinity. This function is used during listener_accept() to notify about a rebind on a new thread just before pushing the connection on the selected thread queue. If the callback fails, accept is done locally. This change will be useful for protocols with state allocated before accept is done. For the moment, only QUIC protocol is concerned. This will allow to rebind the quic_conn to a new thread depending on its load. This should be backported up to 2.7 after a period of observation.	2023-04-18 16:54:52 +02:00
Willy Tarreau	a07635ead5	MINOR: bind-conf: support a new shards value: "by-group" Setting "shards by-group" will create one shard per thread group. This can often be a reasonable tradeoff between a single one that can be suboptimal on CPUs with many cores, and too many that will eat a lot of file descriptors. It was shown to provide good results on a 224 thread machine, with a distribution that was even smoother than the system's since here it can take into account the number of connections per thread in the group. Depending on how popular it becomes, it could even become the default setting in a future version.	2023-04-13 17:38:31 +02:00
Willy Tarreau	d30e82b9f0	MINOR: receiver: reserve special values for "shards" Instead of artificially setting the shards count to MAX_THREAD when "by-thread" is used, let's reserve special values for symbolic names so that we can add more in the future. For now we use value -1 for "by-thread", which requires to turn the type to signed int but it was already used as such everywhere anyway.	2023-04-13 17:12:50 +02:00
Willy Tarreau	fea8c19119	CLEANUP: listener: only store conn counts for local threads The listeners have a thr_conn[] array indexed on the thread number that is used during connection redispatching to know what threads are the least loaded. Since we introduced thread groups, and based on the fact that a listener may only belong to one group, there's no point storing counters for all threads, we just need to store them for all threads in the group. Doing so reduces the struct listener from 1500 to 632 bytes. This may be backported to 2.7 to save a bit of resources.	2023-02-28 10:28:47 +01:00
Aurelien DARRAGON	ca8a4b2966	BUG/MEDIUM: listener/proxy: fix listeners notify for proxy resume In `58651b42f` ("MEDIUM: listener/proxy: make the listeners notify about proxy pause/resume") we introduced the logic for pause/resume notify using li_ready for pause and li_paused for resume. Unfortunately, relying on li_paused for resume doesn't work reliably if we resume a listener which is only made of receivers that are completely stopped. For example, this could happen with receivers that don't support the LI_PAUSED state like ABNS sockets. This is especially true since pause_listener() was renamed to suspend_listener() to better reflect its actual behavior in ("MINOR: listener: pause_listener() becomes suspend_listener()) To fix this, we now rely on the li_suspended state in resume_listener() to make sure that suspend_listener() and resume_listener() notify messages are consistent to each other: "Proxy pause" is triggered when there are no more ready listeners. "Proxy resume" is triggered when there are no more suspended listeners. Also, we make use of the new PR_FL_PAUSED proxy flag to make sure we don't report the same event twice. This could be backported up to 2.4 after a reasonable observation period to make sure that this change doesn't cause unwanted side-effects. -- Backport notes: This commit depends on: - "MINOR: listener: pause_listener() becomes suspend_listener()" -> 2.4 only, as "MINOR: proxy/listener: support for additional PAUSED state" was not backported: Replace this: \|+ if (px && !(px->flags & PR_FL_PAUSED) && !px->li_ready) { \| /* PROXY_LOCK is required / \| proxy_cond_pause(px); \| ha_warning("Paused %s %s.\n", proxy_cap_str(px->cap), px->id); By this: \|+ if (px && !px->li_ready) { \| ha_warning("Paused %s %s.\n", proxy_cap_str(px->cap), px->id); \| send_log(px, LOG_WARNING, "Paused %s %s.\n", proxy_cap_str(px->cap), px->id); \| } And this: \|+ if (px && (px->flags & PR_FL_PAUSED) && !px->li_suspended) { \| / PROXY_LOCK is required */ \| proxy_cond_resume(px); \| ha_warning("Resumed %s %s.\n", proxy_cap_str(px->cap), px->id); By this: \|+ if (px && !px->li_suspended) { \| ha_warning("Resumed %s %s.\n", proxy_cap_str(px->cap), px->id); \| send_log(px, LOG_WARNING, "Resumed %s %s.\n", proxy_cap_str(px->cap), px->id); \| }	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	d3ffba4512	MINOR: listener: pause_listener() becomes suspend_listener() We are simply renaming pause_listener() to suspend_listener() to prevent confusion around listener pausing. A suspended listener can be in two differents valid states: - LI_PAUSED: the listener is effectively paused, it will unpause on resume_listener() - LI_ASSIGNED (not bound): the listener does not support the LI_PAUSED state, so it was unbound to satisfy the suspend request, it will correcly re-bind on resume_listener() Besides that, we add the LI_F_SUSPENDED flag to mark suspended listeners in suspend_listener() and unmark them in resume_listener(). We're also adding li_suspend proxy variable to track the number of currently suspended listeners: That is, the number of listeners that were suspended through suspend_listener() and that are either in LI_PAUSED or LI_ASSIGNED state. Counter is increased on successful suspend in suspend_listener() and it is decreased on successful resume in resume_listener() -- Backport notes: -> 2.4 only, as "MINOR: proxy/listener: support for additional PAUSED state" was not backported: Replace this: \| /* PROXY_LOCK is require \| proxy_cond_resume(px); By this: \| ha_warning("Resumed %s %s.\n", proxy_cap_str(px->cap), px->id); \| send_log(px, LOG_WARNING, "Resumed %s %s.\n", proxy_cap_str(px->cap), px->id); -> 2.6 and 2.7 only, as "MINOR: listener: make sure we don't pause/resume" was custom patched: Replace this: \|@@ -253,6 +253,7 @@ struct listener { \| \| /* listener flags (16 bits) / \| #define LI_F_FINALIZED 0x0001 / listener made it to the READY\|\|LIMITED\|\|FULL state at least once, may be suspended/resumed safely / \|+#define LI_F_SUSPENDED 0x0002 / listener has been suspended using suspend_listener(), it is either is LI_PAUSED or LI_ASSIGNED state / \| \| / Descriptor for a "bind" keyword. The ->parse() function returns 0 in case of \| * success, or a combination of ERR_* flags if an error is encountered. The By this: \|@@ -222,6 +222,7 @@ struct li_per_thread { \| \| #define LI_F_QUIC_LISTENER 0x00000001 /* listener uses proto quic / \| #define LI_F_FINALIZED 0x00000002 / listener made it to the READY\|\|LIMITED\|\|FULL state at least once, may be suspended/resumed safely / \|+#define LI_F_SUSPENDED 0x00000004 / listener has been suspended using suspend_listener(), it is either is LI_PAUSED or LI_ASSIGNED state / \| \| / The listener will be directly referenced by the fdtab[] which holds its \| * socket. The listener provides the protocol-specific accept() function to	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	046a75e131	BUG/MEDIUM: resume from LI_ASSIGNED in default_resume_listener() Since `fc974887c` ("MEDIUM: protocol: explicitly start the receiver before the listener"), resume from LI_ASSIGNED state does not work anymore. This is because the binding part has been divided into 2 distinct steps since: first bind(), then listen(). This new logic was properly implemented in startup sequence through protocol_bind_all() but wasn't properly reported in default_resume_listener() function. Fixing default_resume_listener() to comply with the new logic. This should help ABNS sockets to properly rebind in resume_listener() after they have been stopped by pause_listener(): See Redmine:4475 for more context. This commit depends on: - "MINOR: listener: workaround for closing a tiny race between resume_listener() and stopping" - "MINOR: listener: make sure we don't pause/resume bypassed listeners" This could be backported up to 2.4 after a reasonable observation period to make sure that this change doesn't cause unwanted side-effects.	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	3bb2a38f01	BUG/MINOR: listener: fix resume_listener() resume return value handling In resume_listener(), proto->resume() errors were not properly handled: the function kept flowing down as if no errors were detected. Instead, we're performing an early return when such errors are detected to prevent undefined behaviors. This could be backported up to 2.4. -- Backport notes: This commit depends on: - "MINOR: listener: make sure we don't pause/resume bypassed listeners" -> 2.4 ... 2.7: Replace this: \| if (l->bind_conf->maxconn && l->nbconn >= l->bind_conf->maxconn) { \| l->rx.proto->disable(l); By this: \| if (l->maxconn && l->nbconn >= l->maxconn) { \| l->rx.proto->disable(l);	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	7a15fa58b1	BUG/MEDIUM: listener: fix pause_listener() suspend return value handling Legacy suspend() return value handling in pause_listener() has been altered over the time. First with `fb76bd5ca` ("BUG/MEDIUM: listeners: correctly report pause() errors") Then with `e03204c8e` ("MEDIUM: listeners: implement protocol level ->suspend/resume() calls") We aim to restore original function behavior and comply with resume_listener() function description. This is required for resume_listener() and pause_listener() to work as a whole Now, it is made explicit that pause_listener() may stop a listener if the listener doesn't support the LI_PAUSED state (depending on the protocol family, ie: ABNS sockets), in this case l->state will be set to LI_ASSIGNED and this won't be considered as an error. This could be backported up to 2.4 after a reasonable observation period to make sure that this change doesn't cause unwanted side-effects. -- Backport notes: This commit depends on: - "MINOR: listener: make sure we don't pause/resume bypassed listeners" -> 2.4: manual change required because "MINOR: proxy/listener: support for additional PAUSED state" was not backported: the contextual patch lines don't match. Replace this: \| if (px && !px->li_ready) { \| /* PROXY_LOCK is required */ By this: \| if (px && !px->li_ready) { \| ha_warning("Paused %s %s.\n", proxy_cap_str(px->cap), px->id);	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	2370599f96	MINOR: listener: make sure we don't pause/resume bypassed listeners Some listeners are kept in LI_ASSIGNED state but are not supposed to be started since they were bypassed on initial startup (eg: in protocol_bind_all() or in enable_listener()...) Introduce the LI_F_FINALIZED flag: when the variable is non zero it means that the listener made it past the LI_LISTEN state (finalized) at least once so we can safely pause / resume. This way we won't risk starting a previously bypassed listener which never made it that far and thus was not expected to be lazy-started by accident. As listener_pause() and listener_resume() are currently partially broken, such unexpected lazy-start won't happen. But we're trying to restore pause() and resume() behavior so this patch will be required before going any further. We had to re-introduce listeners 'flags' struct member since it was recently moved into bind_conf struct. But here we do have a legitimate need for these listener-only flags. This should only be backported if explicitly required by another commit. -- Backport notes: -> 2.4 and 2.5: The 2-bytes hole we're using in the current patch does not apply, let's use the 4-byte hole located under the 'option' field. Replace this: \|@@ -226,7 +226,8 @@ struct li_per_thread { \| struct listener { \| enum obj_type obj_type; /* object type = OBJ_TYPE_LISTENER / \| enum li_state state; / state: NEW, INIT, ASSIGNED, LISTEN, READY, FULL / \|- / 2-byte hole here / \|+ uint16_t flags; / listener flags: LI_F_* / \| int luid; / listener universally unique ID, used for SNMP / \| int nbconn; / current number of connections on this listener / \| unsigned int thr_idx; / thread indexes for queue distribution : (t2<<16)+t1 / By this: \|@@ -209,6 +209,8 @@ struct listener { \| short int nice; / nice value to assign to the instantiated tasks / \| int luid; / listener universally unique ID, used for SNMP / \| int options; / socket options : LI_O_* / \|+ uint16_t flags; / listener flags: LI_F_* / \|+ / 2-bytes hole here / \| __decl_thread(HA_RWLOCK_T lock); \| \| struct fe_counters counters; /* statistics counters / -> 2.4 only: We need to adjust some contextual lines. Replace this: \|@@ -477,7 +478,7 @@ int pause_listener(struct listener l, int lpx, int lli) \| if (!lli) \| HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock); \| \|- if (l->state <= LI_PAUSED) \|+ if (!(l->flags & LI_F_FINALIZED) \|\| l->state <= LI_PAUSED) \| goto end; \| \| if (l->rx.proto->suspend) By this: \|@@ -477,7 +478,7 @@ int pause_listener(struct listener l, int lpx, int lli) \| !(proc_mask(l->rx.settings->bind_proc) & pid_bit)) \| goto end; \| \|- if (l->state <= LI_PAUSED) \|+ if (!(l->flags & LI_F_FINALIZED) \|\| l->state <= LI_PAUSED) \| goto end; \| \| if (l->rx.proto->suspend) And this: \|@@ -535,7 +536,7 @@ int resume_listener(struct listener l, int lpx, int lli) \| if (MT_LIST_INLIST(&l->wait_queue)) \| goto end; \| \|- if (l->state == LI_READY) \|+ if (!(l->flags & LI_F_FINALIZED) \|\| l->state == LI_READY) \| goto end; \| \| if (l->rx.proto->resume) By this: \|@@ -535,7 +536,7 @@ int resume_listener(struct listener l, int lpx, int lli) \| !(proc_mask(l->rx.settings->bind_proc) & pid_bit)) \| goto end; \| \|- if (l->state == LI_READY) \|+ if (!(l->flags & LI_F_FINALIZED) \|\| l->state == LI_READY) \| goto end; \| \| if (l->rx.proto->resume) -> 2.6 and 2.7 only: struct listener 'flags' member still exists, let's use it. Remove this from the current patch: \|@@ -226,7 +226,8 @@ struct li_per_thread { \| struct listener { \| enum obj_type obj_type; / object type = OBJ_TYPE_LISTENER / \| enum li_state state; / state: NEW, INIT, ASSIGNED, LISTEN, READY, FULL / \|- / 2-byte hole here / \|+ uint16_t flags; / listener flags: LI_F_* / \| int luid; / listener universally unique ID, used for SNMP / \| int nbconn; / current number of connections on this listener / \| unsigned int thr_idx; / thread indexes for queue distribution : (t2<<16)+t1 / Then, replace this: \|@@ -251,6 +250,9 @@ struct listener { \| EXTRA_COUNTERS(extra_counters); \| }; \| \|+/ listener flags (16 bits) / \|+#define LI_F_FINALIZED 0x0001 / listener made it to the READY\|\|LIMITED\|\|FULL state at least once, may be suspended/resumed safely / \|+ \| / Descriptor for a "bind" keyword. The ->parse() function returns 0 in case of \| * success, or a combination of ERR_* flags if an error is encountered. The \| * function pointer can be NULL if not implemented. The function also has an By this: \|@@ -221,6 +221,7 @@ struct li_per_thread { \| }; \| \| #define LI_F_QUIC_LISTENER 0x00000001 /* listener uses proto quic / \|+#define LI_F_FINALIZED 0x00000002 / listener made it to the READY\|\|LIMITED\|\|FULL state at least once, may be suspended/resumed safely / \| \| / The listener will be directly referenced by the fdtab[] which holds its \| * socket. The listener provides the protocol-specific accept() function to	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	f5d98938ad	MINOR: listener: workaround for closing a tiny race between resume_listener() and stopping This is an alternative fix that tries to address the same issue as `d1ebee177` ("BUG/MINOR: listener: close tiny race between resume_listener() and stopping") while allowing resume_listener() to be more versatile. Indeed, because of the previous fix, resume_listener() is not able to rebind stopped listeners, and this breaks the original behavior that is documented in the function description: "If the listener was only in the assigned state, it's totally rebound. This can happen if a pause() has completely stopped it. If the resume fails, 0 is returned and an error might be displayed." With relax_listener(), we now make sure to check l->state under the listener lock so we don't call resume_listener() when the conditions are not met. As such, concurrently stopped listeners may not be rebound using relax_listener(). Note: the documented race can't happen since `1b927eb3c` ("MEDIUM: proto: stop protocols under thread isolation during soft stop"), but older versions are concerned as `1b927eb3c` was not marked for backports. Moreover, the patch also prevents the race between protocol_pause_all() and resuming from LIMITED or FULL states. This commit depends on: - "MINOR: listener: add relax_listener() function" This should be backported with `d1ebee177` up to 2.4 (`d1ebee177` is marked to be backported for all stable versions but the current patch does not apply for versions < 2.4)	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	bcad7e6319	MINOR: listener: add relax_listener() function There is a need for a small difference between resuming and relaxing a listener. When resuming, we expect that the listener may completely resume, this includes unpausing or rebinding if required. Resuming a listener is a best-effort operation: no matter the current state, try our best to bring the listener up to the LI_READY state. There are some cases where we only want to "relax" listeners that were previously restricted using limit_listener() or listener_full() functions. Here we don't want to ressucitate listeners, we're simply interested in cancelling out the previous restriction. To this day, listener_resume() on a unbound listener is broken, that's why the need for this wasn't felt yet. But we're trying to restore historical listener_resume() behavior, so we better prepare for this by introducing an explicit relax_listener() function that only does what is expected in such cases. This commit depends on: - "MINOR: listener/api: add lli hint to listener functions"	2023-02-23 15:05:05 +01:00
Aurelien DARRAGON	4059e094db	MINOR: listener/api: add lli hint to listener functions Add listener lock hint (AKA lli) to (stop/resume/pause)_listener() functions. All these functions implicitely take the listener lock when they are called: It could be useful to be able to call them while already holding the lock, so we're adding lli hint to make them take the lock only when it is missing. This should only be backported if explicitly required by another commit -- -> 2.4 and 2.5 common backport notes: These 2 commits need to be backported first: - `187396e34` "CLEANUP: listener: function comment typo in stop_listener()" - `a57786e87` "BUG/MINOR: listener: null pointer dereference suspected by coverity" -> 2.4 special backport notes: In addition to the previously mentionned dependencies, the patch needs to be slightly adapted to match the corresponding contextual lines: Replace this: \|@@ -471,7 +474,8 @@ int pause_listener(struct listener l, int lpx) \| if (!lpx && px) \| HA_RWLOCK_WRLOCK(PROXY_LOCK, &px->lock); \| \|- HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock); \|+ if (!lli) \|+ HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock); \| \| if (l->state <= LI_PAUSED) \| goto end; By this: \|@@ -471,7 +474,8 @@ int pause_listener(struct listener l, int lpx) \| if (!lpx && px) \| HA_RWLOCK_WRLOCK(PROXY_LOCK, &px->lock); \| \|- HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock); \|+ if (!lli) \|+ HA_RWLOCK_WRLOCK(LISTENER_LOCK, &l->lock); \| \| if ((global.mode & (MODE_DAEMON \| MODE_MWORKER)) && \| !(proc_mask(l->rx.settings->bind_proc) & pid_bit)) Replace this: \|@@ -169,7 +169,7 @@ void protocol_stop_now(void) \| HA_SPIN_LOCK(PROTO_LOCK, &proto_lock); \| list_for_each_entry(proto, &protocols, list) { \| list_for_each_entry_safe(listener, lback, &proto->receivers, rx.proto_list) \|- stop_listener(listener, 0, 1); \|+ stop_listener(listener, 0, 1, 0); \| } \| HA_SPIN_UNLOCK(PROTO_LOCK, &proto_lock); \| } By this: \|@@ -169,7 +169,7 @@ void protocol_stop_now(void) \| HA_SPIN_LOCK(PROTO_LOCK, &proto_lock); \| list_for_each_entry(proto, &protocols, list) { \| list_for_each_entry_safe(listener, lback, &proto->receivers, rx.proto_list) \| if (!listener->bind_conf->frontend->grace) \|- stop_listener(listener, 0, 1); \|+ stop_listener(listener, 0, 1, 0); \| } \| HA_SPIN_UNLOCK(PROTO_LOCK, &proto_lock); Replace this: \|@@ -2315,7 +2315,7 @@ void stop_proxy(struct proxy p) \| HA_RWLOCK_WRLOCK(PROXY_LOCK, &p->lock); \| \| list_for_each_entry(l, &p->conf.listeners, by_fe) \|- stop_listener(l, 1, 0); \|+ stop_listener(l, 1, 0, 0); \| \| if (!(p->flags & (PR_FL_DISABLED\|PR_FL_STOPPED)) && !p->li_ready) { \| / might be just a backend / By this: \|@@ -2315,7 +2315,7 @@ void stop_proxy(struct proxy p) \| HA_RWLOCK_WRLOCK(PROXY_LOCK, &p->lock); \| \| list_for_each_entry(l, &p->conf.listeners, by_fe) \|- stop_listener(l, 1, 0); \|+ stop_listener(l, 1, 0, 0); \| \| if (!p->disabled && !p->li_ready) { \| /* might be just a backend */	2023-02-23 15:05:05 +01:00
Willy Tarreau	f0de8cacc4	MEDIUM: listener/config: make the "thread" parser rely on thread_sets Instead of reading and storing a single group and a single mask for a "thread" directive on a bind line, we now store the complete range in a thread set that's stored in the bind_conf. The bind_parse_thread() function now just calls parse_thread_set() to complete the current set, which starts empty, and thread_resolve_group_mask() was updated to support retrieving thread group numbers or absolute thread numbers directly from the pre-filled thread_set, and continue to feed bind_tgroup and bind_thread. The CLI parsers which were pre-initialized to set the bind_tgroup to 1 cannot do it anymore as it would prevent one from restricting the thread set. Instead check_config_validity() now detects the CLI frontend and passes the info down to thread_resolve_group_mask() that will automatically use only the group 1's threads for these listeners. The same is done for the peers listeners for now. At this step it's already possible to start with all previous valid configs as well as extended ones supporting comma-delimited thread sets. In addition the parser already accepts large ranges spanning multiple groups, but since the underlying listeners infrastructure is not read, for now we're maintaining a specific check against this at the higher level of the config validity check. The patch is a bit large because thread resolution is performed in multiple steps, so we need to adjust all of them at once to preserve functional and technical consistency.	2023-02-03 18:00:21 +01:00
Willy Tarreau	1714680cec	MINOR: listener: move LI_O_UNLIMITED and LI_O_NOSTOP to bind_conf These two flags are entirely for internal use and are even per proxy in practice since they're used for peers and CLI to indicate (for the first one) that the listener(s) are not subject to connection limits, and for the second that the listener(s) should not be stopped on soft-stop. No need to keep them in the listeners, let's move them to the bind_conf under names BC_O_UNLIMITED and BC_O_NOSTOP.	2023-02-03 18:00:20 +01:00
Willy Tarreau	f1b4730f7d	MINOR: listener: move the ACC_PROXY and ACC_CIP options to bind_conf These are only set per bind line and used when creating a sessions, we can move them to the bind_conf under the names BC_O_ACC_PROXY and BC_O_ACC_CIP respectively.	2023-02-03 18:00:20 +01:00
Willy Tarreau	7dbd4187dc	MINOR: listener: move the nice field to the bind_conf This is another bind line setting which can move to the bind_conf. Note that it leaves a 2-byte hole in the listener struct.	2023-02-03 18:00:20 +01:00
Willy Tarreau	3083615410	MINOR: listener: move the ->accept callback to the bind_conf The accept callback directly derives from the upper layer, generally it's session_accept_fd(). As such it's also defined per bind line so it makes sense to move it there.	2023-02-03 18:00:20 +01:00
Willy Tarreau	758c69d951	MINOR: listener: move the maxconn parameter to the bind_conf The maxconn is set per bind line so let's move it there. This might possibly even slightly reduce inter-thread contention since this one is read-mostly and it was stored next to nbconn which changes for each connection setup or teardown.	2023-02-03 18:00:20 +01:00
Willy Tarreau	1920f897d8	MINOR: listener: move the backlog setting from listener to bind_conf The backlog setting is also defined by the bind_conf, so let's move it there.	2023-02-03 18:00:20 +01:00
Willy Tarreau	882f2485a1	MINOR: listener: move maxaccept from listener to bind_conf Like for previous values, maxaccept is really per-bind_conf, so let's move it there. Some frontends (peers, log) set it to 1 so the assignment was slightly moved.	2023-02-03 18:00:20 +01:00
Willy Tarreau	7866e8e50d	MEDIUM: listener: move the analysers mask to the bind_conf When bind_conf were created, some elements such as the analysers mask ought to have moved there but that wasn't the case. Now that it's getting clearer that bind_conf provides all binding parameters and the listener is essentially a listener on an address, it's starting to get really confusing to keep such parameters in the listener, so let's move the mask to the bind_conf. We also take this opportunity for pre-setting the mask to the frontend's upon initalization. Now several loops have one less argument to take care of.	2023-02-03 18:00:20 +01:00
Willy Tarreau	b2f38c13d1	BUG/MINOR: thread: always reload threads_enabled in loops A few loops waiting for threads to synchronize such as thread_isolate() rightfully filter the thread masks via the threads_enabled field that contains the list of enabled threads. However, it doesn't use an atomic load on it. Before 2.7, the equivalent variables were marked as volatile and were always reloaded. In 2.7 they're fields in ha_tgroup_ctx[], and the risk that the compiler keeps them in a register inside a loop is not null at all. In practice when ha_thread_relax() calls sched_yield() or an x86 PAUSE instruction, it could be verified that the variable is always reloaded. If these are avoided (e.g. architecture providing neither solution), it's visible in asm code that the variables are not reloaded. In this case, if a thread exists just between the moment the two values are read, the loop could spin forever. This patch adds the required _HA_ATOMIC_LOAD() on the relevant threads_enabled fields. It must be backported to 2.7.	2023-01-19 19:22:17 +01:00
Willy Tarreau	d1ebee1774	BUG/MINOR: listener: close tiny race between resume_listener() and stopping Pierre Cheynier reported a very rare race condition on soft-stop in the listeners. What happens is that if a previously limited listener is being resumed by another thread finishing an accept loop, and at the same time a soft-stop is performed, the soft-stop will turn the listener's state to LI_INIT, and once the listener's lock is released, resume_listener() in the second thread will try to resume this listener which has an fd==-1, yielding a crash in listener_set_state(): FATAL: bug condition "l->rx.fd == -1" matched at src/listener.c:288 The reason is that resume_listener() only checks for LI_READY, but doesn't consider being called with a non-initialized or a stopped listener. Let's also make sure we don't try to ressuscitate such a listener there. This will have to be backported to all versions.	2023-01-19 11:34:21 +01:00
Willy Tarreau	469fa47950	BUILD: listener: fix build warning on global_listener_rwlock without threads The global_listener_rwlock was introduced by recent commit `13e86d947` ("BUG/MEDIUM: listener: Fix race condition when updating the global mngmt task"), but it's declared static and is not used when threads are disabled, thus causing a warning to be emitted in this case. Let's just condition it to thread usage to shut the warning. This will need to be backported where the patch above is backported.	2022-11-22 09:10:08 +01:00
Christopher Faulet	ddfb50eec6	CLEANUP: listener: Remove useless task_queue from manage_global_listener_queue At the end of manage_global_listener_queue(), the task expire date is set to TICK_ETERNITY. Thus, it is useless to call task_queue() just after because the function does nothing in this case.	2022-11-17 15:18:59 +01:00
Christopher Faulet	13e86d947d	BUG/MEDIUM: listener: Fix race condition when updating the global mngmt task It is pretty similar to `fbb934da90` ("BUG/MEDIUM: stick-table: fix a race condition when updating the expiration task"). When the global management task is running, at the end of its process function, it resets the expire date by setting it to TICK_ETERNITY. In same time, a listener may reach a global limit and decides to schedule the task. Thus it is possible to queue the task and trigger the BUG_ON() on the expire date because its value was set to TICK_ETERNITY in the means time: FATAL: bug condition "task->expire == 0" matched at src/task.c:310 call trace(12): \| 0x662de8 [b8 01 00 00 00 c6 00 00]: __task_queue+0xc7/0x11e \| 0x63b03f [48 b8 04 00 00 00 05 00]: main+0x2535e \| 0x63ed1a [e9 d2 fd ff ff 48 8b 45]: listener_accept+0xf72/0xfda \| 0x6a36d3 [eb 01 90 c9 c3 55 48 89]: sock_accept_iocb+0x82/0x87 \| 0x6af22f [48 8b 05 ca f9 13 00 8b]: fd_update_events+0x35a/0x497 \| 0x42a7a8 [89 45 d8 83 7d d8 02 75]: main-0x1eb539 \| 0x6158fb [48 8b 05 e6 06 1c 00 64]: run_poll_loop+0x2e7/0x319 \| 0x615b6c [48 8b 05 ed 65 1d 00 48]: main-0x175 \| 0x7ffff775bded [e9 69 fe ff ff 48 8b 4c]: libc:+0x8cded \| 0x7ffff77e1370 [48 89 c7 b8 3c 00 00 00]: libc:+0x112370 To fix the bug, a RW lock is introduced. It is used to fix the race condition. A read lock is taken when the task is scheduled, in listener_accpet() and a write lock is used at the end of process function to set the expire date to TICK_ETERNITY. This lock should not be used very often and most of time by "readers". So, the impact should be really limited. This patch should fix the issue #1930. It must be backported as far as 1.8 with some cautions because the code has evolved a lot since then.	2022-11-17 15:18:40 +01:00
Aurelien DARRAGON	a57786e87d	BUG/MINOR: listener: null pointer dereference suspected by coverity Please refer to GH #1859 for more info. Coverity suspected improper proxy pointer handling. Without the fix it is considered safe for the moment, but it might not be the case in the future as we want to keep the ability to have isolated listeners. Making sure stop_listener(), pause_listener(), resume_listener() and listener_release() functions make proper use of px pointer in that context. No need for backport except if multi-connection protocols (ie:FTP) were to be backported as well.	2022-09-12 10:12:18 +02:00

1 2 3 4 5 ...

321 Commits