haproxy

mirror of http://git.haproxy.org/git/haproxy.git/ synced 2025-04-23 15:35:40 +00:00

Author	SHA1	Message	Date
Willy Tarreau	64e6012eb9	MINOR: task: introduce work lists Sometimes we need to delegate some list processing to a function running on another thread. In this case the list element will simply be queued into a dedicated self-locked list and the task responsible for this list will be woken up, calling the associated function which will run over the list. This is what work_list does. Such lists will be dedicated to a limited type of work but will significantly ease such remote handling. A function is provided to create these per-thread lists, their tasks and to properly bind each task to a distinct thread, so that the caller only has to store the resulting pointer to the start of the structure. These structures should not be abused though as each head will consume 4 pointers per thread, hence 32 bytes per thread or 2 kB for 64 threads.	2019-07-12 09:07:48 +02:00
Willy Tarreau	bd20a9dd4e	BUG: tasks: fix bug introduced by latest scheduler cleanup In commit `86eded6c6` ("CLEANUP: tasks: rename task_remove_from_tasklet_list() to tasklet_remove_*") which consisted in removing the casts between tasks and tasklet, I was a bit too fast to believe that we only saw tasklets in this function since process_runnable_tasks() also uses it with tasks under a cast. So removing the bookkeeping on task_list_size was not appropriate. Bah, the joy of casts which hide the real thing... This patch does two things at once to address this mess once for all: - it restores the decrement of task_list_size when it's a real task, but moves it to process_runnable_task() since it's the only place where it's allowed to call it with a task - it moves the increment there as well and renames task_insert_into_tasklet_list() to tasklet_insert_into_tasklet_list() of obvious consistency reasons. This way the increment/decrement of task_list_size is made at the only places where the cast is enforced, so it has less risks to be missed. The comments on top of these functions were updated to reflect that they are only supposed to be used with tasklets and that the caller is responsible for keeping task_list_size up to date if it decides to enforce a task there. Now we don't have to worry anymore about how these functions work outside of the scheduler, which is better longterm-wise. Thanks to Christopher for spotting this mistake. No backport is needed.	2019-06-14 18:16:19 +02:00
Willy Tarreau	86eded6c69	CLEANUP: tasks: rename task_remove_from_tasklet_list() to tasklet_remove_* The function really only operates on tasklets, its arguments are always tasklets cast as tasks to match the function's type, to be cast back to a struct tasklet. Let's rename it to tasklet_remove_from_tasklet_list(), take a struct tasklet, and get rid of the undesired task casts.	2019-06-14 14:57:03 +02:00
Willy Tarreau	3cec0f94f3	BUG/MINOR: task: prevent schedulable tasks from starving under high I/O activity With both I/O and tasks in the same tasklet list, we now have a very smooth and responsive scheduler, providing a good fairness between I/O activities. With the lower layers relying on tasklet a lot (I/O wakeup, subscribe, etc), there may often be a large number of totally autonomous tasklets doing their business such as forwarding data between two muxes. But the task scheduler historically refrained from picking tasks from the priority-ordered run queue to put them into the tasklet list until this later had less than max_runqueue_depth entries. This was to make sure that low-latency, high-priority tasks would have an opportunity to be dequeued before others even if they arrive late. But the counter used for this is still the tasklet list size, which contains countless I/O events. This causes an unfairness between unbounded I/Os and bounded tasks, resulting for example in the CLI responding slower when forwarding 40 Gbps of HTTP traffic spread over a thousand of connections. A good solution consists in sticking to the initial intent of max_runqueue_depth which is to limit the number of tasks in the list (to maintain fairness between them) and not to limit the number of these tasks among tasklets. It just turns out that the task_list_size initially was this task counter and changed over time to be a tasklet list size. Let's simply refrain from updating it for pure tasklets so that it takes back its original role of counting real tasks as its name implies. With this change the CLI becomes instantly responsive under load again. This patch may possibly be backported to 1.9 though it requires some careful checks.	2019-06-14 09:16:51 +02:00
Olivier Houchard	cfbb3e6560	MEDIUM: tasks: Get rid of active_tasks_mask. Remove the active_tasks_mask variable, we can deduce if we've work to do by other means, and it is costly to maintain. Instead, introduce a new function, thread_has_tasks(), that returns non-zero if there's tasks scheduled for the thread, zero otherwise.	2019-05-29 21:53:37 +02:00
Willy Tarreau	ef28dc11e3	MINOR: task: turn the WQ lock to an RW_LOCK For now it's exclusively used as a write lock though, thus it remains 100% equivalent to the spinlock it replaces.	2019-05-28 19:15:44 +02:00
Willy Tarreau	29bf96d73d	MINOR: task: always reset curr_task when freeing a task or tasklet With the thread debugger it becomes visible that we can leave some wandering pointers for a while in curr_task, which is inappropriate. This patch addresses this by resetting curr_task to NULL before really freeing the area. This way it becomes safe even regarding signals.	2019-05-17 17:16:20 +02:00
Willy Tarreau	aa1e1be88f	MINOR: task: export global_task_mask It will be used in debugging functions and must be exported.	2019-05-16 18:02:03 +02:00
Willy Tarreau	f656279347	CLEANUP: task: remove unneeded tests before task_destroy() Since previous commit it's not needed anymore to test a task pointer before calling task_destory() so let's just remove these tests from the various callers before they become confusing. The function's arguments were also documented. The same should probably be done with tasklet_free() which involves a test in roughly half of the call places.	2019-05-07 19:08:16 +02:00
Dragan Dosen	75bc6d3e59	BUG/MEDIUM: tasks: fix possible segfault on task_destroy() Commit `3f795f7` ("MEDIUM: tasks: Merge task_delete() and task_free() into task_destroy().") replaced task_delete() and task_free() with a single function named task_destroy(). This patch adds a check for struct task* argument in function task_destroy() to prevent a possible segfault on NULL and also to make the function safer for use in other cases.	2019-05-07 18:58:52 +02:00
Willy Tarreau	9634e86dc7	CLEANUP: task: move the task_per_thread definition to task.h It's the second time I look for it and can't find it because it's not in the right file.	2019-04-30 14:36:47 +02:00
Olivier Houchard	3f795f76e8	MEDIUM: tasks: Merge task_delete() and task_free() into task_destroy(). task_delete() was never used without calling task_free() just after, and task_free() was only used on error pathes to destroy a just-created task, so merge them into task_destroy(), that will remove the task from the wait queue, and make sure the task is either destroyed immediately if it's not in the run queue, or destroyed when it's supposed to run.	2019-04-18 10:10:04 +02:00
Willy Tarreau	8c12e2f785	MINOR: task/thread: factor out a wake-up condition The wakeup condition in task_wakeup() is redundant as it is already validated by the CAS. Better move the __task_wakeup() call there, it also has the merit of being easier to audit this way. This also reduces the code size by around 1.8 kB : $ size haproxy-? text data bss dec hex filename 2153806 100208 1307676 3561690 3658da haproxy-1 2152094 100208 1307676 3559978 36522a haproxy-2	2019-04-17 22:15:58 +02:00
Willy Tarreau	a70bfaaf8b	BUG/MAJOR: task: make sure never to delete a queued task Commit `0c7a4b6` ("MINOR: tasks: Don't set the TASK_RUNNING flag when adding in the tasklet list.") revealed a hole in the way tasks may be freed : they could be removed while in the run queue when the TASK_QUEUED flag was present but not the TASK_RUNNING one. But it seems the issue was emphasized by commit `cde7902` ("MEDIUM: tasks: improve fairness between the local and global queues") though the code it replaces was already affected given how late the TASK_RUNNING flag was set after removal from the global queue. At the moment the task is picked from the global run queue, if it is the last one, the global run queue lock is dropped, and then the TASK_RUNNING flag was added. In the mean time another thread might have performed a task_free(), and immediately after, the TASK_RUNNING flag was re-added to the task, which was then added to the tasklet list. The unprotected window was extremely faint but does definitely exist and inconsistent task lists have been observed a few times during very intensive tests over the last few days. From this point various options are possible, the task might have been re-allocated while running, and assigned state 0 and/or state QUEUED while it was still running, resulting in the tast not being put back into the tree. This commit simply makes sure that tests on TASK_RUNNING before removing the task also cover TASK_QUEUED. It must be backported to 1.9 along with the previous ones touching that area.	2019-04-17 22:15:58 +02:00
Olivier Houchard	4a1be0c6d6	MEDIUM: tasks: No longer use rq.node.leaf_p as a lock. Now that we have the warranty that a task won't be added in the runqueue while the TASK_QUEUED or the TASK_RUNNING flag is set, don't bother trying to lock the task by setting leaf_p to 0x1 while inserting it in the runqueue or having it in the tasklet_list, as nobody else will attempt to add it.	2019-04-17 19:28:01 +02:00
Olivier Houchard	5c964f7b42	MINOR: tasks: Don't consider we can wake task with tasklet_wakeup(). In tasklet_wakeup(), don't bother checking if the tasklet is really a task, calling tasklet_wakeup() with a task is invalid.	2019-04-17 19:28:01 +02:00
Willy Tarreau	b038007ae8	BUG/MEDIUM: tasks: Make sure we set TASK_QUEUED before adding a task to the rq. Make sure we set TASK_QUEUED in every case before adding the task to the run queue. task_wakeup() now checks if either TASK_QUEUED or TASK_RUNNING is set, and if neither is set, add TASK_QUEUED and effectively add the task to the runqueue. No longer use __task_wakeup() anywhere except in task_wakeup(), always use task_wakeup() instead. With the old code, process_runnable_task() may re-add a task in the runqueue without setting the TASK_QUEUED flag, and there were race conditions that could lead to a task having the TASK_QUEUED flag but not in the runqueue, thus being unschedulable. This should be backported to 1.9.	2019-04-17 19:28:01 +02:00
Willy Tarreau	24f382f555	CLEANUP: task: do not export rq_next anymore This one hasn't been used anymore since the scheduler changes after 1.8 but it kept being exported and maintained up to date while it's always reset when scanning the trees. Let's stop exporting it and updating it.	2019-04-15 09:50:56 +02:00
Willy Tarreau	a33d39a1b1	CLEANUP: task: only perform a LIST_DEL() when the list is not empty In tasklet_free() we unconditionally perform a LIST_DEL() even when the list is empty, let's move the LIST_DEL() inside the matching block.	2019-03-25 18:10:53 +01:00
Willy Tarreau	e73256fd2a	BUG/MEDIUM: task/h2: add an idempotent task removal fucntion Previous commit `3ea351368` ("BUG/MEDIUM: h2: Remove the tasklet from the task list if unsubscribing.") uncovered an issue which needs to be addressed in the scheduler's API. The function task_remove_from_task_list() was initially designed to remove a task from the running tasklet list from within the scheduler, and had to be used in h2 to abort pending I/O events. However this function was not designed to be idempotent, occasionally causing a double removal from the tasklet list, with the second doing nothing but affecting the apparent tasks count and making haproxy use 100% CPU on some tests consisting in stopping the client during some transfers. The h2_unsubscribe() function can sometimes be called upon stream exit after an error where the tasklet was possibly already removed, so it. This patch does 2 things : - it renames task_remove_from_task_list() to __task_remove_from_tasklet_list() to discourage users from calling it. Also note the fix in the naming since it's a tasklet list and not a task list. This function is still uesd from the scheduler. - it adds a new, idempotent, task_remove_from_tasklet_list() function which does nothing if the task is already not in the tasklet list. This patch will need to be backported where the commit above is backported.	2019-03-25 18:02:54 +01:00
Olivier Houchard	1d7f37a2cb	BUG/MAJOR: tasks: Use the TASK_GLOBAL flag to know if we're in the global rq. In task_unlink_rq, to decide if we should logk the global runqueue lock, use the TASK_GLOBAL flag instead of relying on t->thread_mask being tid_bit, as it could be so while still being in the global runqueue if another thread woke that task for us. This should be backported to 1.9.	2019-03-14 16:19:11 +01:00
Olivier Houchard	4c28328572	MEDIUM: task: Use the new _HA_ATOMIC_* macros. Use the new _HA_ATOMIC_* macros and add barriers where needed.	2019-03-11 17:02:37 +01:00
Willy Tarreau	1e56c70cc9	OPTIM: task: limit the impact of memory barriers in taks_remove_from_task_list() In this function we end up with successive locked operations then a store barrier, and in addition the compiler has to emit less efficient code due to a longer jump. There's no need for absolutely updating the tasks_run_queue counter before clearing the task's leaf pointer, so let's swap the two operations and benefit from a single barrier as much as possible. This code is on the hot path and shows about half a percent of improvement with 8 threads.	2019-03-07 18:44:12 +01:00
Willy Tarreau	b238b12e98	MINOR: task: use LIST_DEL_INIT() to remove a task from the queue By using LIST_DEL_INIT() instead of LIST_DEL()+LIST_INIT() we manage to bump the peak connection rate by no less than 3% on 8 threads. The perf top profile shows much less contention in this area which suffered from the second reload.	2019-03-07 11:45:44 +01:00
Richard Russo	bc9d9844d5	BUG/MAJOR: fd/threads, task/threads: ensure all spin locks are unlocked Calculate if the fd or task should be locked once, before locking, and reuse the calculation when determing when to unlock. Fixes a race condition added in `87d54a9a` for fds, and `b20aa9ee` for tasks, released in 1.9-dev4. When one thread modifies thread_mask to be a single thread for a task or fd while a second thread has locked or is waiting on a lock for that task or fd, the second thread will not unlock it. For FDs, this is observable when a listener is polled by multiple threads, and is closed while those threads have events pending. For tasks, this seems possible, where task_set_affinity is called, but I did not observe it. This must be backported to 1.9.	2019-02-25 16:16:36 +01:00
Willy Tarreau	13afcb7ab3	BUG/MINOR: task: fix possibly missed event in inter-thread wakeups There's a very small but existing uncertainty window when waking another thread up where it is possible for task_wakeup() not to wake the other task up because it's still running while this once is in the process of finishing and loses its TASK_RUNNING flag. In this case the wakeup will be missed. The problem is that we have a single flag to store 3 states, since the transition from running to sleeping isn't atomic. Thus we need to have another flag to cover this part. This patch introduces TASK_QUEUED to mention that the task is already in the run queue, running or not. This bit will be removed while TASK_RUNNING is kept once dequeued, and will be used when removing TASK_RUNNING to check if the task has been requeued. It might be possible to slightly improve this but the occurrence rate is quite low and we don't really need to complexify the scheduler to optimize for a rare case. The impact with the current code is very low since we have few inter- thread wakeups. Most of them are caused by checks killing sessions. This must be backported to 1.9.	2019-01-28 15:03:04 +01:00
Olivier Houchard	09e498f1a1	BUG/MEDIUM: tasks: Decrement tasks_run_queue in tasklet_free(). If the tasklet is in the list, don't forget to decrement tasks_run_queue in tasklet_free(). This should be backported to 1.9.	2018-12-24 14:04:55 +01:00
William Lallemand	27f3fa56f5	BUG/MEDIUM: mworker: stop every tasks in the master The master is not supposed to run (at the moment) any task before the polling loop, the created tasks should be run only in the workers but in the master they should be disabled or removed. No backport needed.	2018-12-06 14:12:58 +01:00
Willy Tarreau	b6b3df3ed3	MEDIUM: initcall: use initcalls for a few initialization functions signal_init(), init_log(), init_stream(), and init_task() all used to only preset some values and lists. This needs to be done very early to provide a reliable interface to all other users. The calls used to be explicit in haproxy.c:init(). Now they're placed in initcalls at the STG_PREPARE stage. The functions are not exported anymore.	2018-11-26 19:50:32 +01:00
Willy Tarreau	9efd7456e0	MEDIUM: tasks: collect per-task CPU time and latency Right now we measure for each task the cumulated time spent waiting for the CPU and using it. The timestamp uses a 64-bit integer to report a nanosecond-level date. This is only enabled when "profiling.tasks" is enabled, and consumes less than 1% extra CPU on x86_64 when enabled. The cumulated processing time and wait time are reported in "show sess". The task's counters are also reset when an HTTP transaction is reset since the HTTP part pretends to restart on a fresh new stream. This will make sure we always report correct numbers for each request in the logs.	2018-11-22 15:44:21 +01:00
Willy Tarreau	8d8747abe0	OPTIM: tasks: group all tree roots per cache line Currently we have per-thread arrays of trees and counts, but these ones unfortunately share cache lines and are accessed very often. This patch moves the task-specific stuff into a structure taking a multiple of a cache line, and has one such per thread. Just doing this has reduced the cache miss ratio from 19.2% to 18.7% and increased the 12-thread test performance by 3%. It starts to become visible that we really need a process-wide per-thread storage area that would cover more than just these parts of the tasks. The code was arranged so that it's easy to move the pieces elsewhere if needed.	2018-10-15 19:06:13 +02:00
Willy Tarreau	b20aa9eef3	MAJOR: tasks: create per-thread wait queues Now we still have a main contention point with the timers in the main wait queue, but the vast majority of the tasks are pinned to a single thread. This patch creates a per-thread wait queue and queues a task to the local wait queue without any locking if the task is bound to a single thread (the current one) otherwise to the shared queue using locking. This significantly reduces contention on the wait queue. A test with 12 threads showed 11 ms spent in the WQ lock compared to 4.7 seconds in the same test without this change. The cache miss ratio decreased from 19.7% to 19.2% on the 12-thread test, and its performance increased by 1.5%. Another indirect benefit is that the average queue size is divided by the number of threads, which roughly removes log(nbthreads) levels in the tree and further speeds up lookups.	2018-10-15 19:04:40 +02:00
Olivier Houchard	931624a00b	BUG/MEDIUM: tasks: Don't forget to decrement task_list_size in tasklet_free(). In tasklet_free(), if we're currently in the runnable task list, don't forget to decrement taks_list_size, or it'll end up being to big, and we may not process tasks in the global runqueue.	2018-09-12 17:37:55 +02:00
Olivier Houchard	abedf5f6c3	BUG/MEDIUM: tasklets: Add the thread as active when waking a tasklet. Set the flag for the current thread in active_threads_mask when waking a tasklet, or we will never run it if no tasks are available. This is 1.9-specific, no backport is needed.	2018-08-21 18:06:33 +02:00
Olivier Houchard	5d18718c8f	MINOR: tasks: Allow tasklet_wakeup() to wakeup a task. Modify tasklet_wakeup() so that it handles a task as well, and inserts it directly into the tasklet list, making it effectively a tasklet. This should make future developments easier.	2018-08-16 17:29:53 +02:00
Olivier Houchard	9b03c0c9a7	MINOR: tasks: Make active_tasks_mask volatile. To be sure we have the relevant informations, make active_tasks_mask volatile	2018-07-26 19:09:50 +02:00
Olivier Houchard	77551ee8a7	BUG/MEDIUM: tasks: make __task_unlink_rq responsible for the rqueue size. As __task_wakeup() is responsible for increasing rqueue_local[tid]/global_rqueue_size, make __task_unlink_rq responsible for decreasing it, as process_runnable_tasks() isn't the only one that removes tasks from runqueues.	2018-07-26 16:33:29 +02:00
Olivier Houchard	76e45181b2	MINOR: tasks: Add a flag that tells if we're in the global runqueue. How that we have bits available in task->state, add a flag that tells if we're in the global runqueue or not.	2018-07-26 16:33:10 +02:00
Olivier Houchard	e17c2d3e57	MINOR: tasklets: Don't attempt to add a tasklet in the list twice. Don't try to add a tasklet to the run queue if it's already in there, or we might get an infinite loop.	2018-07-19 16:23:43 +02:00
Olivier Houchard	9ddaf794a8	MINOR: tasklet: Set process to NULL. Some consumers expect the process to be NULL when a tasklet it created, so do so.	2018-07-19 16:23:08 +02:00
Olivier Houchard	dcd6f3a597	MINOR: tasks: Make sure we correctly init and deinit a tasklet. Up until now, a tasklet couldn't be free'd while it was in the list, it is no longer the case, so make sure we remove it from the list before freeing it. To do so, we have to make sure we correctly initialize it, so use LIST_INIT, instead of setting the pointers to NULL.	2018-06-14 18:57:13 +02:00
Olivier Houchard	b1ca58b245	MINOR: tasks: Don't define rqueue if we're building without threads. To make sure we don't inadvertently insert task in the global runqueue, while only the local runqueue is used without threads, make its definition and usage conditional on USE_THREAD.	2018-06-06 16:35:12 +02:00
Olivier Houchard	e13ab8b3c6	BUG/MEDIUM: tasks: Use the local runqueue when building without threads. When building without threads enabled, instead of just using the global runqueue, just use the local runqueue associated with the only thread, as that's what is now expected for a single thread in prcoess_runnable_tasks(). This should fix haproxy when built without threads.	2018-06-06 16:34:52 +02:00
David Carlier	caa8a37ffe	MINOR: task: Fix a compiler warning by adding a cast. When calling HA_ATOMIC_CAS with a pointer as the target, the compiler expects a pointer as the new value, so give it one by casting 0x1 to (void *).	2018-06-04 17:43:12 +02:00
Thierry FOURNIER	9d5422a4b7	MINOR: task/notification: Is notifications registered ? This function returns true is some notifications are registered. This function is usefull for the following patch BUG/MEDIUM: lua/socket: Sheduling error on write: may dead-lock It should be backported in 1.6, 1.7 and 1.8	2018-05-31 10:58:41 +02:00
Olivier Houchard	09eeb7684d	BUG/MEDIUM: tasks: Don't forget to increase/decrease tasks_run_queue. Don't forget to increase tasks_run_queue when we're adding a task to the tasklet list, and to decrease it when we remove a task from a runqueue, or its value won't be accurate, and could lead to tasks not being executed when put in the global run queue. 1.9-dev only, no backport is needed.	2018-05-28 15:20:55 +02:00
Olivier Houchard	b0bdae7b88	MAJOR: tasks: Introduce tasklets. Introduce tasklets, lightweight tasks. They have no notion of priority, they are just run as soon as possible, and will probably be used for I/O later. For the moment they're used to replace the temporary thread-local list that was used in the scheduler. The first part of the struct is common with tasks so that tasks can be cast to tasklets and queued in this list. Once a task is in the tasklet list, it has its leaf_p set to 0x1 so that it cannot accidently be confused as not in the queue. Pure tasklets are identifiable by their nice value of -32768 (which is normally not possible).	2018-05-26 20:03:19 +02:00
Olivier Houchard	f6e6dc12cd	MAJOR: tasks: Create a per-thread runqueue. A lot of tasks are run on one thread only, so instead of having them all in the global runqueue, create a per-thread runqueue which doesn't require any locking, and add all tasks belonging to only one thread to the corresponding runqueue. The global runqueue is still used for non-local tasks, and is visited by each thread when checking its own runqueue. The nice parameter is thus used both in the global runqueue and in the local ones. The rare tasks that are bound to multiple threads will have their nice value used twice (once for the global queue, once for the thread-local one).	2018-05-26 19:27:29 +02:00
Olivier Houchard	9b36cb4a41	BUG/MEDIUM: task: Don't free a task that is about to be run. While running a task, we may try to delete and free a task that is about to be run, because it's part of the local tasks list, or because rq_next points to it. So flag any task that is in the local tasks list to be deleted, instead of run, by setting t->process to NULL, and re-make rq_next a global, thread-local variable, that is modified if we attempt to delete that task. Many thanks to PiBa-NL for reporting this and analysing the problem. This should be backported to 1.8.	2018-05-04 20:11:04 +02:00
Thierry FOURNIER	cb14688496	BUG/MEDIUM: lua/notification: memory leak The thread patches adds refcount for notifications. The notifications are used with the Lua cosocket. These refcount free the notifications when the session is cleared. In the Lua task case, it not have sessions, so the nofications are never cleraed. This patch adds a garbage collector for signals. The garbage collector just clean the notifications for which the end point is disconnected. This patch should be backported in 1.8	2017-12-10 19:38:58 +01:00

1 2

94 Commits