haproxy

mirror of http://git.haproxy.org/git/haproxy.git/ synced 2025-05-02 15:58:03 +00:00

Author	SHA1	Message	Date
Olivier Houchard	c4aac9effe	BUG/MEDIUM: tasks: Make sure there's no task left before considering inactive. We may remove the thread's bit in active_tasks_mask despite tasks for that thread still being present in the global runqueue. To fix that, introduce global_tasks_mask, and set the correspnding bits when we add a task to the runqueue.	2018-07-26 15:40:22 +02:00
Willy Tarreau	189ea856a7	BUG/MEDIUM: tasks: use atomic ops for active_tasks_mask We don't have the lock anymore so we need to protect it.	2018-07-26 15:16:43 +02:00
Olivier Houchard	e85ee7b663	BUG/MEDIUM: tasks: Decrement rqueue_size at the right time. We need to decrement requeue_size when we remove a task form rqueue_local, not when we remove if from the task list, or we'd also decrement it for any tasklet, that was never in the rqueue in the first place.	2018-07-26 15:00:58 +02:00
Willy Tarreau	9a77186cb0	BUG/MEDIUM: tasks: make sure we pick all tasks in the run queue Commit `09eeb76` ("BUG/MEDIUM: tasks: Don't forget to increase/decrease tasks_run_queue.") addressed a count issue in the run queue and uncovered another issue with the way the tasks are dequeued from the global run queue. The number of tasks to pick is computed using an integral divide, which results in up to nbthread-1 tasks never being run. The fix simply consists in getting rid of the divide and checking the task count in the loop. No backport is needed, this is 1.9-specific.	2018-07-26 14:24:46 +02:00
Olivier Houchard	9db0fedb59	BUG/MINOR: tasklets: Just make sure we don't pass a tasklet to the handler. We can't just set t to NULL if it's a tasklet, or we'd have a hard time accessing to t->process, so just make sure we pass NULL as the first parameter of t->process if it's a tasklet. This should be a non-issue at this point, as tasklets aren't used yet.	2018-06-14 18:57:26 +02:00
Olivier Houchard	b1ca58b245	MINOR: tasks: Don't define rqueue if we're building without threads. To make sure we don't inadvertently insert task in the global runqueue, while only the local runqueue is used without threads, make its definition and usage conditional on USE_THREAD.	2018-06-06 16:35:12 +02:00
David Carlier	cc0a957a50	MINOR: task: Fix compiler warning. Waking up task, when checking if it is a valid entry. Similarly to commit `caa8a37ffe`, casting explicitally to void pointer as HA_ATOMIC_CAS needs.	2018-06-05 13:55:57 +02:00
Olivier Houchard	082627af77	MINOR: task: Also consider the task list size when getting global tasks. We're taking tasks from the global runqueue based on the number of tasks the thread already have in its local runqueue, but now that we have a task list, we also have to take that into account.	2018-05-28 15:20:59 +02:00
Olivier Houchard	736ea41c6c	BUG/MEDIUM: task: Don't forget to decrement max_processed after each task. When the task list was introduced, we bogusly lost max_processed--, that means we would execute as much tasks as present in the list, and we would never set active_tasks_mask, so the thread would go to sleep even if more tasks were to be executed. 1.9-dev only, no backport is needed.	2018-05-28 15:20:57 +02:00
Olivier Houchard	1599b80360	MINOR: tasks: Make the number of tasks to run at once configurable. Instead of hardcoding 200, make the number of tasks to be run configurable using tune.runqueue-depth. 200 is still the default.	2018-05-26 20:03:24 +02:00
Olivier Houchard	b0bdae7b88	MAJOR: tasks: Introduce tasklets. Introduce tasklets, lightweight tasks. They have no notion of priority, they are just run as soon as possible, and will probably be used for I/O later. For the moment they're used to replace the temporary thread-local list that was used in the scheduler. The first part of the struct is common with tasks so that tasks can be cast to tasklets and queued in this list. Once a task is in the tasklet list, it has its leaf_p set to 0x1 so that it cannot accidently be confused as not in the queue. Pure tasklets are identifiable by their nice value of -32768 (which is normally not possible).	2018-05-26 20:03:19 +02:00
Olivier Houchard	f6e6dc12cd	MAJOR: tasks: Create a per-thread runqueue. A lot of tasks are run on one thread only, so instead of having them all in the global runqueue, create a per-thread runqueue which doesn't require any locking, and add all tasks belonging to only one thread to the corresponding runqueue. The global runqueue is still used for non-local tasks, and is visited by each thread when checking its own runqueue. The nice parameter is thus used both in the global runqueue and in the local ones. The rare tasks that are bound to multiple threads will have their nice value used twice (once for the global queue, once for the thread-local one).	2018-05-26 19:27:29 +02:00
Olivier Houchard	9f6af33222	MINOR: tasks: Change the task API so that the callback takes 3 arguments. In preparation for thread-specific runqueues, change the task API so that the callback takes 3 arguments, the task itself, the context, and the state, those were retrieved from the task before. This will allow these elements to change atomically in the scheduler while the application uses the copied value, and even to have NULL tasks later.	2018-05-26 19:23:57 +02:00
Olivier Houchard	9b36cb4a41	BUG/MEDIUM: task: Don't free a task that is about to be run. While running a task, we may try to delete and free a task that is about to be run, because it's part of the local tasks list, or because rq_next points to it. So flag any task that is in the local tasks list to be deleted, instead of run, by setting t->process to NULL, and re-make rq_next a global, thread-local variable, that is modified if we attempt to delete that task. Many thanks to PiBa-NL for reporting this and analysing the problem. This should be backported to 1.8.	2018-05-04 20:11:04 +02:00
Willy Tarreau	d80cb4ee13	MINOR: global: add some global activity counters to help debugging A number of counters have been added at special places helping better understanding certain bug reports. These counters are maintained per thread and are shown using "show activity" on the CLI. The "clear counters" commands also reset these counters. The output is sent as a single write(), which currently produces up to about 7 kB of data for 64 threads. If more counters are added, it may be necessary to write into multiple buffers, or to reset the counters. To backport to 1.8 to help collect more detailed bug reports.	2018-01-23 15:38:33 +01:00
Willy Tarreau	a24d1d0be4	MINOR: task: align the rq and wq locks We really don't want them to share the same cache line as they are expected to be used in parallel. Adding a 64-byte alignment here shows a performance increase of about 4.5% on task-intensive workloads with 2 to 4 threads.	2017-11-26 11:10:51 +01:00
Willy Tarreau	6d1222ce73	MINOR: task: keep a pointer to the currently running task Very often when debugging, the current task's pointer isn't easy to recover (eg: from a core file). Let's keep a copy of it, it will likely help, especially with threads.	2017-11-26 11:10:50 +01:00
Willy Tarreau	bafbe01028	CLEANUP: pools: rename all pool functions and pointers to remove this "2" During the migration to the second version of the pools, the new functions and pool pointers were all called "pool_something2()" and "pool2_something". Now there's no more pool v1 code and it's a real pain to still have to deal with this. Let's clean this up now by removing the "2" everywhere, and by renaming the pool heads "pool_head_something".	2017-11-24 17:49:53 +01:00
Willy Tarreau	51753458c4	BUG/MAJOR: threads/task: dequeue expired tasks under the WQ lock There is a small unprotected window for a task between the wait queue and the run queue where a task could be woken up and destroyed at the same time. What typically happens is that a timeout is reached at the same time an I/O completes and wakes it up, and the I/O terminates the task, causing a use after free in wake_expired_tasks() possibly causing a crash and/or memory corruption : thread 1 thread 2 (wake_expired_tasks) (stream_int_notify) HA_SPIN_UNLOCK(TASK_WQ_LOCK, &wq_lock); task_wakeup(task, TASK_WOKEN_IO); ... process_stream() stream_free() task_free() pool_free(task) task_wakeup(task, TASK_WOKEN_TIMER); This case is reasonably easy to reproduce with a config using very short server timeouts (100ms) and client timeouts (10ms), while injecting on httpterm requesting medium sized objects (5kB) over SSL. All this is easier done with more threads than allocated CPUs so that pauses can happen anywhere and last long enough for process_stream() to kill the task. This patch inverts the lock and the wakeup(), but requires some changes in process_runnable_tasks() to ensure we never try to grab the WQ lock while having the RQ lock held. This means we have to release the RQ lock before calling task_queue(), so we can't hold the RQ lock during the loop and must take and drop it. It seems that a different approach with the scope-aware trees could be easier, but it would possibly not cover situations where a task is allowed to run on multiple threads. The current solution covers it and doesn't seem to have any measurable performance impact.	2017-11-23 18:47:04 +01:00
Christopher Faulet	8a48f67526	MAJOR: polling: Use active_tasks_mask instead of tasks_run_queue tasks_run_queue is the run queue size. It is a global variable. So it is underoptimized because we may be lead to consider there are active tasks for a thread while in fact all active tasks are assigned to the other threads. So, in such cases, the polling loop will be evaluated many more times than necessary. Instead, we now check if the thread id is set in the bitfield active_tasks_mask. Another change has been made in process_runnable_tasks. Now, we always limit the number of tasks processed to 200. This is specific to threads, no backport is needed.	2017-11-16 11:19:46 +01:00
Christopher Faulet	3911ee85df	MINOR: tasks: Use a bitfield to track tasks activity per-thread a bitfield has been added to know if there are runnable tasks for a thread. When a task is woken up, the bits corresponding to its thread_mask are set. When all tasks for a thread have been evaluated without any wakeup, the thread is removed from active ones by unsetting its tid_bit from the bitfield.	2017-11-16 11:19:46 +01:00
Christopher Faulet	919b739862	CLEANUP: tasks: Remove useless double test on rq_next No backport is needed, this is purely 1.8-specific.	2017-11-14 18:11:34 +01:00
Christopher Faulet	9dcf9b6f03	MINOR: threads: Use __decl_hathreads to declare locks This macro should be used to declare variables or struct members depending on the USE_THREAD compile option. It avoids the encapsulation of such declarations between #ifdef/#endif. It is used to declare all lock variables.	2017-11-13 11:38:17 +01:00
Willy Tarreau	9e45b33f7e	BUG/MAJOR: threads/tasks: fix the scheduler again My recent change in commit `ce4e0aa` ("MEDIUM: task: change the construction of the loop in process_runnable_tasks()") was bogus as it used to keep the rq_next across an unlock/lock sequence, occasionally leading to crashes for tasks that are eligible to any thread. We must use the lookup call for each new batch instead. The problem is easily triggered with such a configuration : global nbthread 4 listen check mode http bind 0.0.0.0:8080 redirect location / option httpchk GET / server s1 127.0.0.1:8080 check inter 1 server s2 127.0.0.1:8080 check inter 1 Thanks to Olivier for diagnosing this one. No backport is needed.	2017-11-08 14:05:19 +01:00
Christopher Faulet	2a944ee16b	BUILD: threads: Rename SPIN/RWLOCK macros using HA_ prefix This remove any name conflicts, especially on Solaris.	2017-11-07 11:10:24 +01:00
Willy Tarreau	f0c531ab55	MEDIUM: tasks: implement a lockless scheduler for single-thread usage The scheduler is complex and uses local queues to amortize the cost of locks. But all this comes with a cost that is quite observable with single-thread workloads. The purpose of this patch is to reimplement the much simpler scheduler for the case where threads are not used. The code is very small and simple. It doesn't impact the multi-threaded performance at all, and provides a nice 10% performance increase in single-thread by reaching 606kreq/s on the tests that showed 550kreq/s before.	2017-11-06 11:20:11 +01:00
Willy Tarreau	9d4b56b88e	MINOR: tasks: only visit filled task slots after processing them process_runnable_tasks() needs to requeue or wake up tasks after processing them in batches. By only refilling the existing ones, we avoid revisiting all the queue. The performance gain is measurable starting with two threads, where the request rate climbs to 657k/s compared to 644k.	2017-11-06 11:20:11 +01:00
Willy Tarreau	ce4e0aa7f3	MEDIUM: task: change the construction of the loop in process_runnable_tasks() This patch slightly rearranges the loop to pack the locked code a little bit, and to try to concentrate accesses to the tree together to benefit more from the cache. It also fixes how the loop handles the right margin : now that is guaranteed that the retrieved nodes are filtered to only match the current thread, we don't need to rewind every 16 entries. Instead we can rewind each time we reach the right margin again. With this change, we now achieve the following performance for 10 H2 conns each containing 100 streams : 1 thread : 550kreq/s 2 thread : 644kreq/s 3 thread : 598kreq/s	2017-11-06 11:20:11 +01:00
Willy Tarreau	b992ba16ef	MINOR: task: simplify wake_expired_tasks() to avoid unlocking in the loop This function is sensitive, let's make it shorter by factoring out the unlock and leave code. This reduced the function's size by a few tens of bytes and increased the overall performance by about 1%.	2017-11-06 11:20:11 +01:00
Willy Tarreau	8d38805d3d	MAJOR: task: make use of the scope-aware ebtree functions Currently the task scheduler suffers from an O(n) lookup when skipping tasks that are not for the current thread. The reason is that eb32_lookup_ge() has no information about the current thread so it always revisits many tasks for other threads before finding its own tasks. This is particularly visible with HTTP/2 since the number of concurrent streams created at once causes long series of tasks for the same stream in the scheduler. With only 10 connections and 100 streams each, by running on two threads, the performance drops from 640kreq/s to 11.2kreq/s! Lookup metrics show that for only 200000 task lookups, 430 million skips had to be performed, which means that on average, each lookup leads to 2150 nodes to be visited. This commit backports the principle of scope lookups for ebtrees from the ebtree_v7 development tree. The idea is that each node contains a mask indicating the union of the scopes for the nodes below it, which is fed during insertion, and used during lookups. Then during lookups, branches that do not contain any leaf matching the requested scope are simply ignored. This perfectly matches a thread mask, allowing a thread to only extract the tasks it cares about from the run queue, and to always find them in O(log(n)) instead of O(n). Thus the scheduler uses tid_bit and task->thread_mask as the ebtree scope here. Doing this has recovered most of the performance, as can be seen on the test below with two threads, 10 connections, 100 streams each, and 1 million requests total : Before After Gain test duration : 89.6s 4.73s x19 HTTP requests/s (DEBUG) : 11200 211300 x19 HTTP requests/s (PROD) : 15900 447000 x28 spin_lock time : 85.2s 0.46s /185 time per lookup : 13us 40ns /325 Even when going to 6 threads (on 3 hyperthreaded CPU cores), the performance stays around 284000 req/s, showing that the contention is much lower. A test showed that there's no benefit in using this for the wait queue though.	2017-11-06 11:20:11 +01:00
Willy Tarreau	f65610a83d	CLEANUP: threads: rename process_mask to thread_mask It was a leftover from the last cleaning session; this mask applies to threads and calling it process_mask is a bit confusing. It's the same in fd, task and applets.	2017-10-31 16:06:06 +01:00
Willy Tarreau	5f4a47b701	CLEANUP: threads: replace the last few 1UL<<tid with tid_bit There were a few occurences left, better replace them now.	2017-10-31 15:59:32 +01:00
Emeric Brun	c60def8368	MAJOR: threads/task: handle multithread on task scheduler 2 global locks have been added to protect, respectively, the run queue and the wait queue. And a process mask has been added on each task. Like for FDs, this mask is used to know which threads are allowed to process a task. For many tasks, all threads are granted. And this must be your first intension when you create a new task, else you have a good reason to make a task sticky on some threads. This is then the responsibility to the process callback to lock what have to be locked in the task context. Nevertheless, all tasks linked to a session must be sticky on the thread creating the session. It is important that I/O handlers processing session FDs and these tasks run on the same thread to avoid conflicts.	2017-10-31 13:58:30 +01:00
Thierry FOURNIER	d697596c6c	MINOR: tasks: Move Lua notification from Lua to tasks These notification management function and structs are generic and it will be better to move in common parts. The notification management functions and structs have names containing some "lua" references because it was written for the Lua. This patch removes also these references.	2017-09-11 18:59:40 +02:00
Emeric Brun	0194897e54	MAJOR: task: task scheduler rework. In order to authorize call of task_wakeup on running task: - from within the task handler itself. - in futur, from another thread. The lookups on runqueue and waitqueue are re-worked to prepare multithread stuff. If task_wakeup is called on a running task, the woken message flags are savec in the 'pending_state' attribute of the state. The real wakeup is postponed at the end of the handler process and the woken messages are copied from pending_state to the state attribute of the task. It's important to note that this change will cause a very minor (though measurable) performance loss but it is necessary to make forward progress on a multi-threaded scheduler. Most users won't ever notice.	2017-06-27 14:38:02 +02:00
Christopher Faulet	34c5cc98da	MINOR: task: Rename run_queue and run_queue_cur counters <run_queue> is used to track the number of task in the run queue and <run_queue_cur> is a copy used for the reporting purpose. These counters has been renamed, respectively, <tasks_run_queue> and <tasks_run_queue_cur>. So the naming is consistent between tasks and applets. [wt: needed for next fixes, backport to 1.7 and 1.6]	2016-12-12 19:10:54 +01:00
Willy Tarreau	87b09668be	REORG/MAJOR: session: rename the "session" entity to "stream" With HTTP/2, we'll have to support multiplexed streams. A stream is in fact the largest part of what we currently call a session, it has buffers, logs, etc. In order to catch any error, this commit removes any reference to the struct session and tries to rename most "session" occurrences in function names to "stream" and "sess" to "strm" when that's related to a session. The files stream.{c,h} were added and session.{c,h} removed. The session will be reintroduced later and a few parts of the stream will progressively be moved overthere. It will more or less contain only what we need in an embryonic session. Sample fetch functions and converters will have to change a bit so that they'll use an L5 (session) instead of what's currently called "L4" which is in fact L6 for now. Once all changes are completed, we should see approximately this : L7 - http_txn L6 - stream L5 - session L4 - connection \| applet There will be at most one http_txn per stream, and a same session will possibly be referenced by multiple streams. A connection will point to a session and to a stream. The session will hold all the information we need to keep even when we don't yet have a stream. Some more cleanup is needed because some code was already far from being clean. The server queue management still refers to sessions at many places while comments talk about connections. This will have to be cleaned up once we have a server-side connection pool manager. Stream flags "SN_*" still need to be renamed, it doesn't seem like any of them will need to move to the session.	2015-04-06 11:23:56 +02:00
Willy Tarreau	c46c965540	BUG/MEDIUM: task: fix recently introduced scheduler skew Commit `501260b` ("MEDIUM: task: always ensure that the run queue is consistent") introduced a skew in the scheduler : if a negatively niced task is woken up, it can be inserted prior to the current index and will be skipped as long as there is some activity with less prioritary tasks. The immediate effect is that it's not possible to get access to the stats under full load until the load goes down. This is because the rq_next constantly evolves within more recent positions. The fix is simple, __task_wakeup() must empty rq_next. The sad thing is that this issue was fixed during development and missed during the commit. No backport is needed, this is purely 1.6 stuff.	2015-03-05 11:49:17 +01:00
Thierry FOURNIER	9cf7c4b9df	MAJOR: poll: only rely on wake_expired_tasks() to compute the wait delay Actually, HAProxy uses the function "process_runnable_tasks" and "wake_expired_tasks" to get the next task which can expires. If a task is added with "task_schedule" or other method during the execution of an other task, the expiration of this new task is not taken into account, and the execution of this task can be too late. Actualy, HAProxy seems to be no sensitive to this bug. This fix moves the call to process_runnable_tasks() before the timeout calculation and ensures that all wakeups are processed together. Only wake_expired_tasks() needs to return a timeout now.	2015-02-28 23:12:30 +01:00
Willy Tarreau	501260bf67	MEDIUM: task: always ensure that the run queue is consistent As found by Thierry Fournier, if a task manages to kill another one and if this other task is the next one in the run queue, we can do whatever including crashing, because the scheduler restarts from the saved next task. For now, there is no such concept of a task killing another one, but with Lua it will come. A solution consists in always performing the lookup of the first task in the scheduler's loop, but it's expensive and costs around 2% of the performance. Another solution consists in keeping a global next run queue node and ensuring that when this task gets removed, it updates this pointer to the next one. This allows to simplify the code a bit and in the end to slightly increase the performance (0.3-0.5%). The mechanism might still be usable if we later migrate to a multi-threaded scheduler.	2015-02-23 16:07:01 +01:00
Willy Tarreau	98c6121ee5	[OPTIM] task: don't scan the run queue if we know it's empty It happens quite often in fact, so let's save those precious cycles.	2011-09-10 20:08:49 +02:00
Willy Tarreau	45cb4fb640	[MEDIUM] build: switch ebtree users to use new ebtree version All files referencing the previous ebtree code were changed to point to the new one in the ebtree directory. A makefile variable (EBTREE_DIR) is also available to use files from another directory. The ability to build the libebtree library temporarily remains disabled because it can have an impact on some existing toolchains and does not appear worth it in the medium term if we add support for multi-criteria stickiness for instance.	2009-10-26 21:10:04 +01:00
SaVaGe	1d7a420c84	[BUG] task.c: don't assing last_timer to node-less entries I noticed that in __eb32_insert , if the tree is empty (root->b[EB_LEFT] == NULL) , the node.bit is not defined. However in __task_queue there are checks: - if (last_timer->node.bit < 0) - if (task->wq.node.bit < last_timer->node.bit) which might rely upon an undefined value. This is how I see it: 1. We insert eb32_node in an empty wait queue tree for a task (called by process_runnable_tasks() ): Inserting into empty wait queue &task->wq = 0x72a87c8, last_timer pointer: (nil) 2. Then, we set the last timer to the same address: Setting last_timer: (nil) to: 0x72a87c8 3. We get a new task to be inserted in the queue (again called by process_runnable_tasks()) , before the __task_unlink_wq() is called for the previous task. 4. At this point, we still have last_timer set to 0x72a87c8 , but since it was inserted in an empty tree, it doesn't have node.bit and the values above get dereferenced with undefined value. The bug has no effect right now because the check for equality is still made, so the next timer will still be queued at the right place anyway, without any possible side-effect. But it's a pending bug waiting for a small change somewhere to strike. Iliya Polihronov	2009-10-10 15:15:07 +02:00
Willy Tarreau	34e98ea70d	[BUG] task: fix possible crash when some timeouts are not configured Cristian Ditoiu reported a major regression when testing 1.3.19 at transfer.ro. It would crash within a few minutes while 1.3.15.10 was OK. He offered to help so we could run gdb and debug the crash live. We finally found that the crash was the result of a regression introduced by recent fix `814c978fb6` (task: fix possible timer drift after update) which makes it possible for a tree walk to start from a detached task if this task has got its timeout disabled due to a missing timeout. The trivial fix below has been extensively tested and confirmed not to crash anymore. Special thanks to Cristian who spontaneously provided a lot of help and trust to debug this issue which at first glance looked impossible after reading the code and traces, but took less than an hour to spot and fix when caught live in gdb ! That's really appreciated !	2009-08-09 09:09:54 +02:00
Willy Tarreau	814c978fb6	[BUG] task: fix possible timer drift after update When the scheduler detected that a task was misplaced in the timer queue, it used to place it right again. Unfortunately, it did not check whether it would still call the new task from its new place. This resulted in some tasks not getting called on timeout once in a while, causing a minor drift for repetitive timers. This effect was only observable with slow health checks and without any activity because no other task would cause the scheduler to be immediately called again. In practice, it does not affect any real-world configuration, but it's still better to fix it.	2009-07-14 23:48:55 +02:00
Willy Tarreau	3884cbaae6	[MINOR] show sess: report number of calls to each task For debugging purposes, it can be useful to know how many times each task has been called.	2009-03-28 17:54:35 +01:00
Willy Tarreau	c7bdf09f9f	[MINOR] stats: report number of tasks (active and running) It may be useful for statistics purposes to report the number of tasks.	2009-03-21 18:33:52 +01:00
Willy Tarreau	a461318f97	[MINOR] task: keep a task count and clean up task creators It's sometimes useful at least for statistics to keep a task count. It's easy to do by forcing the rare task creators to always use the same functions to create/destroy a task.	2009-03-21 18:13:21 +01:00
Willy Tarreau	135a113e36	[MINOR] sched: permit a task to stay up between calls If a task wants to stay in the run queue, it is possible. It just needs to wake itself up. We just want to ensure that a reniced task will be processed at the right instant.	2009-03-21 13:26:05 +01:00
Willy Tarreau	26ca34e66e	[BUG] scheduler: fix improper handling of duplicates __task_queue() The top of a duplicate tree is not where bit == -1 but at the most negative bit. This was causing tasks to be queued in reverse order within duplicates. While this is not dramatic, it's incorrect and might lead to longer than expected duplicate depths under some circumstances.	2009-03-21 12:57:06 +01:00

1 2

87 Commits