haproxy

mirror of http://git.haproxy.org/git/haproxy.git/ synced 2024-12-21 20:00:17 +00:00

Author	SHA1	Message	Date
Willy Tarreau	d8aa21a611	CLEANUP: server: rename srv_cleanup_{idle,toremove}_connections() These function names are unbearably long, they don't even fit into the screen in "show profiling", let's trim the "_connections" to "_conns", which happens to match the name of the lists there.	2021-02-26 00:30:22 +01:00
Willy Tarreau	9205ab31d2	MINOR: ssl: mark the SSL handshake tasklet as heavy There's a fairness issue between SSL and clear text. A full end-to-end cleartext connection can require up to ~7.7 wakeups on average, plus 3.3 for the SSL tasklet, one of which is particularly expensive. So if we accept to process many handshakes taking 1ms each, we significantly increase the processing time of regular tasks just by adding an extra delay between their calls. Ideally in order to be fair we should have a 1:18 call ratio, but this requires a bit more accounting. With very little effort we can mark the SSL handshake tasklet as TASK_HEAVY until the handshake completes, and remove it once done. Doing so reduces from 14 to 3.0 ms the total response time experienced by HTTP clients running in parallel to 1000 SSL clients doing full handshakes in loops. Better, when tune.sched.low-latency is set to "on", the latency further drops to 1.8 ms. The tasks latency distribution explain pretty well what is happening: Without the patch: $ socat - /tmp/sock1 <<< "show profiling" Per-task CPU profiling : on # set profiling tasks {on\|auto\|off} Tasks activity: function calls cpu_tot cpu_avg lat_tot lat_avg ssl_sock_io_cb 2785375 19.35m 416.9us 5.401h 6.980ms h1_io_cb 1868949 9.853s 5.271us 4.829h 9.302ms process_stream 1864066 7.582s 4.067us 2.058h 3.974ms si_cs_io_cb 1733808 1.932s 1.114us 26.83m 928.5us h1_timeout_task 935760 - - 1.033h 3.975ms accept_queue_process 303606 4.627s 15.24us 16.65m 3.291ms srv_cleanup_toremove_connections452 64.31ms 142.3us 2.447s 5.415ms task_run_applet 47 5.149ms 109.6us 57.09ms 1.215ms srv_cleanup_idle_connections 34 2.210ms 65.00us 87.49ms 2.573ms With the patch: $ socat - /tmp/sock1 <<< "show profiling" Per-task CPU profiling : on # set profiling tasks {on\|auto\|off} Tasks activity: function calls cpu_tot cpu_avg lat_tot lat_avg ssl_sock_io_cb 3000365 21.08m 421.6us 20.30h 24.36ms h1_io_cb 2031932 9.278s 4.565us 46.70m 1.379ms process_stream 2010682 7.391s 3.675us 22.83m 681.2us si_cs_io_cb 1702070 1.571s 922.0ns 8.732m 307.8us h1_timeout_task 1009594 - - 17.63m 1.048ms accept_queue_process 339595 4.792s 14.11us 3.714m 656.2us srv_cleanup_toremove_connections779 75.42ms 96.81us 438.3ms 562.6us srv_cleanup_idle_connections 48 2.498ms 52.05us 178.1us 3.709us task_run_applet 17 1.738ms 102.3us 11.29ms 663.9us other 1 947.8us 947.8us 202.6us 202.6us => h1_io_cb() and process_stream() are divided by 6 while ssl_sock_io_cb() is multipled by 4 And with low-latency on: $ socat - /tmp/sock1 <<< "show profiling" Per-task CPU profiling : on # set profiling tasks {on\|auto\|off} Tasks activity: function calls cpu_tot cpu_avg lat_tot lat_avg ssl_sock_io_cb 3000565 20.96m 419.1us 20.74h 24.89ms h1_io_cb 2019702 9.294s 4.601us 49.22m 1.462ms process_stream 2009755 6.570s 3.269us 1.493m 44.57us si_cs_io_cb 1997820 1.566s 783.0ns 2.985m 89.66us h1_timeout_task 1009742 - - 1.647m 97.86us accept_queue_process 494509 4.697s 9.498us 1.240m 150.4us srv_cleanup_toremove_connections1120 92.32ms 82.43us 463.0ms 413.4us srv_cleanup_idle_connections 70 2.703ms 38.61us 204.5us 2.921us task_run_applet 13 1.303ms 100.3us 85.12us 6.548us => process_stream() is divided by 100 while ssl_sock_io_cb() is multipled by 4 Interestingly, the total HTTPS response time doesn't increase and even very slightly decreases, with an overall ~1% higher request rate. The net effect here is a redistribution of the CPU resources between internal tasks, and in the case of SSL, handshakes wait bit more but everything after completes faster. This was made simple enough to be backportable if it helps some users suffering from high latencies in mixed traffic.	2021-02-26 00:26:03 +01:00
Willy Tarreau	74dea8caea	MINOR: task: limit the number of subsequent heavy tasks with flag TASK_HEAVY While the scheduler is priority-aware and class-aware, and consistently tries to maintain fairness between all classes, it doesn't make use of a fine execution budget to compensate for high-latency tasks such as TLS handshakes. This can result in many subsequent calls adding multiple milliseconds of latency between the various steps of other tasklets that don't even depend on this. An ideal solution would be to add a 4th queue, have all tasks announce their estimated cost upfront and let the scheduler maintain an auto- refilling budget to pick from the most suitable queue. But it turns out that a very simplified version of this already provides impressive gains with very tiny changes and could easily be backported. The principle is to reserve a new task flag "TASK_HEAVY" that indicates that a task is expected to take a lot of time without yielding (e.g. an SSL handshake typically takes 700 microseconds of crypto computation). When the scheduler sees this flag when queuing a tasklet, it will place it into the bulk queue. And during dequeuing, we accept only one of these in a full round. This means that the first one will be accepted, will not prevent other lower priority tasks from running, but if a new one arrives, then the queue stops here and goes back to the polling. This will allow to collect more important updates for other tasks that will be batched before the next call of a heavy task. Preliminary tests consisting in placing this flag on the SSL handshake tasklet show that response times under SSL stress fell from 14 ms before the patch to 3.0 ms with the patch, and even 1.8 ms if tune.sched.low-latency is set to "on".	2021-02-26 00:25:51 +01:00
Amaury Denoyelle	91e55ea3f3	BUG/MINOR: stats: fix compare of no-maint url suffix Only the first 3 characters are compared for ';no-maint' suffix in http_handle_stats. Fix it by doing a full match over the entire suffix. As a side effect, the ';norefresh' suffix matched the inaccurate comparison, so the maintenance servers were always hidden on the stats page in this case. no-maint suffix is present since commit `3e32036701` MINOR: stats: also support a "no-maint" show stat modifier It should be backported up to 2.3. This fixes github issue #1147.	2021-02-25 14:59:17 +01:00
Christopher Faulet	6c93c4ef08	CLEANUP: muxes: Remove useless if condition in show_fd function In H1, H2 and FCGI muxes, in the show_fd function, there is duplicated test on the stream's subs field. This patch fixes the issue #1142. It may be backported as far as 2.2.	2021-02-25 10:07:24 +01:00
Christopher Faulet	456f45f301	MINOR: server-state: Don't load server-state file for serverless proxies Just a minor improvement. Proxies with no server are now ignored early. It may happens for listeners for instance.	2021-02-25 10:02:39 +01:00
Christopher Faulet	3e3d3be708	REORG: server-state: Move functions to deal with server-state in its own file All functions dealing with the server-state files are moved to server_state.c. srv_update_state() function was renammed to srv_state_srv_update().	2021-02-25 10:02:39 +01:00
Christopher Faulet	69beaa91d5	REORG: server: Export and rename some functions updating server info Some static functions are now exported and renamed to follow the same pattern of other exported functions. Here is the list : * update_server_fqdn: Renamed to srv_update_fqdn and exported * update_server_check_addr_port: renamed to srv_update_check_addr_port and exported * update_server_agent_addr_port: renamed to srv_update_agent_addr_port and exported * update_server_addr: renamed to srv_update_addr * update_server_addr_potr: renamed to srv_update_addr_port * srv_prepare_for_resolution: exported This change is mandatory to move all functions dealing with the server-state files in a separate file.	2021-02-25 10:02:39 +01:00
Christopher Faulet	a67c6bf333	MEDIUM: server: Don't load server-state file if a line is corrupted This change is not huge but may have a visible impact for users. Now, if a line of a server-state file is corrupted, the whole file is ignored. A warning is emitted with the corrupted line number. In fact, there is no way to recover from a corrupted line. A line is considered as corrupted if it is too long (truncated line) or if it contains the wrong number of arguments. In both cases, it means the file was forged (or at least manually edited). It is safer to ignore it. Note for now, memory allocation errors are not reported and the corresponding line is silently ignored.	2021-02-25 10:02:39 +01:00
Christopher Faulet	d0a5e84c8d	MINOR: server: Parse and store server-state lines in a dedicated function Now, srv_state_parse_and_store_line() function is used to parse and store a line in a tree. It is used for global and local server-state files. This significatly simplies the apply_server_state() function.	2021-02-25 10:02:39 +01:00
Christopher Faulet	5c37985149	MEDIUM: server: Use a tree to store local server-state lines Just like for the global server-state file, the line of a local server-state file are now stored in a tree. This way, the file is fully parsed before loading the servers state. And with this change, global and local server-state files are now handled the same way. This will be the opportunity to factorize the code. It is also a good way to validate the file before loading any server state.	2021-02-25 10:02:39 +01:00
Christopher Faulet	2c1db104fb	MINOR: server: Move loading state of servers in a dedicated function The loop on the servers of a proxy to load the server states was moved in the function srv_state_px_update(). This simplify a bit the apply_server_state() function. It is aslo mandatory to simplify the loading of local server-state file.	2021-02-25 10:02:39 +01:00
Christopher Faulet	f4d1da90c2	MINOR: server: Remove cached line from global server-state tree when found When a server for a given backend is found in the tree containing all lines of the global server-state file, the node is removed from the tree. It is useless to keep it longer. It is a small improvement, but it may also be usefull to track the orphan lines (not used for now).	2021-02-25 10:02:39 +01:00
Christopher Faulet	ecfb9b9109	MEDIUM: server: Store parsed params of a server-state line in the tree Parsed parameters are now stored in the tree of server-state lines. This way, a line from the global server-state file is only parsed once. Before, it was parsed a first time to store it in the tree and one more time to load the server state. To do so, the server-state line object must be allocated before parsing a line. This means its size must no longer depend on the length of first parsed parameters (backend and server names). Thus the node type was changed to use a hashed key instead of a string.	2021-02-25 10:02:39 +01:00
Christopher Faulet	8a14b73ecf	MINOR: server: Be more strict when reading the version of a server-state file Now, we read a full line and expects to found an integer only on it. And if the line is empty or truncated, an error is returned. If the version is not valid, an error is also returned. This way, the first line is no longer partially read.	2021-02-25 10:02:39 +01:00
Christopher Faulet	8b4b6a0d63	CLEANUP: server: Use a local eb-tree to store lines of the global server-state file There is no reason to use a global variable to store the lines of the global server-state file. This tree is only used during the file parsing, as a line cache. Now the eb-tree is declared as a local variable in the apply_server_state() function.	2021-02-25 10:02:39 +01:00
Christopher Faulet	6d87c58fb4	CLEANUP: server: Rename state_line structure into server_state_line The structure used to store a server-state line in an eb-tree has a too generic name. Instead of state_line, the structure is renamed as server_state_line.	2021-02-25 10:02:39 +01:00
Christopher Faulet	fcb53fbb58	CLEANUP: server: Rename state_line node to node instead of name_name <state_line.name_name> field is a node in an eb-tree. Thus, instead of "name_name", we now use "node" to name this field. If is a more explicit name and not too strange.	2021-02-25 10:02:39 +01:00
Christopher Faulet	131b07be3c	MEDIUM: server: Refactor apply_server_state() to make it more readable The apply_server_state() function is really hard to read. Thus it was refactored to be more maintainable. First, an helper function is used to get the server-state file path. Some useless variables were removed and most of other variables were renamed to be more readable. The error messages are now prefixed to know the context (global vs per-proxy). Finally, the loop on the proxies list was simplified. This patch may seem a bit huge, but the changes are not so important.	2021-02-25 10:02:39 +01:00
Christopher Faulet	2a031ecd96	MINOR: server: Only fill one array when parsing a server-state line There is no reason to fill two parameter arrays in srv_state_parse_line() function. Now, only one array is used. The 4th first entries are just skipped when srv_update_state() is called.	2021-02-25 10:02:39 +01:00
Christopher Faulet	0bf268e184	MINOR: server: Be more strict on the server-state line parsing The srv_state_parse_line() function was rewritten to be more strict. First of all, it is possible to make the difference between an ignored line and an malformed one. Then, only blank characters (spaces and tabs) are now allowed as field separator. An error is reported for truncated lines or for lines with an unexpected number of arguments regarding the provided version. However, for now, errors are ignored by the caller, invalid lines are just skipped.	2021-02-25 10:02:39 +01:00
Willy Tarreau	2a54ffbf43	MINOR: task: make tasklet wakeup latency measurements more accurate First, we don't want to measure wakeup times if the call date had not been set before profiling was enabled at run time. And second, we may only collect the value before clearing the TASK_IN_LIST bit, otherwise another wakeup might happen on another thread and replace the call date we're about to use, hence artificially lower the wakeup times.	2021-02-25 09:44:16 +01:00
Willy Tarreau	b2285de049	MINOR: tasks: also compute the tasklet latency when DEBUG_TASK is set It is extremely useful to be able to observe the wakeup latency of some important I/O operations, so let's accept to inflate the tasklet struct by 8 extra bytes when DEBUG_TASK is set. With just this we have enough to get live reports like this: $ socat - /tmp/sock1 <<< "show profiling" Per-task CPU profiling : on # set profiling tasks {on\|auto\|off} Tasks activity: function calls cpu_tot cpu_avg lat_tot lat_avg si_cs_io_cb 8099492 4.833s 596.0ns 8.974m 66.48us h1_io_cb 7460365 11.55s 1.548us 2.477m 19.92us process_stream 7383828 22.79s 3.086us 18.39m 149.5us h1_timeout_task 4157 - - 348.4ms 83.81us srv_cleanup_toremove_connections751 39.70ms 52.86us 10.54ms 14.04us srv_cleanup_idle_connections 21 1.405ms 66.89us 30.82us 1.467us task_run_applet 16 1.058ms 66.13us 446.2us 27.89us accept_queue_process 7 34.53us 4.933us 333.1us 47.58us	2021-02-25 09:44:16 +01:00
Willy Tarreau	45499c56d3	MINOR: task: make grq_total atomic to move it outside of the grq_lock Instead of decrementing grq_total once per task picked from the global run queue, let's do it at once after the loop like we do for other counters. This simplifies the code everywhere. It is not expected to bring noticeable improvements however, since global tasks tend to be less common nowadays.	2021-02-25 09:44:16 +01:00
Willy Tarreau	c9afbb10f5	MINOR: task: don't decrement then increment the local run queue Now we don't need to decrement rq_total when we pick a tack in the tree to immediately increment it again after installing it into the local list. Instead, we simply add to the local queue count the number of globally picked tasks. Avoiding this shows ~0.5% performance gains at 1Mreq/s (2M task switches/s).	2021-02-25 09:44:16 +01:00
Willy Tarreau	2b363ac092	MINOR: task: do not use __task_unlink_rq() from process_runnable_tasks() As indicated in previous commit, this function tries to guess which tree the task is in to figure what counters to update, while we already have that info in the caller. Let's just pick the relevant parts to place them in the caller.	2021-02-25 09:44:16 +01:00
Willy Tarreau	e7923c1d22	MINOR: task: split the counts of local and global tasks picked In process_runnable_tasks() we're still calling __task_unlink_rq() to pick a task, and this function tries to guess where to pick the task from and which counter to update while the caller's context already has everything. Worse, the number of local tasks is decremented then recredited, doubling the operations. In order to avoid this we first need to keep separate counters for local and global tasks that were picked. This is what this patch does.	2021-02-25 09:44:16 +01:00
Christopher Faulet	e071f0e6a4	MINOR: htx: Add function to reserve the max possible size for an HTX DATA block The function htx_reserve_max_data() should be used to get an HTX DATA block with the max possible size. A current block may be extended or a new one created, depending on the HTX message state. But the idea is to let the caller to copy a bunch of data without requesting many new blocks. It is its responsibility to resize the block at the end, to set the final block size. This function will be used to parse messages with small chunks. Indeed, we can have more than 2700 1-byte chunks in a 16Kb of input data. So it is easy to understand how this function may help to improve the parsing of chunk messages.	2021-02-24 22:10:01 +01:00
Christopher Faulet	d127ffa9f4	BUG/MEDIUM: resolvers: Reset address for unresolved servers If the DNS resolution failed for a server, its ip address must be removed. Otherwise, the server is stopped but keeps its ip. This may be confusing when the servers state are retrieved on the CLI and it may lead to undefined behavior if HAproxy is configured to load its servers state from a file. This patch should be backported as far as 2.0.	2021-02-24 21:58:46 +01:00
Christopher Faulet	52d4d30109	BUG/MEDIUM: resolvers: Reset server address and port for obselete SRV records When a SRV record expires, the ip/port assigned to the associated server are now removed. Otherwise, the server is stopped but keeps its ip/port while the server hostname is removed. It is confusing when the servers state are retrieve on the CLI and may be a problem if saved in a server-state file. Because the reload may fail because of this inconsistency. Here is an example: * Declare a server template in a backend, using the resolver <dns> server-template test 2 _http._tcp.example.com resolvers dns check * 2 SRV records are announced with the corresponding additional records. Thus, 2 servers are filled. Here is the "show servers state" output : 2 frt 1 test1 192.168.1.1 2 64 0 1 2 15 3 4 6 0 0 0 http1.example.com 8001 _http._tcp.example.com 0 0 - - 0 2 frt 2 test2 192.168.1.2 2 64 0 1 1 15 3 4 6 0 0 0 http2.example.com 8002 _http._tcp.example.com 0 0 - - 0 * Then, one additional record is removed (or a SRV record is removed, the result is the same). Here is the new "show servers state" output : 2 frt 1 test1 192.168.1.1 2 64 0 1 38 15 3 4 6 0 0 0 http1.example.com 8001 _http._tcp.example.com 0 0 - - 0 2 frt 2 test2 192.168.1.2 0 96 0 1 19 15 3 0 14 0 0 0 - 8002 _http._tcp.example.com 0 0 - - 0 On reload, if a server-state file is used, this leads to undefined behaviors depending on the configuration. This patch should be backported as far as 2.0.	2021-02-24 21:58:45 +01:00
Baptiste Assmann	b4badf720c	BUG/MINOR: resolvers: new callback to properly handle SRV record errors When a SRV record was created, it used to register the regular server name resolution callbacks. That said, SRV records and regular server name resolution don't work the same way, furthermore on error management. This patch introduces a new call back to manage DNS errors related to the SRV queries. this fixes github issue #50. Backport status: 2.3, 2.2, 2.1, 2.0	2021-02-24 21:58:45 +01:00
Christopher Faulet	a331a1e8eb	BUG/MINOR: resolvers: Only renew TTL for SRV records with an additional record If no additional record is associated to a SRV record, its TTL must not be renewed. Otherwise the entry never expires. Thus once announced a first time, the entry remains blocked on the same IP/port except if a new announce replaces the old one. Now, the TTL is updated if a SRV record is received while a matching existing one is found with an additional record or when an new additional record is assigned to an existing SRV record. This patch should be backported as far as 2.2.	2021-02-24 21:58:45 +01:00
Christopher Faulet	9c246a4b6c	BUG/MINOR: resolvers: Fix condition to release received ARs if not assigned At the end of resolv_validate_dns_response(), if a received additionnal record is not assigned to an existing server record, it is released. But the condition to do so is buggy. If "answer_record" (the received AR) is not assigned, "tmp_record" is not a valid record object. It is just a dummy record "representing" the head of the record list. Now, the condition is far cleaner. This patch must be backported as far as 2.2.	2021-02-24 21:58:45 +01:00
Willy Tarreau	9c6dbf0eea	CLEANUP: task: split the large tasklet_wakeup_on() function in two This function has become large with the multi-queue scheduler. We need to keep the fast path and the debugging parts inlined, but the rest now moves to task.c just like was done for task_wakeup(). This has reduced the code size by 6kB due to less inlining of large parts that are always context-dependent, and as a side effect, has increased the overall performance by 1%.	2021-02-24 17:55:58 +01:00
Willy Tarreau	955a11ebfa	MINOR: task: move the allocated tasks counter to the per-thread struct The nb_tasks counter was still global and gets incremented and decremented for each task_new()/task_free(), and was read in process_runnable_tasks(). But it's only used for stats reporting, so doing this this often is pointless and expensive. Let's move it to the task_per_thread struct and have the stats sum it when needed.	2021-02-24 17:42:04 +01:00
Willy Tarreau	eeffb3df41	MINOR: task: limit the remote thread wakeup to the global runqueue only The test in __task_wakeup() to figure if the remote threads are sleeping doesn't make sense outside of the global runqueue test, since there are only two possibilities here: local runqueue or global runqueue, hence a sleeping thread is another one and can only happen when sending to the global run queue. Let's move the test inside the "if" block.	2021-02-24 17:42:04 +01:00
Willy Tarreau	018564eaa2	CLEANUP: task: move the tree root detection from __task_wakeup() to task_wakeup() Historically we used to call __task_wakeup() with a known tree root but this is not the case and the code has remained needlessly complicated with the root calculation in task_wakeup() passed in argument to __task_wakeup() which compares it again. Let's get rid of this and just move the detection code there. This eliminates some ifdefs and allows to simplify the test conditions quite a bit.	2021-02-24 17:42:04 +01:00
Willy Tarreau	1f3b1417b8	CLEANUP: tasks: use a less confusing name for task_list_size This one is systematically misunderstood due to its unclear name. It is in fact the number of tasks in the local tasklet list. Let's call it "tasks_in_list" to remove some of the confusion.	2021-02-24 17:42:04 +01:00
Willy Tarreau	2c41d77ebc	MINOR: tasks: do not maintain the rqueue_size counter anymore This one is exclusively used as a boolean nowadays and is non-zero only when the thread-local run queue is not empty. Better check the root tree's pointer and avoid updating this counter all the time.	2021-02-24 17:42:04 +01:00
Willy Tarreau	9c7b8085f4	MEDIUM: task: remove the tasks_run_queue counter and have one per thread This counter is solely used for reporting in the stats and is the hottest thread contention point to date. Moving it to the scheduler and having a separate one for the global run queue dramatically improves the performance, showing a 12% boost on the request rate on 16 threads! In addition, the thread debugging output which used to rely on rqueue_size was not totally accurate as it would only report task counts. Now we can return the exact thread's run queue length. It is also interesting to note that there are still a few other task/tasklet counters in the scheduler that are not efficiently updated because some cover a single area and others cover multiple areas. It looks like having a distinct counter for each of the following entries would help and would keep the code a bit cleaner: - global run queue (tree) - per-thread run queue (tree) - per-thread shared tasklets list - per-thread local lists Maybe even splitting the shared tasklets lists between pure tasklets and tasks instead of having the whole and tasks would simplify the code because there remain a number of places where several counters have to be updated.	2021-02-24 17:42:04 +01:00
Willy Tarreau	e3e648c92f	BUILD: dns: avoid a build warning when threads are disabled (dss unused) dns_session_release() only uses its struct dns_stream_server to access the lock, so a warning is emitted when threads are disabled. Let's mark it __maybe_unused.	2021-02-24 17:42:04 +01:00
Willy Tarreau	49de68520e	MEDIUM: streams: do not use the streams lock anymore The lock was still used exclusively to deal with the concurrency between the "show sess" release handler and a stream_new() or stream_free() on another thread. All other accesses made by "show sess" are already done under thread isolation. The release handler only requires to unlink its node when stopping in the middle of a dump (error, timeout etc). Let's just isolate the thread to deal with this case so that it's compatible with the dump conditions, and remove all remaining locking on the streams. This effectively kills the streams lock. The measured gain here is around 1.6% with 4 threads (374krps -> 380k).	2021-02-24 13:54:50 +01:00
Willy Tarreau	a698eb6739	MINOR: streams: use one list per stream instead of a global one The global streams list is exclusively used for "show sess", to look up a stream to shut down, and for the hard-stop. Having all of them in a single list is extremely expensive in terms of locking when using threads, with performance losses as high as 7% having been observed just due to this. This patch makes the list per-thread, since there's no need to have a global one in this situation. All call places just iterate over all threads. The most "invasive" changes was in "show sess" where the end of list needs to go back to the beginning of next thread's list until the last thread is seen. For now the lock was maintained to keep the code auditable but a next commit should get rid of it. The observed performance gain here with only 4 threads is already 7% (350krps -> 374krps).	2021-02-24 13:53:20 +01:00
Willy Tarreau	5d533e2bad	MINOR: cli/streams: make "show sess" dump all streams till the new epoch Instead of placing the current stream at the end of the stream list when issuing a "show sess" on the CLI as was done in 2.2 with commit `c6e7a1b8e` ("MINOR: cli: make "show sess" stop at the last known session"), now we compare the listed stream's epoch with the dumping stream's and stop on more recent ones. This way we're certain to always only dump known streams at the moment we issue the dump command without having to modify the list. In theory we could miss some streams if more than 2^31 "show sess" requests are issued while an old stream remains present, but that's 68 years at 1 "show sess" per second and it's unlikely we'll keep a process, let alone a stream, that long. It could be verified that the count of dumped streams still matches the one before this change.	2021-02-24 12:12:51 +01:00
Willy Tarreau	b981318c11	MINOR: stream: add an "epoch" to figure which streams appeared when The "show sess" CLI command currently lists all streams and needs to stop at a given position to avoid dumping forever. Since 2.2 with commit `c6e7a1b8e` ("MINOR: cli: make "show sess" stop at the last known session"), a hack consists in unlinking the stream running the applet and linking it again at the current end of the list, in order to serve as a delimiter. But this forces the stream list to be global, which affects scalability. This patch introduces an epoch, which is a global 32-bit counter that is incremented by the "show sess" command, and which is copied by newly created streams. This way any stream can know whether any other one is newer or older than itself. For now it's only stored and not exploited.	2021-02-24 12:12:51 +01:00
Willy Tarreau	0d03825b93	BUG/MINOR: proxy: wake up all threads when sending the hard-stop signal The hard-stop event didn't wake threads up. In the past it wasn't an issue as the poll timeout was limited to 1 second, but since commit `4f59d3861` ("MINOR: time: increase the minimum wakeup interval to 60s") it has become a problem because old processes can remain live for up to one minute after the hard-stop-after delay. Let's just wake them up. This may be backported to older releases, though before 2.4 the extra delay was only one second.	2021-02-24 12:12:46 +01:00
Willy Tarreau	3f5dd2945c	BUG/MEDIUM: cli/shutdown sessions: make it thread-safe There's no locking around the lookup of a stream nor its shutdown when issuing "shutdown sessions" over the CLI so the risk of crashing the process is particularly high. Let's use a thread_isolate() there which is suitable for this task, and there are not that many alternatives. This must be backported to 1.8.	2021-02-24 11:11:06 +01:00
Willy Tarreau	92b887e20a	BUG/MEDIUM: proxy: use thread-safe stream killing on hard-stop When setting hard-stop-after, hard_stop() is called at the end to kill last pending streams. Unfortunately there's no locking there while walking over the streams list nor when shutting them down, so it's very likely that some old processes have been crashing or gone wild due to this. Let's use a thread_isolate() call for this as we don't have much other choice (and it happens once in the process' life, that's OK). This must be backported to 1.8.	2021-02-24 11:08:56 +01:00
Dragan Dosen	ec0a604f27	CLEANUP: vars: make smp_fetch_var() to reuse vars_get_by_desc() They both do the same thing, so let's remove unneeded code duplication.	2021-02-23 17:23:53 +01:00
Dragan Dosen	14518f2305	BUG/MEDIUM: vars: make functions vars_get_by_{name,desc} thread-safe This patch adds a lock to functions vars_get_by_name() and vars_get_by_desc() to protect accesses to the list of variables. After the variable is fetched, a sample data is duplicated by using smp_dup() because the variable may be modified by another thread. This should be backported to all versions supporting vars along with "BUG/MINOR: sample: secure convs that accept base64 string and var name as args" which this patch depends on.	2021-02-23 17:22:46 +01:00

1 2 3 4 5 ...

10945 Commits