haproxy

mirror of http://git.haproxy.org/git/haproxy.git/ synced 2025-01-06 12:20:07 +00:00

Author	SHA1	Message	Date
Willy Tarreau	d87a67f9bc	MINOR: tools: implement my_flsl() We already have my_ffsl() to find the lowest bit set in a word, and this patch implements the search for the highest bit set in a word. On x86 it uses the bsr instruction and on other architectures it uses an efficient implementation.	2019-03-07 13:48:04 +01:00
Willy Tarreau	fc630bd373	MINOR: listener: improve incoming traffic distribution By picking two randoms following the P2C algorithm, we seldom observe asymmetric loads on bursts of small session counts. This is typically what makes h2load take a bit of time to complete the last 100% because if a thread gets two connections while the other ones only have one, it takes twice the time to complete its work. This patch proposes a modification of the p2c algorithm which seems more suitable to this case : it mixes a rotating index with a random. This way, we're certain that all threads are consulted in turn and at the same time we're not forced to use the ones we're giving a chance. This significantly increases the traffic rate. Now h2load shows faster completion and the average request rates on H2 and the TLS resume rate increases by a bit more than 5% compared to pure p2c. The index was placed into the struct bind_conf because 1) it's faster there and it's the best place to optimally distribute traffic among a group of listeners. It's the only runtime-modified element there and it will be quite cache-hot.	2019-03-07 13:48:04 +01:00
Willy Tarreau	b238b12e98	MINOR: task: use LIST_DEL_INIT() to remove a task from the queue By using LIST_DEL_INIT() instead of LIST_DEL()+LIST_INIT() we manage to bump the peak connection rate by no less than 3% on 8 threads. The perf top profile shows much less contention in this area which suffered from the second reload.	2019-03-07 11:45:44 +01:00
Willy Tarreau	c5bd311b2a	MINOR: lists: add a LIST_DEL_INIT() macro It turns out that we call LIST_DEL+LIST_INIT very frequently and that the compiler doesn't know what pointers get modified in the e->n->p and e->p->n dance, so when LIST_INIT() is called, it reloads these pointers, which is quite a bit of a mess in terms of performance. This patch adds LIST_DEL_INIT() to perform the two operations at once using local temporary variables so that the compiler knows these pointers are left unaffected.	2019-03-07 11:45:44 +01:00
Fr�d�ric L�caille	47e4e13c01	REGTEST: Enable reg tests with HEAD HTTP method usage. This patch enables the part of this reg test which could not work due to a vtest (formerly varnishtest) bug. NOTE: You must have a vtest version with 4e43cc1 commit for this bug fix to make this script succeed (see `4e43cc1fec` for more information).	2019-03-06 16:14:59 +01:00
Fr�d�ric L�caille	3c4fb953ce	REGTEST: Peers reg tests.	2019-03-06 15:36:02 +01:00
Fr�d�ric L�caille	bfe6138150	MINOR: sample: Add a protocol buffers specific converter. This patch adds "protobuf" protocol buffers specific converter wich may used in combination with "ungrpc" as first converter to extract a protocol buffers field value. It is simply implemented reusing protobuf_field_lookup() which is the protocol buffers specific parser already used by "ungrpc" converter which only parse a gRPC header in addition of parsing protocol buffers message. Update the documentation for this new "protobuf" converter.	2019-03-06 15:36:02 +01:00
Fr�d�ric L�caille	93d33168e2	DOC: Remove tabs and fixed punctuation.	2019-03-06 15:36:02 +01:00
Fr�d�ric L�caille	5f33f85ce8	MINOR: sample: Extract some protocol buffers specific code. We move the code responsible of parsing protocol buffers messages inside gRPC messages from sample.c to include/proto/protocol_buffers.h so that to reuse it to cascade "ungrpc" converter.	2019-03-06 15:36:02 +01:00
Lukas Tribus	1aabc93978	BUG/MINOR: ssl: fix warning about ssl-min/max-ver support In `84e417d8` ("MINOR: ssl: support Openssl 1.1.1 early callback for switchctx") the code was extended to also support OpenSSL 1.1.1 (code already supported BoringSSL). A configuration check warning was updated but with the wrong logic, the #ifdef needs a && instead of an \|\|. Reported in #54. Should be backported to 1.8.	2019-03-05 23:56:58 +01:00
Willy Tarreau	5799e9cd37	MINOR: config: relax the range checks on cpu-map Emeric reports that when MAX_THREADS and/or MAX_PROCS are set to lower values, referencing thread or process numbers higher than these limits in cpu-map returns errors. This is annoying because these typically are silent settings that are expected to be used only when set. Let's switch back to LONGBITS for this limit.	2019-03-05 18:14:03 +01:00
Willy Tarreau	8e5e1e7bf0	CLEANUP: wurfl: remove dead, broken and unmaintained code Since the "wurfl" device detection engine was merged slightly more than two years ago (2016-11-04), it never received a single fix nor update. For almost two years it didn't receive even the minimal review or changes needed to be compatible with threads, and it's remained build-broken for about the last 9 months, consecutive to the last buffer API changes, without anyone ever noticing! When asked on the list, nobody confirmed using it : https://www.mail-archive.com/haproxy@formilux.org/msg32516.html And obviously nobody even cared to verify that it did still build. So we are left with this broken code with no user and no maintainer. It might even suffer from remotely exploitable vulnerabilities without anyone being able to check if it presents any risk. It's a pain to update each time there is an API change because it doesn't build as it depends on external libraries that are not publicly accessible, leading to careful blind changes. It slows down the whole project. This situation is not acceptable at all. It's time to cure the problem where it is. This patch removes all this dead, non-buildable, non-working code. If anyone ever decides to use it, which I seriously doubt based on history, it could be reintegrated, but this time the following guarantees will be required : - someone has to step up as a maintainer and have his name listed in the MAINTAINERS file (I should have been more careful last time). This person will take the sole blame for all issues and will be responsible for fixing the bugs and incompatibilities affecting this code, and for making it evolve to follow regular internal API updates. - support building on a standard distro with automated tools (i.e. no more "click on this site, register your e-mail and download an archive then figure how to place this into your build system"). Dummy libs are OK though as long as they allow the mainline code to build and start. - multi-threaded support must be fixed. I mean seriously, not worked around with a check saying "please disable threads, we've been busy fishing for the last two years". This may be backported to 1.9 given that the code has never worked there either, thus at least we're certain nobody will miss it.	2019-03-05 13:46:12 +01:00
Fr�d�ric L�caille	756d97f205	MINOR: sample: Rework gRPC converter code. For now on, "ungrpc" may take a second optional argument to provide the protocol buffers types used to encode the field value to be extracted. When absent the field value is extracted as a binary sample which may then followed by others converters like "hex" which takes binary as input sample. When this second argument is a type which does not match the one found by "ungrpc", this field is considered as not found even if present. With this patch we also remove the useless "varint" and "svarint" converters. Update the documentation about "ungrpc" converters.	2019-03-05 11:04:23 +01:00
Fr�d�ric L�caille	7c93e88d0c	MINOR: sample: Code factorization "ungrpc" converter. Parsing protocol buffer fields always consists in skip the field if the field is not found or store the field value if found. So, with this patch we factorize a little bit the code for "ungrpc" converter.	2019-03-05 11:03:53 +01:00
Willy Tarreau	9255e7e971	BUG/MEDIUM: h2/htx: verify that :path doesn't contain invalid chars While the legacy code converts h2 to h1 and provides some control over what is passed, in htx mode there is no such control and it is possible to pass control chars and linear white spaces in the path, which are possibly reencoded differently once passed to the H1 side. HTX supports parse error reporting using a special flag. Let's check the correctness of the :path pseudo header and report any anomaly in the HTX flag. Thanks to J�r�me Magnin for reporting this bug with a working reproducer. This fix must be backported to 1.9 along with the two previous patches ("MINOR: htx: unconditionally handle parsing errors in requests or responses" and "MINOR: mux-h2: always pass HTX_FL_PARSING_ERROR between h2s and buf on RX").	2019-03-05 10:58:28 +01:00
Willy Tarreau	7196dd6071	MINOR: mux-h2: always pass HTX_FL_PARSING_ERROR between h2s and buf on RX In order to allow the H2 parser to report parsing errors, we must make sure to always pass the HTX_FL_PARSING_ERROR flag from the h2s htx to the conn_stream's htx.	2019-03-05 10:56:34 +01:00
Willy Tarreau	4236f035fe	MINOR: htx: unconditionally handle parsing errors in requests or responses The htx request and response processing functions currently only check for HTX_FL_PARSING_ERROR on incomplete messages because that's how mux_h1 delivers these. However with H2 we have to detect some parsing errors in the format of certain pseudo-headers (e.g. :path), so we do have a complete message but we want to report an error. Let's move the parse error check earlier so that it always triggers when the flag is present. It was also moved for htx_wait_for_request_body() since we definitely want to be able to abort processing such an invalid request even if it appears complete, but it was not changed in the forward functions so as not to truncate contents before the position of the first error.	2019-03-05 10:56:34 +01:00
Willy Tarreau	967de20a43	BUG/MEDIUM: list: fix again LIST_ADDQ_LOCKED Well, that's becoming embarrassing. Now this fixes commit `4ef6801c` ("BUG/MEDIUM: list: correct fix for LIST_POP_LOCKED's removal of last element") which itself tried to fix commit `285192564`. This fix only works under low contention and was tested with the listener's queue. With the idle conns it's obvious that it's still wrong since adding more than one element to the list leaves a LLIST_BUSY pointer into the list's head. This was visible when accumulating idle connections in a server's list. This new version of the fix almost goes back to the original code, except that since then we addressed issues with expectedly idempotent operations that were not. Now the code has been verified on paper again and has survived 300 million connections spread over 4 threads. This will have to be backported if the commit above is backported.	2019-03-04 14:09:22 +01:00
Fr�d�ric L�caille	50290fbb42	MINOR: sample: Replace "req.ungrpc" smp fetch by a "ungrpc" converter. This patch simply extracts the code of smp_fetch_req_ungrpc() for "req.ungrpc" from http_fetch.c to move it to sample.c with very few modifications. Furthermore smp_fetch_body_buf() used to fetch the body contents is no more needed. Update the documentation for gRPC.	2019-03-04 08:28:42 +01:00
Willy Tarreau	927b88ba00	BUG/MAJOR: mux-h2: fix race condition between close on both ends A crash in H2 was reported in issue #52. It turns out that there is a small but existing race by which a conn_stream could detach itself using h2_detach(), not being able to destroy the h2s due to pending output data blocked by flow control, then upon next h2s activity (transfer_data or trailers parsing), an ES flag may need to be turned into a CS_FL_REOS bit, causing a dereference of a NULL stream. This is a side effect of the fact that we still have a few places which incorrectly depend on the CS flags, while these flags should only be set by h2_rcv_buf() and h2_snd_buf(). All candidate locations along this path have been secured against this risk, but the code should really evolve to stop depending on CS anymore. This fix must be backported to 1.9 and possibly partially to 1.8.	2019-03-04 08:17:12 +01:00
Willy Tarreau	b28f3446e5	DOC: update the text related to the global maxconn value Maxconn is now automatically calculated, mention this in the doc.	2019-03-04 08:17:12 +01:00
Willy Tarreau	8ae37d8a7b	REGTEST: fix a spurious "nbthread 4" in the connection test Commit `26f6ae12c` ("MAJOR: config: disable support for nbproc and nbthread in parallel") revealed that there was accidently nbproc+nbthread in this test while nbproc is the one expected. This likely is a leftover from a previous attempt at reproducing the issue.	2019-03-01 17:38:08 +01:00
Willy Tarreau	ac35093a19	MEDIUM: init: make the global maxconn default to what rlim_fd_cur permits The global maxconn value is often a pain to configure : - in development the user never has the permissions to increase the rlim_cur value too high and gets warnings all the time ; - in some production environments, users may have limited actions on it or may only be able to act on rlim_fd_cur using ulimit -n. This is sometimes particularly true in containers or whatever environment where the user has no privilege to upgrade the limits. - keeping config homogenous between machines is even less easy. We already had the ability to automatically compute maxconn from the memory limits when they were set. This patch goes a bit further by also computing the limit permitted by the configured limit on the number of FDs. For this it simply reverses the rlim_fd_cur calculation to determine maxconn based on the number of reserved sockets for listeners & checks, the number of SSL engines and the number of pipes (absolute or relative). This way it becomes possible to make maxconn always be the highest possible value resulting in maxsock matching what was set using "ulimit -n", without ever setting it. Note that we adjust to the soft limit, not the hard one, since it's what is configured with ulimit -n. This allows users to also limit to low values if needed. Just like before, the calculated value is reported in verbose mode.	2019-03-01 15:54:16 +01:00
Willy Tarreau	8d687d8464	MINOR: init: move some maxsock updates earlier We'll need to know the global maxsock before the maxconn calculation. Actually only two components were calculated too late, the peers FD and the stats FD. Let's move them a few lines upward.	2019-03-01 15:53:14 +01:00
Willy Tarreau	5a023f0d7a	MINOR: init: make the maxpipe computation more accurate The default number of pipes is adjusted based on the sum of frontends and backends maxconn/fullconn settings. Now that it is possible to have a null maxconn on a frontend to indicate "unlimited" with commit `c8d5b95e6` ("MEDIUM: config: don't enforce a low frontend maxconn value anymore"), the sum of maxconn may remain low and limited to the only frontends/backends where this limit is set. This patch considers this new unlimited case when doing the check, and automatically switches to the default value which is maxconn/4 in this case. All the calculation was moved to a distinct function for ease of use. This function also supports returning unlimited (-1) when the value depends on global.maxconn and this latter is not yet set.	2019-03-01 15:53:14 +01:00
Willy Tarreau	8dca19549a	BUG/MINOR: mworker: be careful to restore the original rlim_fd_cur/max on reload When the master re-execs itself on reload, it doesn't restore the initial rlim_fd_cur/rlim_fd_max values, which have been modified by the ulimit-n or global maxconn directives. This is a problem, because if these values were set really low it could prevent the process from restarting, and if they were set very high, this could have some implications on the restart time, or later on the computed maxconn. Let's simply reset these values to the ones we had at boot to maintain the system in a consistent state. A backport could be performed to 1.9 and maybe 1.8. This patch depends on the two previous ones.	2019-03-01 11:26:08 +01:00
Willy Tarreau	9f6dc72477	BUG/MINOR: checks: make external-checks restore the original rlim_fd_cur/max It's not normal that external processes are run with high FD limits, as quite often such processes (especially shell scripts) will iterate over all FDs to close them. Ideally we should even provide a tunable with the external-check directive to adjust this value, but at least we need to restore it to the value that was active when starting haproxy (before it was adjusted for maxconn). Additionally with very low maxconn values causing rlim_fd_cur to be low, some heavy checks could possibly fail. This was also mentioned in issue #45. Currently the following config and scripts report this : $ cat rlim.cfg global maxconn 500000 external-check listen www bind :8001 timeout client 5s timeout server 5s timeout connect 5s option external-check external-check command "$PWD/sleep1.sh" server local 127.0.0.1:80 check inter 1s $ cat sleep1.sh #!/bin/sh /bin/sleep 0.1 echo -n "soft: ";ulimit -S -n echo -n "hard: ";ulimit -H -n # ./haproxy -db -f rlim.cfg soft: 1000012 hard: 1000012 soft: 1000012 hard: 1000012 Now with the fix : # ./haproxy -db -f rlim.cfg soft: 1024 hard: 4096 soft: 1024 hard: 4096 This fix should be backported to stable versions but it depends on "MINOR: global: keep a copy of the initial rlim_fd_cur and rlim_fd_max values" and "BUG/MINOR: init: never lower rlim_fd_max".	2019-03-01 11:23:45 +01:00
Willy Tarreau	e5cfdacb83	BUG/MINOR: init: never lower rlim_fd_max If a ulimit-n value is set, we must not lower the rlim_max value if the new value is lower, we must only adjust the rlim_cur one. The effect is that on very low values, this could prevent a master-worker reload, or make an external check fail by lack of FDs. This may be backported to 1.9 and earlier, but it depends on this patch "MINOR: global: keep a copy of the initial rlim_fd_cur and rlim_fd_max values".	2019-03-01 10:40:30 +01:00
Willy Tarreau	bf6964007a	MINOR: global: keep a copy of the initial rlim_fd_cur and rlim_fd_max values Let's keep a copy of these initial values. They will be useful to compute automatic maxconn, as well as to restore proper limits when doing an execve() on external checks.	2019-03-01 10:40:30 +01:00
Fr�d�ric L�caille	645635da84	MINOR: peers: Add a message for heartbeat. This patch implements peer heartbeat feature to prevent any haproxy peer from reconnecting too often, consuming sockets for nothing. To do so, we add PEER_MSG_CTRL_HEARTBEAT new message to PEER_MSG_CLASS_CONTROL peers control class of messages. A ->heartbeat field is added to peer structs to store the heatbeat timeout value which is handled by the same function as for ->reconnect to control the session timeouts. A 2-bytes heartbeat message is sent every 3s when no updates have to be sent. This way, the peer which receives such a message is sure the remote peer is still alive. So, it resets the ->reconnect peer session timeout to its initial value (5s). This prevents any reconnection to an already connected alive peer.	2019-03-01 09:33:26 +01:00
Willy Tarreau	c8d5b95e6d	MEDIUM: config: don't enforce a low frontend maxconn value anymore Historically the default frontend's maxconn used to be quite low (2000), which was sufficient two decades ago but often proved to be a problem when users had purposely set the global maxconn value but forgot to set the frontend's. There is no point in keeping this arbitrary limit for frontends : when the global maxconn is lower, it's already too high and when the global maxconn is much higher, it becomes a limiting factor which causes trouble in production. This commit allows the value to be set to zero, which becomes the new default value, to mean it's not directly limited, or in fact it's set to the global maxconn. Since this operation used to be performed before computing a possibly automatic global maxconn based on memory limits, the calculation of the maxconn value and its propagation to the backends' fullconn has now moved to a dedicated function, proxy_adjust_all_maxconn(), which is called once the global maxconn is stabilized. This comes with two benefits : 1) a configuration missing "maxconn" in the defaults section will not limit itself to a magically hardcoded value but will scale up to the global maxconn ; 2) when the global maxconn is not set and memory limits are used instead, the frontends' maxconn automatically adapts, and the backends' fullconn as well.	2019-02-28 17:05:32 +01:00
Willy Tarreau	d89cc8bfc0	MINOR: proxy: do not change the listeners' maxconn when updating the frontend's It is possible to update a frontend's maxconn from the CLI. Unfortunately when doing this it scratches all listeners' maxconn values and sets them all to the new frontend's value. This can be problematic when mixing different traffic classes (bind to interface or private networks, etc). Now that the listener's maxconn is allowed to remain unset, let's not change these values when setting the frontend's maxconn. This way the overall frontend's limit can be raised but if certain specific listeners had their own value forced in the config, they will be preserved. This makes more sense and is more in line with the principle of defaults propagation.	2019-02-28 17:05:32 +01:00
Willy Tarreau	a8cf66bcab	MINOR: listener: do not needlessly set l->maxconn It's pointless to always set and maintain l->maxconn because the accept loop already enforces the frontend's limit anyway. Thus let's stop setting this value by default and keep it to zero meaning "no limit". This way the frontend's maxconn will be used by default. Of course if a value is set, it will be enforced.	2019-02-28 17:05:32 +01:00
Willy Tarreau	e2711c7bd6	MINOR: listener: introduce listener_backlog() to report the backlog value In an attempt to try to provide automatic maxconn settings, we need to decorrelate a listner's backlog and maxconn so that these values can be independent. This introduces a listener_backlog() function which retrieves the backlog value from the listener's backlog, the frontend's, the listener's maxconn, the frontend's or falls back to 1024. This corresponds to what was done in cfgparse.c to force a value there except the last fallback which was not set since the frontend's maxconn is always known.	2019-02-28 17:05:29 +01:00
Willy Tarreau	4ef6801cd4	BUG/MEDIUM: list: correct fix for LIST_POP_LOCKED's removal of last element As seen with Olivier, in the end the fix in commit `285192564` ("BUG/MEDIUM: list: fix LIST_POP_LOCKED's removal of the last pointer") is wrong, the code there was right but the bug was triggered by another bug in LIST_ADDQ_LOCKED() which doesn't properly update the list's head by inserting in the wrong order. This will have to be backported if the commit above is backported.	2019-02-28 16:51:28 +01:00
Willy Tarreau	82c9789ac4	BUG/MEDIUM: listener: make sure the listener never accepts too many conns We were not checking p->feconn nor the global actconn soon enough. In older versions this could result in a frontend accepting more connections than allowed by its maxconn or the global maxconn, exactly N-1 extra connections where N is the number of threads, provided each of these threads were running a different listener. But with the lock removal, it became worse, the excess could be the listener's maxconn multiplied by the number of threads. Among the nasty side effect was that LI_FULL could be removed while the limit was still over and in some cases the polling on the socket was no re-enabled. This commit takes care of updating and checking p->feconn and the global actconn before processing the connection, so that the listener can be turned off before accepting the socket if needed. This requires to move some of the bookkeeping operations form session to listen, which totally makes sense in this context. Now the limits are properly respected, even if a listener's maxconn is over a frontend's. This only applies on top of the listener lock removal series and doesn't have to be backported.	2019-02-28 16:08:54 +01:00
Willy Tarreau	01abd02508	BUG/MEDIUM: listener: use a self-locked list for the dequeue lists There is a very difficult to reproduce race in the listener's accept code, which is much easier to reproduce once connection limits are properly enforced. It's an ABBA lock issue : - the following functions take l->lock then lq_lock : disable_listener, pause_listener, listener_full, limit_listener, do_unbind_listener - the following ones take lq_lock then l->lock : resume_listener, dequeue_all_listener This is because __resume_listener() only takes the listener's lock and expects to be called with lq_lock held. The problem can easily happen when listener_full() and limit_listener() are called a lot while in parallel another thread releases sessions for the same listener using listener_release() which in turn calls resume_listener(). This scenario is more prevalent in 2.0-dev since the removal of the accept lock in listener_accept(). However in 1.9 and before, a different but extremely unlikely scenario can happen : thread1 thread2 ............................ enter listener_accept() limit_listener() ............................ long pause before taking the lock session_free() dequeue_all_listeners() lock(lq_lock) [1] ............................ try_lock(l->lock) [2] __resume_listener() spin_lock(l->lock) =>WAIT[2] ............................ accept() l->accept() nbconn==maxconn => listener_full() state==LI_LIMITED => lock(lq_lock) =>DEADLOCK[1]! In practice it is almost impossible to trigger it because it requires to limit both on the listener's maxconn and the frontend's rate limit, at the same time, and to release the listener when the connection rate goes below the limit between poll() returns the FD and the lock is taken (a few nanoseconds). But maybe with threads competing on the same core it has more chances to appear. This patch removes the lq_lock and replaces it with a lockless queue for the listener's wait queue (well, technically speaking a self-locked queue) brought by commit `a8434ec14` ("MINOR: lists: Implement locked variations.") and its few subsequent fixes. This relieves us from the need of the lq_lock and removes the deadlock. It also gets rid of the distinction between __resume_listener() and resume_listener() since the only difference was the lq_lock. All listener removals from the list are now unconditional to avoid races on the state. It's worth noting that the list used to never be initialized and that it used to work only thanks to the state tests, so the initialization has now been added. This patch must carefully be backported to 1.9 and very likely 1.8. It is mandatory to be careful about replacing all manipulations of l->wait_queue, global.listener_queue and p->listener_queue.	2019-02-28 16:08:54 +01:00
Willy Tarreau	c912f94b57	MINOR: server: remove a few unneeded LIST_INIT calls after LIST_DEL_LOCKED Since LIST_DEL_LOCKED() and LIST_POP_LOCKED() now automatically reinitialize the removed element, there's no need for keeping this LIST_INIT() call in the idle connection code.	2019-02-28 16:08:54 +01:00
Willy Tarreau	4c747e86cd	MINOR: list: make the delete and pop operations idempotent These operations previously used to return a "locked" element, which is a constraint when multiple threads try to delete the same element, because the second one will block indefinitely. Instead, let's make sure that both LIST_DEL_LOCKED() and LIST_POP_LOCKED() always reinitialize the element after deleting it. This ensures that the second thread will immediately unblock and succeed with the removal. It also secures the pop vs delete competition that may happen when trying to remove an element that's about to be dequeued.	2019-02-28 16:03:29 +01:00
Willy Tarreau	690d2ad4d2	BUG/MEDIUM: list: add missing store barriers when updating elements and head Commit `a8434ec14` ("MINOR: lists: Implement locked variations.") introduced locked lists which use the elements pointers as locks for concurrent operations. Under heavy stress the lists occasionally fail. The cause is a missing barrier at some points when updating the list element and the head : nothing prevents the compiler (or CPU) from updating the list head first before updating the element, making another thread jump to a wrong location. This patch simply adds the missing barriers before these two opeations. This will have to be backported if the commit above is backported.	2019-02-28 15:59:31 +01:00
Willy Tarreau	285192564d	BUG/MEDIUM: list: fix LIST_POP_LOCKED's removal of the last pointer There was a typo making the last updated pointer be the pre-last element's prev instead of the last's prev element. It didn't show up during early tests because the contention is very rare on this one and it's implicitly recovered when updating the pointers to go to the next element, but it was clearly visible in the listener_accept() tests by having all threads block on LIST_POP_LOCKED() with n==p==LLIST_BUSY. This will have to be backported if commit `a8434ec14` ("MINOR: lists: Implement locked variations.") is backported.	2019-02-28 15:59:31 +01:00
Willy Tarreau	bd20ad5874	BUG/MEDIUM: list: fix the rollback on addq in the locked liss Commit `a8434ec14` ("MINOR: lists: Implement locked variations.") introduced locked lists which use the elements pointers as locks for concurrent operations. A copy-paste typo in LIST_ADDQ_LOCKED() causes corruption in the list in case the next pointer is already held, as it restores the previous pointer into the next one. It may impact the server pools. This will have to be backported if the commit above is backported.	2019-02-28 15:10:15 +01:00
Willy Tarreau	18215cba6a	BUG/MINOR: config: don't over-count the global maxsock value global.maxsock used to be augmented by the frontend's maxconn value for each frontend listener, which is absurd when there are many listeners in a frontend because the frontend's maxconn fixes an upper limit to how many connections will be accepted on all of its listeners anyway. What is needed instead is to add one to count the listening socket. In addition, the CLI's and peers' value was incremented twice, the first time when creating the listener and the second time in the main init code. Let's now make sure we only increment global.maxsock by the required amount of sockets. This means not adding maxconn for each listener, and relying on the global values when they are correct.	2019-02-27 19:35:37 +01:00
Willy Tarreau	3f36448e17	DOC: update management.txt to reflect that threads are used by default It was still mentioned "single-threaded" there. It was also the opportunity to mention that multiple threads are started by default.	2019-02-27 15:01:46 +01:00
Willy Tarreau	149ab779cc	MAJOR: threads: enable one thread per CPU by default Threads have long matured by now, still for most users their usage is not trivial. It's about time to enable them by default on platforms where we know the number of CPUs bound. This patch does this, it counts the number of CPUs the process is bound to upon startup, and enables as many threads by default. Of course, "nbthread" still overrides this, but if it's not set the default behaviour is to start one thread per CPU. The default number of threads is reported in "haproxy -vv". Simply using "taskset -c" is now enough to adjust this number of threads so that there is no more need for playing with cpu-map. And thanks to the previous patches on the listener, the vast majority of configurations will not need to duplicate "bind" lines with the "process x/y" statement anymore either, so a simple config will automatically adapt to the number of processors available.	2019-02-27 14:51:50 +01:00
Willy Tarreau	7ac908bf8c	MINOR: config: add global tune.listener.multi-queue setting tune.listener.multi-queue { on \| off } Enables ('on') or disables ('off') the listener's multi-queue accept which spreads the incoming traffic to all threads a "bind" line is allowed to run on instead of taking them for itself. This provides a smoother traffic distribution and scales much better, especially in environments where threads may be unevenly loaded due to external activity (network interrupts colliding with one thread for example). This option is enabled by default, but it may be forcefully disabled for troubleshooting or for situations where it is estimated that the operating system already provides a good enough distribution and connections are extremely short-lived.	2019-02-27 14:27:07 +01:00
Willy Tarreau	8a03408d81	MINOR: activity: add accept queue counters for pushed and overflows It's important to monitor the accept queues to know if some incoming connections had to be handled by their originating thread due to an overflow. It's also important to be able to confirm thread fairness. This patch adds "accq_pushed" to activity reporting, which reports the number of connections that were successfully pushed into each thread's queue, and "accq_full", which indicates the number of connections that couldn't be pushed because the thread's queue was full.	2019-02-27 14:27:07 +01:00
Willy Tarreau	e0e9c48ab2	MAJOR: listener: use the multi-queue for multi-thread listeners The idea is to redistribute an incoming connection to one of the threads a bind_conf is bound to when there is more than one. We do this using a random improved by the p2c algorithm : a random() call returns two different thread numbers. We then compare their respective connection count and the length of their accept queues, and pick the least loaded one. We even use this deferred accept mechanism if the target thread ends up being the local thread, because this maintains fairness between all connections and tests show that it's about 1% faster this way, likely due to cache locality. If the target thread's accept queue is full, the connection is accepted synchronously by the current thread.	2019-02-27 14:27:07 +01:00
Willy Tarreau	1efafce61f	MINOR: listener: implement multi-queue accept for threads There is one point where we can migrate a connection to another thread without taking risk, it's when we accept it : the new FD is not yet in the fd cache and no task was created yet. It's still possible to assign it a different thread than the one which accepted the connection. The only requirement for this is to have one accept queue per thread and their respective processing tasks that have to be woken up each time an entry is added to the queue. This is a multiple-producer, single-consumer model. Entries are added at the queue's tail and the processing task is woken up. The consumer picks entries at the head and processes them in order. The accept queue contains the fd, the source address, and the listener. Each entry of the accept queue was rounded up to 64 bytes (one cache line) to avoid cache aliasing because tests have shown that otherwise performance suffers a lot (5%). A test has shown that it's important to have at least 256 entries for the rings, as at 128 it's still possible to fill them often at high loads on small thread counts. The processing task does almost nothing except calling the listener's accept() function and updating the global session and SSL rate counters just like listener_accept() does on synchronous calls. At this point the accept queue is implemented but not used.	2019-02-27 14:27:07 +01:00
Willy Tarreau	b2b50a7784	MINOR: listener: pre-compute some thread counts per bind_conf In order to quickly pick a thread ID when accepting a connection, we'll need to know certain pre-computed values derived from the thread mask, which are counts of bits per position multiples of 1, 2, 4, 8, 16 and 32. In practice it is sufficient to compute only the 4 first ones and store them in the bind_conf. We update the count every time the bind_thread value is adjusted. The fields in the bind_conf struct have been moved around a little bit to make it easier to group all thread bit values into the same cache line. The function used to return a thread number is bind_map_thread_id(), and it maps a number between 0 and 31/63 to a thread ID between 0 and 31/63, starting from the left.	2019-02-27 14:27:07 +01:00

1 2 3 4 5 ...

9283 Commits