haproxy

mirror of http://git.haproxy.org/git/haproxy.git/ synced 2025-01-20 12:40:46 +00:00

Author	SHA1	Message	Date
Willy Tarreau	e35c94a748	[MEDIUM] scheduler: get rid of the 4 trees thanks and use ebtree v4.1 Since we're now able to search from a precise expiration date in the timer tree using ebtree 4.1, we don't need to maintain 4 trees anymore. Not only does this simplify the code a lot, but it also ensures that we can always look 24 days back and ahead, which doubles the ability of the previous scheduler. Indeed, while based on absolute values, the timer tree is now relative to <now> as we can always search from <now>-31 bits. The run queue uses the exact same principle now, and is now simpler and a bit faster to process. With these changes alone, an overall 0.5% performance gain was observed. Tests were performed on the few wrapping cases and everything works as expected.	2009-03-21 10:25:14 +01:00
Willy Tarreau	5804434a0f	[MINOR] update ebtree to version 4.1 Ebtree version 4.1 brings lookup by ranges. This will be useful for the scheduler.	2009-03-21 10:23:36 +01:00
Willy Tarreau	8365f9335d	[CLEANUP] http: remove some commented out obsolete code in process_response	2009-03-15 23:11:49 +01:00
Willy Tarreau	86ef7dc98d	[MINOR] tcp_request: let the caller take care of errors and timeouts tcp_request is not meant to decide how an error or a timeout has to be handled. It must just apply it rules. Now that the error checks have been added to the session, we don't need to check them anymore in tcp_request_inspect(), which will only consider the shutdown which may be the result of such an error. That makes a lot more sense since tcp_request is not really waiting for a request.	2009-03-15 22:55:47 +01:00
Willy Tarreau	844553303d	[BUG] session: errors were not reported in termination flags in TCP mode In order to get termination flags properly updated, the session was relying a bit too much on http_return_srv_error() which is http-centric. A generic srv_error function was implemented in the session in order to catch all connection abort situations. It was then noticed that a request abort during a connection attempt was not reported, which is now fixed. Read and write errors/timeouts were not logged either. It was necessary to add those tests at 4 new locations. Now it looks like everything is correctly logged. Most likely some error checking code could now be removed from some analysers.	2009-03-15 22:34:05 +01:00
Willy Tarreau	a3780f2db8	[BUG] connect timeout is in the stream interface, not the buffer The connect timeout was not properly detected due to the fact that it was not correctly initialized. It must be set as the stream interface timeout, not the buffer's write timeout.	2009-03-15 21:49:00 +01:00
Willy Tarreau	5af24efee9	[CLEANUP] config: catch and report some possibly wrong rule ordering There are some configurations in which redirect rules are declared after use_backend rules. We can also find "block" rules after any of these ones. The processing sequence is : - block - redirect - use_backend So as of now we try to detect wrong ordering to warn the user about a possibly undesired behaviour.	2009-03-15 15:23:16 +01:00
Willy Tarreau	55bc0f8eb7	[MEDIUM] reverse internal proxy declaration order to match configuration People are regularly complaining that proxies are linked in reverse order when reading the stats. This is now definitely fixed because the proxy order is now fixed to match configuration order.	2009-03-15 14:51:53 +01:00
Willy Tarreau	d869b24119	[MINOR] tcp-inspect: permit the use of no-delay inspection Sometimes it may make sense to be able to immediately apply a verdict without waiting at all. It was not possible because no inspect-delay meant no inspection at all. This is now fixed.	2009-03-15 14:43:58 +01:00
Willy Tarreau	3cd9af228f	[MINOR] cfgparse: set backends to "balance roundrobin" by default When a backend has no LB algo specified and is not in dispatch, proxy nor transparent mode, use "balance roundrobin" by default instead of complaining. This will be particularly useful with stats and redirects.	2009-03-15 14:11:27 +01:00
Willy Tarreau	ff01a21ebe	[MINOR] cfgparse: some cleanups in the consistency checks Check for servers in health mode, for health mode in pure-backends. Some code have been refactored for better organization.	2009-03-15 13:46:16 +01:00
Willy Tarreau	787bbd9b7a	[MINOR] show errors: encode backslash as well as non-ascii characters These ones were not properly encoded, causing confusion on the output.	2009-03-12 08:18:33 +01:00
Willy Tarreau	c9619468ea	[BUG] stream_sock: write timeout must be updated when forwarding ! When data are forwarded between socket, we must update the output socket's write timeout. This was forgotten, causing sessions to unexpectedly expire during long posts.	2009-03-09 22:40:57 +01:00
Willy Tarreau	6bf1736fb1	[BUILD] proto_http did not build on gcc-2.95 (again) move the DPRINTF below the local variable declarations. (cherry picked from commit `7b92db4cd5`) The patch accidently got reverted.	2009-03-08 23:10:34 +01:00
Willy Tarreau	87bed62a92	[BUILD] build fixes for Solaris One build error in stream_sock.c when MSG_NOSIGNAL is not defined, and a warning in task.c.	2009-03-08 22:25:28 +01:00
Willy Tarreau	7c84bab879	[MEDIUM] rearrange forwarding condition to enable splice during analysis The forwarding condition was not very clear. We would only enable forwarding when send_max is zero, and we would only splice when no analyser is installed. In fact we want to enable forward when there is no analyser and we want to splice at soon as there is data to forward, regardless of the analysers.	2009-03-08 21:38:23 +01:00
Willy Tarreau	6f0aa476bd	[CLEANUP] buffer_flush() was misleading, rename it as buffer_erase	2009-03-08 20:33:29 +01:00
Willy Tarreau	ed066fae25	[CLEANUP] don't enable kernel splicing when socket is closed Splicing will not be used when the source socket is closed. Don't enable it uselessly.	2009-03-08 19:44:29 +01:00
Willy Tarreau	0be0ef9604	[OPTIM] do not re-check req buffer when only response has changed In process_session(), we used to re-run through all the evaluation loop when only the response had changed. Now we carefully check in this order : - changes to the stream interfaces (only SI_ST_DIS) - changes to the request buffer flags - changes to the response buffer flags And we branch to the appropriate section. This saves significant CPU cycles, which is important since process_session() is one of the major CPU eaters. The same changes have been applied to uxst_process_session().	2009-03-08 19:20:25 +01:00
Willy Tarreau	531cf0cf8d	[OPTIM] task: reduce the number of calls to task_queue() Most of the time, task_queue() will immediately return. By extracting the preliminary checks and putting them in an inline function, we can significantly reduce the number of calls to the function itself, and most of the tests can be optimized away due to the caller's context. Another minor improvement in process_runnable_tasks() consisted in taking benefit from the processor's branch prediction unit by making a special case of the process_session() callback which is by far the most common one. All this improved performance by about 1%, mainly during the call from process_runnable_tasks().	2009-03-08 16:35:27 +01:00
Willy Tarreau	d0a201b35c	[CLEANUP] task: distinguish between clock ticks and timers Timers are unsigned and used as tree positions. Ticks are signed and used as absolute date within current time frame. While the two are normally equal (except zero), it's important not to confuse them in the code as they are not interchangeable. We add two inline functions to turn each one into the other. The comments have also been moved to the proper location, as it was not easy to understand what was a tick and what was a timer unit.	2009-03-08 15:58:07 +01:00
Willy Tarreau	721fdbc381	[BUG] event_accept() must always wake the task up, even in health mode event_accept() did not wake the task up in health mode, so that mode was not working anymore.	2009-03-08 12:25:07 +01:00
Willy Tarreau	26c250683f	[MEDIUM] minor update to the task api: let the scheduler queue itself All the tasks callbacks had to requeue the task themselves, and update a global timeout. This was not convenient at all. Now the API has been simplified. The tasks callbacks only have to update their expire timer, and return either a pointer to the task or NULL if the task has been deleted. The scheduler will take care of requeuing the task at the proper place in the wait queue.	2009-03-08 09:38:41 +01:00
Willy Tarreau	4136522527	[OPTIM] displace tasks in the wait queue only if absolutely needed We don't need to remove then add tasks in the wait queue every time we update a timeout. We only need to do that when the new timeout is earlier than previous one. We can rely on wake_expired_tasks() to perform the proper checks and bounce the misplaced tasks in the rare case where this happens. The motivation behind this is that we very rarely hit timeouts, so we save a lot of CPU cycles by moving the tasks very rarely. This now means we can also find tasks with expiration date set to eternity in the queue, and that is not a problem.	2009-03-08 07:59:27 +01:00
Willy Tarreau	4726f53794	[OPTIM] task: don't unlink a task from a wait queue when waking it up In many situations, we wake a task on an I/O event, then queue it exactly where it was. This is a real waste because we delete/insert tasks into the wait queue for nothing. The only reason for this is that there was only one tree node in the task struct. By adding another tree node, we can have one tree for the timers (wait queue) and one tree for the priority (run queue). That way, we can have a task both in the run queue and wait queue at the same time. The wait queue now really holds timers, which is what it was designed for. The net gain is at least 1 delete/insert cycle per session, and up to 2-3 depending on the workload, since we save one cycle each time the expiration date is not changed during a wake up.	2009-03-08 07:59:18 +01:00
Willy Tarreau	1b8ca663a4	[BUG] task: fix handling of duplicate keys A bug was introduced with the ebtree-based scheduler. It seldom causes some timeouts to last longer than required if they hit an expiration date which is the same as the last queued date, is also part of a duplicate tree without being the top of the tree. In this case, the task will not be expired until after the duplicate tree has been flushed. It is easier to reproduce by setting a very short client timeout (1s) and sending connections and waiting for them to expire with the 408 status. Then in parallel, inject at about 1kh/s. The bug causes the connections to sometimes wait longer than 1s before timing out. The cause was the use of eb_insert_dup() on wrong nodes, as this function is designed to work only on the top of the dup tree. The solution consists in updating last_timer only when its bit is -1, and using it only if its bit is still -1 (top of a dup tree). The fix has not reduced performance because it only fixes the case where this bug could fire, which is extremely rare.	2009-03-08 07:57:47 +01:00
Willy Tarreau	39af0f663d	[BUG] rate-limit in defaults section was ignored Just a missing initialisation of the field when creating a proxy.	2009-03-07 11:53:44 +01:00
Willy Tarreau	2ade301505	[BUG] disable any analysers for monitoring requests We must not parse an HTTP request on a monitoring request. In fact, we should even create a dedicated monitoring analyser.	2009-03-06 19:16:39 +01:00
Willy Tarreau	3d8c5531d8	[OPTIM] freq_ctr: do not rotate the counters when reading It's easier to take the counter's age into account when consulting it than to rotate it first. It also saves some CPU cycles and avoids the multiply for outdated counters, finally saving CPU cycles here too when multiple operations need to read the same counter. The freq_ctr code has also shrinked by one third consecutively to these optimizations.	2009-03-06 14:29:25 +01:00
Willy Tarreau	ec22b2c27a	[CLEANUP] remove last references to term_trace term_trace was very useful while reworking the lower layers but has almost completely been removed from every place it was referenced. Even the few remaining ones were not accurate, so it's better to completely remove those references and re-add them from scratch later if needed.	2009-03-06 13:07:40 +01:00
Willy Tarreau	9279562e2a	[BUG] switch server-side stream interface to close in case of abort In pure TCP mode, there is no response analyser to switch the server-side stream interface from INI to CLO when the output has been closed after an abort. This caused sessions to remain indefinitely active when they were aborted by the client during a TCP content analysis. The proper action is to switch the stream interface to the CLO state from INI when we have write enable and shutdown write.	2009-03-06 12:51:23 +01:00
Willy Tarreau	79584225e5	[OPTIM] rate-limit: cleaner behaviour on low rates and reduce consumption The rate-limit was applied to the smoothed value which does a special case for frequencies below 2 events per period. This caused irregular limitations when set to 1 session per second. The proper way to handle this is to compute the number of remaining events that can occur without reaching the limit. This is what has been added. It also has the benefit that the frequency calculation is now done once when entering event_accept(), before the accept() loop, and not once per accept() loop anymore, thus saving a few CPU cycles during very high loads. With this fix, rate limits of 1/s are perfectly respected.	2009-03-06 09:18:27 +01:00
Willy Tarreau	efcbc6e66d	[OPTIM] maintain_proxies: only wake up when the frontend will be ready It's not needed to try to check the frontend's freq counter every millisecond, we can precisely compute when to wake up.	2009-03-06 08:27:10 +01:00
Willy Tarreau	bb9251ed8f	[BUG] typo in timeout error reporting : report res and not err	2009-03-06 08:05:40 +01:00
Willy Tarreau	604e83097f	[BUG] interface binding: length must include the trailing zero The interface length passed to the setsockopt(SO_BINDTODEVICE) must include the trailing \0. Otherwise it will randomly fail.	2009-03-06 00:48:23 +01:00
Willy Tarreau	3a7d20781d	[MEDIUM] implement "rate-limit sessions" for the frontend The new "rate-limit sessions" statement sets a limit on the number of new connections per second on the frontend. As it is extremely accurate (about 0.1%), it is efficient at limiting resource abuse or DoS.	2009-03-05 23:48:25 +01:00
Willy Tarreau	079ff0a207	[MINOR] acl: add 2 new verbs: fe_sess_rate and be_sess_rate These new ACLs match frontend session rate and backend session rate. Examples are provided in the doc to explain how to use that in order to limit abuse of service.	2009-03-05 21:34:28 +01:00
Willy Tarreau	3a8efeb46d	[BUG] the "connslots" keyword was matched as "connlots" This bug has been lying there since the patch got merged.	2009-03-05 21:31:36 +01:00
Willy Tarreau	7f062c4193	[MEDIUM] measure and report session rate on frontend, backends and servers With this change, all frontends, backends, and servers maintain a session counter and a timer to compute a session rate over the last second. This value will be very useful because it varies instantly and can be used to check thresholds. This value is also reported in the stats in a new "rate" column.	2009-03-05 18:43:00 +01:00
Willy Tarreau	755905857a	[MINOR] add curr_sec_ms and curr_sec_ms_scaled for current second. Several algorithms will need to know the millisecond value within the current second. Instead of doing a divide every time it is needed, it's better to compute it when it changes, which is when now and now_ms are recomputed. curr_sec_ms_scaled is the same multiplied by 2^32/1000, which will be useful to compute some ratios based on the position within last second.	2009-03-05 16:56:16 +01:00
Willy Tarreau	defc52da95	[MINOR] errors dump must use user-visible date, not internal date.	2009-03-04 20:53:44 +01:00
Willy Tarreau	74808cb907	[MEDIUM] implement error dump on unix socket with "show errors" The new "show errors" command sent on a unix socket will dump all captured request and response errors for all proxies. It is also possible to bound the log to frontends and backends whose ID is passed as an optional parameter. The output provides information about frontend, backend, server, session ID, source address, error type, and error position along with a complete dump of the request or response which has caused the error. If a new error scratches the one currently being reported, then the dump is aborted with a warning message, and processing goes on to next error.	2009-03-04 15:53:18 +01:00
Willy Tarreau	f073a83b1d	[MEDIUM] store a complete dump of request and response errors in proxies Each proxy instance, either frontend or backend, now has some room dedicated to storing a complete dated request or response in case of parsing error. This will make it possible to consult errors in order to find the exact cause, which is particularly important for troubleshooting faulty applications.	2009-03-04 10:26:38 +01:00
Willy Tarreau	7552c031c0	[MINOR] ensure that http_msg_analyzer updates pointer to invalid char If an invalid character is encountered while parsing an HTTP message, we want to get buf->lr updated to reflect it. Along this change, a few useless __label__ declarations have been removed because they caused gcc to consume stack space without putting anything there.	2009-03-01 11:10:40 +01:00
Willy Tarreau	f49d1df25c	[BUG] global.tune.maxaccept must be limited even in mono-process mode On overloaded systems, it sometimes happens that hundreds or thousands of incoming connections are queued in the system's backlog, and all get dequeued at once. The problem is that when haproxy processes them and does not apply any limit, this can take some time and the internal date does not progress, resulting in wrong timer measures for all sessions. The most common effect of this is that all of these sessions report a large request time (around several hundreds of ms) which is in fact caused by the time spent accepting other connections. This might happen on shared systems when the machine swaps. For this reason, we finally apply a reasonable limit even in mono-process mode. Accepting 100 connections at once is fast enough for extreme cases and will not cause that much of a trouble when the system is saturated.	2009-03-01 08:35:41 +01:00
Willy Tarreau	368480cf45	[BUG] the "source" keyword must first clear optional settings Problem reported by John Lauro. When "source ... usesrc ..." is set in the defaults section, it is not possible anymore to remove the "usesrc" part when declaring a more precise "source" in a backend. The only workaround was to declare it by server. We need to clear optional settings when declaring a new "source". The problem was the same with the "interface" declaration.	2009-03-01 08:27:21 +01:00
Willy Tarreau	7b92db4cd5	[BUILD] proto_http did not build on gcc-2.95 move the DPRINTF below the local variable declarations.	2009-02-24 10:48:35 +01:00
Willy Tarreau	38c99bcb98	[BUG] fix unix socket processing of interrupted output Unix socket processing was still quite buggy. It did not properly handle interrupted output due to a full response buffer. The fix mainly consists in not trying to prematurely enable write on the response buffer, just like the standard session works. This also gets the unix socket code closer to the standard session code handling.	2009-02-22 15:58:45 +01:00
Willy Tarreau	fd3828e263	[BUG] fix random memory corruption using "show sess" Commit `8a5c626e73` introduced the sessions dump on the unix socket. This implementation is buggy because it may try to link to the sessions list's head after the last session is removed with a backref. Also, for the LIST_ISEMPTY test to succeed, we have to proceed with LIST_INIT after LIST_DEL.	2009-02-22 15:17:24 +01:00
Vincenzo Farruggia	9b97cff1c2	[BUILD] Haproxy won't compile if DEBUG_FULL is defined As subject when i try to compile haproxy with -DDEBUG_FULL it stop at stream_sock.c file with: gcc -Iinclude -Wall -O2 -g -DDEBUG_FULL -DTPROXY -DENABLE_POLL -DENABLE_EPOLL -DENABLE_SEPOLL -DNETFILTER -DUSE_GETSOCKNAME -DCONFIG_HAPROXY_VERSION=\"1.3.15\" -DCONFIG_HAPROXY_DATE=\"2008/04/19\" -c -o src/stream_sock.o src/stream_sock.c src/stream_sock.c: In function 'stream_sock_chk_rcv': src/stream_sock.c:905: error: 'fd' undeclared (first use in this function) src/stream_sock.c:905: error: (Each undeclared identifier is reported only once src/stream_sock.c:905: error: for each function it appears in.) src/stream_sock.c:905: error: 'ob' undeclared (first use in this function) src/stream_sock.c: In function 'stream_sock_chk_snd': src/stream_sock.c:940: error: 'fd' undeclared (first use in this function) src/stream_sock.c:940: error: 'ib' undeclared (first use in this function) make: *** [src/stream_sock.o] Error 1 With this patch all build fine:	2009-02-04 22:46:19 +01:00
Krzysztof Piotr Oledzki	f39c71c981	[CRITICAL] fix server state tracking: it was O(n!) instead of O(n) Using the wrong operator (&& instead of &) causes DOWN->UP transition to take longer than it should and to produce a lot of redundant logs. With typical "track" usage (1-6 tracking servers) it shouldn't make a big difference but for heavily tracked servers this bug leads to hang with 100% CPU usage and extremely big log spam.	2009-02-04 22:39:03 +01:00
Willy Tarreau	0b9c02c861	[MEDIUM] implement bind-process to limit service presence by process The "bind-process" keyword lets the admin select which instances may run on which process (in multi-process mode). It makes it easier to more evenly distribute the load across multiple processes by avoiding having too many listen to the same IP:ports.	2009-02-04 22:05:05 +01:00
Willy Tarreau	c76721da57	[MEDIUM] add support for source interface binding at the server level Add support for "interface <name>" after the "source" statement on the server line.	2009-02-04 20:20:58 +01:00
Willy Tarreau	d53f96b3f0	[MEDIUM] add support for source interface binding Specifying "interface <name>" after the "source" statement allows one to bind to a specific interface for proxy<->server traffic. This makes it possible to use multiple links to reach multiple servers, and to force traffic to pass via an interface different from the one the system would have chosen based on the routing table.	2009-02-04 18:46:54 +01:00
Willy Tarreau	4e30ed73f4	[BUG] inform the user when root is expected but not set When a plain user runs haproxy as non-root but some options require root, let's inform him.	2009-02-04 18:02:48 +01:00
Willy Tarreau	5e6e204d1c	[MINOR] add support for bind interface name By appending "interface <name>" to a "bind" line, it is now possible to specifically bind to a physical interface name. Note that this currently only works on Linux and requires root privileges.	2009-02-04 17:19:29 +01:00
Willy Tarreau	0a3b9d90d3	[BUG] we must not exit if protocol binding only returns a warning Right now, protocol binding cannot return a warning, but when this will happen, we must not exit but just print the warning.	2009-02-04 17:05:23 +01:00
Krzysztof Piotr Oledzki	7b723efca3	[DOC] remove buggy comment for use_backend "early blocking based on ACLs" is definitely wrong here	2009-01-27 21:30:31 +01:00
Krzysztof Piotr Oledzki	52d522b566	[BUG] Fix listen & more of 2 couples <ip>:<port> Fix "listen www-mutualise 80.248.x.y1:80,80.248.x.y2:80,80.248.x.y3:80": [ALERT] 309/161509 (15450) : Invalid server address: '80.248.x.y1:80,80.248.x.y2' [ALERT] 309/161509 (15450) : Error reading configuration file : /etc/haproxy/haproxy.cfg Bug reported by Laurent Dolosor.	2009-01-27 21:00:18 +01:00
Willy Tarreau	3ab68cf0ae	[MEDIUM] splice: add the global "nosplice" option Setting "nosplice" in the global section will disable the use of TCP splicing (both tcpsplice and linux 2.6 splice). The same will be achieved using the "-dS" parameter on the command line.	2009-01-25 16:03:28 +01:00
Willy Tarreau	43b78999ec	[MEDIUM] move global tuning options to the global structure The global tuning options right now only concern the polling mechanisms, and they are not in the global struct itself. It's not very practical to add other options so let's move them to the global struct and remove types/polling.h which was not used for anything else.	2009-01-25 15:42:27 +01:00
Willy Tarreau	686ac828fa	[OPTIM] make global.maxpipes default to global.maxconn/4 when not specified global.maxconn/4 seems to be a good hint for global.maxpipes when that one must be guessed. If the limit is reached, it's still possible to set it manually in the configuration.	2009-01-25 14:06:58 +01:00
Willy Tarreau	a206fa9d5d	[STATS] report pipe usage in the statistics Pipe usage is reported in info and web stats including maxpipes, pipes_free, and pipes_used.	2009-01-25 14:02:00 +01:00
Willy Tarreau	3eba98aa57	[MEDIUM] splice: make use of pipe pools Using pipe pools makes pipe management a lot easier. It also allows to remove quite a bunch of #ifdefs in areas which depended on the presence or not of support for kernel splicing. The buffer now holds a pointer to a pipe structure which is always NULL except if there are still data in the pipe. When it needs to use that pipe, it dynamically allocates it from the pipe pool. When the data is consumed, the pipe is immediately released. That way, there is no need anymore to care about pipe closure upon session termination, nor about pipe creation when trying to use splice(). Another immediate advantage of this method is that it considerably reduces the number of pipes needed to use splice(). Tests have shown that even with 0.2 pipe per connection, almost all sessions can use splice(), because the same pipe may be used by several consecutive calls to splice().	2009-01-25 13:56:13 +01:00
Willy Tarreau	982b6e37e4	[MEDIUM] introduce pipe pools A new data type has been added : pipes. Some pre-allocated empty pipes are maintained in a pool for users such as splice which use them a lot for very short times. Pipes are allocated using get_pipe() and released using put_pipe(). Pipes which are released with pending data are immediately killed. The struct pipe is small (16 to 20 bytes) and may even be further reduced by unifying ->data and ->next. It would be nice to have a dedicated cleanup task which would watch for the pipes usage and destroy a few of them from time to time.	2009-01-25 13:49:53 +01:00
Willy Tarreau	98b306be65	[MEDIUM] splice: add hints to support older buggy kernels Kernels before 2.6.27.13 would have splice() return EAGAIN on shutdown. By adding a few tricks, we can deal with the situation. If splice() returns EAGAIN and the pipe is empty, then fallback to recv() which will be able to check if it's an end of connection or not. The advantage of this method is that it remains transparent for good kernels since there is no reason that epoll() will return EPOLLIN without anything to read, and even if it would happen, the recv() overhead on this check is minimal.	2009-01-25 11:11:32 +01:00
Willy Tarreau	afb4876778	[BUG] reserve some pipes for backends with splice enabled If splicing is enabled in a backend, we need to guess how many pipes will be needed. We used to rely on fullconn, but this leads to non-working splicing when fullconn is not specified. So we now fallback to global.maxconn.	2009-01-25 10:42:05 +01:00
Willy Tarreau	5bd8c376ad	[MAJOR] complete support for linux 2.6 kernel splicing This code provides support for linux 2.6 kernel splicing. This feature appeared in kernel 2.6.25, but initial implementations were awkward and buggy. A kernel >= 2.6.29-rc1 is recommended, as well as some optimization patches. Using pipes, this code is able to pass network data directly between sockets. The pipes are a bit annoying to manage (fd creation, release, ...) but finally work quite well. Preliminary tests show that on high bandwidths, there's a substantial gain (approx +50%, only +20% with kernel workarounds for corruption bugs). With 2000 concurrent connections, with Myricom NICs, haproxy now more easily achieves 4.5 Gbps for 1 process and 6 Gbps for two processes buffers. 8-9 Gbps are easily reached with smaller numbers of connections. We also try to splice out immediately after a splice in by making profit from the new ability for a data producer to notify the consumer that data are available. Doing this ensures that the data are immediately transferred between sockets without latency, and without having to re-poll. Performance on small packets has considerably increased due to this method. Earlier kernels return only one TCP segment at a time in non-blocking splice-in mode, while newer return as many segments as may fit in the pipe. To work around this limitation without hurting more recent kernels, we try to collect as much data as possible, but we stop when we believe we have read 16 segments, then we forward everything at once. It also ensures that even upon shutdown or EAGAIN the data will be forwarded. Some tricks were necessary because the splice() syscall does not make a difference between missing data and a pipe full, it always returns EAGAIN. The trick consists in stop polling in case of EAGAIN and a non empty pipe. The receiver waits for the buffer to be empty before using the pipe. This is in order to avoid confusion between buffer data and pipe data. The BF_EMPTY flag now covers the pipe too. Right now the code is disabled by default. It needs to be built with CONFIG_HAP_LINUX_SPLICE, and the instances intented to use splice() must have "option splice-response" (or option splice-request) enabled. It is probably desirable to keep a pool of pre-allocated pipes to avoid having to create them for every session. This will be worked on later. Preliminary tests show very good results, even with the kernel workaround causing one memcpy(). At 3000 connections, performance has moved from 3.2 Gbps to 4.7 Gbps.	2009-01-19 00:32:22 +01:00
Willy Tarreau	6b4aad4c1b	[MEDIUM] add definitions for Linux kernel splicing Some older libc don't define the splice() syscall, and some even define a wrong one. For this reason, we try our best to declare it correctly. These definitions still work with recent glibc.	2009-01-18 21:59:13 +01:00
Willy Tarreau	259de1b702	[MINOR] introduce structures required to support Linux kernel splicing When CONFIG_HAP_LINUX_SPLICE is defined, the buffer structure will be slightly enlarged to support information needed for kernel splicing on Linux. A first attempt consisted in putting this information into the stream interface, but in the long term, it appeared really awkward. This version puts the information into the buffer. The platform-dependant part is conditionally added and will only enlarge the buffers when compiled in. One new flag has also been added to the buffers: BF_KERN_SPLICING. It indicates that the application considers it is appropriate to use splicing to forward remaining data.	2009-01-18 21:56:21 +01:00
Willy Tarreau	66aa61f76b	[MEDIUM] splice: add configuration options and set global.maxpipes Three new options have been added when CONFIG_HAP_LINUX_SPLICE is set : - splice-request - splice-response - splice-auto They are used to enable splicing per frontend/backend. They are also supported in defaults sections. The "splice-auto" option is meant to automatically turn splice on for buffers marked as fast streamers. This should save quite a bunch of file descriptors. It was required to add a new "options2" field to the proxy structure because the original "options" is full. When global.maxpipes is not set, it is automatically adjusted to the max of the sums of all frontend's and backend's maxconns for those which have at least one splice option enabled.	2009-01-18 21:44:07 +01:00
Willy Tarreau	3ec79b9c42	[MINOR] global.maxpipes: add the ability to reserve file descriptors for pipes This will be needed to use linux's splice() syscall.	2009-01-18 20:39:42 +01:00
Willy Tarreau	a456f2a059	[MEDIUM] stream_sock: try to send pending data on chk_snd() When the producer calls stream_sock_chk_snd(), we now try to send all pending data asynchronously. If it succeeds, we don't have to enable polling on the FD which saves about half of the calls to epoll_wait(). In stream_sock_read(), we finally set the WAIT_ROOM flag as soon as possible, in preparation of the splice code. We reset it when we detect that some room has been released either in the buffer or in the splice.	2009-01-18 19:43:47 +01:00
Willy Tarreau	d2def0fd25	[MINOR] stream_sock: fix a few wrong empty calculations	2009-01-18 17:37:33 +01:00
Willy Tarreau	9c0fe59612	[MEDIUM] stream_sock_read: call ->chk_snd whenever there are data pending The condition to cakk ->chk_snd() in stream_sock_read() was suboptimal because we did not call it when the socket was shut down nor when there was an error after data were added. Now we ensure to call is whenever there are data pending. Also, the "full" condition was handled before calling chk_snd(), which could cause deadlock issues if chk_snd() did consume some data.	2009-01-18 16:25:31 +01:00
Willy Tarreau	0c2fc1f39d	[MEDIUM] split stream_sock_write() into callback and core functions stream_sock_write() has been split in two parts : - the poll callback, intented to be called when an I/O event has been detected - the write() core function, which ought to be usable from various other places, possibly not meant to wake the task up. The code has also been slightly cleaned up in the process. It's more readable now.	2009-01-18 15:48:52 +01:00
Willy Tarreau	ac128fef73	[CLEANUP] stream_sock: move the write-nothing condition out of the loop Some tricks to handle situations where we write nothing were in the middle of the main loop in stream_sock_write(). This cleanup provides better source and object code, and slightly shrinks the output code.	2009-01-09 13:05:19 +01:00
Willy Tarreau	efc612c17b	[CLEANUP] replace a few occurrences of (flags & X) && !(flags & Y) This construct collapses into ((flags & (X\|Y)) == X) when X is a single-bit flag. This provides a noticeable code shrink and the output code results in less conditional jumps.	2009-01-09 12:18:24 +01:00
Willy Tarreau	68eac13217	[OPTIM] stream_sock: factor out the buffer full handling out of the loop Handling the buffer full condition is not trivial and this code was duplicated inside the loop. Move it out of the loop at a single place.	2009-01-09 11:38:52 +01:00
Willy Tarreau	03d60bbaf9	[OPTIM] buffer: replace rlim by max_len In the buffers, the read limit used to leave some place for header rewriting was set by a pointer to the end of the buffer. Not only this required subtracts at every place in the code, but this will also soon not be usable anymore when we want to support keepalive. Let's replace this with a length limit, comparable to the buffer's length. This has also sightly reduced the code size.	2009-01-09 11:14:39 +01:00
Willy Tarreau	af78d0fdb6	[OPTIM] stream_sock: do not ask for polling on EAGAIN if we have read It is not always wise to return 0 in stream_sock_read() upon EAGAIN, because if we have read enough data, we should consider that enough and try again later without polling in between. We still make a difference between small reads and large reads though. Small reads still lead to polling because we're sure that there's nothing left in the system's buffers if we read less than one MSS.	2009-01-09 10:15:03 +01:00
Willy Tarreau	0abebcc0fb	[MEDIUM] i/o: rework ->to_forward and ->send_max The way the buffers and stream interfaces handled ->to_forward was really not handy for multiple reasons. Now we've moved its control to the receive-side of the buffer, which is also responsible for keeping send_max up to date. This makes more sense as it now becomes possible to send some pre-formatted data followed by forwarded data. The following explanation has also been added to buffer.h to clarify the situation. Right now, tests show that the I/O is behaving extremely well. Some work will have to be done to adapt existing splice code though. /* Note about the buffer structure The buffer contains two length indicators, one to_forward counter and one send_max limit. First, it must be understood that the buffer is in fact split in two parts : - the visible data (->data, for ->l bytes) - the invisible data, typically in kernel buffers forwarded directly from the source stream sock to the destination stream sock (->splice_len bytes). Those are used only during forward. In order not to mix data streams, the producer may only feed the invisible data with data to forward, and only when the visible buffer is empty. The consumer may not always be able to feed the invisible buffer due to platform limitations (lack of kernel support). Conversely, the consumer must always take data from the invisible data first before ever considering visible data. There is no limit to the size of data to consume from the invisible buffer, as platform-specific implementations will rarely leave enough control on this. So any byte fed into the invisible buffer is expected to reach the destination file descriptor, by any means. However, it's the consumer's responsibility to ensure that the invisible data has been entirely consumed before consuming visible data. This must be reflected by ->splice_len. This is very important as this and only this can ensure strict ordering of data between buffers. The producer is responsible for decreasing ->to_forward and increasing ->send_max. The ->to_forward parameter indicates how many bytes may be fed into either data buffer without waking the parent up. The ->send_max parameter says how many bytes may be read from the visible buffer. Thus it may never exceed ->l. This parameter is updated by any buffer_write() as well as any data forwarded through the visible buffer. The consumer is responsible for decreasing ->send_max when it sends data from the visible buffer, and ->splice_len when it sends data from the invisible buffer. A real-world example consists in part in an HTTP response waiting in a buffer to be forwarded. We know the header length (300) and the amount of data to forward (content-length=9000). The buffer already contains 1000 bytes of data after the 300 bytes of headers. Thus the caller will set ->send_max to 300 indicating that it explicitly wants to send those data, and set ->to_forward to 9000 (content-length). This value must be normalised immediately after updating ->to_forward : since there are already 1300 bytes in the buffer, 300 of which are already counted in ->send_max, and that size is smaller than ->to_forward, we must update ->send_max to 1300 to flush the whole buffer, and reduce ->to_forward to 8000. After that, the producer may try to feed the additional data through the invisible buffer using a platform-specific method such as splice(). */	2009-01-09 10:15:03 +01:00
Willy Tarreau	4d9b1dee9f	[MEDIUM] stream_sock: factor out the return path in case of no-writes Previously, we wrote nothing only if the buffer was empty. Now with send_max, we can also write nothing because we are not allowed to send anything due to send_max. The code starts to look like spaghetti. It needs to be rearranged a lot before merging the splice patches.	2009-01-09 10:15:02 +01:00
Willy Tarreau	dcef33fa9b	[MINOR] add the splice_len member to the buffer struct in preparation of splice support In preparation of splice support, let's add the splice_len member to the buffer struct. An earlier implementation made it conditional, which made the whole logics very complex due to a large number of ifdefs. Now BF_EMPTY is only set once both buf->l and buf->splice_len are null. Splice_len is initialized to zero during buffer creation and is currently not changed, so the whole logics remains unaffected. When splice gets merged, splice_len will reflect the number of bytes in flight out of the buffer but not yet sent, typically in a pipe for the Linux case.	2009-01-09 10:15:02 +01:00
Willy Tarreau	6b66f3e4f6	[MAJOR] implement autonomous inter-socket forwarding If an analyser sets buf->to_forward to a given value, that many data will be forwarded between the two stream interfaces attached to a buffer without waking the task up. The same applies once all analysers have been released. This saves a large amount of calls to process_session() and a number of task_dequeue/queue.	2009-01-09 10:15:02 +01:00
Willy Tarreau	3ffeba1f67	[MEDIUM] enable inter-stream_interface wakeup calls By letting the producer tell the consumer there is data to check, and the consumer tell the producer there is some space left again, we can cut in half the number of session wakeups. This is also an important starting point for future splicing support.	2008-12-28 11:09:02 +01:00
Willy Tarreau	b0ef735c71	[MINOR] add flags to indicate when a stream interface is waiting for space/data It will soon be required to know when a stream interface is waiting for buffer data or buffer room. Let's add two flags for that.	2008-12-28 11:08:03 +01:00
Willy Tarreau	86491c3164	[MEDIUM] indicate when we don't care about read timeout Sometimes we don't care about a read timeout, for instance, from the client when waiting for the server, but we still want the client to be able to read. Till now it was done by articially forcing the read timeout to ETERNITY. But this will cause trouble when we want the low level stream sock to communicate without waking the session up. So we add a BF_READ_NOEXP flag to indicate that when the read timeout is to be set, it might have to be set to ETERNITY. Since BF_READ_ENA was not used, we replaced this flag.	2008-12-28 11:06:40 +01:00
Willy Tarreau	f890dc9003	[MEDIUM] add a send limit to a buffer For keep-alive, line-mode protocols and splicing, we will need to limit the sender to process a certain amount of bytes. The limit is automatically set to the buffer size when analysers are detached from the buffer.	2008-12-28 10:58:52 +01:00
Willy Tarreau	05cb29bcd0	[MINOR] transfer errors were not reported anymore in data phase	2008-12-28 10:58:25 +01:00
Willy Tarreau	4b1f85912c	[BUG] "option transparent" is for backend, not frontend ! "option transparent" was set and checked on frontends only while it is purely a backend thing as it replaces the "balance" mode. For this reason, it did only work in "listen" sections. This change will then not affect the rare users of this option.	2008-12-23 23:13:55 +01:00
Willy Tarreau	7cd9d94360	[BUG] check timeout must not be changed if timeout.check is not set This causes health checks to stop after some time since the new ticks-based scheduler because a check timeout is set to eternity. This fix must be merged into master but not in earlier versions as it only affects the new scheduler. (cherry picked from commit e349eb452b655dc1adc059f05ba8b36565753393)	2008-12-23 09:58:49 +01:00
Willy Tarreau	8a5c626e73	[MINOR] stats: indicate if a task is running in "show sess" It's sometimes useful to know that a task is currently running.	2008-12-08 00:16:21 +01:00
Willy Tarreau	922a806075	[BUG] do not dequeue the backend's pending connections on a dead server Kai Krueger found that previous patch was incomplete, because there is an unconditionnal call to process_srv_queue() in session_free() which still causes a dead server to consume pending connections from the backend. This call was made unconditionnal so that we don't leave unserved connections in the server queue, for instance connections coming in with "option persist" which can bypass the server status check. However, the server must not touch the backend's queue if it is down. Another fear was that some connections might remain unserved when the server is using a dynamic maxconn if the number of connections to the backend is too low. Right now, srv_dynamic_maxconn() ensures this cannot happen, so the call can remain conditionnal. The fix consists in allowing a server to process it own queue whatever its state, but not to touch the backend's queue if it is down. Its queue should normally be empty when the server is down because it is redistributed when the server goes down. The only remaining cases are precisely the persistent connections with "option persist" set, coming in after the queue has been redispatched. Those ones must still be processed when a connection terminates. (cherry picked from commit `cd485c4480`)	2008-12-07 23:51:12 +01:00
Willy Tarreau	fe651a50d6	[MINOR] redirect: in prefix mode a "/" means not to change the URI If the prefix is set to "/", it means the user does not want to alter the original URI, so we don't want to insert a new slash before the original URI. (cherry-picked from commit 02a35c74942c1bce762e996698add1270e6a5030)	2008-12-07 23:48:39 +01:00
Willy Tarreau	0140f2553c	[MINOR] redirect: add support for "set-cookie" and "clear-cookie" It is now possible to set or clear a cookie during a redirection. This is useful for logout pages, or for protecting against some DoSes. Check the documentation for the options supported by the "redirect" keyword. (cherry-picked from commit 4af993822e880d8c932f4ad6920db4c9242b0981)	2008-12-07 23:46:38 +01:00
Willy Tarreau	79da4697ca	[MINOR] redirect: add support for the "drop-query" option If "drop-query" is present on a "redirect" line using the "prefix" mode, then the returned Location header will be the request URI without the query-string. This may be used on some login/logout pages, or when it must be decided to redirect the user to a non-secure server. (cherry-picked from commit f2d361ccd73aa16538ce767c766362dd8f0a88fd)	2008-12-07 23:42:01 +01:00
Willy Tarreau	106cb76c4b	[BUG] critical errors should be reported even in daemon mode Josh Goebel reported that haproxy silently dies when it fails to chroot. In fact, it does so when in daemon mode, because daemon mode has been disabling output for ages. Since the code has been reworked, this could have been changed because there is no reason for this anymore, hence this patch. (cherry picked from commit `304d6fb00f`) (cherry picked from commit 50b7f7f12c67322c793f50a6be009f0fd0eec1bb)	2008-12-07 23:37:28 +01:00
Jeffrey 'jf' Lim	65cb2f1c85	[MINOR] cfgparse: fix off-by 2 in error message size was just looking through the source, and noticed this... :) (cherry picked from commit `63b76be713`) (cherry picked from commit a801db6c5ea750f93a3795dbb2e70c03e05bbef4)	2008-12-07 23:37:15 +01:00
Willy Tarreau	fd39ddaa3d	[BUG] cookie capture is declared in the frontend but checked on the backend Cookie capture would only work by pure luck on the request but did never work on responses since only the backend was checked. The fix consists in always checking frontend for cookie captures. (cherry picked from commit a83c5ba9315a7c47cda2698280b7e49a9d3eb374)	2008-12-07 23:36:52 +01:00

1 2 3 4 5 ...

685 Commits