haproxy

mirror of http://git.haproxy.org/git/haproxy.git/ synced 2025-01-19 04:00:46 +00:00

Author	SHA1	Message	Date
Willy Tarreau	55e2f5ad14	BUG/MINOR: logs/threads: properly split the log area upon startup If logs were emitted before creating the threads, then the dataptr pointer keeps a copy of the end of the log header. Then after the threads are created, the headers are reallocated for each thread. However the end pointer was not reset until the end of the first second, which may result in logs emitted by multiple threads during the first second to be mangled, or possibly in some cases to use a memory area that was reused for something else. The fix simply consists in reinitializing the end pointers immediately when the threads are created. This fix must be backported to 1.9 and 1.8.	2019-05-05 10:16:13 +02:00
Willy Tarreau	4fc49a9aab	BUG/MEDIUM: checks: make sure the warmup task takes the server lock The server warmup task is used when a server uses the "slowstart" parameter. This task affects the server's weight and maxconn, and may dequeue pending connections from the queue. This must be done under the server's lock, which was not the case. This must be backported to 1.9 and 1.8.	2019-05-05 06:54:22 +02:00
Willy Tarreau	223995e8ca	BUG/MINOR: stream: also increment the retry stats counter on L7 retries It happens that the retries stats use their own counter and are not derived from the stream interface, so we need to update it as well when performing an L7 retry. No backport is needed.	2019-05-04 10:40:00 +02:00
Olivier Houchard	e3249a98e2	MEDIUM: streams: Add a new keyword for retry-on, "junk-response" Add a way to retry requests if we got a junk response from the server, ie an incomplete response, or something that is not valid HTTP. To do so, one can use the new "junk-response" keyword for retry-on.	2019-05-04 10:20:24 +02:00
Olivier Houchard	865d8392bb	MEDIUM: streams: Add a way to replay failed 0rtt requests. Add a new keyword for retry-on, 0rtt-rejected. If set, we will try to replay requests for which we sent early data that got rejected by the server. If that option is set, we will attempt to use 0rtt if "allow-0rtt" is set on the server line even if the client didn't send early data.	2019-05-04 10:20:24 +02:00
Olivier Houchard	a254a37ad7	MEDIUM: streams: Add the ability to retry a request on L7 failure. When running in HTX mode, if we sent the request, but failed to get the answer, either because the server just closed its socket, we hit a server timeout, or we get a 404, 408, 425, 500, 501, 502, 503 or 504 error, attempt to retry the request, exactly as if we just failed to connect to the server. To do so, add a new backend keyword, "retry-on". It accepts a list of keywords, which can be "none" (never retry), "conn-failure" (we failed to connect, or to do the SSL handshake), "empty-response" (the server closed the connection without answering), "response-timeout" (we timed out while waiting for the server response), or "404", "408", "425", "500", "501", "502", "503" and "504". The default is "conn-failure".	2019-05-04 10:19:56 +02:00
Olivier Houchard	f4bda993dd	BUG/MEDIUM: streams: Don't add CF_WRITE_ERROR if early data were rejected. In sess_update_st_con_tcp(), if we have an error on the stream_interface because we tried to send early_data but failed, don't flag the request channel as CF_WRITE_ERROR, or we will never reach the analyser that sends back the 425 response. This should be backported to 1.9.	2019-05-03 22:23:41 +02:00
Olivier Houchard	010941f876	BUG/MEDIUM: ssl: Use the early_data API the right way. We can only read early data if we're a server, and write if we're a client, so don't attempt to mix both. This should be backported to 1.8 and 1.9.	2019-05-03 21:00:10 +02:00
Willy Tarreau	c40efc1919	MINOR: init/threads: make the threads array global Currently the thread array is a local variable inside a function block and there is no access to it from outside, which often complicates debugging. Let's make it global and export it. Also the allocation return is now checked.	2019-05-03 10:16:30 +02:00
Willy Tarreau	b4f7cc3839	MINOR: init/threads: remove the useless tids[] array It's still obscure how we managed to initialize an array of integers with values always equal to the index, just to retrieve the value from an opaque pointer to the index instead of directly using it! I suspect it's a leftover from the very early threading experiments. This commit gets rid of this and simply passes the thread ID as the argument to run_thread_poll_loop(), thus significantly simplifying the few call places and removing the need to allocate then free an array of identity.	2019-05-03 09:59:15 +02:00
Willy Tarreau	81492c989c	MINOR: threads: flatten the per-thread cpu-map When we initially experimented with threads and processes support, we needed to implement arrays of threads per process for cpu-map, but this is not needed anymore since we support either threads or processes. Let's simply make the thread-based cpu-map per thread and not per thread and per process since that's not used anymore. Doing so reduces the global struct from 33kB to 1.5kB.	2019-05-03 09:46:45 +02:00
Olivier Houchard	a48237fd07	BUG/MEDIUM: connections: Make sure we remove CO_FL_SESS_IDLE on disown. When for some reason the session is not the owner of the connection anymore, make sure we remove CO_FL_SESS_IDLE, even if we're about to call conn->mux->destroy(), as the destroy may not destroy the connection immediately if it's still in use. This should be backported to 1.9. u	2019-05-02 12:08:39 +02:00
Olivier Houchard	55071d30ca	BUG/MEDIUM: channels: Don't forget to reset output in channel_erase(). In channel_erase(), don't forget to set output to 0, otherwise the channel won't seem empty, when it really is, and that could lead to stream never closing properly. This should be backported to 1.9.	2019-05-02 10:40:59 +02:00
Dragan Dosen	e99af978c8	BUG/MEDIUM: pattern: fix memory leak in regex pattern functions The allocated regex is not freed properly and can cause a memory leak, eg. when patterns are updated via CLI socket. This patch should be backported to all supported versions.	2019-05-02 10:05:11 +02:00
Dragan Dosen	026ef570e1	BUG/MINOR: checks: free memory allocated for tasklets The check->wait_list.task and agent->wait_list.task were not freed properly on deinit(). This patch should be backported to 1.9.	2019-05-02 10:05:09 +02:00
Dragan Dosen	61302da0e7	BUG/MINOR: log: properly free memory on logformat parse error and deinit() This patch may be backported to all supported versions.	2019-05-02 10:05:07 +02:00
Dragan Dosen	2a7c20f602	BUG/MINOR: haproxy: fix rule->file memory leak When using the "use_backend" configuration directive, the configuration file name stored as rule->file was not freed in some situations. This was introduced in commit `4ed1c95` ("MINOR: http/conf: store the use_backend configuration file and line for logs"). This patch should be backported to 1.9, 1.8 and 1.7.	2019-05-02 10:05:06 +02:00
Olivier Houchard	b51937ebaa	BUG/MEDIUM: ssl: Don't pretend we can retry a recv/send if we got a shutr/w. In ha_ssl_write() and ha_ssl_read(), don't pretend we can retry a read/write if we got a shutr/shutw, or we will never properly shutdown the connection.	2019-05-01 17:37:33 +02:00
Ilya Shipitsin	0c50b1ecbb	BUG/MEDIUM: servers: fix typo "src" instead of "srv" When copying the settings for all servers when using server templates, fix a typo, or we would never copy the length of the ALPN to be used for checks. This should be backported to 1.9.	2019-04-30 23:04:47 +02:00
Christopher Faulet	02f3cf19ed	CLEANUP: config: Don't alter listener->maxaccept when nbproc is set to 1 This patch only removes a useless calculation on listener->maxaccept when nbproc is set to 1. Indeed, the following formula has no effet in such case: listener->maxaccept = (listener->maxaccept + nbproc - 1) / nbproc; This patch may be backported as far as 1.5.	2019-04-30 15:28:29 +02:00
Christopher Faulet	6b02ab8734	MINOR: config: Test validity of tune.maxaccept during the config parsing Only -1 and positive integers from 0 to INT_MAX are accepted. An error is triggered during the config parsing for any other values. This patch may be backported to all supported versions.	2019-04-30 15:28:29 +02:00
Christopher Faulet	102854cbba	BUG/MEDIUM: listener: Fix how unlimited number of consecutive accepts is handled There is a bug when global.tune.maxaccept is set to -1 (no limit). It is pretty visible with one process (nbproc sets to 1). The functions listener_accept() and accept_queue_process() don't expect to handle negative maxaccept values. So instead of accepting incoming connections without any limit, none are never accepted and HAProxy loop infinitly in the scheduler. When there are 2 or more processes, the bug is a bit more subtile. The limit for a listener is set to 1. So only one connection is accepted at a time by a given listener. This happens because the listener's maxaccept value is an unsigned integer. In check_config_validity(), it is first set to UINT_MAX (-1 casted in an unsigned integer), and then some calculations on it leads to an integer overflow. To fix the bug, the listener's maxaccept value is now a signed integer. So, if a negative value is set for global.tune.maxaccept, we keep it untouched for the listener and no calculation is made on it. Then, in the listener code, this signed value is casted to a unsigned one. It simplifies all tests instead of dealing with negative values. So, it limits the number of connections accepted at a time to UINT_MAX at most. But, honestly, it not an issue. This patch must be backported to 1.9 and 1.8.	2019-04-30 15:28:29 +02:00
Olivier Houchard	07425de717	BUG/MEDIUM: port_range: Make the ring buffer lock-free. Port range uses a ring buffer, and unfortunately, when making haproxy multithreaded, it's been overlooked, and the ring buffer is not thread-safe. When specifying a source range, 2 or more threads could pick the same port, and of course only one of them could use the port, the others would always fail the connection. To fix this, make it a lock-free ring buffer. This is easier than usual because we know the ring buffer can never be full. This should be backported to 1.8 and 1.9.	2019-04-30 15:10:17 +02:00
Olivier Houchard	9ce62b5498	MINOR: threads: Implement HA_ATOMIC_LOAD(). The same way we have HA_ATOMIC_STORE(), implement HA_ATOMIC_LOAD(). This should be backported to 1.8 and 1.9, as we need it for a bug fix in port ranges.	2019-04-30 15:10:08 +02:00
Willy Tarreau	bc13bec548	MINOR: activity: report context switch counts instead of rates It's not logical to report context switch rates per thread in show activity because everything else is a counter and it's not even possible to compare values. Let's only report counts. Further, this simplifies the scheduler's code.	2019-04-30 14:55:18 +02:00
Willy Tarreau	9634e86dc7	CLEANUP: task: move the task_per_thread definition to task.h It's the second time I look for it and can't find it because it's not in the right file.	2019-04-30 14:36:47 +02:00
Fr�d�ric L�caille	eacb022676	REGTEST: Make this reg test be Linux specific. This patch reverts 9ffb88 commit (REGTEST: Be less Linux specific with a syslog regex.) and makes this script be Linux specific.	2019-04-30 11:56:52 +02:00
Willy Tarreau	49ee3b2f9a	BUG/MAJOR: map/acl: real fix segfault during show map/acl on CLI A previous commit `8d85aa44d` ("BUG/MAJOR: map: fix segfault during 'show map/acl' on cli.") was provided to address a concurrency issue between "show acl" and "clear acl" on the CLI. Sadly the code placed there was copy-pasted without changing the element type (which was struct stream in the original code) and not tested since the crash is still present. The reproducer is simple : load a large ACL file (e.g. geolocation addresses), issue "show acl #0" in loops in one window and issue a "clear acl #0" in the other one, haproxy crashes. This fix was also tested with threads enabled and looks good since the locking seems to work correctly in these areas though. It will have to be backported as far as 1.6 since the commit above went that far as well...	2019-04-30 11:50:59 +02:00
Fr�d�ric L�caille	85a7ea0740	REGTEST: Add a new reg test for log load-balancing feature. This is a reg test for the log load-balancing feature implemented by these commits: MINOR: log: Add "sample" new keyword to "log" lines MINOR: log: Enable the log sampling and load-balancing feature The size of the logging buffer for vtest has been doubled to support this script.	2019-04-30 09:25:09 +02:00
Fr�d�ric L�caille	d690dfac1d	DOC: log: Document the sampling and load-balancing logging feature. This document should come with these commits: 'MINOR: log: Enable the log sampling and load-balancing feature' 'MINOR: log: Add "sample" new keyword to "log" lines.'	2019-04-30 09:25:09 +02:00
Fr�d�ric L�caille	d803e475e5	MINOR: log: Enable the log sampling and load-balancing feature. This patch implements the sampling and load-balancing of log servers configured with "sample" new keyword implemented by this commit: 'MINOR: log: Add "sample" new keyword to "log" lines'. As the list of ranges used to sample the log to balance is ordered, we only have to maintain ->curr_idx member of smp_info struct which is the index of the sample and check if it belongs or not to the current range to decide if we must send it to the log server or not.	2019-04-30 09:25:09 +02:00
Fr�d�ric L�caille	d95ea2897e	MINOR: log: Add "sample" new keyword to "log" lines. This patch implements the parsing of "sample" new optional keyword for "log" lines to be able to sample and balance the load of log messages between serveral log destinations declared by "log" lines. This keyword must be followed by a list of comma seperated ranges of indexes numbered from 1 to define the samples to be used to balance the load of logs to send. This "sample" keyword must be used on "log" lines obviously before the remaining optional ones without keyword. The list of ranges must be followed by a colon character to separate it from the log sampling size. With such following configuration declarations: log stderr local0 log 127.0.0.1:10001 sample 2-3,8-11:11 local0 log 127.0.0.2:10002 sample 5:5 local0 in addition to being sent to stderr, about the second "log" line, every 11 logs the logs #2 up to #3 would be sent to 127.0.0.1:10001, then #8 up tp #11 four logs would be sent to the same log server and so on periodically. Logs would be sent to 127.0.0.2:100002 every 5 logs. It is also possible to define the size of the sample with a value different of the maximum of the high limits of the ranges, for instance as follows: log 127.0.0.1:10001 sample 2-3,8-11:15 local0 as before the two logs #2 and #3 would be sent to 127.0.0.1:10001, then #8 up tp #11 logs, but in this case here, this would be done periodically every 15 messages. Also note that the ranges must not overlap each others. This is to ease the way the logs are periodically sent.	2019-04-30 09:25:09 +02:00
Yann Cézard	bf60f6b803	BUG/MEDIUM: contrib/modsecurity: If host header is NULL, don't try to strdup it I discovered this bug when running OWASP regression tests against HAProxy + modsecurity-spoa (it's a POC to evaluate how it is working). I found out that modsecurity spoa will crash when the request doesn't have any Host header. See the pull request #86 on github for details. This patch must be backported to 1.9 and 1.8.	2019-04-29 16:26:05 +02:00
Yann Cézard	494ddbff47	DOC: contrib/modsecurity: Typos and fix the reject example Thanks to https://www.mail-archive.com/haproxy@formilux.org/msg30056.html This patch may be backported to 1.9 and 1.8.	2019-04-29 16:25:49 +02:00
Christopher Faulet	85db3212b8	MINOR: spoe: Use the sample context to pass frag_ctx info during encoding This simplifies the API and hide the details in the sample. This way, only string and binary are aware of these info, because other types cannot be partially encoded. This patch may be backported to 1.9 and 1.8.	2019-04-29 16:02:05 +02:00
Kevin Zhu	f7f54280c8	BUG/MEDIUM: spoe: arg len encoded in previous frag frame but len changed Fragmented arg will do fetch at every encode time, each fetch may get different result if SMP_F_MAY_CHANGE, for example res.payload, but the length already encoded in first fragment of the frame, that will cause SPOA decode failed and waste resources. This patch must be backported to 1.9 and 1.8.	2019-04-29 16:02:05 +02:00
Christopher Faulet	1907ccc2f7	BUG/MINOR: http: Call stream_inc_be_http_req_ctr() only one time per request The function stream_inc_be_http_req_ctr() is called at the beginning of the analysers AN_REQ_HTTP_PROCESS_FE/BE. It as an effect only on the backend. But we must be careful to call it only once. If the processing of HTTP rules is interrupted in the middle, when the analyser is resumed, we must not call it again. Otherwise, the tracked counters of the backend are incremented several times. This bug was reported in github. See issue #74. This fix should be backported as far as 1.6.	2019-04-29 16:01:47 +02:00
Willy Tarreau	97215ca284	BUG/MEDIUM: mux-h2: properly deal with too large headers frames In h2c_decode_headers(), now that we support CONTINUATION frames, we try to defragment all pending frames at once before processing them. However if the first is exactly full and the second cannot be parsed, we don't detect the problem and we wait for the next part forever due to an incorrect check on exit; we must abort the processing as soon as the current frame remains full after defragmentation as in this case there is no way to make forward progress. Thanks to Yves Lafon for providing traces exhibiting the problem. This must be backported to 1.9.	2019-04-29 10:20:21 +02:00
David CARLIER	4de0eba848	MEDIUM: da: HTX mode support. The DeviceAtlas module now can support both the legacy mode and the new HTX's with the known set of support headers for the latter.	2019-04-26 17:06:32 +02:00
David Carlier	0470d704a7	BUILD/MEDIUM: contrib: Dummy DeviceAtlas API. Creating a "mocked" version mainly for testing purposes.	2019-04-26 17:06:32 +02:00
Willy Tarreau	4ad574fbe2	MEDIUM: streams: measure processing time and abort when detecting bugs On some occasions we've had loops happening when processing actions (e.g. a yield not being well understood) resulting in analysers being called in loops until the analysis timeout without incrementing the stream's call count, thus this type of bug cannot be caught by the current protection system. What this patch proposes is to start to measure the time spent in analysers when profiling is enabled on the thread, in order to detect if a stream is really misbehaving. In this case we measured the consumed CPU time, not the wall clock time, so as not to be affected by possible noisy neighbours sharing the same CPU. When more than 100ms are spent in an analyser, we trigger the stream_dump_and_crash() function to report the anomaly. The choice of 100ms comes from the fact that regular calls only take around 1 microsecond and it seems reasonable to accept a degradation factor of 100000, which covers very slow machines such as home gateways running on sub-ghz processors, with extremely heavy configurations. Some complete tests show that even this common bogus map_regm() entry supposedly designed to extract a port from an IP:port entry does not trigger the timeout (25 ms evaluation time for a 4kB header, exercise left to the reader to spot the mistake) : ([0-9]{0,3}).([0-9]{0,3}).([0-9]{0,3}).([0-9]{0,3}):([0-9]{0,5}) \5 However this one purposely designed to kill haproxy definitely dies as it manages to completely freeze the whole process for more than one second on a 4 GHz CPU for only 120 bytes in : (.{0,20})(.{0,20})(.{0,20})(.{0,20})(.{0,20})b \1 This protection will definitely help during the code stabilization period and may possibly be left enabled later depending on reported issues or not. If you've noticed that your workload is affected by this patch, please report it as you have very likely found a bug. And in the mean time you can turn profiling off to disable it.	2019-04-26 14:30:59 +02:00
Willy Tarreau	3d07a16f14	MEDIUM: stream/debug: force a crash if a stream spins over itself forever If a stream is caught spinning over itself at more than 100000 loops per second and for more than one second, the process will be aborted and the offender reported on the console and logs. Typical figures usually are just a few tens to hundreds per second over a very short time so there is a huge margin here. Using even higher values could also work but there is the risk of not being able to catch offenders if multiple ones start to bug at the same time and share the load. This code should ideally be disabled for stable releases, though in theory nothing should ever trigger it.	2019-04-26 13:16:14 +02:00
Willy Tarreau	dcb0e1d37d	MEDIUM: appctx/debug: force a crash if an appctx spins over itself forever If an appctx is caught spinning over itself at more than 100000 loops per second and for more than one second, the process will be aborted and the offender reported on the console and logs. Typical figures usually are just a few tens to hundreds per second over a very short time so there is a huge margin here. Using even higher values could also work but there is the risk of not being able to catch offenders if multiple ones start to bug at the same time and share the load. This code should ideally be disabled for stable releases, though in theory nothing should ever trigger it.	2019-04-26 13:15:56 +02:00
Willy Tarreau	71c07ac65a	MINOR: stream/debug: make a stream dump and crash function During 1.9 development (and even a bit after) we've started to face a significant number of situations where streams were abusively spinning due to an uncaught error flag or complex conditions that couldn't be correctly identified. Sometimes streams wake appctx up and conversely as well. More importantly when this happens the only fix is to restart. This patch adds a new function to report a serious error, some relevant info and to crash the process using abort() so that a core dump is available. The purpose will be for this function to be called in various situations where the process is unfixable. It will help detect these issues much earlier during development and may even help fixing test platforms which are able to automatically restart when such a condition happens, though this is not the primary purpose. This patch only provides the function and doesn't use it yet.	2019-04-26 13:15:56 +02:00
Willy Tarreau	5e6a5b3a6e	MINOR: connection: make the debugging helper functions safer We have various functions like conn_get_ctrl_name() to retrieve some information reported in "show sess" for debugging, which assume that the connection is valid. This is really not convenient in code aimed at debugging and is error-prone. Let's add a validity test first.	2019-04-25 18:35:49 +02:00
Willy Tarreau	5e370daa52	BUG/MINOR: proto_http: properly reset the stream's call rate on keep-alive The stream's call rate measurement was added by commit `2e9c1d296` ("MINOR: stream: measure and report a stream's call rate in "show sess"") but it forgot to reset it in case of HTTP keep-alive (legacy mode), resulting in incorrect measurements. No backport is needed, unless the patch above is backported.	2019-04-25 18:33:37 +02:00
Willy Tarreau	d5ec4bfe85	CLEANUP: standard: use proper const to addr_to_str() and port_to_str() The input parameter was not marked const, making it painful for some calls.	2019-04-25 17:48:16 +02:00
Willy Tarreau	d2d3348acb	MINOR: activity: enable automatic profiling turn on/off Instead of having to manually turn task profiling on/off in the configuration, by default it will work in "auto" mode, which automatically turns on on any thread experiencing sustained loop latencies over one millisecond averaged over the last 1024 samples. This may happen with configs using lots of regex (thing map_reg for example, which is the lazy way to convert Apache's rewrite rules but must not be abused), and such high latencies affect all the process and the problem is most often intermittent (e.g. hitting a map which is only used for certain host names). Thus now by default, with profiling set to "auto", it remains off all the time until something bad happens. This also helps better focus on the issues when looking at the logs as well as in "show sess" output. It automatically turns off when the average loop latency over the last 1024 calls goes below 990 microseconds (which typically takes a while when in idle). This patch could be backported to stable versions after a bit more exposure, as it definitely improves observability and the ability to quickly spot the culprit. In this case, previous patch ("MINOR: activity: make the profiling status per thread and not global") must also be taken.	2019-04-25 17:26:46 +02:00
Willy Tarreau	d9add3acc8	MINOR: activity: make the profiling status per thread and not global In order to later support automatic profiling turn on/off, we need to have it per-thread. We're keeping the global option to know whether to turn it or on off, but the profiling status is now set per thread. We're updating the status in activity_count_runtime() which is called before entering poll(). The reason is that we'll extend this with run time measurement when deciding to automatically turn it on or off.	2019-04-25 17:26:19 +02:00
Willy Tarreau	d636675137	BUG/MINOR: activity: always initialize the profiling variable It happens it was only set if present in the configuration. It's harmless anyway but can still cause doubts when comparing logs and configurations so better correctly initialize it. This should be backported to 1.9.	2019-04-25 17:26:19 +02:00

1 2 3 4 5 ...

9667 Commits