haproxy

mirror of http://git.haproxy.org/git/haproxy.git/ synced 2024-12-28 15:42:30 +00:00

Author	SHA1	Message	Date
Willy Tarreau	078e1c7102	CLEANUP: protocol: remove the ->enable_all method It's not used anymore, now the listeners are enabled from protocol_enable_all().	2020-10-09 11:27:30 +02:00
Willy Tarreau	5b95ae6b32	MINOR: protocol: directly call enable_listener() from protocol_enable_all() protocol_enable_all() calls proto->enable_all() for all protocols, which is always equal to enable_all_listeners() which in turn simply is a generic loop calling enable_listener() always returning ERR_NONE. Let's clean this madness by first calling enable_listener() directly from protocol_enable_all().	2020-10-09 11:27:30 +02:00
Willy Tarreau	7834a3f70f	MINOR: listeners: export enable_listener() we'll soon call it from outside.	2020-10-09 11:27:30 +02:00
Willy Tarreau	d008009958	CLEANUP: listeners: remove unused disable_listener and disable_all_listeners These ones have never been called, they were referenced by the protocol's disable_all for some protocols but there are no traces of their use, so in addition to not being sure the code works, it has never been tested. Let's remove a bit of complexity starting from there.	2020-10-09 11:27:30 +02:00
Willy Tarreau	fb4ead8e8a	CLEANUP: protocol: remove the ->disable_all method This one has never been used, is only referenced by proto_uxst and proto_sockpair, and it's not even certain it works at all. Let's get rid of it.	2020-10-09 11:27:30 +02:00
Willy Tarreau	e53608b2cd	MINOR: listeners: move fd_stop_recv() to the receiver's socket code fd_stop_recv() has nothing to do in the generic listener code, it's per protocol as some don't need it. For instance with abns@ it could even lead to fd_stop_recv(-1). And later with QUIC we don't want to touch the fd at all! It used to be that since commit `f2cb169487` delegating fd manipulation to their respective threads it wasn't possible to call it down there but it's not the case anymore, so let's perform the action in the protocol-specific code.	2020-10-09 11:27:30 +02:00
Willy Tarreau	fb76bd5ca6	BUG/MEDIUM: listeners: correctly report pause() errors By using the same "ret" variable in the "if" block to test the return value of pause(), the second one shadows the first one and when forcing the result to zero in case of an error, it doesn't do anything. The problem is that some listeners used to fail to pause in multi-process mode and this was not reported, but their failure was automatically resolved by the last process to pause. By properly checking for errors we might now possibly report a race once in a while so we may have to roll this back later if some users meet it. The test on ==0 is wrong too since technically speaking a total stop validates the need for a pause, but stops the listener so it's just the resume that won't work anymore. We could switch to stopped but it's an involuntary switch and the user will not know. Better then mark it as paused and let the resume continue to fail so that only the resume will eventually report an error (e.g. abns@). This must not be backported as there is a risk of side effect by fixing this bug, given that it hides other bugs itself.	2020-10-09 11:27:30 +02:00
Willy Tarreau	91c614dd0e	MEDIUM: proto_tcp: make the pause() more robust in multi-process In multi-process, the TCP pause is very brittle and we never noticed it because the error was lost in the upper layers. The problem is that shutdown() may fail if another process already did it, and will cause a process to fail to pause. What we do here in case of error is that we double-check the socket's state to verify if it's still accepting connections, and if not, we can conclude that another process already did the job in parallel. The difficulty here is that we're trying to eliminate false positives where some OSes will silently report a success on shutdown() while they don't shut the socket down, hence this dance of shutw/listen/shutr that only keeps the compatible ones. Probably that a new approach relying on connect(AF_UNSPEC) would provide better results.	2020-10-09 11:27:30 +02:00
Willy Tarreau	1accacbcc3	CLEANUP: proxy: remove the now unused pause_proxies() and resume_proxies() They're not used anymore, delete them before someone thinks about using them again!	2020-10-09 11:27:30 +02:00
Willy Tarreau	775e00158a	MAJOR: signals: use protocol_pause_all() and protocol_resume_all() When temporarily pausing the listeners with SIG_TTOU, we now pause all listeners via the protocols instead of the proxies. This has the benefits that listeners are paused regardless of whether or not they belong to a visible proxy. And for resuming via SIG_TTIN we do the same, which allows to report binding conflicts and address them, since the operation can be repeated on a per-listener basis instead of a per-proxy basis. While in appearance all cases were properly handled, it's impossible to completely rule out the possibility that something broken used to work by luck due to the scan ordering which is naturally different, hence the major tag.	2020-10-09 11:27:30 +02:00
Willy Tarreau	09819d1118	MINOR: protocol: introduce protocol_{pause,resume}_all() These two functions are used to pause and resume all listeners of all protocols. They use the standard listener functions for this so they're supposed to handle the situation gracefully regardless of the upper proxies' states, and they will report completion on proxies once the switch is performed. It might be nice to define a particular "failed" state for listeners that cannot resume and to count them on proxies in order to mention that they're definitely stuck. On the other hand, the current situation is retryable which is quite appreciable as well.	2020-10-09 11:27:30 +02:00
Willy Tarreau	58651b42fc	MEDIUM: listener/proxy: make the listeners notify about proxy pause/resume Till now, we used to call pause_proxy()/resume_proxy() to enable/disable processing on a proxy, which is used during soft reloads. But since we want to drive this process from the listeners themselves, we have to instead proceed the other way around so that when we enable/disable a listener, it checks if it changed anything for the proxy and notifies about updates at this level. The detection is made using li_ready=0 for pause(), and li_paused=0 for resume(). Note that we must not include any test for li_bound because this state is seen by processes which share the listener with another one and which must not act on it since the other process will do it. As such the socket behind the FD will automatically be paused and resume without its local state changing, but this is the limit of a multi-process system with shared listeners.	2020-10-09 11:27:30 +02:00
Willy Tarreau	5d7f9ce831	MINOR: listeners: check the current listener earlier state in resume_listener() It's quite confusing to have the test on LI_READY very low in the function as it should be made much earlier. Just like with previous commit, let's do it when entering. The additional states, however (limited, full) continue to go through the whole function.	2020-10-09 11:27:30 +02:00
Willy Tarreau	9b3a932777	MINOR: listeners: check the current listener state in pause_listener() It's better not to try to perform pause() actions on wrong states, so let's check this and make sure that all callers are now safe. This means that we must not try to pause a listener which is already paused (e.g. it could possibly fail if the pause operation isn't idempotent at the socket level), nor should we try it on earlier states.	2020-10-09 11:27:30 +02:00
Willy Tarreau	337c835d16	MEDIUM: proxy: merge zombify_proxy() with stop_proxy() The two functions don't need to be distinguished anymore since they have all the necessary info to act as needed on their listeners. Let's just pass via stop_proxy() and make it check for each listener which one to close or not.	2020-10-09 11:27:30 +02:00
Willy Tarreau	43ba3cf2b5	MEDIUM: proxy: remove start_proxies() Its sole remaining purpose was to display "proxy foo started", which has little benefit and pollutes output for those with plenty of proxies. Let's remove it now. The VTCs were updated to reflect this, because many of them had explicit counts of dropped lines to match this message. This is tagged as MEDIUM because some users may be surprized by the loss of this quite old message.	2020-10-09 11:27:30 +02:00
Willy Tarreau	c3914d4fff	MEDIUM: proxy: replace proxy->state with proxy->disabled The remaining proxy states were only used to distinguish an enabled proxy from a disabled one. Due to the initialization order, both PR_STNEW and PR_STREADY were equivalent after startup, and they would only differ from PR_STSTOPPED when the proxy is disabled or shutdown (which is effectively another way to disable it). Now we just have a "disabled" field which allows to distinguish them. It's becoming obvious that start_proxies() is only used to print a greeting message now, that we'd rather get rid of. Probably that zombify_proxy() and stop_proxy() should be merged once their differences move to the right place.	2020-10-09 11:27:30 +02:00
Willy Tarreau	1ad64acf6c	CLEANUP: peers: don't use the PR_ST* states to mark enabled/disabled The enabled/disabled config options were stored into a "state" field that is an integer but contained only PR_STNEW or PR_STSTOPPED, which is a bit confusing, and causes a dependency with proxies. This was renamed to "disabled" and is used as a boolean. The field was also moved to the end of the struct to stop creating a hole and fill another one.	2020-10-09 11:27:30 +02:00
Willy Tarreau	b50bf046e8	MINOR: startup: don't rely on PR_STNEW to check for listeners Instead of looking at listeners in proxies in PR_STNEW state, we'd rather check for listeners in those not in PR_STSTOPPED as it's only this state which indicates the proxy was disabled. And let's check the listeners count instead of testing the list's head.	2020-10-09 11:27:30 +02:00
Willy Tarreau	f18d968830	MEDIUM: proxy: remove state PR_STPAUSED This state was used to mention that a proxy was in PAUSED state, as opposed to the READY state. This was causing some trouble because if a listener failed to resume (e.g. because its port was temporarily in use during the resume), it was not possible to retry the operation later. Now by checking the number of READY or PAUSED listeners instead, we can accurately know if something went bad and try to fix it again later. The case of the temporary port conflict during resume now works well: $ socat readline /tmp/sock1 prompt > disable frontend testme3 > disable frontend testme3 All sockets are already disabled. > enable frontend testme3 Failed to resume frontend, check logs for precise cause (port conflict?). > enable frontend testme3 > enable frontend testme3 All sockets are already enabled.	2020-10-09 11:27:30 +02:00
Willy Tarreau	a17c91b37f	MEDIUM: proxy: remove the PR_STERROR state This state is only set when a pause() fails but isn't even set when a resume() fails. And we cannot recover from this state. Instead, let's just count remaining ready listeners to decide to emit an error or not. It's more accurate and will better support new attempts if needed.	2020-10-09 11:27:30 +02:00
Willy Tarreau	6b3bf733dd	MEDIUM: proxy: remove the unused PR_STFULL state Since v1.4 or so, it's almost not possible anymore to set this state. The only exception is by using the CLI to change a frontend's maxconn setting below its current usage. This case makes no sense, and for other cases it doesn't make sense either because "full" is a vague concept when only certain listeners are full and not all. Let's just remove this unused state and make it clear that it's not reported. The "ready" or "open" states will continue to be reported without being misleading as they will be opposed to "stop".	2020-10-09 11:27:30 +02:00
Willy Tarreau	efc0eec4c1	MINOR: proxy: maintain per-state counters of listeners The proxy state tries to be synthetic but that doesn't work well with many listeners, especially for transition phases or after a failed pause/resume. In order to address this, we'll instead rely on counters of listeners in a given state for the 3 major states (ready, paused, listen) and a total counter. We'll now be able to determine a proxy's state by comparing these counters only.	2020-10-09 11:27:30 +02:00
Willy Tarreau	a37b244509	MINOR: listeners: introduce listener_set_state() This function is used as a wrapper to set a listener's state everywhere. We'll use it later to maintain some counters in a consistent state when switching state so it's capital that all state changes go through it. No functional change was made beyond calling the wrapper.	2020-10-09 11:27:30 +02:00
Willy Tarreau	bec7ab0ad9	CLEANUP: proxy: remove the first_to_listen hack in zombify_proxy() This thing was needed for an optimization used in soft_stop() which doesn't exist anymore, so let's remove it as it's cryptic and hinders the listeners cleanup.	2020-10-09 11:27:29 +02:00
Willy Tarreau	987dbf5bab	MINOR: listeners: do not uselessly try to close zombie listeners in soft_stop() The loop doesn't match anymore since the non-started listeners are in LI_INIT and even if it had ever worked the benefit of closing zombies at this point looks void at best.	2020-10-09 11:27:29 +02:00
Willy Tarreau	c6dac6c7f5	MEDIUM: listeners: remove the now unused ZOMBIE state The zombie state is not used anymore by the listeners, because in the last two cases where it was tested it couldn't match as it was covered by the test on the process mask. Instead now the FD is either in the LISTEN state or the INIT state. This also avoids forcing the listener to be single-dimensional because actually belonging to another process isn't totally exclusive with the other states, which explains some of the difficulties requiring to check the proc_mask and the fd sometimes. So let's get rid of it now not to be tempted to reuse it. The doc on the listeners state was updated.	2020-10-09 11:27:29 +02:00
Willy Tarreau	ae7bc4a237	MEDIUM: deinit: close all receivers/listeners before scanning proxies Because of the zombie state, proxies have a skewed vision of the state of listeners, which explains why there are hacks switching the state from ZOMBIE to INIT in the proxy cleaning loop. This is particularly complicated and not needed, as all the information is now available in the protocol list and the fdtab. What we do here instead is to first close all active listeners or receivers by protocol and clean their protocol parts. Then we scan the fdtab to get rid of remaining ones that were necessarily in INIT state after a previous invocation of delete_listener(). From this point, we know the listeners are cleaned, the can safely be freed by scanning the proxies.	2020-10-09 11:27:29 +02:00
Willy Tarreau	b6607bfaf0	MEDIUM: listeners: make unbind_listener() converge if needed The ZOMBIE state on listener is a real mess. Listeners passing through this state have lost their consistency with the proxy AND with the fdtab. Plus this state is not used for all foreign listeners, only for those belonging to a proxy that entirely runs on another process, otherwise it stays in INIT state, which makes the usefulness extremely questionable. But the real issue is that it's impossible to untangle the receivers from the proxy state as long as we have this because of deinit()... So what we do here is to start by making unbind_listener() support being called more than once. This will permit to call it again to really close the FD and finish the operations if it's called with an FD that's in a fake state (such as INIT but with a valid fd).	2020-10-09 11:27:29 +02:00
Willy Tarreau	02b092f006	MEDIUM: init: stop disabled proxies after initializing fdtab During the startup process we don't have any fdtab nor fd_updt for quite a long time, and as such some operations on the listeners are not permitted, such as fd_want_/fd_stop_ or fd_delete(). The latter is of particular concern because it's used when stopping a disabled frontend, and it's performed very early during check_config_validity() while there is no fdtab yet. The trick till now relies on the listener's state which is a bit brittle. There is absolutely no valid reason for stopping a proxy's listeners this early, we can postpone it after init_pollers() which will at least have allocated fdtab.	2020-10-09 11:27:29 +02:00
Willy Tarreau	cb89e32f31	MEDIUM: listeners: don't bounce listeners management between queues During 2.1 development, commit `f2cb16948` ("BUG/MAJOR: listener: fix thread safety in resume_listener()") was introduced to bounce the enabling/disabling of a listener's FD to one of its threads because the remains of fd_update_cache() were fundamentally incompatible with the need to call fd_want_recv() or fd_stop_recv() for another thread. However since then we've totally dropped such code and it's totally safe to use these functions on an FD that is solely used by another thread (this is even used by the FD migration code). The only remaining limitation concerning the wake up delay was addressed by previous commit "MEDIUM: fd: always wake up one thread when enabling a foreing FD". The current situation forces the FD management to remain in the pause_listener() and resume_listener() functions just so that it can bounce between threads, without having the ability to delegate it to the suitable protocol layer. So let's first remove this now unneeded workaround.	2020-10-09 11:27:29 +02:00
Willy Tarreau	f015887444	MEDIUM: fd: always wake up one thread when enabling a foreing FD Since 2.2 it's safe to enable/disable another thread's FD but the fd_wake calls will not immediately be considered because nothing wakes the other threads up. This will have an impact on listeners when deciding to resume them after they were paused, so at minima we want to wake up one of their threads, just like the scheduler does on task_kill(). This is what this patch does.	2020-10-09 11:27:29 +02:00
Willy Tarreau	2ea15a0804	REGTESTS: mark abns_socket as broken This test is inherently racy. It regularly pops up on the CI, and I've spent one hour chasing a bug that apparently doesn't exist, just because I'm running it 10 times in a row and it reports from 4 to 8 failures when built at -O2 and generally even more at -O0. The logs are very confusing, often reporting that it failed with status 0, with nothing else wrong. I suspect it might sometimes be the shell command that fails if it executes faster than haproxy finishes to start up, which would also explain the relation with the optimization level. E.g: > Testing with haproxy version: 2.2.0 > # top TEST reg-tests/seamless-reload/abns_socket.vtc FAILED (3.006) exit=2 > # top TEST reg-tests/seamless-reload/abns_socket.vtc FAILED (3.006) exit=2 > # top TEST reg-tests/seamless-reload/abns_socket.vtc FAILED (3.009) exit=2 > # top TEST reg-tests/seamless-reload/abns_socket.vtc FAILED (3.008) exit=2 > # top TEST reg-tests/seamless-reload/abns_socket.vtc FAILED (3.007) exit=2 > # top TEST reg-tests/seamless-reload/abns_socket.vtc FAILED (3.007) exit=2 > 6 tests failed, 0 tests skipped, 4 tests passed Some of the failures include this, suggesting that some barriers could help: ---- h1 haproxy h1 PID file check failed: Could not read PID file '/tmp/haregtests-2020-10-09_11-19-40.kgsDB4/vtc.30539.04dbea7f/h1/pid Since it has been causing false positives and consumed way more troubleshooting time than it saved, let's mark it as broken so that it doesn't waste more time. We can bring it back when someone manages to figure what the problem is.	2020-10-09 11:26:42 +02:00
Christopher Faulet	b8d148a93f	BUG/MINOR: http-htx: Expect no body for 204/304 internal HTTP responses 204 and 304 HTTP responses must no contain message body. These status codes are correctly handled when the responses are received from a server. But there is no specific processing for internal HTTP reponses (errorfile and http replies). Now, when errorfiles or an http replies are parsed during the configuration parsing, an error is triggered if a 204/304 message contains a body. An extra check is also performed to ensure the body length matches the announce content-length. This patch should fix the issue #891. It must be backported as far as 2.0. For 2.1 and 2.0, only the http_str_to_htx() function must be fixed. http_parse_http_reply() function does not exist.	2020-10-09 10:02:09 +02:00
Christopher Faulet	5563392554	BUG/MINOR: http: Fix content-length of the default 500 error 96 bytes is announce in the C-L header for a message of body of 97 bytes. This bug was introduced by the patch `46a030cdd` ("CLEANUP: assorted typo fixes in the code and comments"). This patch must be backported in all versions where the patch above is (the 2.2 for now).	2020-10-09 10:02:09 +02:00
Sébastien Gross	ab8771285c	DOC: Fix typos in configuration.txt This patch fixes small typos and grammar in configuration.txt for the http-request return documentation.	2020-10-09 10:02:09 +02:00
Christopher Faulet	aade4edc1a	BUG/MEDIUM: mux-h2: Don't handle pending read0 too early on streams This patch is similar to the previous one on the fcgi. Same is true for the H2. But the bug is far harder to trigger because of the protocol cinematic. But it may explain strange aborts in some edge cases. A read0 received on the connection must not be handled too early by H2 streams. If the demux buffer is not empty, the pending read0 must not be considered. The H2 streams must not be passed in half-closed remote state in h2s_wake_one_stream() and the CS_FL_EOS flag must not be set on the associated conn-stream in h2_rcv_buf(). To sum up, it means, if there are still data pending in the demux buffer, no abort must be reported to the streams. To fix the issue, a dedicated function has been added, responsible for detecting pending read0 for a H2 connection. A read0 is reported only if the demux buffer is empty. This function is used instead of conn_xprt_read0_pending() at some places. Note that the HREM stream state should not be used to report aborts. It is performed on h2s_wake_one_stream() function and it is a legacy of the very first versions of the mux-h2. This patch should be backported as far as 2.0. In the 1.8, the code is too different to apply it like that. But it is probably useless because the mux-h2 can only be installed on the client side.	2020-10-09 10:02:09 +02:00
Christopher Faulet	6670e3e2bf	BUG/MEDIUM: mux-fcgi: Don't handle pending read0 too early on streams A read0 received on the connection must not be handled too early by FCGI streams. If the demux buffer is not empty, the pending read0 must not be considered. The FCGI streams must not be passed in half-closed remote state in fcgi_strm_wake_one_stream() and the CS_FL_EOS flag must not be set on the associated conn-stream in fcgi_rcv_buf(). To sum up, it means, if there are still data pending in the demux buffer, no abort must be reported to the streams. To fix the issue, a dedicated function has been added, responsible for detecting pending read0 for a FCGI connection. A read0 is reported only if the demux buffer is empty. This function is used instead of conn_xprt_read0_pending() at some places. This patch should fix the issue #886. It must be backported as far as 2.1.	2020-10-09 10:02:00 +02:00
Pierre Cheynier	08eb718494	DOC: Add missing stats fields in the management doc Added latest fields: idle_conn_cur, safe_conn_cur, used_conn_cur, need_conn_est	2020-10-09 09:56:37 +02:00
Ilya Shipitsin	7aaadf5583	CI: travis-ci: help Coverity to detect BUG_ON() as a real stop Let's add DEBUG_STRICT=1 to coverity build definition. Hopefully, it will resolve 1 coverity issue.	2020-10-09 09:55:53 +02:00
Brad Smith	0fdfe4179e	BUILD: makefile: Update feature flags for NetBSD This updates the feature flags for NetBSD. NetBSD 8 adds support for accept4(). Enable getaddrinfo().	2020-10-09 09:53:56 +02:00
Willy Tarreau	4a539f343a	BUG/MINOR: makefile: fix a tiny typo in the target list Previous commit `382001b46` ("BUILD: Add a DragonFlyBSD target") introduced a tiny typo in the target list ("iopenbs" vs "openbsd"). This will have to be backported if that patch is backported.	2020-10-09 05:58:40 +02:00
Brad Smith	382001b46b	BUILD: Add a DragonFlyBSD target Add a target for DragonFlyBSD 4.3 and above.	2020-10-08 20:54:18 +02:00
Willy Tarreau	465dc7d63f	DOC: fix a confusing typo on a regsub example S�bastien reported a confusing example in the doc about regsub when used with quotes. Nested quotes are already not trivial to grasp, but when typos are there and result in something valid, it's even worse. The closing quote ought to have been inside the brackets. However haproxy will not make any difference because the single quotes delimit a word and the delimited word remains the same. Let's just not add yet another level of confusion.	2020-10-08 18:05:56 +02:00
Emeric Brun	da46c1ca60	DOC: re-work log forward bind statement documentation. This patch re-work the documentation about the bind statement of log forward section.	2020-10-08 08:54:04 +02:00
Emeric Brun	b0c331f71f	BUG/MINOR: proxy/log: frontend/backend and log forward names must differ This patch disallow to use same name for a log forward section and a frontend/backend section.	2020-10-08 08:53:26 +02:00
Emeric Brun	cbb7bf7dd1	MEDIUM: log: syslog TCP support on log forward section. This patch re-introduce the "bind" statement on log forward sections to handle syslog TCP listeners as defined in rfc-6587. As complement it introduce "maxconn", "backlog" and "timeout client" statements to parameter those listeners.	2020-10-07 17:17:27 +02:00
Emeric Brun	6d75616951	MINOR: channel: new getword and getchar functions on channel. This patch adds two new functions to get a char or a word from a channel.	2020-10-07 17:17:27 +02:00
Emeric Brun	2897644ae5	MINOR: stats: inc req counter on listeners. This patch enables count of requests for listeners if listener's counters are enabled.	2020-10-07 17:17:27 +02:00
Emeric Brun	c47ba59d1e	BUG/MEDIUM: log: old processes with log foward section don't die on soft stop. Old processes didn't die if a log foward section is declared and a soft stop is requested. This patch fix this issue and should be backpored in banches including the log forward feature.	2020-10-07 17:17:27 +02:00

... 8 9 10 11 12 ...

13337 Commits