This patch adds support for a new "drain" mode. So now we have 3 admin
modes for a server :
- READY
- DRAIN
- MAINT
The drain mode disables load balancing but leaves the server up. It can
coexist with maint, except that maint has precedence. It is also inherited
from tracked servers, so just like maint, it's represented with 2 bits.
New functions were designed to set/clear each flag and to propagate the
changes to tracking servers when relevant, and to log the changes. Existing
functions srv_set_adm_maint() and srv_set_adm_ready() were replaced to make
use of the new functions.
Currently the drain mode is not yet used, however the whole logic was tested
with all combinations of set/clear of both flags in various orders to catch
all corner cases.
srv_is_usable() is broader than srv_is_usable() as it not only considers
the weight but the server's state as well. Future changes will allow a
server to be in drain mode with a non-zero weight, so we should migrate
to use that function instead.
Now that servers have their own states, let's report this one instead of
following the tracked server chain and reporting the tracked server's.
However the tracked server is still used to report x/y when a server is
going up or down. When the agent reports a down state, this one is still
enforced.
Function check_set_server_drain() used to set a server into stopping state.
Now it first checks if all configured checks are UP, and if the possibly
tracked servers is not stopped, and only calls set_srv_stopping() after
that. That also simplified the conditions to call the function, and its
logic. The function was also renamed check_notify_stopping() to better
report this change.
Function check_set_server_up() used to set a server up. Now it first
checks if all configured checks are UP, and if all tracked servers are
UP, and only calls set_srv_running() after that. That also simplified
the conditions to call the function, and its logic. The function was
also renamed check_notify_success() to better report this change.
Function check_set_server_down() used to set a server down. Now it first
checks if the health check's result differs from the server's state, and
only calls srv_set_stopped() if the check reports a failure while the
server is not down. Thanks to this, the conditions that were present
around its call could be removed. The function was also renamed
check_notify_failure() to better report this change.
This function was taken from check_set_server_drain(). It does not
consider health checks at all and only sets a server to stopping
provided it's not in maintenance and is not currently stopped. The
resulting state will be STOPPING. The state change is propagated
to tracked servers.
For now the function is not used, but the goal is to split health
checks status from server status and to be able to change a server's
state regardless of health checks statuses.
This function was taken from check_set_server_up(). It does not consider
health checks at all and only sets a server up provided it's not in
maintenance. The resulting state may be either RUNNING or STARTING
depending on the presence of a slowstart or not. The state change is
propagated to tracked servers.
For now the function is not used, but the goal is to split health
checks status from server status and to be able to change a server's
state regardless of health checks statuses.
This function was extracted from check_set_server_down(). In only
manipulates the server state and does not consider the health checks
at all, nor does it modify their status. It takes a reason message to
report in logs, however it passes NULL when recursing through the
trackers chain.
For now the function is not used, but the goal is to split health
checks status from server status and to be able to change a server's
state regardless of health checks statuses.
check_report_srv_status() was removed in favor of check_reason_string()
combined with srv_report_status(). This way we have one function which
is dedicated to check decoding, and another one dedicated to server
status.
srv_adm_append_status() was renamed srv_append_status() since it's no
more dedicated to maintenance mode. It now supports a reason which if
not null is appended to the output string.
We don't want to manipulate check's statuses anymore in functions
which modify the server's state. So since any check is forced to
call set_server_check_status() exactly once to report the result
of the check, it's the best place to update the check's health.
We don't have to handle the maintenance transition here anymore so we
can simplify the functions and conditions. This also means that we don't
need the disable/enable functions but only a function to switch to each
new state.
It's worth mentionning that at this stage there are still confusions
between the server state and the checks states. For example, the health
check's state is adjusted from tracked servers changing state, while it
should not be.
Now that it is possible to know whether a server is in forced maintenance
or inherits its maintenance status from another one, it is possible to
allow server tracking at more than one level. We still provide a loop
detection however.
Note that for the stats it's a bit trickier since we have to report the
check state which corresponds to the state of the server at the end of
the chain.
This change now involves a new flag SRV_ADMF_IMAINT to note that the
maintenance status of a server is inherited from another server. Thus,
we know at each server level in the chain if it's running, in forced
maintenance or in a maintenance status because it tracks another server,
or even in both states.
Disabling a server propagates this flag down to other servers. Enabling
a server flushes the flag down. A server becomes up again once both of
its flags are cleared.
Two new functions "srv_adm_set_maint()" and "srv_adm_set_ready()" are used to
manipulate this maintenance status. They're used by the CLI and the stats
page.
Now the stats page always says "MAINT" instead of "MAINT(via)" and it's
only the chk/down field which reports "via x/y" when the status is
inherited from another server, but it doesn't say it when a server was
forced into maintenance. The CSV output indicates "MAINT (via x/y)"
instead of only "MAINT(via)". This is the most accurate representation.
One important thing is that now entering/leaving maintenance for a
tracking server correctly follows the state of the tracked server.
Checks.c has become a total mess. A number of proxy or server maintenance
and queue management functions were put there probably because they were
used there, but that makes the code untouchable. And that's without saying
that their names does not always relate to what they really do!
So let's do a first pass by moving these ones :
- set_backend_down() => backend.c
- redistribute_pending() => queue.c:pendconn_redistribute()
- check_for_pending() => queue.c:pendconn_grab_from_px()
- shutdown_sessions => server.c:srv_shutdown_sessions()
- shutdown_backup_sessions => server.c:srv_shutdown_backup_sessions()
All of them were moved at once.
Servers used to have 3 flags to store a state, now they have 4 states
instead. This avoids lots of confusion for the 4 remaining undefined
states.
The encoding from the previous to the new states can be represented
this way :
SRV_STF_RUNNING
| SRV_STF_GOINGDOWN
| | SRV_STF_WARMINGUP
| | |
0 x x SRV_ST_STOPPED
1 0 0 SRV_ST_RUNNING
1 0 1 SRV_ST_STARTING
1 1 x SRV_ST_STOPPING
Note that the case where all bits were set used to exist and was randomly
dealt with. For example, the task was not stopped, the throttle value was
still updated and reported in the stats and in the http_server_state header.
It was the same if the server was stopped by the agent or for maintenance.
It's worth noting that the internal function names are still quite confusing.
Now we introduce srv->admin and srv->prev_admin which are bitfields
containing one bit per source of administrative status (maintenance only
for now). For the sake of backwards compatibility we implement a single
source (ADMF_FMAINT) but the code already checks any source (ADMF_MAINT)
where the STF_MAINTAIN bit was previously checked. This will later allow
us to add ADMF_IMAINT for maintenance mode inherited from tracked servers.
Along doing these changes, it appeared that some places will need to be
revisited when implementing the inherited bit, this concerns all those
modifying the ADMF_FMAINT bit (enable/disable actions on the CLI or stats
page), and the checks to report "via" on the stats page. But currently
the code is harmless.
Till now, the server's state and flags were all saved as a single bit
field. It causes some difficulties because we'd like to have an enum
for the state and separate flags.
This commit starts by splitting them in two distinct fields. The first
one is srv->state (with its counter-part srv->prev_state) which are now
enums, but which still contain bits (SRV_STF_*).
The flags now lie in their own field (srv->flags).
The function srv_is_usable() was updated to use the enum as input, since
it already used to deal only with the state.
Note that currently, the maintenance mode is still in the state for
simplicity, but it must move as well.
Twice in a week I found people were surprized by a "conditional timeout" not
being respected, because they add "if <cond>" after a timeout, and since
they don't see any error nor read the doc, the expect it to work. Let's
make the timeout parser reject extra arguments to avoid these situations.
The DRAIN status is not inherited between tracked servers, so the stats
page must only use the reported server's status and not the tracked
server's status, otherwise it misleadingly indicates a DRAIN state when
a server tracks a draining server, while this is wrong.
As more or less suspected, commit b1982e2 ("BUG/MEDIUM: http/session:
disable client-side expiration only after body") was hazardous. It
introduced a regression causing client side timeout to expire during
connection retries if it's lower than the time needed to cover the
amount of retries, so clients get a 408 when the connection to the
server fails to establish fast enough.
The reason is that the CF_READ_NOEXP flag is set after the MSG_DONE state
is reached, which protects the timeout from being re-armed, then during
the retries, process_session() clears the flag without calling the analyser
(since there's no activity for it), so the timeouts are rearmed.
Ideally, these one-shot flags should be per-analyser, and the analyser
which sets them would be responsible for clearing them, or they would
automatically be cleared when switching to another analyser. Unfortunately
this is not really possible currently.
What can be done however is to only clear them in the following situations :
- we're going to call analysers
- analysers have all been unsubscribed
This method seems reliable enough and approaches the ideal case well enough.
No backport is needed, this bug was introduced in 1.5-dev25.
When run in daemon mode (i.e. with at least one forked process) and using
the epoll poller, sending USR1 (graceful shutdown) to the worker processes
can cause some workers to start running at 100% CPU. Precondition is having
an established HTTP keep-alive connection when the signal is received.
The cloned (during fork) listening sockets do not get closed in the parent
process, thus they do not get removed from the epoll set automatically
(see man 7 epoll). This can lead to the process receiving epoll events
that it doesn't feel responsible for, resulting in an endless loop around
epoll_wait() delivering these events.
The solution is to explicitly remove these file descriptors from the epoll
set. To not degrade performance, care was taken to only do this when
neccessary, i.e. when the file descriptor was cloned during fork.
Signed-off-by: Conrad Hoffmann <conrad@soundcloud.com>
[wt: a backport to 1.4 could be studied though chances to catch the bug are low]
This is a minor fix, but the SSL_CTX_set_options() and
SSL_CTX_set_mode() functions take a long, not an int parameter. As
SSL_OP_ALL is now (since OpenSSL 1.0.0) defined as 0x80000BFFL, I think
it is worth fixing.
Thomas Heil reported that previous commit 07fcaaa ("MINOR: fix a few
memory usage errors") make haproxy crash when req* rules are used. As
diagnosed by Cyril Bonté, this commit introduced a regression which
makes haproxy free the memory areas allocated for regex even when
they're going to be used, resulting in the crashes.
This patch does three things :
- undo the free() on the valid path
- add regfree() on the error path but only when regcomp() succeeds
- rename err_code to ret_code to avoid confusing the valid return
path with an error path.
We used to call srv_is_usable() with either the current state and weights
or the previous ones. This causes trouble for future changes, so let's first
split it in two variants :
- srv_is_usable(srv) considers the current status
- srv_was_usable(srv) considers the previous status
Detecting that a server's status has changed is a bit messy, as well
as it is to commit the status changes. We'll have to add new conditions
soon and we'd better avoid to multiply the number of touched locations
with the high risk of forgetting them.
This commit introduces :
- srv_lb_status_changed() to report if the status changed from the
previously committed one ;
- svr_lb_commit_status() to commit the current status
The function is now used by all load-balancing algorithms.
This flag is only a copy of (srv->uweight == 0), so better get rid of
it to reduce some of the confusion that remains in the code, and use
a simple function to return this state based on this weight instead.
Function set_server_check_status() is very weird. It is called at the
end of a check to update the server's state before the new state is even
calculated, and possibly to log status changes, only if the proxy has
"option log-health-checks" set.
In order to do so, it employs an exhaustive list of the combinations
which can lead to a state change, while in practice almost all of
them may simply be deduced from the change of check status. Better,
some changes of check status are currently not detected while they
can be very valuable (eg: changes between L4/L6/TOUT/HTTP 500 for
example).
The doc was updated to reflect this.
Also, a minor change was made to consider s->uweight and not s->eweight
as meaning "DRAIN" since eweight can be null without the DRAIN mode (eg:
throttle, NOLB, ...).
Having both "active or backup DOWN" and "not checked" on the left side of
the color caption inflates the whole header block for no reason. Simply
move them both on the same line and reduce the header height.
Abuse of copy-paste has made "tcp-check expect binary" to consider a
buffer starting with \0 as empty! Thanks to Lukas Benes for reporting
this problem and confirming the fix.
This is 1.5-only, no backport is needed.
John-Paul Bader reported a stupid regression in 1.5-dev25, we
forget to check that global.stats_fe is initialized before visiting
its sockets, resulting in a crash.
No backport is needed.
It appears than many people considers that the default match for a fetch
returning string is "exact match string" aka "str". This patch set this
match as default for strings.
Commit f465994198 removed the "via" link when a tracking server is in maintenance, but
still calculated an empty link that no one can use. We can safely remove it.
The "via" column includes a link to the tracked server but instead of closing
the link with a </a> tag, a new tag is opened.
This typo should also be backported to 1.4
Whatever ACL option beginning with a '-' is considered as a pattern
if it does not match a known option. This is a big problem because
typos are silently ignored, such as "-" or "-mi".
Better clearly check the complete option name and report a parsing
error if the option is unknown.
Released version 1.5-dev25 with the following main changes :
- MEDIUM: connection: Implement and extented PROXY Protocol V2
- MINOR: ssl: clean unused ACLs declarations
- MINOR: ssl: adds fetchs and ACLs for ssl back connection.
- MINOR: ssl: merge client's and frontend's certificate functions.
- MINOR: ssl: adds ssl_f_sha1 fetch to return frontend's certificate fingerprint
- MINOR: ssl: adds sample converter base64 for binary type.
- MINOR: ssl: convert to binary ssl_fc_unique_id and ssl_bc_unique_id.
- BUG/MAJOR: ssl: Fallback to private session cache if current lock mode is not supported.
- MAJOR: ssl: Change default locks on ssl session cache.
- BUG/MINOR: chunk: Fix function chunk_strcmp and chunk_strcasecmp match a substring.
- MINOR: ssl: add global statement tune.ssl.force-private-cache.
- MINOR: ssl: remove fallback to SSL session private cache if lock init fails.
- BUG/MEDIUM: patterns: last fix was still not enough
- MINOR: http: export the smp_fetch_cookie function
- MINOR: http: generic pointer to rule argument
- BUG/MEDIUM: pattern: a typo breaks automatic acl/map numbering
- BUG/MAJOR: patterns: -i and -n are ignored for inlined patterns
- BUG/MINOR: proxy: unsafe initialization of HTTP transaction when switching from TCP frontend
- BUG/MINOR: http: log 407 in case of proxy auth
- MINOR: http: rely on the message body parser to send 100-continue
- MEDIUM: http: move reqadd after execution of http_request redirect
- MEDIUM: http: jump to dedicated labels after http-request processing
- BUG/MINOR: http: block rules forgot to increment the denied_req counter
- BUG/MINOR: http: block rules forgot to increment the session's request counter
- MEDIUM: http: move Connection header processing earlier
- MEDIUM: http: remove even more of the spaghetti in the request path
- MINOR: http: silently support the "block" action for http-request
- CLEANUP: proxy: rename "block_cond" to "block_rules"
- MEDIUM: http: emulate "block" rules using "http-request" rules
- MINOR: http: remove the now unused loop over "block" rules
- MEDIUM: http: factorize the "auth" action of http-request and stats
- MEDIUM: http: make http-request rules processing return a verdict instead of a rule
- MINOR: config: add minimum support for emitting warnings only once
- MEDIUM: config: inform the user about the deprecatedness of "block" rules
- MEDIUM: config: inform the user that "reqsetbe" is deprecated
- MEDIUM: config: inform the user only once that "redispatch" is deprecated
- MEDIUM: config: warn that '{cli,con,srv}timeout' are deprecated
- BUG/MINOR: auth: fix wrong return type in pat_match_auth()
- BUILD: config: remove a warning with clang
- BUG/MAJOR: http: connection setup may stall on balance url_param
- BUG/MEDIUM: http/session: disable client-side expiration only after body
- BUG/MEDIUM: http: correctly report request body timeouts
- BUG/MEDIUM: http: disable server-side expiration until client has sent the body
- MEDIUM: listener: make the accept function more robust against pauses
- BUILD: syscalls: remove improper inline statement in front of syscalls
- BUILD: ssl: SSL_CTX_set_msg_callback() needs openssl >= 0.9.7
- BUG/MAJOR: session: recover the correct connection pointer in half-initialized sessions
- DOC: add some explanation on the shared cache build options in the readme.
- MEDIUM: proxy: only adjust the backend's bind-process when already set
- MEDIUM: config: limit nbproc to the machine's word size
- MEDIUM: config: check the bind-process settings according to nbproc
- MEDIUM: listener: parse the new "process" bind keyword
- MEDIUM: listener: inherit the process mask from the proxy
- MAJOR: listener: only start listeners bound to the same processes
- MINOR: config: only report a warning when stats sockets are bound to more than 1 process
- CLEANUP: config: set the maxaccept value for peers listeners earlier
- BUG/MINOR: backend: only match IPv4 addresses with RDP cookies
- BUG/MINOR: checks: correctly configure the address family and protocol
- MINOR: tools: split is_addr() and is_inet_addr()
- MINOR: protocols: use is_inet_addr() when only INET addresses are desired
- MEDIUM: unix: add preliminary support for connecting to servers over UNIX sockets
- MEDIUM: checks: only complain about the missing port when the check uses TCP
- MEDIUM: unix: implement support for Linux abstract namespace sockets
- DOC: map_beg was missing from the table of map_* converters
- DOC: ebtree: indicate that prefix insertion/lookup may be used with strings
- MEDIUM: pattern: use ebtree's longest match to index/lookup string beginning
- BUILD: remove the obsolete BSD and OSX makefiles
- MEDIUM: unix: avoid a double connect probe when no data are sent
- DOC: stop referencing the slow git repository in the README
- BUILD: only build the systemd wrapper on Linux 2.6 and above
- DOC: update roadmap with completed tasks
- MEDIUM: session: implement half-closed timeouts (client-fin and server-fin)
Long-lived sessions are often subject to half-closed sessions resulting in
a lot of sessions appearing in FIN_WAIT state in the system tables, and no
way for haproxy to get rid of them. This typically happens because clients
suddenly disconnect without sending any packet (eg: FIN or RST was lost in
the path), and while the server detects this using an applicative heart
beat, haproxy does not close the connection.
This patch adds two new timeouts : "timeout client-fin" and
"timeout server-fin". The former allows one to override the client-facing
timeout when a FIN has been received or sent. The latter does the same for
server-facing connections, which is less useful.
Attempting to build haproxy-systemd-wrapper on non-linux platforms
sometimes results in build errors. Better move it into an EXTRA
variable which is set to haproxy-systemd-wrapper only on Linux 2.6
and above. Proceeding this way also allows to disable building it
in quick builds (eg: when developing).
git.1wt.eu is painfully slow and some people experience issues with
it. Better hide it and only advertise git.haproxy.org which is mirrored
on a faster server.
Also replace haproxy.1wt.eu with www.haproxy.org in the download URL
which appears in the stats page.
Plain "tcp" health checks sent to a unix socket cause two connect()
calls to be made, one to connect, and a second one to verify that the
connection properly established. But with unix sockets, we get
immediate notification of success, so we can avoid this second
attempt. However we need to ensure that we'll visit the connection
handler even if there's no remaining handshake pending, so for this
we claim we have some data to send in order to enable polling for
writes if there's no more handshake.
These makefiles were aging, with no support for SSL nor zlib, and
they were hard to maintain. GNU make is packaged and provided with
all systems which were relying on these makefiles, so better simply
delete them and enable the new features for everyone.
Being able to map prefixes to values is already used for IPv4/IPv6
but was not yet used with strings. It can be very convenient to map
directories to server farms but large lists may be slow.
By using ebmb_insert_prefix() and ebmb_lookup_longest(), we can
insert strings with their own length as a prefix, and lookup
candidate strings and ensure that the longest matching one will
be returned, which is the longest string matching the entry.
And indicate what is required for this (that the pattern is properly
terminated by a zero).
(cherry picked from commit c87c93800ce4045b1053302d99a3cd78321a7ec4)