These are mostly comments in the code. A few error messages were fixed
and are of low enough importance not to deserve a backport. Some regtests
were also fixed.
When parsing references to stick-tables declared as backends, they are added to
a list of proxies (they are proxies!) which refer to this stick-tables.
Before this patch we added them to these list without checking they were already
present, making the silly hypothesis the actions/sample were checked/resolved in the same
order the proxies are parsed.
This patch implement a simple inline function to in_proxies_list() to test
the presence of a proxy in a list of proxies. We use this function when resolving
/checking samples/actions.
This bug was introduced by 015e4d7 commit.
Must be backported to 2.0.
When running in HTX mode, if we sent the request, but failed to get the
answer, either because the server just closed its socket, we hit a server
timeout, or we get a 404, 408, 425, 500, 501, 502, 503 or 504 error,
attempt to retry the request, exactly as if we just failed to connect to
the server.
To do so, add a new backend keyword, "retry-on".
It accepts a list of keywords, which can be "none" (never retry),
"conn-failure" (we failed to connect, or to do the SSL handshake),
"empty-response" (the server closed the connection without answering),
"response-timeout" (we timed out while waiting for the server response),
or "404", "408", "425", "500", "501", "502", "503" and "504".
The default is "conn-failure".
Older compilers don't like to see "inline" placed after the type in a
function declaration, it must be "static inline <type>" only. This
patch touches various areas. The warnings were seen with gcc-3.4.
Historically the default frontend's maxconn used to be quite low (2000),
which was sufficient two decades ago but often proved to be a problem
when users had purposely set the global maxconn value but forgot to set
the frontend's.
There is no point in keeping this arbitrary limit for frontends : when
the global maxconn is lower, it's already too high and when the global
maxconn is much higher, it becomes a limiting factor which causes trouble
in production.
This commit allows the value to be set to zero, which becomes the new
default value, to mean it's not directly limited, or in fact it's set
to the global maxconn. Since this operation used to be performed before
computing a possibly automatic global maxconn based on memory limits,
the calculation of the maxconn value and its propagation to the backends'
fullconn has now moved to a dedicated function, proxy_adjust_all_maxconn(),
which is called once the global maxconn is stabilized.
This comes with two benefits :
1) a configuration missing "maxconn" in the defaults section will not
limit itself to a magically hardcoded value but will scale up to the
global maxconn ;
2) when the global maxconn is not set and memory limits are used instead,
the frontends' maxconn automatically adapts, and the backends' fullconn
as well.
This was the largest function of the whole file, taking a rough second
to build alone. Let's move it to a distinct file along with a few
dependencies. Doing so saved about 2 seconds on the total build time.
This function now captures an error regardless of its side and protocol.
The caller must pass a number of elements and may pass a protocol-specific
structure and a callback to display it. Later this function may deal with
more advanced allocation techniques to avoid allocating as many buffers
as proxies.
In preparation for thread-specific runqueues, change the task API so that
the callback takes 3 arguments, the task itself, the context, and the state,
those were retrieved from the task before. This will allow these elements to
change atomically in the scheduler while the application uses the copied
value, and even to have NULL tasks later.
Rename the global variable "proxy" to "proxies_list".
There's been multiple proxies in haproxy for quite some time, and "proxy"
is a potential source of bugs, a number of functions have a "proxy" argument,
and some code used "proxy" when it really meant "px" or "curproxy". It worked
by pure luck, because it usually happened while parsing the config, and thus
"proxy" pointed to the currently parsed proxy, but we should probably not
rely on this.
[wt: some of these are definitely fixes that are worth backporting]
Now, each proxy contains a lock that must be used when necessary to protect
it. Moreover, all proxy's counters are now updated using atomic operations.
First, we use atomic operations to update jobs/totalconn/actconn variables,
listener's nbconn variable and listener's counters. Then we add a lock on
listeners to protect access to their information. And finally, listener queues
(global and per proxy) are also protected by a lock. Here, because access to
these queues are unusal, we use the same lock for all queues instead of a global
one for the global queue and a lock per proxy for others.
When running with multiple process, if some proxies are just assigned
to some processes, the other processes will just close the file descriptors
for the listening sockets. However, we may still have to provide those
sockets when reloading, so instead we just try hard to pretend those proxies
are dead, while keeping the sockets opened.
A new global option, no-reused-socket", has been added, to restore the old
behavior of closing the sockets not bound to this process.
This adds 3 new commands to the cli :
enable dynamic-cookie backend <backend> that enables dynamic cookies for a
specified backend
disable dynamic-cookie backend <backend> that disables dynamic cookies for a
specified backend
set dynamic-cookie-key backend <backend> that lets one change the dynamic
cookie secret key, for a specified backend.
Several CLI commands require a frontend, so let's have a function to
look this one up and prepare the appropriate error message and the
appctx's state in case of failure.
This function tries to spot a proxy by its name, ID and type, and
in case some elements don't match, it tries to determine which ones
could be ignored and reports which ones were ignored so that the
caller can decide whether or not it wants to pick this proxy. This
will be used for maintaining the status across reloads where the
config might have changed a bit.
It does the same as the other one except that it only focuses on the
numeric ID and the capabilities. It's used by proxy_find_by_name()
for numeric names.
First, findproxy() was renamed proxy_find_by_name() so that its explicit
that a name is required for the lookup. Second, we give this function
the ability to search for tables if needed. Third we now provide inline
wrappers to pass the appropriate PR_CAP_* flags and to explicitly look
up a frontend, backend or table.
With HTTP/2, we'll have to support multiplexed streams. A stream is in
fact the largest part of what we currently call a session, it has buffers,
logs, etc.
In order to catch any error, this commit removes any reference to the
struct session and tries to rename most "session" occurrences in function
names to "stream" and "sess" to "strm" when that's related to a session.
The files stream.{c,h} were added and session.{c,h} removed.
The session will be reintroduced later and a few parts of the stream
will progressively be moved overthere. It will more or less contain
only what we need in an embryonic session.
Sample fetch functions and converters will have to change a bit so
that they'll use an L5 (session) instead of what's currently called
"L4" which is in fact L6 for now.
Once all changes are completed, we should see approximately this :
L7 - http_txn
L6 - stream
L5 - session
L4 - connection | applet
There will be at most one http_txn per stream, and a same session will
possibly be referenced by multiple streams. A connection will point to
a session and to a stream. The session will hold all the information
we need to keep even when we don't yet have a stream.
Some more cleanup is needed because some code was already far from
being clean. The server queue management still refers to sessions at
many places while comments talk about connections. This will have to
be cleaned up once we have a server-side connection pool manager.
Stream flags "SN_*" still need to be renamed, it doesn't seem like
any of them will need to move to the session.
Large configurations can take time to parse when thousands of backends
are in use. Let's store all the proxies in trees.
findproxy_mode() has been modified to use the tree for lookups, which
has divided the parsing time by about 2.5. But many lookups are still
present at many places and need to be dealt with.
Tunnel timeouts are used when TCP connections are forwarded, or
when forwarding upgraded HTTP connections (WebSocket) as well as
CONNECT requests to proxies.
This timeout allows long-lived sessions to be supported without
having to set large timeouts to normal requests.
The stats socket now allows the admin to disable, enable or shutdown a frontend.
This can be used when a bug is discovered in a configuration and it's desirable
to fix it but the rules in place don't allow to change a running config. Thus it
becomes possible to kill the frontend to release the port and start a new one in
a separate process.
This can also be used to temporarily make haproxy return TCP resets to incoming
requests to pretend the service is not bound. For instance, this may be useful
to quickly flush a very deep SYN backlog.
The frontend check and lookup code was factored with the "set maxconn" usage.
This one enforces a per-process connection rate limit, regardless of what
may be set per frontend. It can be a way to limit the CPU usage of a process
being severely attacked.
The side effect is that the global process connection rate is now measured
for each incoming connection, so it will be possible to report it.
This was revealed with one of the very latest patches which caused
the listener_queue not to be initialized on the stats socket frontend.
And in fact a number of other ones were missing too. This is getting so
boring that now we'll always make use of the same function to initialize
any proxy. Doing so has even saved about 500 bytes on the binary due to
the avoided code redundancy.
No backport is needed.
This function is finally not needed anymore, as it has been replaced with
a per-proxy task that is scheduled when some limits are encountered on
incoming connections or when the process is stopping. The savings should
be noticeable on configs with a large number of proxies. The most important
point is that the rate limiting is now enforced in a clean and solid way.
Managing listeners state is difficult because they have their own state
and can at the same time have theirs dictated by their proxy. The pause
is not done properly, as the proxy code is fiddling with sockets. By
introducing new functions such as pause_listener()/resume_listener(), we
make it a bit more obvious how/when they're supposed to be used. The
listen_proxies() function was also renamed to resume_proxies() since
it's only used for pause/resume.
This patch is the first in a series aiming at getting rid of the maintain_proxies
mess. In the end, proxies should not call enable_listener()/disable_listener()
anymore.
It's very annoying that frontend and backend stats are merged because we
don't know what we're observing. For instance, if a "listen" instance
makes use of a distinct backend, it's impossible to know what the bytes_out
means.
Some points take care of not updating counters twice if the backend points
to the frontend, indicating a "listen" instance. The thing becomes more
complex when we try to add support for server side keep-alive, because we
have to maintain a pointer to the backend used for last request, and to
update its stats. But we can't perform such comparisons anymore because
the counters will not match anymore.
So in order to get rid of this situation, let's have both frontend AND
backend stats in the "struct proxy". We simply update the relevant ones
during activity. Some of them are only accounted for in the backend,
while others are just for frontend. Maybe we can improve a bit on that
later, but the essential part is that those counters now reflect what
they really mean.
Now we're able to reject connections very early, so we need to use a
different counter for the connections that are received and the ones
that are accepted and converted into sessions, so that the rate limits
can still apply to the accepted ones. The session rate must still be
used to compute the rate limit, so that we can reject undesired traffic
without affecting the rate.
Right now, an HTTP server cannot track a TCP server and vice-versa.
This patch enables proxy tracking without relying on the proxy's mode
(tcp/http/health). It only requires a matching proxy name to exist. The
original function was renamed to findproxy_mode().
int get_backend_server(const char *bk_name, const char *sv_name,
struct proxy **bk, struct server **sv);
This function scans the list of backends and servers to retrieve the first
backend and the first server with the given names, and sets them in both
parameters. It returns zero if either is not found, or non-zero and sets
the ones it did not found to NULL. If a NULL pointer is passed for the
backend, only the pointer to the server will be updated.
There are a few remaining max values that need to move to counters.
Also, the counters are more often used than some config information,
so get them closer to the other useful struct members for better cache
efficiency.
This patch allows to collect & provide separate statistics for each socket.
It can be very useful if you would like to distinguish between traffic
generate by local and remote users or between different types of remote
clients (peerings, domestic, foreign).
Currently no "Session rate" is supported, but adding it should be possible
if we found it useful.
The HTTP processing has been splitted into 7 steps, one of which
is not anymore HTTP-specific (content-switching). That way, it
becomes possible to use "use_backend" rules in TCP mode. A new
"use_server" directive should follow soon.
This is a first step towards support of multiple configuration files.
Now readcfgfile() only reads a file in memory and performs very minimal
parsing. The checks are performed afterwards.
Some users want to keep the max sessions/s seen on servers, frontends
and backends for capacity planning. It's easy to grab it while the
session count is updated, so let's keep it.
With this change, all frontends, backends, and servers maintain a session
counter and a timer to compute a session rate over the last second. This
value will be very useful because it varies instantly and can be used to
check thresholds. This value is also reported in the stats in a new "rate"
column.
There is a problem when an instance is marked "disabled". Its ports are
still bound but will not be unbound upon termination. This causes processes
to accumulate during soft restarts, and might even cause failures to restart
new ones due to the inability to bind to the same port.
The ideal solution would be to bind all ports at the end of the configuration
parsing. An acceptable workaround is to unbind all listeners of disabled
proxies. This is what the current patch does.
(cherry picked from commit a944218e9c)
(cherry picked from commit 8cfebbb82b87345bade831920177077e7d25840a)
This is the first attempt at moving all internal parts from
using struct timeval to integer ticks. Those provides simpler
and faster code due to simplified operations, and this change
also saved about 64 bytes per session.
A new header file has been added : include/common/ticks.h.
It is possible that some functions should finally not be inlined
because they're used quite a lot (eg: tick_first, tick_add_ifset
and tick_is_expired). More measurements are required in order to
decide whether this is interesting or not.
Some function and variable names are still subject to change for
a better overall logics.
This patch implements ability to set the current state of one server
by tracking another one. It:
- adds two variables: *tracknext, *tracked to struct server
- implements findserver(), similar to findproxy()
- adds "track" keyword accepting both "proxy/server" and "server" (assuming current proxy)
- verifies if both checks and tracking is not enabled at the same time
- changes set_server_down() to notify tracking server
- creates set_server_up(), set_server_disabled(), set_server_enabled() by
moving the code from process_chk() and adding notifications
- changes stats to show a name of tracked server instead of Chk/Dwn/Dwntime(html)
or by adding new variable (csv)
Changes from the previuos version:
- it is possibile to track independently of the declaration order
- one extra comma bug is fixed
- new condition to check if there is no disable-on-404 inconsistency