2482 lines
127 KiB
Plaintext
2482 lines
127 KiB
Plaintext
------------------------
|
|
HAProxy Management Guide
|
|
------------------------
|
|
version 1.7
|
|
|
|
|
|
This document describes how to start, stop, manage, and troubleshoot HAProxy,
|
|
as well as some known limitations and traps to avoid. It does not describe how
|
|
to configure it (for this please read configuration.txt).
|
|
|
|
Note to documentation contributors :
|
|
This document is formatted with 80 columns per line, with even number of
|
|
spaces for indentation and without tabs. Please follow these rules strictly
|
|
so that it remains easily printable everywhere. If you add sections, please
|
|
update the summary below for easier searching.
|
|
|
|
|
|
Summary
|
|
-------
|
|
|
|
1. Prerequisites
|
|
2. Quick reminder about HAProxy's architecture
|
|
3. Starting HAProxy
|
|
4. Stopping and restarting HAProxy
|
|
5. File-descriptor limitations
|
|
6. Memory management
|
|
7. CPU usage
|
|
8. Logging
|
|
9. Statistics and monitoring
|
|
9.1. CSV format
|
|
9.2. Typed output format
|
|
9.3. Unix Socket commands
|
|
10. Tricks for easier configuration management
|
|
11. Well-known traps to avoid
|
|
12. Debugging and performance issues
|
|
13. Security considerations
|
|
|
|
|
|
1. Prerequisites
|
|
----------------
|
|
|
|
In this document it is assumed that the reader has sufficient administration
|
|
skills on a UNIX-like operating system, uses the shell on a daily basis and is
|
|
familiar with troubleshooting utilities such as strace and tcpdump.
|
|
|
|
|
|
2. Quick reminder about HAProxy's architecture
|
|
----------------------------------------------
|
|
|
|
HAProxy is a single-threaded, event-driven, non-blocking daemon. This means is
|
|
uses event multiplexing to schedule all of its activities instead of relying on
|
|
the system to schedule between multiple activities. Most of the time it runs as
|
|
a single process, so the output of "ps aux" on a system will report only one
|
|
"haproxy" process, unless a soft reload is in progress and an older process is
|
|
finishing its job in parallel to the new one. It is thus always easy to trace
|
|
its activity using the strace utility.
|
|
|
|
HAProxy is designed to isolate itself into a chroot jail during startup, where
|
|
it cannot perform any file-system access at all. This is also true for the
|
|
libraries it depends on (eg: libc, libssl, etc). The immediate effect is that
|
|
a running process will not be able to reload a configuration file to apply
|
|
changes, instead a new process will be started using the updated configuration
|
|
file. Some other less obvious effects are that some timezone files or resolver
|
|
files the libc might attempt to access at run time will not be found, though
|
|
this should generally not happen as they're not needed after startup. A nice
|
|
consequence of this principle is that the HAProxy process is totally stateless,
|
|
and no cleanup is needed after it's killed, so any killing method that works
|
|
will do the right thing.
|
|
|
|
HAProxy doesn't write log files, but it relies on the standard syslog protocol
|
|
to send logs to a remote server (which is often located on the same system).
|
|
|
|
HAProxy uses its internal clock to enforce timeouts, that is derived from the
|
|
system's time but where unexpected drift is corrected. This is done by limiting
|
|
the time spent waiting in poll() for an event, and measuring the time it really
|
|
took. In practice it never waits more than one second. This explains why, when
|
|
running strace over a completely idle process, periodic calls to poll() (or any
|
|
of its variants) surrounded by two gettimeofday() calls are noticed. They are
|
|
normal, completely harmless and so cheap that the load they imply is totally
|
|
undetectable at the system scale, so there's nothing abnormal there. Example :
|
|
|
|
16:35:40.002320 gettimeofday({1442759740, 2605}, NULL) = 0
|
|
16:35:40.002942 epoll_wait(0, {}, 200, 1000) = 0
|
|
16:35:41.007542 gettimeofday({1442759741, 7641}, NULL) = 0
|
|
16:35:41.007998 gettimeofday({1442759741, 8114}, NULL) = 0
|
|
16:35:41.008391 epoll_wait(0, {}, 200, 1000) = 0
|
|
16:35:42.011313 gettimeofday({1442759742, 11411}, NULL) = 0
|
|
|
|
HAProxy is a TCP proxy, not a router. It deals with established connections that
|
|
have been validated by the kernel, and not with packets of any form nor with
|
|
sockets in other states (eg: no SYN_RECV nor TIME_WAIT), though their existence
|
|
may prevent it from binding a port. It relies on the system to accept incoming
|
|
connections and to initiate outgoing connections. An immediate effect of this is
|
|
that there is no relation between packets observed on the two sides of a
|
|
forwarded connection, which can be of different size, numbers and even family.
|
|
Since a connection may only be accepted from a socket in LISTEN state, all the
|
|
sockets it is listening to are necessarily visible using the "netstat" utility
|
|
to show listening sockets. Example :
|
|
|
|
# netstat -ltnp
|
|
Active Internet connections (only servers)
|
|
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
|
|
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1629/sshd
|
|
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 2847/haproxy
|
|
tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 2847/haproxy
|
|
|
|
|
|
3. Starting HAProxy
|
|
-------------------
|
|
|
|
HAProxy is started by invoking the "haproxy" program with a number of arguments
|
|
passed on the command line. The actual syntax is :
|
|
|
|
$ haproxy [<options>]*
|
|
|
|
where [<options>]* is any number of options. An option always starts with '-'
|
|
followed by one of more letters, and possibly followed by one or multiple extra
|
|
arguments. Without any option, HAProxy displays the help page with a reminder
|
|
about supported options. Available options may vary slightly based on the
|
|
operating system. A fair number of these options overlap with an equivalent one
|
|
if the "global" section. In this case, the command line always has precedence
|
|
over the configuration file, so that the command line can be used to quickly
|
|
enforce some settings without touching the configuration files. The current
|
|
list of options is :
|
|
|
|
-- <cfgfile>* : all the arguments following "--" are paths to configuration
|
|
file/directory to be loaded and processed in the declaration order. It is
|
|
mostly useful when relying on the shell to load many files that are
|
|
numerically ordered. See also "-f". The difference between "--" and "-f" is
|
|
that one "-f" must be placed before each file name, while a single "--" is
|
|
needed before all file names. Both options can be used together, the
|
|
command line ordering still applies. When more than one file is specified,
|
|
each file must start on a section boundary, so the first keyword of each
|
|
file must be one of "global", "defaults", "peers", "listen", "frontend",
|
|
"backend", and so on. A file cannot contain just a server list for example.
|
|
|
|
-f <cfgfile|cfgdir> : adds <cfgfile> to the list of configuration files to be
|
|
loaded. If <cfgdir> is a directory, all the files (and only files) it
|
|
contains are added in lexical order (using LC_COLLATE=C) to the list of
|
|
configuration files to be loaded ; only files with ".cfg" extension are
|
|
added, only non hidden files (not prefixed with ".") are added.
|
|
Configuration files are loaded and processed in their declaration order.
|
|
This option may be specified multiple times to load multiple files. See
|
|
also "--". The difference between "--" and "-f" is that one "-f" must be
|
|
placed before each file name, while a single "--" is needed before all file
|
|
names. Both options can be used together, the command line ordering still
|
|
applies. When more than one file is specified, each file must start on a
|
|
section boundary, so the first keyword of each file must be one of
|
|
"global", "defaults", "peers", "listen", "frontend", "backend", and so on.
|
|
A file cannot contain just a server list for example.
|
|
|
|
-C <dir> : changes to directory <dir> before loading configuration
|
|
files. This is useful when using relative paths. Warning when using
|
|
wildcards after "--" which are in fact replaced by the shell before
|
|
starting haproxy.
|
|
|
|
-D : start as a daemon. The process detaches from the current terminal after
|
|
forking, and errors are not reported anymore in the terminal. It is
|
|
equivalent to the "daemon" keyword in the "global" section of the
|
|
configuration. It is recommended to always force it in any init script so
|
|
that a faulty configuration doesn't prevent the system from booting.
|
|
|
|
-Ds : work in systemd mode. Only used by the systemd wrapper.
|
|
|
|
-L <name> : change the local peer name to <name>, which defaults to the local
|
|
hostname. This is used only with peers replication.
|
|
|
|
-N <limit> : sets the default per-proxy maxconn to <limit> instead of the
|
|
builtin default value (usually 2000). Only useful for debugging.
|
|
|
|
-V : enable verbose mode (disables quiet mode). Reverts the effect of "-q" or
|
|
"quiet".
|
|
|
|
-c : only performs a check of the configuration files and exits before trying
|
|
to bind. The exit status is zero if everything is OK, or non-zero if an
|
|
error is encountered.
|
|
|
|
-d : enable debug mode. This disables daemon mode, forces the process to stay
|
|
in foreground and to show incoming and outgoing events. It is equivalent to
|
|
the "global" section's "debug" keyword. It must never be used in an init
|
|
script.
|
|
|
|
-dG : disable use of getaddrinfo() to resolve host names into addresses. It
|
|
can be used when suspecting that getaddrinfo() doesn't work as expected.
|
|
This option was made available because many bogus implementations of
|
|
getaddrinfo() exist on various systems and cause anomalies that are
|
|
difficult to troubleshoot.
|
|
|
|
-dM[<byte>] : forces memory poisoning, which means that each and every
|
|
memory region allocated with malloc() or pool_alloc2() will be filled with
|
|
<byte> before being passed to the caller. When <byte> is not specified, it
|
|
defaults to 0x50 ('P'). While this slightly slows down operations, it is
|
|
useful to reliably trigger issues resulting from missing initializations in
|
|
the code that cause random crashes. Note that -dM0 has the effect of
|
|
turning any malloc() into a calloc(). In any case if a bug appears or
|
|
disappears when using this option it means there is a bug in haproxy, so
|
|
please report it.
|
|
|
|
-dS : disable use of the splice() system call. It is equivalent to the
|
|
"global" section's "nosplice" keyword. This may be used when splice() is
|
|
suspected to behave improperly or to cause performance issues, or when
|
|
using strace to see the forwarded data (which do not appear when using
|
|
splice()).
|
|
|
|
-dV : disable SSL verify on the server side. It is equivalent to having
|
|
"ssl-server-verify none" in the "global" section. This is useful when
|
|
trying to reproduce production issues out of the production
|
|
environment. Never use this in an init script as it degrades SSL security
|
|
to the servers.
|
|
|
|
-db : disable background mode and multi-process mode. The process remains in
|
|
foreground. It is mainly used during development or during small tests, as
|
|
Ctrl-C is enough to stop the process. Never use it in an init script.
|
|
|
|
-de : disable the use of the "epoll" poller. It is equivalent to the "global"
|
|
section's keyword "noepoll". It is mostly useful when suspecting a bug
|
|
related to this poller. On systems supporting epoll, the fallback will
|
|
generally be the "poll" poller.
|
|
|
|
-dk : disable the use of the "kqueue" poller. It is equivalent to the
|
|
"global" section's keyword "nokqueue". It is mostly useful when suspecting
|
|
a bug related to this poller. On systems supporting kqueue, the fallback
|
|
will generally be the "poll" poller.
|
|
|
|
-dp : disable the use of the "poll" poller. It is equivalent to the "global"
|
|
section's keyword "nopoll". It is mostly useful when suspecting a bug
|
|
related to this poller. On systems supporting poll, the fallback will
|
|
generally be the "select" poller, which cannot be disabled and is limited
|
|
to 1024 file descriptors.
|
|
|
|
-m <limit> : limit the total allocatable memory to <limit> megabytes across
|
|
all processes. This may cause some connection refusals or some slowdowns
|
|
depending on the amount of memory needed for normal operations. This is
|
|
mostly used to force the processes to work in a constrained resource usage
|
|
scenario. It is important to note that the memory is not shared between
|
|
processes, so in a multi-process scenario, this value is first divided by
|
|
global.nbproc before forking.
|
|
|
|
-n <limit> : limits the per-process connection limit to <limit>. This is
|
|
equivalent to the global section's keyword "maxconn". It has precedence
|
|
over this keyword. This may be used to quickly force lower limits to avoid
|
|
a service outage on systems where resource limits are too low.
|
|
|
|
-p <file> : write all processes' pids into <file> during startup. This is
|
|
equivalent to the "global" section's keyword "pidfile". The file is opened
|
|
before entering the chroot jail, and after doing the chdir() implied by
|
|
"-C". Each pid appears on its own line.
|
|
|
|
-q : set "quiet" mode. This disables some messages during the configuration
|
|
parsing and during startup. It can be used in combination with "-c" to
|
|
just check if a configuration file is valid or not.
|
|
|
|
-sf <pid>* : send the "finish" signal (SIGUSR1) to older processes after boot
|
|
completion to ask them to finish what they are doing and to leave. <pid>
|
|
is a list of pids to signal (one per argument). The list ends on any
|
|
option starting with a "-". It is not a problem if the list of pids is
|
|
empty, so that it can be built on the fly based on the result of a command
|
|
like "pidof" or "pgrep".
|
|
|
|
-st <pid>* : send the "terminate" signal (SIGTERM) to older processes after
|
|
boot completion to terminate them immediately without finishing what they
|
|
were doing. <pid> is a list of pids to signal (one per argument). The list
|
|
is ends on any option starting with a "-". It is not a problem if the list
|
|
of pids is empty, so that it can be built on the fly based on the result of
|
|
a command like "pidof" or "pgrep".
|
|
|
|
-v : report the version and build date.
|
|
|
|
-vv : display the version, build options, libraries versions and usable
|
|
pollers. This output is systematically requested when filing a bug report.
|
|
|
|
A safe way to start HAProxy from an init file consists in forcing the daemon
|
|
mode, storing existing pids to a pid file and using this pid file to notify
|
|
older processes to finish before leaving :
|
|
|
|
haproxy -f /etc/haproxy.cfg \
|
|
-D -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)
|
|
|
|
When the configuration is split into a few specific files (eg: tcp vs http),
|
|
it is recommended to use the "-f" option :
|
|
|
|
haproxy -f /etc/haproxy/global.cfg -f /etc/haproxy/stats.cfg \
|
|
-f /etc/haproxy/default-tcp.cfg -f /etc/haproxy/tcp.cfg \
|
|
-f /etc/haproxy/default-http.cfg -f /etc/haproxy/http.cfg \
|
|
-D -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)
|
|
|
|
When an unknown number of files is expected, such as customer-specific files,
|
|
it is recommended to assign them a name starting with a fixed-size sequence
|
|
number and to use "--" to load them, possibly after loading some defaults :
|
|
|
|
haproxy -f /etc/haproxy/global.cfg -f /etc/haproxy/stats.cfg \
|
|
-f /etc/haproxy/default-tcp.cfg -f /etc/haproxy/tcp.cfg \
|
|
-f /etc/haproxy/default-http.cfg -f /etc/haproxy/http.cfg \
|
|
-D -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid) \
|
|
-f /etc/haproxy/default-customers.cfg -- /etc/haproxy/customers/*
|
|
|
|
Sometimes a failure to start may happen for whatever reason. Then it is
|
|
important to verify if the version of HAProxy you are invoking is the expected
|
|
version and if it supports the features you are expecting (eg: SSL, PCRE,
|
|
compression, Lua, etc). This can be verified using "haproxy -vv". Some
|
|
important information such as certain build options, the target system and
|
|
the versions of the libraries being used are reported there. It is also what
|
|
you will systematically be asked for when posting a bug report :
|
|
|
|
$ haproxy -vv
|
|
HA-Proxy version 1.6-dev7-a088d3-4 2015/10/08
|
|
Copyright 2000-2015 Willy Tarreau <willy@haproxy.org>
|
|
|
|
Build options :
|
|
TARGET = linux2628
|
|
CPU = generic
|
|
CC = gcc
|
|
CFLAGS = -pg -O0 -g -fno-strict-aliasing -Wdeclaration-after-statement \
|
|
-DBUFSIZE=8030 -DMAXREWRITE=1030 -DSO_MARK=36 -DTCP_REPAIR=19
|
|
OPTIONS = USE_ZLIB=1 USE_DLMALLOC=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1
|
|
|
|
Default settings :
|
|
maxconn = 2000, bufsize = 8030, maxrewrite = 1030, maxpollevents = 200
|
|
|
|
Encrypted password support via crypt(3): yes
|
|
Built with zlib version : 1.2.6
|
|
Compression algorithms supported : identity("identity"), deflate("deflate"), \
|
|
raw-deflate("deflate"), gzip("gzip")
|
|
Built with OpenSSL version : OpenSSL 1.0.1o 12 Jun 2015
|
|
Running on OpenSSL version : OpenSSL 1.0.1o 12 Jun 2015
|
|
OpenSSL library supports TLS extensions : yes
|
|
OpenSSL library supports SNI : yes
|
|
OpenSSL library supports prefer-server-ciphers : yes
|
|
Built with PCRE version : 8.12 2011-01-15
|
|
PCRE library supports JIT : no (USE_PCRE_JIT not set)
|
|
Built with Lua version : Lua 5.3.1
|
|
Built with transparent proxy support using: IP_TRANSPARENT IP_FREEBIND
|
|
|
|
Available polling systems :
|
|
epoll : pref=300, test result OK
|
|
poll : pref=200, test result OK
|
|
select : pref=150, test result OK
|
|
Total: 3 (3 usable), will use epoll.
|
|
|
|
The relevant information that many non-developer users can verify here are :
|
|
- the version : 1.6-dev7-a088d3-4 above means the code is currently at commit
|
|
ID "a088d3" which is the 4th one after after official version "1.6-dev7".
|
|
Version 1.6-dev7 would show as "1.6-dev7-8c1ad7". What matters here is in
|
|
fact "1.6-dev7". This is the 7th development version of what will become
|
|
version 1.6 in the future. A development version not suitable for use in
|
|
production (unless you know exactly what you are doing). A stable version
|
|
will show as a 3-numbers version, such as "1.5.14-16f863", indicating the
|
|
14th level of fix on top of version 1.5. This is a production-ready version.
|
|
|
|
- the release date : 2015/10/08. It is represented in the universal
|
|
year/month/day format. Here this means August 8th, 2015. Given that stable
|
|
releases are issued every few months (1-2 months at the beginning, sometimes
|
|
6 months once the product becomes very stable), if you're seeing an old date
|
|
here, it means you're probably affected by a number of bugs or security
|
|
issues that have since been fixed and that it might be worth checking on the
|
|
official site.
|
|
|
|
- build options : they are relevant to people who build their packages
|
|
themselves, they can explain why things are not behaving as expected. For
|
|
example the development version above was built for Linux 2.6.28 or later,
|
|
targeting a generic CPU (no CPU-specific optimizations), and lacks any
|
|
code optimization (-O0) so it will perform poorly in terms of performance.
|
|
|
|
- libraries versions : zlib version is reported as found in the library
|
|
itself. In general zlib is considered a very stable product and upgrades
|
|
are almost never needed. OpenSSL reports two versions, the version used at
|
|
build time and the one being used, as found on the system. These ones may
|
|
differ by the last letter but never by the numbers. The build date is also
|
|
reported because most OpenSSL bugs are security issues and need to be taken
|
|
seriously, so this library absolutely needs to be kept up to date. Seeing a
|
|
4-months old version here is highly suspicious and indeed an update was
|
|
missed. PCRE provides very fast regular expressions and is highly
|
|
recommended. Certain of its extensions such as JIT are not present in all
|
|
versions and still young so some people prefer not to build with them,
|
|
which is why the build status is reported as well. Regarding the Lua
|
|
scripting language, HAProxy expects version 5.3 which is very young since
|
|
it was released a little time before HAProxy 1.6. It is important to check
|
|
on the Lua web site if some fixes are proposed for this branch.
|
|
|
|
- Available polling systems will affect the process's scalability when
|
|
dealing with more than about one thousand of concurrent connections. These
|
|
ones are only available when the correct system was indicated in the TARGET
|
|
variable during the build. The "epoll" mechanism is highly recommended on
|
|
Linux, and the kqueue mechanism is highly recommended on BSD. Lacking them
|
|
will result in poll() or even select() being used, causing a high CPU usage
|
|
when dealing with a lot of connections.
|
|
|
|
|
|
4. Stopping and restarting HAProxy
|
|
----------------------------------
|
|
|
|
HAProxy supports a graceful and a hard stop. The hard stop is simple, when the
|
|
SIGTERM signal is sent to the haproxy process, it immediately quits and all
|
|
established connections are closed. The graceful stop is triggered when the
|
|
SIGUSR1 signal is sent to the haproxy process. It consists in only unbinding
|
|
from listening ports, but continue to process existing connections until they
|
|
close. Once the last connection is closed, the process leaves.
|
|
|
|
The hard stop method is used for the "stop" or "restart" actions of the service
|
|
management script. The graceful stop is used for the "reload" action which
|
|
tries to seamlessly reload a new configuration in a new process.
|
|
|
|
Both of these signals may be sent by the new haproxy process itself during a
|
|
reload or restart, so that they are sent at the latest possible moment and only
|
|
if absolutely required. This is what is performed by the "-st" (hard) and "-sf"
|
|
(graceful) options respectively.
|
|
|
|
To understand better how these signals are used, it is important to understand
|
|
the whole restart mechanism.
|
|
|
|
First, an existing haproxy process is running. The administrator uses a system
|
|
specific command such as "/etc/init.d/haproxy reload" to indicate he wants to
|
|
take the new configuration file into effect. What happens then is the following.
|
|
First, the service script (/etc/init.d/haproxy or equivalent) will verify that
|
|
the configuration file parses correctly using "haproxy -c". After that it will
|
|
try to start haproxy with this configuration file, using "-st" or "-sf".
|
|
|
|
Then HAProxy tries to bind to all listening ports. If some fatal errors happen
|
|
(eg: address not present on the system, permission denied), the process quits
|
|
with an error. If a socket binding fails because a port is already in use, then
|
|
the process will first send a SIGTTOU signal to all the pids specified in the
|
|
"-st" or "-sf" pid list. This is what is called the "pause" signal. It instructs
|
|
all existing haproxy processes to temporarily stop listening to their ports so
|
|
that the new process can try to bind again. During this time, the old process
|
|
continues to process existing connections. If the binding still fails (because
|
|
for example a port is shared with another daemon), then the new process sends a
|
|
SIGTTIN signal to the old processes to instruct them to resume operations just
|
|
as if nothing happened. The old processes will then restart listening to the
|
|
ports and continue to accept connections. Not that this mechanism is system
|
|
dependent and some operating systems may not support it in multi-process mode.
|
|
|
|
If the new process manages to bind correctly to all ports, then it sends either
|
|
the SIGTERM (hard stop in case of "-st") or the SIGUSR1 (graceful stop in case
|
|
of "-sf") to all processes to notify them that it is now in charge of operations
|
|
and that the old processes will have to leave, either immediately or once they
|
|
have finished their job.
|
|
|
|
It is important to note that during this timeframe, there are two small windows
|
|
of a few milliseconds each where it is possible that a few connection failures
|
|
will be noticed during high loads. Typically observed failure rates are around
|
|
1 failure during a reload operation every 10000 new connections per second,
|
|
which means that a heavily loaded site running at 30000 new connections per
|
|
second may see about 3 failed connection upon every reload. The two situations
|
|
where this happens are :
|
|
|
|
- if the new process fails to bind due to the presence of the old process,
|
|
it will first have to go through the SIGTTOU+SIGTTIN sequence, which
|
|
typically lasts about one millisecond for a few tens of frontends, and
|
|
during which some ports will not be bound to the old process and not yet
|
|
bound to the new one. HAProxy works around this on systems that support the
|
|
SO_REUSEPORT socket options, as it allows the new process to bind without
|
|
first asking the old one to unbind. Most BSD systems have been supporting
|
|
this almost forever. Linux has been supporting this in version 2.0 and
|
|
dropped it around 2.2, but some patches were floating around by then. It
|
|
was reintroduced in kernel 3.9, so if you are observing a connection
|
|
failure rate above the one mentioned above, please ensure that your kernel
|
|
is 3.9 or newer, or that relevant patches were backported to your kernel
|
|
(less likely).
|
|
|
|
- when the old processes close the listening ports, the kernel may not always
|
|
redistribute any pending connection that was remaining in the socket's
|
|
backlog. Under high loads, a SYN packet may happen just before the socket
|
|
is closed, and will lead to an RST packet being sent to the client. In some
|
|
critical environments where even one drop is not acceptable, these ones are
|
|
sometimes dealt with using firewall rules to block SYN packets during the
|
|
reload, forcing the client to retransmit. This is totally system-dependent,
|
|
as some systems might be able to visit other listening queues and avoid
|
|
this RST. A second case concerns the ACK from the client on a local socket
|
|
that was in SYN_RECV state just before the close. This ACK will lead to an
|
|
RST packet while the haproxy process is still not aware of it. This one is
|
|
harder to get rid of, though the firewall filtering rules mentioned above
|
|
will work well if applied one second or so before restarting the process.
|
|
|
|
For the vast majority of users, such drops will never ever happen since they
|
|
don't have enough load to trigger the race conditions. And for most high traffic
|
|
users, the failure rate is still fairly within the noise margin provided that at
|
|
least SO_REUSEPORT is properly supported on their systems.
|
|
|
|
|
|
5. File-descriptor limitations
|
|
------------------------------
|
|
|
|
In order to ensure that all incoming connections will successfully be served,
|
|
HAProxy computes at load time the total number of file descriptors that will be
|
|
needed during the process's life. A regular Unix process is generally granted
|
|
1024 file descriptors by default, and a privileged process can raise this limit
|
|
itself. This is one reason for starting HAProxy as root and letting it adjust
|
|
the limit. The default limit of 1024 file descriptors roughly allow about 500
|
|
concurrent connections to be processed. The computation is based on the global
|
|
maxconn parameter which limits the total number of connections per process, the
|
|
number of listeners, the number of servers which have a health check enabled,
|
|
the agent checks, the peers, the loggers and possibly a few other technical
|
|
requirements. A simple rough estimate of this number consists in simply
|
|
doubling the maxconn value and adding a few tens to get the approximate number
|
|
of file descriptors needed.
|
|
|
|
Originally HAProxy did not know how to compute this value, and it was necessary
|
|
to pass the value using the "ulimit-n" setting in the global section. This
|
|
explains why even today a lot of configurations are seen with this setting
|
|
present. Unfortunately it was often miscalculated resulting in connection
|
|
failures when approaching maxconn instead of throttling incoming connection
|
|
while waiting for the needed resources. For this reason it is important to
|
|
remove any vestigial "ulimit-n" setting that can remain from very old versions.
|
|
|
|
Raising the number of file descriptors to accept even moderate loads is
|
|
mandatory but comes with some OS-specific adjustments. First, the select()
|
|
polling system is limited to 1024 file descriptors. In fact on Linux it used
|
|
to be capable of handling more but since certain OS ship with excessively
|
|
restrictive SELinux policies forbidding the use of select() with more than
|
|
1024 file descriptors, HAProxy now refuses to start in this case in order to
|
|
avoid any issue at run time. On all supported operating systems, poll() is
|
|
available and will not suffer from this limitation. It is automatically picked
|
|
so there is nothing to do to get a working configuration. But poll's becomes
|
|
very slow when the number of file descriptors increases. While HAProxy does its
|
|
best to limit this performance impact (eg: via the use of the internal file
|
|
descriptor cache and batched processing), a good rule of thumb is that using
|
|
poll() with more than a thousand concurrent connections will use a lot of CPU.
|
|
|
|
For Linux systems base on kernels 2.6 and above, the epoll() system call will
|
|
be used. It's a much more scalable mechanism relying on callbacks in the kernel
|
|
that guarantee a constant wake up time regardless of the number of registered
|
|
monitored file descriptors. It is automatically used where detected, provided
|
|
that HAProxy had been built for one of the Linux flavors. Its presence and
|
|
support can be verified using "haproxy -vv".
|
|
|
|
For BSD systems which support it, kqueue() is available as an alternative. It
|
|
is much faster than poll() and even slightly faster than epoll() thanks to its
|
|
batched handling of changes. At least FreeBSD and OpenBSD support it. Just like
|
|
with Linux's epoll(), its support and availability are reported in the output
|
|
of "haproxy -vv".
|
|
|
|
Having a good poller is one thing, but it is mandatory that the process can
|
|
reach the limits. When HAProxy starts, it immediately sets the new process's
|
|
file descriptor limits and verifies if it succeeds. In case of failure, it
|
|
reports it before forking so that the administrator can see the problem. As
|
|
long as the process is started by as root, there should be no reason for this
|
|
setting to fail. However, it can fail if the process is started by an
|
|
unprivileged user. If there is a compelling reason for *not* starting haproxy
|
|
as root (eg: started by end users, or by a per-application account), then the
|
|
file descriptor limit can be raised by the system administrator for this
|
|
specific user. The effectiveness of the setting can be verified by issuing
|
|
"ulimit -n" from the user's command line. It should reflect the new limit.
|
|
|
|
Warning: when an unprivileged user's limits are changed in this user's account,
|
|
it is fairly common that these values are only considered when the user logs in
|
|
and not at all in some scripts run at system boot time nor in crontabs. This is
|
|
totally dependent on the operating system, keep in mind to check "ulimit -n"
|
|
before starting haproxy when running this way. The general advice is never to
|
|
start haproxy as an unprivileged user for production purposes. Another good
|
|
reason is that it prevents haproxy from enabling some security protections.
|
|
|
|
Once it is certain that the system will allow the haproxy process to use the
|
|
requested number of file descriptors, two new system-specific limits may be
|
|
encountered. The first one is the system-wide file descriptor limit, which is
|
|
the total number of file descriptors opened on the system, covering all
|
|
processes. When this limit is reached, accept() or socket() will typically
|
|
return ENFILE. The second one is the per-process hard limit on the number of
|
|
file descriptors, it prevents setrlimit() from being set higher. Both are very
|
|
dependent on the operating system. On Linux, the system limit is set at boot
|
|
based on the amount of memory. It can be changed with the "fs.file-max" sysctl.
|
|
And the per-process hard limit is set to 1048576 by default, but it can be
|
|
changed using the "fs.nr_open" sysctl.
|
|
|
|
File descriptor limitations may be observed on a running process when they are
|
|
set too low. The strace utility will report that accept() and socket() return
|
|
"-1 EMFILE" when the process's limits have been reached. In this case, simply
|
|
raising the "ulimit-n" value (or removing it) will solve the problem. If these
|
|
system calls return "-1 ENFILE" then it means that the kernel's limits have
|
|
been reached and that something must be done on a system-wide parameter. These
|
|
trouble must absolutely be addressed, as they result in high CPU usage (when
|
|
accept() fails) and failed connections that are generally visible to the user.
|
|
One solution also consists in lowering the global maxconn value to enforce
|
|
serialization, and possibly to disable HTTP keep-alive to force connections
|
|
to be released and reused faster.
|
|
|
|
|
|
6. Memory management
|
|
--------------------
|
|
|
|
HAProxy uses a simple and fast pool-based memory management. Since it relies on
|
|
a small number of different object types, it's much more efficient to pick new
|
|
objects from a pool which already contains objects of the appropriate size than
|
|
to call malloc() for each different size. The pools are organized as a stack or
|
|
LIFO, so that newly allocated objects are taken from recently released objects
|
|
still hot in the CPU caches. Pools of similar sizes are merged together, in
|
|
order to limit memory fragmentation.
|
|
|
|
By default, since the focus is set on performance, each released object is put
|
|
back into the pool it came from, and allocated objects are never freed since
|
|
they are expected to be reused very soon.
|
|
|
|
On the CLI, it is possible to check how memory is being used in pools thanks to
|
|
the "show pools" command :
|
|
|
|
> show pools
|
|
Dumping pools usage. Use SIGQUIT to flush them.
|
|
- Pool pipe (32 bytes) : 5 allocated (160 bytes), 5 used, 3 users [SHARED]
|
|
- Pool hlua_com (48 bytes) : 0 allocated (0 bytes), 0 used, 1 users [SHARED]
|
|
- Pool vars (64 bytes) : 0 allocated (0 bytes), 0 used, 2 users [SHARED]
|
|
- Pool task (112 bytes) : 5 allocated (560 bytes), 5 used, 1 users [SHARED]
|
|
- Pool session (128 bytes) : 1 allocated (128 bytes), 1 used, 2 users [SHARED]
|
|
- Pool http_txn (272 bytes) : 0 allocated (0 bytes), 0 used, 1 users [SHARED]
|
|
- Pool connection (352 bytes) : 2 allocated (704 bytes), 2 used, 1 users [SHARED]
|
|
- Pool hdr_idx (416 bytes) : 0 allocated (0 bytes), 0 used, 1 users [SHARED]
|
|
- Pool stream (864 bytes) : 1 allocated (864 bytes), 1 used, 1 users [SHARED]
|
|
- Pool requri (1024 bytes) : 0 allocated (0 bytes), 0 used, 1 users [SHARED]
|
|
- Pool buffer (8064 bytes) : 3 allocated (24192 bytes), 2 used, 1 users [SHARED]
|
|
Total: 11 pools, 26608 bytes allocated, 18544 used.
|
|
|
|
The pool name is only indicative, it's the name of the first object type using
|
|
this pool. The size in parenthesis is the object size for objects in this pool.
|
|
Object sizes are always rounded up to the closest multiple of 16 bytes. The
|
|
number of objects currently allocated and the equivalent number of bytes is
|
|
reported so that it is easy to know which pool is responsible for the highest
|
|
memory usage. The number of objects currently in use is reported as well in the
|
|
"used" field. The difference between "allocated" and "used" corresponds to the
|
|
objects that have been freed and are available for immediate use.
|
|
|
|
It is possible to limit the amount of memory allocated per process using the
|
|
"-m" command line option, followed by a number of megabytes. It covers all of
|
|
the process's addressable space, so that includes memory used by some libraries
|
|
as well as the stack, but it is a reliable limit when building a resource
|
|
constrained system. It works the same way as "ulimit -v" on systems which have
|
|
it, or "ulimit -d" for the other ones.
|
|
|
|
If a memory allocation fails due to the memory limit being reached or because
|
|
the system doesn't have any enough memory, then haproxy will first start to
|
|
free all available objects from all pools before attempting to allocate memory
|
|
again. This mechanism of releasing unused memory can be triggered by sending
|
|
the signal SIGQUIT to the haproxy process. When doing so, the pools state prior
|
|
to the flush will also be reported to stderr when the process runs in
|
|
foreground.
|
|
|
|
During a reload operation, the process switched to the graceful stop state also
|
|
automatically performs some flushes after releasing any connection so that all
|
|
possible memory is released to save it for the new process.
|
|
|
|
|
|
7. CPU usage
|
|
------------
|
|
|
|
HAProxy normally spends most of its time in the system and a smaller part in
|
|
userland. A finely tuned 3.5 GHz CPU can sustain a rate about 80000 end-to-end
|
|
connection setups and closes per second at 100% CPU on a single core. When one
|
|
core is saturated, typical figures are :
|
|
- 95% system, 5% user for long TCP connections or large HTTP objects
|
|
- 85% system and 15% user for short TCP connections or small HTTP objects in
|
|
close mode
|
|
- 70% system and 30% user for small HTTP objects in keep-alive mode
|
|
|
|
The amount of rules processing and regular expressions will increase the user
|
|
land part. The presence of firewall rules, connection tracking, complex routing
|
|
tables in the system will instead increase the system part.
|
|
|
|
On most systems, the CPU time observed during network transfers can be cut in 4
|
|
parts :
|
|
- the interrupt part, which concerns all the processing performed upon I/O
|
|
receipt, before the target process is even known. Typically Rx packets are
|
|
accounted for in interrupt. On some systems such as Linux where interrupt
|
|
processing may be deferred to a dedicated thread, it can appear as softirq,
|
|
and the thread is called ksoftirqd/0 (for CPU 0). The CPU taking care of
|
|
this load is generally defined by the hardware settings, though in the case
|
|
of softirq it is often possible to remap the processing to another CPU.
|
|
This interrupt part will often be perceived as parasitic since it's not
|
|
associated with any process, but it actually is some processing being done
|
|
to prepare the work for the process.
|
|
|
|
- the system part, which concerns all the processing done using kernel code
|
|
called from userland. System calls are accounted as system for example. All
|
|
synchronously delivered Tx packets will be accounted for as system time. If
|
|
some packets have to be deferred due to queues filling up, they may then be
|
|
processed in interrupt context later (eg: upon receipt of an ACK opening a
|
|
TCP window).
|
|
|
|
- the user part, which exclusively runs application code in userland. HAProxy
|
|
runs exclusively in this part, though it makes heavy use of system calls.
|
|
Rules processing, regular expressions, compression, encryption all add to
|
|
the user portion of CPU consumption.
|
|
|
|
- the idle part, which is what the CPU does when there is nothing to do. For
|
|
example HAProxy waits for an incoming connection, or waits for some data to
|
|
leave, meaning the system is waiting for an ACK from the client to push
|
|
these data.
|
|
|
|
In practice regarding HAProxy's activity, it is in general reasonably accurate
|
|
(but totally inexact) to consider that interrupt/softirq are caused by Rx
|
|
processing in kernel drivers, that user-land is caused by layer 7 processing
|
|
in HAProxy, and that system time is caused by network processing on the Tx
|
|
path.
|
|
|
|
Since HAProxy runs around an event loop, it waits for new events using poll()
|
|
(or any alternative) and processes all these events as fast as possible before
|
|
going back to poll() waiting for new events. It measures the time spent waiting
|
|
in poll() compared to the time spent doing processing events. The ratio of
|
|
polling time vs total time is called the "idle" time, it's the amount of time
|
|
spent waiting for something to happen. This ratio is reported in the stats page
|
|
on the "idle" line, or "Idle_pct" on the CLI. When it's close to 100%, it means
|
|
the load is extremely low. When it's close to 0%, it means that there is
|
|
constantly some activity. While it cannot be very accurate on an overloaded
|
|
system due to other processes possibly preempting the CPU from the haproxy
|
|
process, it still provides a good estimate about how HAProxy considers it is
|
|
working : if the load is low and the idle ratio is low as well, it may indicate
|
|
that HAProxy has a lot of work to do, possibly due to very expensive rules that
|
|
have to be processed. Conversely, if HAProxy indicates the idle is close to
|
|
100% while things are slow, it means that it cannot do anything to speed things
|
|
up because it is already waiting for incoming data to process. In the example
|
|
below, haproxy is completely idle :
|
|
|
|
$ echo "show info" | socat - /var/run/haproxy.sock | grep ^Idle
|
|
Idle_pct: 100
|
|
|
|
When the idle ratio starts to become very low, it is important to tune the
|
|
system and place processes and interrupts correctly to save the most possible
|
|
CPU resources for all tasks. If a firewall is present, it may be worth trying
|
|
to disable it or to tune it to ensure it is not responsible for a large part
|
|
of the performance limitation. It's worth noting that unloading a stateful
|
|
firewall generally reduces both the amount of interrupt/softirq and of system
|
|
usage since such firewalls act both on the Rx and the Tx paths. On Linux,
|
|
unloading the nf_conntrack and ip_conntrack modules will show whether there is
|
|
anything to gain. If so, then the module runs with default settings and you'll
|
|
have to figure how to tune it for better performance. In general this consists
|
|
in considerably increasing the hash table size. On FreeBSD, "pfctl -d" will
|
|
disable the "pf" firewall and its stateful engine at the same time.
|
|
|
|
If it is observed that a lot of time is spent in interrupt/softirq, it is
|
|
important to ensure that they don't run on the same CPU. Most systems tend to
|
|
pin the tasks on the CPU where they receive the network traffic because for
|
|
certain workloads it improves things. But with heavily network-bound workloads
|
|
it is the opposite as the haproxy process will have to fight against its kernel
|
|
counterpart. Pinning haproxy to one CPU core and the interrupts to another one,
|
|
all sharing the same L3 cache tends to sensibly increase network performance
|
|
because in practice the amount of work for haproxy and the network stack are
|
|
quite close, so they can almost fill an entire CPU each. On Linux this is done
|
|
using taskset (for haproxy) or using cpu-map (from the haproxy config), and the
|
|
interrupts are assigned under /proc/irq. Many network interfaces support
|
|
multiple queues and multiple interrupts. In general it helps to spread them
|
|
across a small number of CPU cores provided they all share the same L3 cache.
|
|
Please always stop irq_balance which always does the worst possible thing on
|
|
such workloads.
|
|
|
|
For CPU-bound workloads consisting in a lot of SSL traffic or a lot of
|
|
compression, it may be worth using multiple processes dedicated to certain
|
|
tasks, though there is no universal rule here and experimentation will have to
|
|
be performed.
|
|
|
|
In order to increase the CPU capacity, it is possible to make HAProxy run as
|
|
several processes, using the "nbproc" directive in the global section. There
|
|
are some limitations though :
|
|
- health checks are run per process, so the target servers will get as many
|
|
checks as there are running processes ;
|
|
- maxconn values and queues are per-process so the correct value must be set
|
|
to avoid overloading the servers ;
|
|
- outgoing connections should avoid using port ranges to avoid conflicts
|
|
- stick-tables are per process and are not shared between processes ;
|
|
- each peers section may only run on a single process at a time ;
|
|
- the CLI operations will only act on a single process at a time.
|
|
|
|
With this in mind, it appears that the easiest setup often consists in having
|
|
one first layer running on multiple processes and in charge for the heavy
|
|
processing, passing the traffic to a second layer running in a single process.
|
|
This mechanism is suited to SSL and compression which are the two CPU-heavy
|
|
features. Instances can easily be chained over UNIX sockets (which are cheaper
|
|
than TCP sockets and which do not waste ports), and the proxy protocol which is
|
|
useful to pass client information to the next stage. When doing so, it is
|
|
generally a good idea to bind all the single-process tasks to process number 1
|
|
and extra tasks to next processes, as this will make it easier to generate
|
|
similar configurations for different machines.
|
|
|
|
On Linux versions 3.9 and above, running HAProxy in multi-process mode is much
|
|
more efficient when each process uses a distinct listening socket on the same
|
|
IP:port ; this will make the kernel evenly distribute the load across all
|
|
processes instead of waking them all up. Please check the "process" option of
|
|
the "bind" keyword lines in the configuration manual for more information.
|
|
|
|
|
|
8. Logging
|
|
----------
|
|
|
|
For logging, HAProxy always relies on a syslog server since it does not perform
|
|
any file-system access. The standard way of using it is to send logs over UDP
|
|
to the log server (by default on port 514). Very commonly this is configured to
|
|
127.0.0.1 where the local syslog daemon is running, but it's also used over the
|
|
network to log to a central server. The central server provides additional
|
|
benefits especially in active-active scenarios where it is desirable to keep
|
|
the logs merged in arrival order. HAProxy may also make use of a UNIX socket to
|
|
send its logs to the local syslog daemon, but it is not recommended at all,
|
|
because if the syslog server is restarted while haproxy runs, the socket will
|
|
be replaced and new logs will be lost. Since HAProxy will be isolated inside a
|
|
chroot jail, it will not have the ability to reconnect to the new socket. It
|
|
has also been observed in field that the log buffers in use on UNIX sockets are
|
|
very small and lead to lost messages even at very light loads. But this can be
|
|
fine for testing however.
|
|
|
|
It is recommended to add the following directive to the "global" section to
|
|
make HAProxy log to the local daemon using facility "local0" :
|
|
|
|
log 127.0.0.1:514 local0
|
|
|
|
and then to add the following one to each "defaults" section or to each frontend
|
|
and backend section :
|
|
|
|
log global
|
|
|
|
This way, all logs will be centralized through the global definition of where
|
|
the log server is.
|
|
|
|
Some syslog daemons do not listen to UDP traffic by default, so depending on
|
|
the daemon being used, the syntax to enable this will vary :
|
|
|
|
- on sysklogd, you need to pass argument "-r" on the daemon's command line
|
|
so that it listens to a UDP socket for "remote" logs ; note that there is
|
|
no way to limit it to address 127.0.0.1 so it will also receive logs from
|
|
remote systems ;
|
|
|
|
- on rsyslogd, the following lines must be added to the configuration file :
|
|
|
|
$ModLoad imudp
|
|
$UDPServerAddress *
|
|
$UDPServerRun 514
|
|
|
|
- on syslog-ng, a new source can be created the following way, it then needs
|
|
to be added as a valid source in one of the "log" directives :
|
|
|
|
source s_udp {
|
|
udp(ip(127.0.0.1) port(514));
|
|
};
|
|
|
|
Please consult your syslog daemon's manual for more information. If no logs are
|
|
seen in the system's log files, please consider the following tests :
|
|
|
|
- restart haproxy. Each frontend and backend logs one line indicating it's
|
|
starting. If these logs are received, it means logs are working.
|
|
|
|
- run "strace -tt -s100 -etrace=sendmsg -p <haproxy's pid>" and perform some
|
|
activity that you expect to be logged. You should see the log messages
|
|
being sent using sendmsg() there. If they don't appear, restart using
|
|
strace on top of haproxy. If you still see no logs, it definitely means
|
|
that something is wrong in your configuration.
|
|
|
|
- run tcpdump to watch for port 514, for example on the loopback interface if
|
|
the traffic is being sent locally : "tcpdump -As0 -ni lo port 514". If the
|
|
packets are seen there, it's the proof they're sent then the syslogd daemon
|
|
needs to be troubleshooted.
|
|
|
|
While traffic logs are sent from the frontends (where the incoming connections
|
|
are accepted), backends also need to be able to send logs in order to report a
|
|
server state change consecutive to a health check. Please consult HAProxy's
|
|
configuration manual for more information regarding all possible log settings.
|
|
|
|
It is convenient to chose a facility that is not used by other daemons. HAProxy
|
|
examples often suggest "local0" for traffic logs and "local1" for admin logs
|
|
because they're never seen in field. A single facility would be enough as well.
|
|
Having separate logs is convenient for log analysis, but it's also important to
|
|
remember that logs may sometimes convey confidential information, and as such
|
|
they must not be mixed with other logs that may accidentally be handed out to
|
|
unauthorized people.
|
|
|
|
For in-field troubleshooting without impacting the server's capacity too much,
|
|
it is recommended to make use of the "halog" utility provided with HAProxy.
|
|
This is sort of a grep-like utility designed to process HAProxy log files at
|
|
a very fast data rate. Typical figures range between 1 and 2 GB of logs per
|
|
second. It is capable of extracting only certain logs (eg: search for some
|
|
classes of HTTP status codes, connection termination status, search by response
|
|
time ranges, look for errors only), count lines, limit the output to a number
|
|
of lines, and perform some more advanced statistics such as sorting servers
|
|
by response time or error counts, sorting URLs by time or count, sorting client
|
|
addresses by access count, and so on. It is pretty convenient to quickly spot
|
|
anomalies such as a bot looping on the site, and block them.
|
|
|
|
|
|
9. Statistics and monitoring
|
|
----------------------------
|
|
|
|
It is possible to query HAProxy about its status. The most commonly used
|
|
mechanism is the HTTP statistics page. This page also exposes an alternative
|
|
CSV output format for monitoring tools. The same format is provided on the
|
|
Unix socket.
|
|
|
|
|
|
9.1. CSV format
|
|
---------------
|
|
|
|
The statistics may be consulted either from the unix socket or from the HTTP
|
|
page. Both means provide a CSV format whose fields follow. The first line
|
|
begins with a sharp ('#') and has one word per comma-delimited field which
|
|
represents the title of the column. All other lines starting at the second one
|
|
use a classical CSV format using a comma as the delimiter, and the double quote
|
|
('"') as an optional text delimiter, but only if the enclosed text is ambiguous
|
|
(if it contains a quote or a comma). The double-quote character ('"') in the
|
|
text is doubled ('""'), which is the format that most tools recognize. Please
|
|
do not insert any column before these ones in order not to break tools which
|
|
use hard-coded column positions.
|
|
|
|
In brackets after each field name are the types which may have a value for
|
|
that field. The types are L (Listeners), F (Frontends), B (Backends), and
|
|
S (Servers).
|
|
|
|
0. pxname [LFBS]: proxy name
|
|
1. svname [LFBS]: service name (FRONTEND for frontend, BACKEND for backend,
|
|
any name for server/listener)
|
|
2. qcur [..BS]: current queued requests. For the backend this reports the
|
|
number queued without a server assigned.
|
|
3. qmax [..BS]: max value of qcur
|
|
4. scur [LFBS]: current sessions
|
|
5. smax [LFBS]: max sessions
|
|
6. slim [LFBS]: configured session limit
|
|
7. stot [LFBS]: cumulative number of sessions
|
|
8. bin [LFBS]: bytes in
|
|
9. bout [LFBS]: bytes out
|
|
10. dreq [LFB.]: requests denied because of security concerns.
|
|
- For tcp this is because of a matched tcp-request content rule.
|
|
- For http this is because of a matched http-request or tarpit rule.
|
|
11. dresp [LFBS]: responses denied because of security concerns.
|
|
- For http this is because of a matched http-request rule, or
|
|
"option checkcache".
|
|
12. ereq [LF..]: request errors. Some of the possible causes are:
|
|
- early termination from the client, before the request has been sent.
|
|
- read error from the client
|
|
- client timeout
|
|
- client closed connection
|
|
- various bad requests from the client.
|
|
- request was tarpitted.
|
|
13. econ [..BS]: number of requests that encountered an error trying to
|
|
connect to a backend server. The backend stat is the sum of the stat
|
|
for all servers of that backend, plus any connection errors not
|
|
associated with a particular server (such as the backend having no
|
|
active servers).
|
|
14. eresp [..BS]: response errors. srv_abrt will be counted here also.
|
|
Some other errors are:
|
|
- write error on the client socket (won't be counted for the server stat)
|
|
- failure applying filters to the response.
|
|
15. wretr [..BS]: number of times a connection to a server was retried.
|
|
16. wredis [..BS]: number of times a request was redispatched to another
|
|
server. The server value counts the number of times that server was
|
|
switched away from.
|
|
17. status [LFBS]: status (UP/DOWN/NOLB/MAINT/MAINT(via)...)
|
|
18. weight [..BS]: total weight (backend), server weight (server)
|
|
19. act [..BS]: number of active servers (backend), server is active (server)
|
|
20. bck [..BS]: number of backup servers (backend), server is backup (server)
|
|
21. chkfail [...S]: number of failed checks. (Only counts checks failed when
|
|
the server is up.)
|
|
22. chkdown [..BS]: number of UP->DOWN transitions. The backend counter counts
|
|
transitions to the whole backend being down, rather than the sum of the
|
|
counters for each server.
|
|
23. lastchg [..BS]: number of seconds since the last UP<->DOWN transition
|
|
24. downtime [..BS]: total downtime (in seconds). The value for the backend
|
|
is the downtime for the whole backend, not the sum of the server downtime.
|
|
25. qlimit [...S]: configured maxqueue for the server, or nothing in the
|
|
value is 0 (default, meaning no limit)
|
|
26. pid [LFBS]: process id (0 for first instance, 1 for second, ...)
|
|
27. iid [LFBS]: unique proxy id
|
|
28. sid [L..S]: server id (unique inside a proxy)
|
|
29. throttle [...S]: current throttle percentage for the server, when
|
|
slowstart is active, or no value if not in slowstart.
|
|
30. lbtot [..BS]: total number of times a server was selected, either for new
|
|
sessions, or when re-dispatching. The server counter is the number
|
|
of times that server was selected.
|
|
31. tracked [...S]: id of proxy/server if tracking is enabled.
|
|
32. type [LFBS]: (0=frontend, 1=backend, 2=server, 3=socket/listener)
|
|
33. rate [.FBS]: number of sessions per second over last elapsed second
|
|
34. rate_lim [.F..]: configured limit on new sessions per second
|
|
35. rate_max [.FBS]: max number of new sessions per second
|
|
36. check_status [...S]: status of last health check, one of:
|
|
UNK -> unknown
|
|
INI -> initializing
|
|
SOCKERR -> socket error
|
|
L4OK -> check passed on layer 4, no upper layers testing enabled
|
|
L4TOUT -> layer 1-4 timeout
|
|
L4CON -> layer 1-4 connection problem, for example
|
|
"Connection refused" (tcp rst) or "No route to host" (icmp)
|
|
L6OK -> check passed on layer 6
|
|
L6TOUT -> layer 6 (SSL) timeout
|
|
L6RSP -> layer 6 invalid response - protocol error
|
|
L7OK -> check passed on layer 7
|
|
L7OKC -> check conditionally passed on layer 7, for example 404 with
|
|
disable-on-404
|
|
L7TOUT -> layer 7 (HTTP/SMTP) timeout
|
|
L7RSP -> layer 7 invalid response - protocol error
|
|
L7STS -> layer 7 response error, for example HTTP 5xx
|
|
37. check_code [...S]: layer5-7 code, if available
|
|
38. check_duration [...S]: time in ms took to finish last health check
|
|
39. hrsp_1xx [.FBS]: http responses with 1xx code
|
|
40. hrsp_2xx [.FBS]: http responses with 2xx code
|
|
41. hrsp_3xx [.FBS]: http responses with 3xx code
|
|
42. hrsp_4xx [.FBS]: http responses with 4xx code
|
|
43. hrsp_5xx [.FBS]: http responses with 5xx code
|
|
44. hrsp_other [.FBS]: http responses with other codes (protocol error)
|
|
45. hanafail [...S]: failed health checks details
|
|
46. req_rate [.F..]: HTTP requests per second over last elapsed second
|
|
47. req_rate_max [.F..]: max number of HTTP requests per second observed
|
|
48. req_tot [.F..]: total number of HTTP requests received
|
|
49. cli_abrt [..BS]: number of data transfers aborted by the client
|
|
50. srv_abrt [..BS]: number of data transfers aborted by the server
|
|
(inc. in eresp)
|
|
51. comp_in [.FB.]: number of HTTP response bytes fed to the compressor
|
|
52. comp_out [.FB.]: number of HTTP response bytes emitted by the compressor
|
|
53. comp_byp [.FB.]: number of bytes that bypassed the HTTP compressor
|
|
(CPU/BW limit)
|
|
54. comp_rsp [.FB.]: number of HTTP responses that were compressed
|
|
55. lastsess [..BS]: number of seconds since last session assigned to
|
|
server/backend
|
|
56. last_chk [...S]: last health check contents or textual error
|
|
57. last_agt [...S]: last agent check contents or textual error
|
|
58. qtime [..BS]: the average queue time in ms over the 1024 last requests
|
|
59. ctime [..BS]: the average connect time in ms over the 1024 last requests
|
|
60. rtime [..BS]: the average response time in ms over the 1024 last requests
|
|
(0 for TCP)
|
|
61. ttime [..BS]: the average total session time in ms over the 1024 last
|
|
requests
|
|
62. agent_status [...S]: status of last agent check, one of:
|
|
UNK -> unknown
|
|
INI -> initializing
|
|
SOCKERR -> socket error
|
|
L4OK -> check passed on layer 4, no upper layers testing enabled
|
|
L4TOUT -> layer 1-4 timeout
|
|
L4CON -> layer 1-4 connection problem, for example
|
|
"Connection refused" (tcp rst) or "No route to host" (icmp)
|
|
L7OK -> agent reported "up"
|
|
L7STS -> agent reported "fail", "stop", or "down"
|
|
63. agent_code [...S]: numeric code reported by agent if any (unused for now)
|
|
64. agent_duration [...S]: time in ms taken to finish last check
|
|
65. check_desc [...S]: short human-readable description of check_status
|
|
66. agent_desc [...S]: short human-readable description of agent_status
|
|
67. check_rise [...S]: server's "rise" parameter used by checks
|
|
68. check_fall [...S]: server's "fall" parameter used by checks
|
|
69. check_health [...S]: server's health check value between 0 and rise+fall-1
|
|
70. agent_rise [...S]: agent's "rise" parameter, normally 1
|
|
71. agent_fall [...S]: agent's "fall" parameter, normally 1
|
|
72. agent_health [...S]: agent's health parameter, between 0 and rise+fall-1
|
|
73. addr [L..S]: address:port or "unix". IPv6 has brackets around the address.
|
|
74: cookie [..BS]: server's cookie value or backend's cookie name
|
|
75: mode [LFBS]: proxy mode (tcp, http, health, unknown)
|
|
76: algo [..B.]: load balancing algorithm
|
|
77: conn_rate [.F..]: number of connections over the last elapsed second
|
|
78: conn_rate_max [.F..]: highest known conn_rate
|
|
79: conn_tot [.F..]: cumulative number of connections
|
|
80: intercepted [.FB.]: cum. number of intercepted requests (monitor, stats)
|
|
|
|
|
|
9.2) Typed output format
|
|
------------------------
|
|
|
|
Both "show info" and "show stat" support a mode where each output value comes
|
|
with its type and sufficient information to know how the value is supposed to
|
|
be aggregated between processes and how it evolves.
|
|
|
|
In all cases, the output consists in having a single value per line with all
|
|
the information split into fields delimited by colons (':').
|
|
|
|
The first column designates the object or metric being dumped. Its format is
|
|
specific to the command producing this output and will not be described in this
|
|
section. Usually it will consist in a series of identifiers and field names.
|
|
|
|
The second column contains 3 characters respectively indicating the origin, the
|
|
nature and the scope of the value being reported. The first character (the
|
|
origin) indicates where the value was extracted from. Possible characters are :
|
|
|
|
M The value is a metric. It is valid at one instant any may change depending
|
|
on its nature .
|
|
|
|
S The value is a status. It represents a discrete value which by definition
|
|
cannot be aggregated. It may be the status of a server ("UP" or "DOWN"),
|
|
the PID of the process, etc.
|
|
|
|
K The value is a sorting key. It represents an identifier which may be used
|
|
to group some values together because it is unique among its class. All
|
|
internal identifiers are keys. Some names can be listed as keys if they
|
|
are unique (eg: a frontend name is unique). In general keys come from the
|
|
configuration, even though some of them may automatically be assigned. For
|
|
most purposes keys may be considered as equivalent to configuration.
|
|
|
|
C The value comes from the configuration. Certain configuration values make
|
|
sense on the output, for example a concurrent connection limit or a cookie
|
|
name. By definition these values are the same in all processes started
|
|
from the same configuration file.
|
|
|
|
P The value comes from the product itself. There are very few such values,
|
|
most common use is to report the product name, version and release date.
|
|
These elements are also the same between all processes.
|
|
|
|
The second character (the nature) indicates the nature of the information
|
|
carried by the field in order to let an aggregator decide on what operation to
|
|
use to aggregate multiple values. Possible characters are :
|
|
|
|
A The value represents an age since a last event. This is a bit different
|
|
from the duration in that an age is automatically computed based on the
|
|
current date. A typical example is how long ago did the last session
|
|
happen on a server. Ages are generally aggregated by taking the minimum
|
|
value and do not need to be stored.
|
|
|
|
a The value represents an already averaged value. The average response times
|
|
and server weights are of this nature. Averages can typically be averaged
|
|
between processes.
|
|
|
|
C The value represents a cumulative counter. Such measures perpetually
|
|
increase until they wrap around. Some monitoring protocols need to tell
|
|
the difference between a counter and a gauge to report a different type.
|
|
In general counters may simply be summed since they represent events or
|
|
volumes. Examples of metrics of this nature are connection counts or byte
|
|
counts.
|
|
|
|
D The value represents a duration for a status. There are a few usages of
|
|
this, most of them include the time taken by the last health check and
|
|
the time a server has spent down. Durations are generally not summed,
|
|
most of the time the maximum will be retained to compute an SLA.
|
|
|
|
G The value represents a gauge. It's a measure at one instant. The memory
|
|
usage or the current number of active connections are of this nature.
|
|
Metrics of this type are typically summed during aggregation.
|
|
|
|
L The value represents a limit (generally a configured one). By nature,
|
|
limits are harder to aggregate since they are specific to the point where
|
|
they were retrieved. In certain situations they may be summed or be kept
|
|
separate.
|
|
|
|
M The value represents a maximum. In general it will apply to a gauge and
|
|
keep the highest known value. An example of such a metric could be the
|
|
maximum amount of concurrent connections that was encountered in the
|
|
product's life time. To correctly aggregate maxima, you are supposed to
|
|
output a range going from the maximum of all maxima and the sum of all
|
|
of them. There is indeed no way to know if they were encountered
|
|
simultaneously or not.
|
|
|
|
m The value represents a minimum. In general it will apply to a gauge and
|
|
keep the lowest known value. An example of such a metric could be the
|
|
minimum amount of free memory pools that was encountered in the product's
|
|
life time. To correctly aggregate minima, you are supposed to output a
|
|
range going from the minimum of all minima and the sum of all of them.
|
|
There is indeed no way to know if they were encountered simultaneously
|
|
or not.
|
|
|
|
N The value represents a name, so it is a string. It is used to report
|
|
proxy names, server names and cookie names. Names have configuration or
|
|
keys as their origin and are supposed to be the same among all processes.
|
|
|
|
O The value represents a free text output. Outputs from various commands,
|
|
returns from health checks, node descriptions are of such nature.
|
|
|
|
R The value represents an event rate. It's a measure at one instant. It is
|
|
quite similar to a gauge except that the recipient knows that this measure
|
|
moves slowly and may decide not to keep all values. An example of such a
|
|
metric is the measured amount of connections per second. Metrics of this
|
|
type are typically summed during aggregation.
|
|
|
|
T The value represents a date or time. A field emitting the current date
|
|
would be of this type. The method to aggregate such information is left
|
|
as an implementation choice. For now no field uses this type.
|
|
|
|
The third character (the scope) indicates what extent the value reflects. Some
|
|
elements may be per process while others may be per configuration or per system.
|
|
The distinction is important to know whether or not a single value should be
|
|
kept during aggregation or if values have to be aggregated. The following
|
|
characters are currently supported :
|
|
|
|
C The value is valid for a whole cluster of nodes, which is the set of nodes
|
|
communicating over the peers protocol. An example could be the amount of
|
|
entries present in a stick table that is replicated with other peers. At
|
|
the moment no metric use this scope.
|
|
|
|
P The value is valid only for the process reporting it. Most metrics use
|
|
this scope.
|
|
|
|
S The value is valid for the whole service, which is the set of processes
|
|
started together from the same configuration file. All metrics originating
|
|
from the configuration use this scope. Some other metrics may use it as
|
|
well for some shared resources (eg: shared SSL cache statistics).
|
|
|
|
s The value is valid for the whole system, such as the system's hostname,
|
|
current date or resource usage. At the moment this scope is not used by
|
|
any metric.
|
|
|
|
Consumers of these information will generally have enough of these 3 characters
|
|
to determine how to accurately report aggregated information across multiple
|
|
processes.
|
|
|
|
After this column, the third column indicates the type of the field, among "s32"
|
|
(signed 32-bit integer), "s64" (signed 64-bit integer), "u32" (unsigned 32-bit
|
|
integer), "u64" (unsigned 64-bit integer), "str" (string). It is important to
|
|
know the type before parsing the value in order to properly read it. For example
|
|
a string containing only digits is still a string an not an integer (eg: an
|
|
error code extracted by a check).
|
|
|
|
Then the fourth column is the value itself, encoded according to its type.
|
|
Strings are dumped as-is immediately after the colon without any leading space.
|
|
If a string contains a colon, it will appear normally. This means that the
|
|
output should not be exclusively split around colons or some check outputs
|
|
or server addresses might be truncated.
|
|
|
|
|
|
9.3. Unix Socket commands
|
|
-------------------------
|
|
|
|
The stats socket is not enabled by default. In order to enable it, it is
|
|
necessary to add one line in the global section of the haproxy configuration.
|
|
A second line is recommended to set a larger timeout, always appreciated when
|
|
issuing commands by hand :
|
|
|
|
global
|
|
stats socket /var/run/haproxy.sock mode 600 level admin
|
|
stats timeout 2m
|
|
|
|
It is also possible to add multiple instances of the stats socket by repeating
|
|
the line, and make them listen to a TCP port instead of a UNIX socket. This is
|
|
never done by default because this is dangerous, but can be handy in some
|
|
situations :
|
|
|
|
global
|
|
stats socket /var/run/haproxy.sock mode 600 level admin
|
|
stats socket ipv4@192.168.0.1:9999 level admin
|
|
stats timeout 2m
|
|
|
|
To access the socket, an external utility such as "socat" is required. Socat is
|
|
a swiss-army knife to connect anything to anything. We use it to connect
|
|
terminals to the socket, or a couple of stdin/stdout pipes to it for scripts.
|
|
The two main syntaxes we'll use are the following :
|
|
|
|
# socat /var/run/haproxy.sock stdio
|
|
# socat /var/run/haproxy.sock readline
|
|
|
|
The first one is used with scripts. It is possible to send the output of a
|
|
script to haproxy, and pass haproxy's output to another script. That's useful
|
|
for retrieving counters or attack traces for example.
|
|
|
|
The second one is only useful for issuing commands by hand. It has the benefit
|
|
that the terminal is handled by the readline library which supports line
|
|
editing and history, which is very convenient when issuing repeated commands
|
|
(eg: watch a counter).
|
|
|
|
The socket supports two operation modes :
|
|
- interactive
|
|
- non-interactive
|
|
|
|
The non-interactive mode is the default when socat connects to the socket. In
|
|
this mode, a single line may be sent. It is processed as a whole, responses are
|
|
sent back, and the connection closes after the end of the response. This is the
|
|
mode that scripts and monitoring tools use. It is possible to send multiple
|
|
commands in this mode, they need to be delimited by a semi-colon (';'). For
|
|
example :
|
|
|
|
# echo "show info;show stat;show table" | socat /var/run/haproxy stdio
|
|
|
|
The interactive mode displays a prompt ('>') and waits for commands to be
|
|
entered on the line, then processes them, and displays the prompt again to wait
|
|
for a new command. This mode is entered via the "prompt" command which must be
|
|
sent on the first line in non-interactive mode. The mode is a flip switch, if
|
|
"prompt" is sent in interactive mode, it is disabled and the connection closes
|
|
after processing the last command of the same line.
|
|
|
|
For this reason, when debugging by hand, it's quite common to start with the
|
|
"prompt" command :
|
|
|
|
# socat /var/run/haproxy readline
|
|
prompt
|
|
> show info
|
|
...
|
|
>
|
|
|
|
Since multiple commands may be issued at once, haproxy uses the empty line as a
|
|
delimiter to mark an end of output for each command, and takes care of ensuring
|
|
that no command can emit an empty line on output. A script can thus easily
|
|
parse the output even when multiple commands were pipelined on a single line.
|
|
|
|
It is important to understand that when multiple haproxy processes are started
|
|
on the same sockets, any process may pick up the request and will output its
|
|
own stats.
|
|
|
|
The list of commands currently supported on the stats socket is provided below.
|
|
If an unknown command is sent, haproxy displays the usage message which reminds
|
|
all supported commands. Some commands support a more complex syntax, generally
|
|
it will explain what part of the command is invalid when this happens.
|
|
|
|
add acl <acl> <pattern>
|
|
Add an entry into the acl <acl>. <acl> is the #<id> or the <file> returned by
|
|
"show acl". This command does not verify if the entry already exists. This
|
|
command cannot be used if the reference <acl> is a file also used with a map.
|
|
In this case, you must use the command "add map" in place of "add acl".
|
|
|
|
add map <map> <key> <value>
|
|
Add an entry into the map <map> to associate the value <value> to the key
|
|
<key>. This command does not verify if the entry already exists. It is
|
|
mainly used to fill a map after a clear operation. Note that if the reference
|
|
<map> is a file and is shared with a map, this map will contain also a new
|
|
pattern entry.
|
|
|
|
clear counters
|
|
Clear the max values of the statistics counters in each proxy (frontend &
|
|
backend) and in each server. The accumulated counters are not affected. This
|
|
can be used to get clean counters after an incident, without having to
|
|
restart nor to clear traffic counters. This command is restricted and can
|
|
only be issued on sockets configured for levels "operator" or "admin".
|
|
|
|
clear counters all
|
|
Clear all statistics counters in each proxy (frontend & backend) and in each
|
|
server. This has the same effect as restarting. This command is restricted
|
|
and can only be issued on sockets configured for level "admin".
|
|
|
|
clear acl <acl>
|
|
Remove all entries from the acl <acl>. <acl> is the #<id> or the <file>
|
|
returned by "show acl". Note that if the reference <acl> is a file and is
|
|
shared with a map, this map will be also cleared.
|
|
|
|
clear map <map>
|
|
Remove all entries from the map <map>. <map> is the #<id> or the <file>
|
|
returned by "show map". Note that if the reference <map> is a file and is
|
|
shared with a acl, this acl will be also cleared.
|
|
|
|
clear table <table> [ data.<type> <operator> <value> ] | [ key <key> ]
|
|
Remove entries from the stick-table <table>.
|
|
|
|
This is typically used to unblock some users complaining they have been
|
|
abusively denied access to a service, but this can also be used to clear some
|
|
stickiness entries matching a server that is going to be replaced (see "show
|
|
table" below for details). Note that sometimes, removal of an entry will be
|
|
refused because it is currently tracked by a session. Retrying a few seconds
|
|
later after the session ends is usual enough.
|
|
|
|
In the case where no options arguments are given all entries will be removed.
|
|
|
|
When the "data." form is used entries matching a filter applied using the
|
|
stored data (see "stick-table" in section 4.2) are removed. A stored data
|
|
type must be specified in <type>, and this data type must be stored in the
|
|
table otherwise an error is reported. The data is compared according to
|
|
<operator> with the 64-bit integer <value>. Operators are the same as with
|
|
the ACLs :
|
|
|
|
- eq : match entries whose data is equal to this value
|
|
- ne : match entries whose data is not equal to this value
|
|
- le : match entries whose data is less than or equal to this value
|
|
- ge : match entries whose data is greater than or equal to this value
|
|
- lt : match entries whose data is less than this value
|
|
- gt : match entries whose data is greater than this value
|
|
|
|
When the key form is used the entry <key> is removed. The key must be of the
|
|
same type as the table, which currently is limited to IPv4, IPv6, integer and
|
|
string.
|
|
|
|
Example :
|
|
$ echo "show table http_proxy" | socat stdio /tmp/sock1
|
|
>>> # table: http_proxy, type: ip, size:204800, used:2
|
|
>>> 0x80e6a4c: key=127.0.0.1 use=0 exp=3594729 gpc0=0 conn_rate(30000)=1 \
|
|
bytes_out_rate(60000)=187
|
|
>>> 0x80e6a80: key=127.0.0.2 use=0 exp=3594740 gpc0=1 conn_rate(30000)=10 \
|
|
bytes_out_rate(60000)=191
|
|
|
|
$ echo "clear table http_proxy key 127.0.0.1" | socat stdio /tmp/sock1
|
|
|
|
$ echo "show table http_proxy" | socat stdio /tmp/sock1
|
|
>>> # table: http_proxy, type: ip, size:204800, used:1
|
|
>>> 0x80e6a80: key=127.0.0.2 use=0 exp=3594740 gpc0=1 conn_rate(30000)=10 \
|
|
bytes_out_rate(60000)=191
|
|
$ echo "clear table http_proxy data.gpc0 eq 1" | socat stdio /tmp/sock1
|
|
$ echo "show table http_proxy" | socat stdio /tmp/sock1
|
|
>>> # table: http_proxy, type: ip, size:204800, used:1
|
|
|
|
del acl <acl> [<key>|#<ref>]
|
|
Delete all the acl entries from the acl <acl> corresponding to the key <key>.
|
|
<acl> is the #<id> or the <file> returned by "show acl". If the <ref> is used,
|
|
this command delete only the listed reference. The reference can be found with
|
|
listing the content of the acl. Note that if the reference <acl> is a file and
|
|
is shared with a map, the entry will be also deleted in the map.
|
|
|
|
del map <map> [<key>|#<ref>]
|
|
Delete all the map entries from the map <map> corresponding to the key <key>.
|
|
<map> is the #<id> or the <file> returned by "show map". If the <ref> is used,
|
|
this command delete only the listed reference. The reference can be found with
|
|
listing the content of the map. Note that if the reference <map> is a file and
|
|
is shared with a acl, the entry will be also deleted in the map.
|
|
|
|
disable agent <backend>/<server>
|
|
Mark the auxiliary agent check as temporarily stopped.
|
|
|
|
In the case where an agent check is being run as a auxiliary check, due
|
|
to the agent-check parameter of a server directive, new checks are only
|
|
initialized when the agent is in the enabled. Thus, disable agent will
|
|
prevent any new agent checks from begin initiated until the agent
|
|
re-enabled using enable agent.
|
|
|
|
When an agent is disabled the processing of an auxiliary agent check that
|
|
was initiated while the agent was set as enabled is as follows: All
|
|
results that would alter the weight, specifically "drain" or a weight
|
|
returned by the agent, are ignored. The processing of agent check is
|
|
otherwise unchanged.
|
|
|
|
The motivation for this feature is to allow the weight changing effects
|
|
of the agent checks to be paused to allow the weight of a server to be
|
|
configured using set weight without being overridden by the agent.
|
|
|
|
This command is restricted and can only be issued on sockets configured for
|
|
level "admin".
|
|
|
|
disable frontend <frontend>
|
|
Mark the frontend as temporarily stopped. This corresponds to the mode which
|
|
is used during a soft restart : the frontend releases the port but can be
|
|
enabled again if needed. This should be used with care as some non-Linux OSes
|
|
are unable to enable it back. This is intended to be used in environments
|
|
where stopping a proxy is not even imaginable but a misconfigured proxy must
|
|
be fixed. That way it's possible to release the port and bind it into another
|
|
process to restore operations. The frontend will appear with status "STOP"
|
|
on the stats page.
|
|
|
|
The frontend may be specified either by its name or by its numeric ID,
|
|
prefixed with a sharp ('#').
|
|
|
|
This command is restricted and can only be issued on sockets configured for
|
|
level "admin".
|
|
|
|
disable health <backend>/<server>
|
|
Mark the primary health check as temporarily stopped. This will disable
|
|
sending of health checks, and the last health check result will be ignored.
|
|
The server will be in unchecked state and considered UP unless an auxiliary
|
|
agent check forces it down.
|
|
|
|
This command is restricted and can only be issued on sockets configured for
|
|
level "admin".
|
|
|
|
disable server <backend>/<server>
|
|
Mark the server DOWN for maintenance. In this mode, no more checks will be
|
|
performed on the server until it leaves maintenance.
|
|
If the server is tracked by other servers, those servers will be set to DOWN
|
|
during the maintenance.
|
|
|
|
In the statistics page, a server DOWN for maintenance will appear with a
|
|
"MAINT" status, its tracking servers with the "MAINT(via)" one.
|
|
|
|
Both the backend and the server may be specified either by their name or by
|
|
their numeric ID, prefixed with a sharp ('#').
|
|
|
|
This command is restricted and can only be issued on sockets configured for
|
|
level "admin".
|
|
|
|
enable agent <backend>/<server>
|
|
Resume auxiliary agent check that was temporarily stopped.
|
|
|
|
See "disable agent" for details of the effect of temporarily starting
|
|
and stopping an auxiliary agent.
|
|
|
|
This command is restricted and can only be issued on sockets configured for
|
|
level "admin".
|
|
|
|
enable frontend <frontend>
|
|
Resume a frontend which was temporarily stopped. It is possible that some of
|
|
the listening ports won't be able to bind anymore (eg: if another process
|
|
took them since the 'disable frontend' operation). If this happens, an error
|
|
is displayed. Some operating systems might not be able to resume a frontend
|
|
which was disabled.
|
|
|
|
The frontend may be specified either by its name or by its numeric ID,
|
|
prefixed with a sharp ('#').
|
|
|
|
This command is restricted and can only be issued on sockets configured for
|
|
level "admin".
|
|
|
|
enable health <backend>/<server>
|
|
Resume a primary health check that was temporarily stopped. This will enable
|
|
sending of health checks again. Please see "disable health" for details.
|
|
|
|
This command is restricted and can only be issued on sockets configured for
|
|
level "admin".
|
|
|
|
enable server <backend>/<server>
|
|
If the server was previously marked as DOWN for maintenance, this marks the
|
|
server UP and checks are re-enabled.
|
|
|
|
Both the backend and the server may be specified either by their name or by
|
|
their numeric ID, prefixed with a sharp ('#').
|
|
|
|
This command is restricted and can only be issued on sockets configured for
|
|
level "admin".
|
|
|
|
get map <map> <value>
|
|
get acl <acl> <value>
|
|
Lookup the value <value> in the map <map> or in the ACL <acl>. <map> or <acl>
|
|
are the #<id> or the <file> returned by "show map" or "show acl". This command
|
|
returns all the matching patterns associated with this map. This is useful for
|
|
debugging maps and ACLs. The output format is composed by one line par
|
|
matching type. Each line is composed by space-delimited series of words.
|
|
|
|
The first two words are:
|
|
|
|
<match method>: The match method applied. It can be "found", "bool",
|
|
"int", "ip", "bin", "len", "str", "beg", "sub", "dir",
|
|
"dom", "end" or "reg".
|
|
|
|
<match result>: The result. Can be "match" or "no-match".
|
|
|
|
The following words are returned only if the pattern matches an entry.
|
|
|
|
<index type>: "tree" or "list". The internal lookup algorithm.
|
|
|
|
<case>: "case-insensitive" or "case-sensitive". The
|
|
interpretation of the case.
|
|
|
|
<entry matched>: match="<entry>". Return the matched pattern. It is
|
|
useful with regular expressions.
|
|
|
|
The two last word are used to show the returned value and its type. With the
|
|
"acl" case, the pattern doesn't exist.
|
|
|
|
return=nothing: No return because there are no "map".
|
|
return="<value>": The value returned in the string format.
|
|
return=cannot-display: The value cannot be converted as string.
|
|
|
|
type="<type>": The type of the returned sample.
|
|
|
|
get weight <backend>/<server>
|
|
Report the current weight and the initial weight of server <server> in
|
|
backend <backend> or an error if either doesn't exist. The initial weight is
|
|
the one that appears in the configuration file. Both are normally equal
|
|
unless the current weight has been changed. Both the backend and the server
|
|
may be specified either by their name or by their numeric ID, prefixed with a
|
|
sharp ('#').
|
|
|
|
help
|
|
Print the list of known keywords and their basic usage. The same help screen
|
|
is also displayed for unknown commands.
|
|
|
|
prompt
|
|
Toggle the prompt at the beginning of the line and enter or leave interactive
|
|
mode. In interactive mode, the connection is not closed after a command
|
|
completes. Instead, the prompt will appear again, indicating the user that
|
|
the interpreter is waiting for a new command. The prompt consists in a right
|
|
angle bracket followed by a space "> ". This mode is particularly convenient
|
|
when one wants to periodically check information such as stats or errors.
|
|
It is also a good idea to enter interactive mode before issuing a "help"
|
|
command.
|
|
|
|
quit
|
|
Close the connection when in interactive mode.
|
|
|
|
set map <map> [<key>|#<ref>] <value>
|
|
Modify the value corresponding to each key <key> in a map <map>. <map> is the
|
|
#<id> or <file> returned by "show map". If the <ref> is used in place of
|
|
<key>, only the entry pointed by <ref> is changed. The new value is <value>.
|
|
|
|
set maxconn frontend <frontend> <value>
|
|
Dynamically change the specified frontend's maxconn setting. Any positive
|
|
value is allowed including zero, but setting values larger than the global
|
|
maxconn does not make much sense. If the limit is increased and connections
|
|
were pending, they will immediately be accepted. If it is lowered to a value
|
|
below the current number of connections, new connections acceptation will be
|
|
delayed until the threshold is reached. The frontend might be specified by
|
|
either its name or its numeric ID prefixed with a sharp ('#').
|
|
|
|
set maxconn server <backend/server> <value>
|
|
Dynamically change the specified server's maxconn setting. Any positive
|
|
value is allowed including zero, but setting values larger than the global
|
|
maxconn does not make much sense.
|
|
|
|
set maxconn global <maxconn>
|
|
Dynamically change the global maxconn setting within the range defined by the
|
|
initial global maxconn setting. If it is increased and connections were
|
|
pending, they will immediately be accepted. If it is lowered to a value below
|
|
the current number of connections, new connections acceptation will be
|
|
delayed until the threshold is reached. A value of zero restores the initial
|
|
setting.
|
|
|
|
set rate-limit connections global <value>
|
|
Change the process-wide connection rate limit, which is set by the global
|
|
'maxconnrate' setting. A value of zero disables the limitation. This limit
|
|
applies to all frontends and the change has an immediate effect. The value
|
|
is passed in number of connections per second.
|
|
|
|
set rate-limit http-compression global <value>
|
|
Change the maximum input compression rate, which is set by the global
|
|
'maxcomprate' setting. A value of zero disables the limitation. The value is
|
|
passed in number of kilobytes per second. The value is available in the "show
|
|
info" on the line "CompressBpsRateLim" in bytes.
|
|
|
|
set rate-limit sessions global <value>
|
|
Change the process-wide session rate limit, which is set by the global
|
|
'maxsessrate' setting. A value of zero disables the limitation. This limit
|
|
applies to all frontends and the change has an immediate effect. The value
|
|
is passed in number of sessions per second.
|
|
|
|
set rate-limit ssl-sessions global <value>
|
|
Change the process-wide SSL session rate limit, which is set by the global
|
|
'maxsslrate' setting. A value of zero disables the limitation. This limit
|
|
applies to all frontends and the change has an immediate effect. The value
|
|
is passed in number of sessions per second sent to the SSL stack. It applies
|
|
before the handshake in order to protect the stack against handshake abuses.
|
|
|
|
set server <backend>/<server> addr <ip4 or ip6 address>
|
|
Replace the current IP address of a server by the one provided.
|
|
|
|
set server <backend>/<server> agent [ up | down ]
|
|
Force a server's agent to a new state. This can be useful to immediately
|
|
switch a server's state regardless of some slow agent checks for example.
|
|
Note that the change is propagated to tracking servers if any.
|
|
|
|
set server <backend>/<server> health [ up | stopping | down ]
|
|
Force a server's health to a new state. This can be useful to immediately
|
|
switch a server's state regardless of some slow health checks for example.
|
|
Note that the change is propagated to tracking servers if any.
|
|
|
|
set server <backend>/<server> state [ ready | drain | maint ]
|
|
Force a server's administrative state to a new state. This can be useful to
|
|
disable load balancing and/or any traffic to a server. Setting the state to
|
|
"ready" puts the server in normal mode, and the command is the equivalent of
|
|
the "enable server" command. Setting the state to "maint" disables any traffic
|
|
to the server as well as any health checks. This is the equivalent of the
|
|
"disable server" command. Setting the mode to "drain" only removes the server
|
|
from load balancing but still allows it to be checked and to accept new
|
|
persistent connections. Changes are propagated to tracking servers if any.
|
|
|
|
set server <backend>/<server> weight <weight>[%]
|
|
Change a server's weight to the value passed in argument. This is the exact
|
|
equivalent of the "set weight" command below.
|
|
|
|
set ssl ocsp-response <response>
|
|
This command is used to update an OCSP Response for a certificate (see "crt"
|
|
on "bind" lines). Same controls are performed as during the initial loading of
|
|
the response. The <response> must be passed as a base64 encoded string of the
|
|
DER encoded response from the OCSP server.
|
|
|
|
Example:
|
|
openssl ocsp -issuer issuer.pem -cert server.pem \
|
|
-host ocsp.issuer.com:80 -respout resp.der
|
|
echo "set ssl ocsp-response $(base64 -w 10000 resp.der)" | \
|
|
socat stdio /var/run/haproxy.stat
|
|
|
|
set ssl tls-key <id> <tlskey>
|
|
Set the next TLS key for the <id> listener to <tlskey>. This key becomes the
|
|
ultimate key, while the penultimate one is used for encryption (others just
|
|
decrypt). The oldest TLS key present is overwritten. <id> is either a numeric
|
|
#<id> or <file> returned by "show tls-keys". <tlskey> is a base64 encoded 48
|
|
bit TLS ticket key (ex. openssl rand -base64 48).
|
|
|
|
set table <table> key <key> [data.<data_type> <value>]*
|
|
Create or update a stick-table entry in the table. If the key is not present,
|
|
an entry is inserted. See stick-table in section 4.2 to find all possible
|
|
values for <data_type>. The most likely use consists in dynamically entering
|
|
entries for source IP addresses, with a flag in gpc0 to dynamically block an
|
|
IP address or affect its quality of service. It is possible to pass multiple
|
|
data_types in a single call.
|
|
|
|
set timeout cli <delay>
|
|
Change the CLI interface timeout for current connection. This can be useful
|
|
during long debugging sessions where the user needs to constantly inspect
|
|
some indicators without being disconnected. The delay is passed in seconds.
|
|
|
|
set weight <backend>/<server> <weight>[%]
|
|
Change a server's weight to the value passed in argument. If the value ends
|
|
with the '%' sign, then the new weight will be relative to the initially
|
|
configured weight. Absolute weights are permitted between 0 and 256.
|
|
Relative weights must be positive with the resulting absolute weight is
|
|
capped at 256. Servers which are part of a farm running a static
|
|
load-balancing algorithm have stricter limitations because the weight
|
|
cannot change once set. Thus for these servers, the only accepted values
|
|
are 0 and 100% (or 0 and the initial weight). Changes take effect
|
|
immediately, though certain LB algorithms require a certain amount of
|
|
requests to consider changes. A typical usage of this command is to
|
|
disable a server during an update by setting its weight to zero, then to
|
|
enable it again after the update by setting it back to 100%. This command
|
|
is restricted and can only be issued on sockets configured for level
|
|
"admin". Both the backend and the server may be specified either by their
|
|
name or by their numeric ID, prefixed with a sharp ('#').
|
|
|
|
show env [<name>]
|
|
Dump one or all environment variables known by the process. Without any
|
|
argument, all variables are dumped. With an argument, only the specified
|
|
variable is dumped if it exists. Otherwise "Variable not found" is emitted.
|
|
Variables are dumped in the same format as they are stored or returned by the
|
|
"env" utility, that is, "<name>=<value>". This can be handy when debugging
|
|
certain configuration files making heavy use of environment variables to
|
|
ensure that they contain the expected values. This command is restricted and
|
|
can only be issued on sockets configured for levels "operator" or "admin".
|
|
|
|
show errors [<iid>]
|
|
Dump last known request and response errors collected by frontends and
|
|
backends. If <iid> is specified, the limit the dump to errors concerning
|
|
either frontend or backend whose ID is <iid>. This command is restricted
|
|
and can only be issued on sockets configured for levels "operator" or
|
|
"admin".
|
|
|
|
The errors which may be collected are the last request and response errors
|
|
caused by protocol violations, often due to invalid characters in header
|
|
names. The report precisely indicates what exact character violated the
|
|
protocol. Other important information such as the exact date the error was
|
|
detected, frontend and backend names, the server name (when known), the
|
|
internal session ID and the source address which has initiated the session
|
|
are reported too.
|
|
|
|
All characters are returned, and non-printable characters are encoded. The
|
|
most common ones (\t = 9, \n = 10, \r = 13 and \e = 27) are encoded as one
|
|
letter following a backslash. The backslash itself is encoded as '\\' to
|
|
avoid confusion. Other non-printable characters are encoded '\xNN' where
|
|
NN is the two-digits hexadecimal representation of the character's ASCII
|
|
code.
|
|
|
|
Lines are prefixed with the position of their first character, starting at 0
|
|
for the beginning of the buffer. At most one input line is printed per line,
|
|
and large lines will be broken into multiple consecutive output lines so that
|
|
the output never goes beyond 79 characters wide. It is easy to detect if a
|
|
line was broken, because it will not end with '\n' and the next line's offset
|
|
will be followed by a '+' sign, indicating it is a continuation of previous
|
|
line.
|
|
|
|
Example :
|
|
$ echo "show errors" | socat stdio /tmp/sock1
|
|
>>> [04/Mar/2009:15:46:56.081] backend http-in (#2) : invalid response
|
|
src 127.0.0.1, session #54, frontend fe-eth0 (#1), server s2 (#1)
|
|
response length 213 bytes, error at position 23:
|
|
|
|
00000 HTTP/1.0 200 OK\r\n
|
|
00017 header/bizarre:blah\r\n
|
|
00038 Location: blah\r\n
|
|
00054 Long-line: this is a very long line which should b
|
|
00104+ e broken into multiple lines on the output buffer,
|
|
00154+ otherwise it would be too large to print in a ter
|
|
00204+ minal\r\n
|
|
00211 \r\n
|
|
|
|
In the example above, we see that the backend "http-in" which has internal
|
|
ID 2 has blocked an invalid response from its server s2 which has internal
|
|
ID 1. The request was on session 54 initiated by source 127.0.0.1 and
|
|
received by frontend fe-eth0 whose ID is 1. The total response length was
|
|
213 bytes when the error was detected, and the error was at byte 23. This
|
|
is the slash ('/') in header name "header/bizarre", which is not a valid
|
|
HTTP character for a header name.
|
|
|
|
show backend
|
|
Dump the list of backends available in the running process
|
|
|
|
show info [typed]
|
|
Dump info about haproxy status on current process. If "typed" is passed as an
|
|
optional argument, field numbers, names and types are emitted as well so that
|
|
external monitoring products can easily retrieve, possibly aggregate, then
|
|
report information found in fields they don't know. Each field is dumped on
|
|
its own line. By default, the format contains only two columns delimited by a
|
|
colon (':'). The left one is the field name and the right one is the value.
|
|
It is very important to note that in typed output format, the dump for a
|
|
single object is contiguous so that there is no need for a consumer to store
|
|
everything at once.
|
|
|
|
When using the typed output format, each line is made of 4 columns delimited
|
|
by colons (':'). The first column is a dot-delimited series of 3 elements. The
|
|
first element is the numeric position of the field in the list (starting at
|
|
zero). This position shall not change over time, but holes are to be expected,
|
|
depending on build options or if some fields are deleted in the future. The
|
|
second element is the field name as it appears in the default "show info"
|
|
output. The third element is the relative process number starting at 1.
|
|
|
|
The rest of the line starting after the first colon follows the "typed output
|
|
format" described in the section above. In short, the second column (after the
|
|
first ':') indicates the origin, nature and scope of the variable. The third
|
|
column indicates the type of the field, among "s32", "s64", "u32", "u64" and
|
|
"str". Then the fourth column is the value itself, which the consumer knows
|
|
how to parse thanks to column 3 and how to process thanks to column 2.
|
|
|
|
Thus the overall line format in typed mode is :
|
|
|
|
<field_pos>.<field_name>.<process_num>:<tags>:<type>:<value>
|
|
|
|
Example :
|
|
|
|
> show info
|
|
Name: HAProxy
|
|
Version: 1.7-dev1-de52ea-146
|
|
Release_date: 2016/03/11
|
|
Nbproc: 1
|
|
Process_num: 1
|
|
Pid: 28105
|
|
Uptime: 0d 0h00m04s
|
|
Uptime_sec: 4
|
|
Memmax_MB: 0
|
|
PoolAlloc_MB: 0
|
|
PoolUsed_MB: 0
|
|
PoolFailed: 0
|
|
(...)
|
|
|
|
> show info typed
|
|
0.Name.1:POS:str:HAProxy
|
|
1.Version.1:POS:str:1.7-dev1-de52ea-146
|
|
2.Release_date.1:POS:str:2016/03/11
|
|
3.Nbproc.1:CGS:u32:1
|
|
4.Process_num.1:KGP:u32:1
|
|
5.Pid.1:SGP:u32:28105
|
|
6.Uptime.1:MDP:str:0d 0h00m08s
|
|
7.Uptime_sec.1:MDP:u32:8
|
|
8.Memmax_MB.1:CLP:u32:0
|
|
9.PoolAlloc_MB.1:MGP:u32:0
|
|
10.PoolUsed_MB.1:MGP:u32:0
|
|
11.PoolFailed.1:MCP:u32:0
|
|
(...)
|
|
|
|
In the typed format, the presence of the process ID at the end of the line
|
|
makes it very easy to visually aggregate outputs from multiple processes.
|
|
Example :
|
|
|
|
$ ( echo show info typed | socat /var/run/haproxy.sock1 ; \
|
|
echo show info typed | socat /var/run/haproxy.sock2 ) | \
|
|
sort -t . -k 1,1n -k 2,2 -k 3,3n
|
|
0.Name.1:POS:str:HAProxy
|
|
0.Name.2:POS:str:HAProxy
|
|
1.Version.1:POS:str:1.7-dev1-868ab3-148
|
|
1.Version.2:POS:str:1.7-dev1-868ab3-148
|
|
2.Release_date.1:POS:str:2016/03/11
|
|
2.Release_date.2:POS:str:2016/03/11
|
|
3.Nbproc.1:CGS:u32:2
|
|
3.Nbproc.2:CGS:u32:2
|
|
4.Process_num.1:KGP:u32:1
|
|
4.Process_num.2:KGP:u32:2
|
|
5.Pid.1:SGP:u32:30120
|
|
5.Pid.2:SGP:u32:30121
|
|
6.Uptime.1:MDP:str:0d 0h01m28s
|
|
6.Uptime.2:MDP:str:0d 0h01m28s
|
|
(...)
|
|
|
|
show map [<map>]
|
|
Dump info about map converters. Without argument, the list of all available
|
|
maps is returned. If a <map> is specified, its contents are dumped. <map> is
|
|
the #<id> or <file>. The first column is a unique identifier. It can be used
|
|
as reference for the operation "del map" and "set map". The second column is
|
|
the pattern and the third column is the sample if available. The data returned
|
|
are not directly a list of available maps, but are the list of all patterns
|
|
composing any map. Many of these patterns can be shared with ACL.
|
|
|
|
show acl [<acl>]
|
|
Dump info about acl converters. Without argument, the list of all available
|
|
acls is returned. If a <acl> is specified, its contents are dumped. <acl> if
|
|
the #<id> or <file>. The dump format is the same than the map even for the
|
|
sample value. The data returned are not a list of available ACL, but are the
|
|
list of all patterns composing any ACL. Many of these patterns can be shared
|
|
with maps.
|
|
|
|
show pools
|
|
Dump the status of internal memory pools. This is useful to track memory
|
|
usage when suspecting a memory leak for example. It does exactly the same
|
|
as the SIGQUIT when running in foreground except that it does not flush
|
|
the pools.
|
|
|
|
show servers state [<backend>]
|
|
Dump the state of the servers found in the running configuration. A backend
|
|
name or identifier may be provided to limit the output to this backend only.
|
|
|
|
The dump has the following format:
|
|
- first line contains the format version (1 in this specification);
|
|
- second line contains the column headers, prefixed by a sharp ('#');
|
|
- third line and next ones contain data;
|
|
- each line starting by a sharp ('#') is considered as a comment.
|
|
|
|
Since multiple versions of the output may co-exist, below is the list of
|
|
fields and their order per file format version :
|
|
1:
|
|
be_id: Backend unique id.
|
|
be_name: Backend label.
|
|
srv_id: Server unique id (in the backend).
|
|
srv_name: Server label.
|
|
srv_addr: Server IP address.
|
|
srv_op_state: Server operational state (UP/DOWN/...).
|
|
In source code: SRV_ST_*.
|
|
srv_admin_state: Server administrative state (MAINT/DRAIN/...).
|
|
In source code: SRV_ADMF_*.
|
|
srv_uweight: User visible server's weight.
|
|
srv_iweight: Server's initial weight.
|
|
srv_time_since_last_change: Time since last operational change.
|
|
srv_check_status: Last health check status.
|
|
srv_check_result: Last check result (FAILED/PASSED/...).
|
|
In source code: CHK_RES_*.
|
|
srv_check_health: Checks rise / fall current counter.
|
|
srv_check_state: State of the check (ENABLED/PAUSED/...).
|
|
In source code: CHK_ST_*.
|
|
srv_agent_state: State of the agent check (ENABLED/PAUSED/...).
|
|
In source code: CHK_ST_*.
|
|
bk_f_forced_id: Flag to know if the backend ID is forced by
|
|
configuration.
|
|
srv_f_forced_id: Flag to know if the server's ID is forced by
|
|
configuration.
|
|
|
|
show sess
|
|
Dump all known sessions. Avoid doing this on slow connections as this can
|
|
be huge. This command is restricted and can only be issued on sockets
|
|
configured for levels "operator" or "admin".
|
|
|
|
show sess <id>
|
|
Display a lot of internal information about the specified session identifier.
|
|
This identifier is the first field at the beginning of the lines in the dumps
|
|
of "show sess" (it corresponds to the session pointer). Those information are
|
|
useless to most users but may be used by haproxy developers to troubleshoot a
|
|
complex bug. The output format is intentionally not documented so that it can
|
|
freely evolve depending on demands. You may find a description of all fields
|
|
returned in src/dumpstats.c
|
|
|
|
The special id "all" dumps the states of all sessions, which must be avoided
|
|
as much as possible as it is highly CPU intensive and can take a lot of time.
|
|
|
|
show stat [<iid> <type> <sid>] [typed]
|
|
Dump statistics using the CSV format, or using the extended typed output
|
|
format described in the section above if "typed" is passed after the other
|
|
arguments. By passing <id>, <type> and <sid>, it is possible to dump only
|
|
selected items :
|
|
- <iid> is a proxy ID, -1 to dump everything
|
|
- <type> selects the type of dumpable objects : 1 for frontends, 2 for
|
|
backends, 4 for servers, -1 for everything. These values can be ORed,
|
|
for example:
|
|
1 + 2 = 3 -> frontend + backend.
|
|
1 + 2 + 4 = 7 -> frontend + backend + server.
|
|
- <sid> is a server ID, -1 to dump everything from the selected proxy.
|
|
|
|
Example :
|
|
$ echo "show info;show stat" | socat stdio unix-connect:/tmp/sock1
|
|
>>> Name: HAProxy
|
|
Version: 1.4-dev2-49
|
|
Release_date: 2009/09/23
|
|
Nbproc: 1
|
|
Process_num: 1
|
|
(...)
|
|
|
|
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq, (...)
|
|
stats,FRONTEND,,,0,0,1000,0,0,0,0,0,0,,,,,OPEN,,,,,,,,,1,1,0, (...)
|
|
stats,BACKEND,0,0,0,0,1000,0,0,0,0,0,,0,0,0,0,UP,0,0,0,,0,250,(...)
|
|
(...)
|
|
www1,BACKEND,0,0,0,0,1000,0,0,0,0,0,,0,0,0,0,UP,1,1,0,,0,250, (...)
|
|
|
|
$
|
|
|
|
In this example, two commands have been issued at once. That way it's easy to
|
|
find which process the stats apply to in multi-process mode. This is not
|
|
needed in the typed output format as the process number is reported on each
|
|
line. Notice the empty line after the information output which marks the end
|
|
of the first block. A similar empty line appears at the end of the second
|
|
block (stats) so that the reader knows the output has not been truncated.
|
|
|
|
When "typed" is specified, the output format is more suitable to monitoring
|
|
tools because it provides numeric positions and indicates the type of each
|
|
output field. Each value stands on its own line with process number, element
|
|
number, nature, origin and scope. This same format is available via the HTTP
|
|
stats by passing ";typed" after the URI. It is very important to note that in
|
|
typed output format, the dump for a single object is contiguous so that there
|
|
is no need for a consumer to store everything at once.
|
|
|
|
When using the typed output format, each line is made of 4 columns delimited
|
|
by colons (':'). The first column is a dot-delimited series of 5 elements. The
|
|
first element is a letter indicating the type of the object being described.
|
|
At the moment the following object types are known : 'F' for a frontend, 'B'
|
|
for a backend, 'L' for a listener, and 'S' for a server. The second element
|
|
The second element is a positive integer representing the unique identifier of
|
|
the proxy the object belongs to. It is equivalent to the "iid" column of the
|
|
CSV output and matches the value in front of the optional "id" directive found
|
|
in the frontend or backend section. The third element is a positive integer
|
|
containing the unique object identifier inside the proxy, and corresponds to
|
|
the "sid" column of the CSV output. ID 0 is reported when dumping a frontend
|
|
or a backend. For a listener or a server, this corresponds to their respective
|
|
ID inside the proxy. The fourth element is the numeric position of the field
|
|
in the list (starting at zero). This position shall not change over time, but
|
|
holes are to be expected, depending on build options or if some fields are
|
|
deleted in the future. The fifth element is the field name as it appears in
|
|
the CSV output. The sixth element is a positive integer and is the relative
|
|
process number starting at 1.
|
|
|
|
The rest of the line starting after the first colon follows the "typed output
|
|
format" described in the section above. In short, the second column (after the
|
|
first ':') indicates the origin, nature and scope of the variable. The third
|
|
column indicates the type of the field, among "s32", "s64", "u32", "u64" and
|
|
"str". Then the fourth column is the value itself, which the consumer knows
|
|
how to parse thanks to column 3 and how to process thanks to column 2.
|
|
|
|
Thus the overall line format in typed mode is :
|
|
|
|
<obj>.<px_id>.<id>.<fpos>.<fname>.<process_num>:<tags>:<type>:<value>
|
|
|
|
Here's an example of typed output format :
|
|
|
|
$ echo "show stat typed" | socat stdio unix-connect:/tmp/sock1
|
|
F.2.0.0.pxname.1:MGP:str:private-frontend
|
|
F.2.0.1.svname.1:MGP:str:FRONTEND
|
|
F.2.0.8.bin.1:MGP:u64:0
|
|
F.2.0.9.bout.1:MGP:u64:0
|
|
F.2.0.40.hrsp_2xx.1:MGP:u64:0
|
|
L.2.1.0.pxname.1:MGP:str:private-frontend
|
|
L.2.1.1.svname.1:MGP:str:sock-1
|
|
L.2.1.17.status.1:MGP:str:OPEN
|
|
L.2.1.73.addr.1:MGP:str:0.0.0.0:8001
|
|
S.3.13.60.rtime.1:MCP:u32:0
|
|
S.3.13.61.ttime.1:MCP:u32:0
|
|
S.3.13.62.agent_status.1:MGP:str:L4TOUT
|
|
S.3.13.64.agent_duration.1:MGP:u64:2001
|
|
S.3.13.65.check_desc.1:MCP:str:Layer4 timeout
|
|
S.3.13.66.agent_desc.1:MCP:str:Layer4 timeout
|
|
S.3.13.67.check_rise.1:MCP:u32:2
|
|
S.3.13.68.check_fall.1:MCP:u32:3
|
|
S.3.13.69.check_health.1:SGP:u32:0
|
|
S.3.13.70.agent_rise.1:MaP:u32:1
|
|
S.3.13.71.agent_fall.1:SGP:u32:1
|
|
S.3.13.72.agent_health.1:SGP:u32:1
|
|
S.3.13.73.addr.1:MCP:str:1.255.255.255:8888
|
|
S.3.13.75.mode.1:MAP:str:http
|
|
B.3.0.0.pxname.1:MGP:str:private-backend
|
|
B.3.0.1.svname.1:MGP:str:BACKEND
|
|
B.3.0.2.qcur.1:MGP:u32:0
|
|
B.3.0.3.qmax.1:MGP:u32:0
|
|
B.3.0.4.scur.1:MGP:u32:0
|
|
B.3.0.5.smax.1:MGP:u32:0
|
|
B.3.0.6.slim.1:MGP:u32:1000
|
|
B.3.0.55.lastsess.1:MMP:s32:-1
|
|
(...)
|
|
|
|
In the typed format, the presence of the process ID at the end of the line
|
|
makes it very easy to visually aggregate outputs from multiple processes, as
|
|
show in the example below where each line appears for each process :
|
|
|
|
$ ( echo show stat typed | socat /var/run/haproxy.sock1 - ; \
|
|
echo show stat typed | socat /var/run/haproxy.sock2 - ) | \
|
|
sort -t . -k 1,1 -k 2,2n -k 3,3n -k 4,4n -k 5,5 -k 6,6n
|
|
B.3.0.0.pxname.1:MGP:str:private-backend
|
|
B.3.0.0.pxname.2:MGP:str:private-backend
|
|
B.3.0.1.svname.1:MGP:str:BACKEND
|
|
B.3.0.1.svname.2:MGP:str:BACKEND
|
|
B.3.0.2.qcur.1:MGP:u32:0
|
|
B.3.0.2.qcur.2:MGP:u32:0
|
|
B.3.0.3.qmax.1:MGP:u32:0
|
|
B.3.0.3.qmax.2:MGP:u32:0
|
|
B.3.0.4.scur.1:MGP:u32:0
|
|
B.3.0.4.scur.2:MGP:u32:0
|
|
B.3.0.5.smax.1:MGP:u32:0
|
|
B.3.0.5.smax.2:MGP:u32:0
|
|
B.3.0.6.slim.1:MGP:u32:1000
|
|
B.3.0.6.slim.2:MGP:u32:1000
|
|
(...)
|
|
|
|
show stat resolvers [<resolvers section id>]
|
|
Dump statistics for the given resolvers section, or all resolvers sections
|
|
if no section is supplied.
|
|
|
|
For each name server, the following counters are reported:
|
|
sent: number of DNS requests sent to this server
|
|
valid: number of DNS valid responses received from this server
|
|
update: number of DNS responses used to update the server's IP address
|
|
cname: number of CNAME responses
|
|
cname_error: CNAME errors encountered with this server
|
|
any_err: number of empty response (IE: server does not support ANY type)
|
|
nx: non existent domain response received from this server
|
|
timeout: how many time this server did not answer in time
|
|
refused: number of requests refused by this server
|
|
other: any other DNS errors
|
|
invalid: invalid DNS response (from a protocol point of view)
|
|
too_big: too big response
|
|
outdated: number of response arrived too late (after an other name server)
|
|
|
|
show table
|
|
Dump general information on all known stick-tables. Their name is returned
|
|
(the name of the proxy which holds them), their type (currently zero, always
|
|
IP), their size in maximum possible number of entries, and the number of
|
|
entries currently in use.
|
|
|
|
Example :
|
|
$ echo "show table" | socat stdio /tmp/sock1
|
|
>>> # table: front_pub, type: ip, size:204800, used:171454
|
|
>>> # table: back_rdp, type: ip, size:204800, used:0
|
|
|
|
show table <name> [ data.<type> <operator> <value> ] | [ key <key> ]
|
|
Dump contents of stick-table <name>. In this mode, a first line of generic
|
|
information about the table is reported as with "show table", then all
|
|
entries are dumped. Since this can be quite heavy, it is possible to specify
|
|
a filter in order to specify what entries to display.
|
|
|
|
When the "data." form is used the filter applies to the stored data (see
|
|
"stick-table" in section 4.2). A stored data type must be specified
|
|
in <type>, and this data type must be stored in the table otherwise an
|
|
error is reported. The data is compared according to <operator> with the
|
|
64-bit integer <value>. Operators are the same as with the ACLs :
|
|
|
|
- eq : match entries whose data is equal to this value
|
|
- ne : match entries whose data is not equal to this value
|
|
- le : match entries whose data is less than or equal to this value
|
|
- ge : match entries whose data is greater than or equal to this value
|
|
- lt : match entries whose data is less than this value
|
|
- gt : match entries whose data is greater than this value
|
|
|
|
|
|
When the key form is used the entry <key> is shown. The key must be of the
|
|
same type as the table, which currently is limited to IPv4, IPv6, integer,
|
|
and string.
|
|
|
|
Example :
|
|
$ echo "show table http_proxy" | socat stdio /tmp/sock1
|
|
>>> # table: http_proxy, type: ip, size:204800, used:2
|
|
>>> 0x80e6a4c: key=127.0.0.1 use=0 exp=3594729 gpc0=0 conn_rate(30000)=1 \
|
|
bytes_out_rate(60000)=187
|
|
>>> 0x80e6a80: key=127.0.0.2 use=0 exp=3594740 gpc0=1 conn_rate(30000)=10 \
|
|
bytes_out_rate(60000)=191
|
|
|
|
$ echo "show table http_proxy data.gpc0 gt 0" | socat stdio /tmp/sock1
|
|
>>> # table: http_proxy, type: ip, size:204800, used:2
|
|
>>> 0x80e6a80: key=127.0.0.2 use=0 exp=3594740 gpc0=1 conn_rate(30000)=10 \
|
|
bytes_out_rate(60000)=191
|
|
|
|
$ echo "show table http_proxy data.conn_rate gt 5" | \
|
|
socat stdio /tmp/sock1
|
|
>>> # table: http_proxy, type: ip, size:204800, used:2
|
|
>>> 0x80e6a80: key=127.0.0.2 use=0 exp=3594740 gpc0=1 conn_rate(30000)=10 \
|
|
bytes_out_rate(60000)=191
|
|
|
|
$ echo "show table http_proxy key 127.0.0.2" | \
|
|
socat stdio /tmp/sock1
|
|
>>> # table: http_proxy, type: ip, size:204800, used:2
|
|
>>> 0x80e6a80: key=127.0.0.2 use=0 exp=3594740 gpc0=1 conn_rate(30000)=10 \
|
|
bytes_out_rate(60000)=191
|
|
|
|
When the data criterion applies to a dynamic value dependent on time such as
|
|
a bytes rate, the value is dynamically computed during the evaluation of the
|
|
entry in order to decide whether it has to be dumped or not. This means that
|
|
such a filter could match for some time then not match anymore because as
|
|
time goes, the average event rate drops.
|
|
|
|
It is possible to use this to extract lists of IP addresses abusing the
|
|
service, in order to monitor them or even blacklist them in a firewall.
|
|
Example :
|
|
$ echo "show table http_proxy data.gpc0 gt 0" \
|
|
| socat stdio /tmp/sock1 \
|
|
| fgrep 'key=' | cut -d' ' -f2 | cut -d= -f2 > abusers-ip.txt
|
|
( or | awk '/key/{ print a[split($2,a,"=")]; }' )
|
|
|
|
show tls-keys [id|*]
|
|
Dump all loaded TLS ticket keys references. The TLS ticket key reference ID
|
|
and the file from which the keys have been loaded is shown. Both of those
|
|
can be used to update the TLS keys using "set ssl tls-key". If an ID is
|
|
specified as parameter, it will dump the tickets, using * it will dump every
|
|
keys from every references.
|
|
|
|
shutdown frontend <frontend>
|
|
Completely delete the specified frontend. All the ports it was bound to will
|
|
be released. It will not be possible to enable the frontend anymore after
|
|
this operation. This is intended to be used in environments where stopping a
|
|
proxy is not even imaginable but a misconfigured proxy must be fixed. That
|
|
way it's possible to release the port and bind it into another process to
|
|
restore operations. The frontend will not appear at all on the stats page
|
|
once it is terminated.
|
|
|
|
The frontend may be specified either by its name or by its numeric ID,
|
|
prefixed with a sharp ('#').
|
|
|
|
This command is restricted and can only be issued on sockets configured for
|
|
level "admin".
|
|
|
|
shutdown session <id>
|
|
Immediately terminate the session matching the specified session identifier.
|
|
This identifier is the first field at the beginning of the lines in the dumps
|
|
of "show sess" (it corresponds to the session pointer). This can be used to
|
|
terminate a long-running session without waiting for a timeout or when an
|
|
endless transfer is ongoing. Such terminated sessions are reported with a 'K'
|
|
flag in the logs.
|
|
|
|
shutdown sessions server <backend>/<server>
|
|
Immediately terminate all the sessions attached to the specified server. This
|
|
can be used to terminate long-running sessions after a server is put into
|
|
maintenance mode, for instance. Such terminated sessions are reported with a
|
|
'K' flag in the logs.
|
|
|
|
|
|
10. Tricks for easier configuration management
|
|
----------------------------------------------
|
|
|
|
It is very common that two HAProxy nodes constituting a cluster share exactly
|
|
the same configuration modulo a few addresses. Instead of having to maintain a
|
|
duplicate configuration for each node, which will inevitably diverge, it is
|
|
possible to include environment variables in the configuration. Thus multiple
|
|
configuration may share the exact same file with only a few different system
|
|
wide environment variables. This started in version 1.5 where only addresses
|
|
were allowed to include environment variables, and 1.6 goes further by
|
|
supporting environment variables everywhere. The syntax is the same as in the
|
|
UNIX shell, a variable starts with a dollar sign ('$'), followed by an opening
|
|
curly brace ('{'), then the variable name followed by the closing brace ('}').
|
|
Except for addresses, environment variables are only interpreted in arguments
|
|
surrounded with double quotes (this was necessary not to break existing setups
|
|
using regular expressions involving the dollar symbol).
|
|
|
|
Environment variables also make it convenient to write configurations which are
|
|
expected to work on various sites where only the address changes. It can also
|
|
permit to remove passwords from some configs. Example below where the the file
|
|
"site1.env" file is sourced by the init script upon startup :
|
|
|
|
$ cat site1.env
|
|
LISTEN=192.168.1.1
|
|
CACHE_PFX=192.168.11
|
|
SERVER_PFX=192.168.22
|
|
LOGGER=192.168.33.1
|
|
STATSLP=admin:pa$$w0rd
|
|
ABUSERS=/etc/haproxy/abuse.lst
|
|
TIMEOUT=10s
|
|
|
|
$ cat haproxy.cfg
|
|
global
|
|
log "${LOGGER}:514" local0
|
|
|
|
defaults
|
|
mode http
|
|
timeout client "${TIMEOUT}"
|
|
timeout server "${TIMEOUT}"
|
|
timeout connect 5s
|
|
|
|
frontend public
|
|
bind "${LISTEN}:80"
|
|
http-request reject if { src -f "${ABUSERS}" }
|
|
stats uri /stats
|
|
stats auth "${STATSLP}"
|
|
use_backend cache if { path_end .jpg .css .ico }
|
|
default_backend server
|
|
|
|
backend cache
|
|
server cache1 "${CACHE_PFX}.1:18080" check
|
|
server cache2 "${CACHE_PFX}.2:18080" check
|
|
|
|
backend server
|
|
server cache1 "${SERVER_PFX}.1:8080" check
|
|
server cache2 "${SERVER_PFX}.2:8080" check
|
|
|
|
|
|
11. Well-known traps to avoid
|
|
-----------------------------
|
|
|
|
Once in a while, someone reports that after a system reboot, the haproxy
|
|
service wasn't started, and that once they start it by hand it works. Most
|
|
often, these people are running a clustered IP address mechanism such as
|
|
keepalived, to assign the service IP address to the master node only, and while
|
|
it used to work when they used to bind haproxy to address 0.0.0.0, it stopped
|
|
working after they bound it to the virtual IP address. What happens here is
|
|
that when the service starts, the virtual IP address is not yet owned by the
|
|
local node, so when HAProxy wants to bind to it, the system rejects this
|
|
because it is not a local IP address. The fix doesn't consist in delaying the
|
|
haproxy service startup (since it wouldn't stand a restart), but instead to
|
|
properly configure the system to allow binding to non-local addresses. This is
|
|
easily done on Linux by setting the net.ipv4.ip_nonlocal_bind sysctl to 1. This
|
|
is also needed in order to transparently intercept the IP traffic that passes
|
|
through HAProxy for a specific target address.
|
|
|
|
Multi-process configurations involving source port ranges may apparently seem
|
|
to work but they will cause some random failures under high loads because more
|
|
than one process may try to use the same source port to connect to the same
|
|
server, which is not possible. The system will report an error and a retry will
|
|
happen, picking another port. A high value in the "retries" parameter may hide
|
|
the effect to a certain extent but this also comes with increased CPU usage and
|
|
processing time. Logs will also report a certain number of retries. For this
|
|
reason, port ranges should be avoided in multi-process configurations.
|
|
|
|
Since HAProxy uses SO_REUSEPORT and supports having multiple independent
|
|
processes bound to the same IP:port, during troubleshooting it can happen that
|
|
an old process was not stopped before a new one was started. This provides
|
|
absurd test results which tend to indicate that any change to the configuration
|
|
is ignored. The reason is that in fact even the new process is restarted with a
|
|
new configuration, the old one also gets some incoming connections and
|
|
processes them, returning unexpected results. When in doubt, just stop the new
|
|
process and try again. If it still works, it very likely means that an old
|
|
process remains alive and has to be stopped. Linux's "netstat -lntp" is of good
|
|
help here.
|
|
|
|
When adding entries to an ACL from the command line (eg: when blacklisting a
|
|
source address), it is important to keep in mind that these entries are not
|
|
synchronized to the file and that if someone reloads the configuration, these
|
|
updates will be lost. While this is often the desired effect (for blacklisting)
|
|
it may not necessarily match expectations when the change was made as a fix for
|
|
a problem. See the "add acl" action of the CLI interface.
|
|
|
|
|
|
12. Debugging and performance issues
|
|
------------------------------------
|
|
|
|
When HAProxy is started with the "-d" option, it will stay in the foreground
|
|
and will print one line per event, such as an incoming connection, the end of a
|
|
connection, and for each request or response header line seen. This debug
|
|
output is emitted before the contents are processed, so they don't consider the
|
|
local modifications. The main use is to show the request and response without
|
|
having to run a network sniffer. The output is less readable when multiple
|
|
connections are handled in parallel, though the "debug2ansi" and "debug2html"
|
|
scripts found in the examples/ directory definitely help here by coloring the
|
|
output.
|
|
|
|
If a request or response is rejected because HAProxy finds it is malformed, the
|
|
best thing to do is to connect to the CLI and issue "show errors", which will
|
|
report the last captured faulty request and response for each frontend and
|
|
backend, with all the necessary information to indicate precisely the first
|
|
character of the input stream that was rejected. This is sometimes needed to
|
|
prove to customers or to developers that a bug is present in their code. In
|
|
this case it is often possible to relax the checks (but still keep the
|
|
captures) using "option accept-invalid-http-request" or its equivalent for
|
|
responses coming from the server "option accept-invalid-http-response". Please
|
|
see the configuration manual for more details.
|
|
|
|
Example :
|
|
|
|
> show errors
|
|
Total events captured on [13/Oct/2015:13:43:47.169] : 1
|
|
|
|
[13/Oct/2015:13:43:40.918] frontend HAProxyLocalStats (#2): invalid request
|
|
backend <NONE> (#-1), server <NONE> (#-1), event #0
|
|
src 127.0.0.1:51981, session #0, session flags 0x00000080
|
|
HTTP msg state 26, msg flags 0x00000000, tx flags 0x00000000
|
|
HTTP chunk len 0 bytes, HTTP body len 0 bytes
|
|
buffer flags 0x00808002, out 0 bytes, total 31 bytes
|
|
pending 31 bytes, wrapping at 8040, error at position 13:
|
|
|
|
00000 GET /invalid request HTTP/1.1\r\n
|
|
|
|
|
|
The output of "show info" on the CLI provides a number of useful information
|
|
regarding the maximum connection rate ever reached, maximum SSL key rate ever
|
|
reached, and in general all information which can help to explain temporary
|
|
issues regarding CPU or memory usage. Example :
|
|
|
|
> show info
|
|
Name: HAProxy
|
|
Version: 1.6-dev7-e32d18-17
|
|
Release_date: 2015/10/12
|
|
Nbproc: 1
|
|
Process_num: 1
|
|
Pid: 7949
|
|
Uptime: 0d 0h02m39s
|
|
Uptime_sec: 159
|
|
Memmax_MB: 0
|
|
Ulimit-n: 120032
|
|
Maxsock: 120032
|
|
Maxconn: 60000
|
|
Hard_maxconn: 60000
|
|
CurrConns: 0
|
|
CumConns: 3
|
|
CumReq: 3
|
|
MaxSslConns: 0
|
|
CurrSslConns: 0
|
|
CumSslConns: 0
|
|
Maxpipes: 0
|
|
PipesUsed: 0
|
|
PipesFree: 0
|
|
ConnRate: 0
|
|
ConnRateLimit: 0
|
|
MaxConnRate: 1
|
|
SessRate: 0
|
|
SessRateLimit: 0
|
|
MaxSessRate: 1
|
|
SslRate: 0
|
|
SslRateLimit: 0
|
|
MaxSslRate: 0
|
|
SslFrontendKeyRate: 0
|
|
SslFrontendMaxKeyRate: 0
|
|
SslFrontendSessionReuse_pct: 0
|
|
SslBackendKeyRate: 0
|
|
SslBackendMaxKeyRate: 0
|
|
SslCacheLookups: 0
|
|
SslCacheMisses: 0
|
|
CompressBpsIn: 0
|
|
CompressBpsOut: 0
|
|
CompressBpsRateLim: 0
|
|
ZlibMemUsage: 0
|
|
MaxZlibMemUsage: 0
|
|
Tasks: 5
|
|
Run_queue: 1
|
|
Idle_pct: 100
|
|
node: wtap
|
|
description:
|
|
|
|
When an issue seems to randomly appear on a new version of HAProxy (eg: every
|
|
second request is aborted, occasional crash, etc), it is worth trying to enable
|
|
memory poisoning so that each call to malloc() is immediately followed by the
|
|
filling of the memory area with a configurable byte. By default this byte is
|
|
0x50 (ASCII for 'P'), but any other byte can be used, including zero (which
|
|
will have the same effect as a calloc() and which may make issues disappear).
|
|
Memory poisoning is enabled on the command line using the "-dM" option. It
|
|
slightly hurts performance and is not recommended for use in production. If
|
|
an issue happens all the time with it or never happens when poisoning uses
|
|
byte zero, it clearly means you've found a bug and you definitely need to
|
|
report it. Otherwise if there's no clear change, the problem it is not related.
|
|
|
|
When debugging some latency issues, it is important to use both strace and
|
|
tcpdump on the local machine, and another tcpdump on the remote system. The
|
|
reason for this is that there are delays everywhere in the processing chain and
|
|
it is important to know which one is causing latency to know where to act. In
|
|
practice, the local tcpdump will indicate when the input data come in. Strace
|
|
will indicate when haproxy receives these data (using recv/recvfrom). Warning,
|
|
openssl uses read()/write() syscalls instead of recv()/send(). Strace will also
|
|
show when haproxy sends the data, and tcpdump will show when the system sends
|
|
these data to the interface. Then the external tcpdump will show when the data
|
|
sent are really received (since the local one only shows when the packets are
|
|
queued). The benefit of sniffing on the local system is that strace and tcpdump
|
|
will use the same reference clock. Strace should be used with "-tts200" to get
|
|
complete timestamps and report large enough chunks of data to read them.
|
|
Tcpdump should be used with "-nvvttSs0" to report full packets, real sequence
|
|
numbers and complete timestamps.
|
|
|
|
In practice, received data are almost always immediately received by haproxy
|
|
(unless the machine has a saturated CPU or these data are invalid and not
|
|
delivered). If these data are received but not sent, it generally is because
|
|
the output buffer is saturated (ie: recipient doesn't consume the data fast
|
|
enough). This can be confirmed by seeing that the polling doesn't notify of
|
|
the ability to write on the output file descriptor for some time (it's often
|
|
easier to spot in the strace output when the data finally leave and then roll
|
|
back to see when the write event was notified). It generally matches an ACK
|
|
received from the recipient, and detected by tcpdump. Once the data are sent,
|
|
they may spend some time in the system doing nothing. Here again, the TCP
|
|
congestion window may be limited and not allow these data to leave, waiting for
|
|
an ACK to open the window. If the traffic is idle and the data take 40 ms or
|
|
200 ms to leave, it's a different issue (which is not an issue), it's the fact
|
|
that the Nagle algorithm prevents empty packets from leaving immediately, in
|
|
hope that they will be merged with subsequent data. HAProxy automatically
|
|
disables Nagle in pure TCP mode and in tunnels. However it definitely remains
|
|
enabled when forwarding an HTTP body (and this contributes to the performance
|
|
improvement there by reducing the number of packets). Some HTTP non-compliant
|
|
applications may be sensitive to the latency when delivering incomplete HTTP
|
|
response messages. In this case you will have to enable "option http-no-delay"
|
|
to disable Nagle in order to work around their design, keeping in mind that any
|
|
other proxy in the chain may similarly be impacted. If tcpdump reports that data
|
|
leave immediately but the other end doesn't see them quickly, it can mean there
|
|
is a congested WAN link, a congested LAN with flow control enabled and
|
|
preventing the data from leaving, or more commonly that HAProxy is in fact
|
|
running in a virtual machine and that for whatever reason the hypervisor has
|
|
decided that the data didn't need to be sent immediately. In virtualized
|
|
environments, latency issues are almost always caused by the virtualization
|
|
layer, so in order to save time, it's worth first comparing tcpdump in the VM
|
|
and on the external components. Any difference has to be credited to the
|
|
hypervisor and its accompanying drivers.
|
|
|
|
When some TCP SACK segments are seen in tcpdump traces (using -vv), it always
|
|
means that the side sending them has got the proof of a lost packet. While not
|
|
seeing them doesn't mean there are no losses, seeing them definitely means the
|
|
network is lossy. Losses are normal on a network, but at a rate where SACKs are
|
|
not noticeable at the naked eye. If they appear a lot in the traces, it is
|
|
worth investigating exactly what happens and where the packets are lost. HTTP
|
|
doesn't cope well with TCP losses, which introduce huge latencies.
|
|
|
|
The "netstat -i" command will report statistics per interface. An interface
|
|
where the Rx-Ovr counter grows indicates that the system doesn't have enough
|
|
resources to receive all incoming packets and that they're lost before being
|
|
processed by the network driver. Rx-Drp indicates that some received packets
|
|
were lost in the network stack because the application doesn't process them
|
|
fast enough. This can happen during some attacks as well. Tx-Drp means that
|
|
the output queues were full and packets had to be dropped. When using TCP it
|
|
should be very rare, but will possibly indicate a saturated outgoing link.
|
|
|
|
|
|
13. Security considerations
|
|
---------------------------
|
|
|
|
HAProxy is designed to run with very limited privileges. The standard way to
|
|
use it is to isolate it into a chroot jail and to drop its privileges to a
|
|
non-root user without any permissions inside this jail so that if any future
|
|
vulnerability were to be discovered, its compromise would not affect the rest
|
|
of the system.
|
|
|
|
In order to perform a chroot, it first needs to be started as a root user. It is
|
|
pointless to build hand-made chroots to start the process there, these ones are
|
|
painful to build, are never properly maintained and always contain way more
|
|
bugs than the main file-system. And in case of compromise, the intruder can use
|
|
the purposely built file-system. Unfortunately many administrators confuse
|
|
"start as root" and "run as root", resulting in the uid change to be done prior
|
|
to starting haproxy, and reducing the effective security restrictions.
|
|
|
|
HAProxy will need to be started as root in order to :
|
|
- adjust the file descriptor limits
|
|
- bind to privileged port numbers
|
|
- bind to a specific network interface
|
|
- transparently listen to a foreign address
|
|
- isolate itself inside the chroot jail
|
|
- drop to another non-privileged UID
|
|
|
|
HAProxy may require to be run as root in order to :
|
|
- bind to an interface for outgoing connections
|
|
- bind to privileged source ports for outgoing connections
|
|
- transparently bind to a foreign address for outgoing connections
|
|
|
|
Most users will never need the "run as root" case. But the "start as root"
|
|
covers most usages.
|
|
|
|
A safe configuration will have :
|
|
|
|
- a chroot statement pointing to an empty location without any access
|
|
permissions. This can be prepared this way on the UNIX command line :
|
|
|
|
# mkdir /var/empty && chmod 0 /var/empty || echo "Failed"
|
|
|
|
and referenced like this in the HAProxy configuration's global section :
|
|
|
|
chroot /var/empty
|
|
|
|
- both a uid/user and gid/group statements in the global section :
|
|
|
|
user haproxy
|
|
group haproxy
|
|
|
|
- a stats socket whose mode, uid and gid are set to match the user and/or
|
|
group allowed to access the CLI so that nobody may access it :
|
|
|
|
stats socket /var/run/haproxy.stat uid hatop gid hatop mode 600
|
|
|