The header of a new management guide chapter, "13.1. Linux capabilities
support", is not rendered in HTML format in a proper way, because of missing
dots at the end of this chapter's number.
If 'namespace' keyword is used in the backend server settings or/and in the
bind string, it means that haproxy process will call setns() to change its
default namespace to the configured one and then, it will create a
socket in this new namespace. setns() syscall requires CAP_SYS_ADMIN
capability in the process Effective set (see man 2 setns). Otherwise, the
process must be run as root.
To avoid to run haproxy as root, let's add cap_sys_admin capability in the
same way as we already added the support for some other network capabilities.
As CAP_SYS_ADMIN belongs to CAP_SYS_* capabilities type, let's add a separate
flag LSTCHK_SYSADM for it. This flag is set, if the 'namespace' keyword was
found during configuration parsing. The flag may be unset only in
prepare_caps_for_setuid() or in prepare_caps_from_permitted_set(), which
inspect process EUID/RUID and Effective and Permitted capabilities sets.
If system doesn't support Linux capabilities or 'cap_sys_admin' was not set
in 'setcap', but 'namespace' keyword is presented in the configuration, we
keep the previous strict behaviour. Process, that has changed uid to the
non-priviledged user, will terminate with alert. This alert invites the user
to recheck its configuration.
In the case, when haproxy will start and run under a non-root user and
'cap_sys_admin' is not set, but 'namespace' keyword is presented, this patch
does not change previous behaviour as well. We'll still let the user to try
its configuration, but we inform via warning, that unexpected things, like
socket creation errors, may occur.
As previously discussed, "srv-unused" is sufficiently ambiguous to cause
some trouble over the long term. Better use "srv-removable" to indicate
that the server is removable, and if the conditions to delete a server
change over time, the wait condition will be adjusted without renaming
it.
Since the introduction of the automatic seamless reload using the
internal socketpair, there is no way of disabling the seamless reload.
Previously we just needed to remove -x from the startup command line,
and remove any "expose-fd" keyword on stats socket lines.
This was introduced in 2be557f7c ("MEDIUM: mworker: seamless reload use
the internal sockpairs").
The patch copy /dev/null again and pass it to the next exec so we never
try to get socket from the -x.
Must be backported as far as 2.6.
Define a new CLI command "dump stats-file" with its handler
cli_parse_dump_stat_file(). It will loop twice on proxies_list to dump
first frontend and then backend side. It reuses the common function
stats_dump_stat_to_buffer(), using STAT_F_BOUND to restrict on the
correct side.
A new module stats-file.c is added to regroup function specifics to
stats-file. It defines two main functions :
* stats_dump_file_header() to generate the list of column list prefixed
by the line context, either "#fe" or "#be"
* stats_dump_fields_file() to generate each stat lines. Object without
GUID are skipped. Each stat entry is separated by a comma.
For the moment, stats-file does not support statistics modules. As such,
stats_dump_*_line() functions are updated to prevent looping over stats
module on stats-file output.
Limiting total allocatable process memory (VSZ) via setting RLIMIT_AS limit is
no longer effective, in order to restrict memory consumption at run time.
We can see from process memory map below, that there are many holes within
the process VA space, which bumps its VSZ to 1.5G. These holes are here by
many reasons and could be explaned at first by the full randomization of
system VA space. Now it is usually enabled in Linux kernels by default. There
are always gaps around the process stack area to trap overflows. Holes before
and after shared libraries could be explained by the fact, that on many
architectures libraries have a 'preferred' address to be loaded at; putting
them elsewhere requires relocation work, and probably some unshared pages.
Repetitive holes of 65380K are most probably correspond to the header that
malloc has to allocate before asked a claimed memory block. This header is
used by malloc to link allocated chunks together and for its internal book
keeping.
$ sudo pmap -x -p `pidof haproxy`
127136: ./haproxy -f /home/haproxy/haproxy/haproxy_h2.cfg
Address Kbytes RSS Dirty Mode Mapping
0000555555554000 388 64 0 r---- /home/haproxy/haproxy/haproxy
00005555555b5000 2608 1216 0 r-x-- /home/haproxy/haproxy/haproxy
0000555555841000 916 64 0 r---- /home/haproxy/haproxy/haproxy
0000555555926000 60 60 60 r---- /home/haproxy/haproxy/haproxy
0000555555935000 116 116 116 rw--- /home/haproxy/haproxy/haproxy
0000555555952000 7872 5236 5236 rw--- [ anon ]
00007fff98000000 156 36 36 rw--- [ anon ]
00007fff98027000 65380 0 0 ----- [ anon ]
00007fffa0000000 156 36 36 rw--- [ anon ]
00007fffa0027000 65380 0 0 ----- [ anon ]
00007fffa4000000 156 36 36 rw--- [ anon ]
00007fffa4027000 65380 0 0 ----- [ anon ]
00007fffa8000000 156 36 36 rw--- [ anon ]
00007fffa8027000 65380 0 0 ----- [ anon ]
00007fffac000000 156 36 36 rw--- [ anon ]
00007fffac027000 65380 0 0 ----- [ anon ]
00007fffb0000000 156 36 36 rw--- [ anon ]
00007fffb0027000 65380 0 0 ----- [ anon ]
...
00007ffff7fce000 4 4 0 r-x-- [ anon ]
00007ffff7fcf000 4 4 0 r---- /usr/lib/x86_64-linux-gnu/ld-2.31.so
00007ffff7fd0000 140 140 0 r-x-- /usr/lib/x86_64-linux-gnu/ld-2.31.so
...
00007ffff7ffe000 4 4 4 rw--- [ anon ]
00007ffffffde000 132 20 20 rw--- [ stack ]
ffffffffff600000 4 0 0 --x-- [ anon ]
---------------- ------- ------- -------
total kB 1499288 75504 72760
This exceeded VSZ makes impossible to start an haproxy process with 200M
memory limit, set at its initialization stage as RLIMIT_AS. We usually
have in this case such cryptic output at stderr:
$ haproxy -m 200 -f haproxy_quic.cfg
(null)(null)(null)(null)(null)(null)
At the same time the process RSS (a memory really used) is only 75,5M.
So to make process memory accounting more realistic let's base the memory
limit, set by -m option, on RSS measurement and let's use RLIMIT_DATA instead
of RLIMIT_AS.
RLIMIT_AS was used before, because earlier versions of haproxy always allocate
memory buffers for new connections, but data were not written there
immediately. So these buffers were not instantly counted in RSS, but were
always counted in VSZ. Now we allocate new buffers only in the case, when we
will write there some data immediately, so using RLIMIT_DATA becomes more
appropriate.
Since the Linux capabilities support add-on (see the commit bd84387beb
("MEDIUM: capabilities: enable support for Linux capabilities")), we can also
check haproxy process effective and permitted capabilities sets, when it
starts and runs as non-root.
Like this, if needed network capabilities are presented only in the process
permitted set, we can get this information with capget and put them in the
process effective set via capset. To do this properly, let's introduce
prepare_caps_from_permitted_set().
First, it checks if binary effective set has CAP_NET_ADMIN or CAP_NET_RAW. If
there is a match, LSTCHK_NETADM is removed from global.last_checks list to
avoid warning, because in the initialization sequence some last configuration
checks are based on LSTCHK_NETADM flag and haproxy process euid may stay
unpriviledged.
If there are no CAP_NET_ADMIN and CAP_NET_RAW in the effective set, permitted
set will be checked and only capabilities given in 'setcap' keyword will be
promoted in the process effective set. LSTCHK_NETADM will be also removed in
this case by the same reason. In order to be transparent, we promote from
permitted set only capabilities given by user in 'setcap' keyword. So, if
caplist doesn't include CAP_NET_ADMIN or CAP_NET_RAW, LSTCHK_NETADM would not
be unset and warning about missing priviledges will be emitted at
initialization.
Need to call it before protocol_bind_all() to allow binding to priviledged
ports under non-root and 'setcap cap_net_bind_service' must be set in the
global section in this case.
This commit allows "cookie" keyword for dynamic servers. After code
review, nothing was found which could prevent a dynamic server to use
it. An extra warning is added under cli_parse_add_server() if cookie
value is ignored due to a non HTTP backend.
This patch is not considered a bugfix. However, it may backported if
needed as its impact seems minimal.
Since their first implementation, dynamic servers are created into
maintenance state. This has been done purposely to avoid immediate
activation of a newly inserted server.
However, this principle is incompatible if "enabled" keyword is used on
"add server". The newly created instance will be unreacheable as proxy
load-balancing algorithm is not informed of its presence via
srv_lb_propagate(). The new server could be unblocked by toggling its
state with "disable server" / "enable server" commands, which will
trigger srv_lb_propagate() invocation.
To avoid this unexpected state, simply forbid "enabled" keyword for
dynamic servers. In the long-term, it could be possible to re authorize
it but at least this requires to call srv_lb_propagate() on dynamic
server creation.
This should fix github issue #2497.
This patch should not be backported as-is, to avoid breaking dynamic
servers API on stable versions. "enabled" should instead be ignored for
them. This will be implemented in a dedicated patch on top of 2.9.
-dI allow to enable "insure-fork-wanted" directly from the command line,
which is useful when you want to run ASAN with addr2line with a lot of
configuration files without editing them.
Extend "show quic" to be able to dump MUX related information. This is
done via the new function qcc_show_quic(). This replaces the old streams
dumping list which was incomplete.
These info are displayed on full output or by specifying "mux" field.
Add the possibility to customize show quic full output with only a
specific set of printed fields. This is specified as a comma-separated
list. Here are the currently supported values :
* tp: transport parameters
* sock: connection addresses and socket FD
* pktns: packet number space with ack ranges and in flight bytes
* cc: congestion controler and loss information
Note that streams output is not filtered by this mechanism. It's because
it will be replaced soon by an output generated from the MUX which will
use its owned field name.
Add the possibilty to restrict show quic output to only a single
connection. This is done by specifying a quic_conn address pointer.
Default format selection has evolved with it. Indeed, it seems more
fitting to use full format by default when filtering on a connection.
However, it's still possible to revert to the original oneline format
with it by specifying it explicitely.
The "wait" command now supports a condition, "srv-unused", which waits
for the designated server to become totally unused, indicating that it
is removable. Upon each wakeup it calls srv_check_for_deletion() to
verify if conditions are met, if not if it's recoverable, or if it's
not recoverable, and proceeds according to this, never waiting for a
final decision longer than the configured delay.
The purpose is to make it possible to remove servers from the CLI after
waiting for their sessions to be terminated:
$ socat -t5 /path/to/socket - <<< "
disable server px/srv1
shutdown sessions server px/srv1
wait 2s srv-unused px/srv1
del server px/srv1"
Or even wait for connections to terminate themselves:
$ socat -t70 /path/to/socket - <<< "
disable server px/srv1
wait 1m srv-unused px/srv1
del server px/srv1"
This allows to insert delays between commands, i.e. to collect a same
set of metrics at a fixed interval. E.g:
$ socat -t20 /path/to/socket <<< "show activity; wait 10s; show activity"
The goal will be to extend the feature to optionally support waiting on
certain conditions. For this reason the struct definitions and enums were
placed into cli-t.h.
Because maps and list of ACLs are no longer necessarily referenced by
filenames, CLI commands to manipulate them were updated accordingly. Instead
of "filename" we talk about "name" now.
The same is performed in the LUA documentation.
No backport needed, these were introduced by latest commits 3dd55fa13
("MINOR: mworker/cli: implement hard-reload over the master CLI") and
cef29d370 ("MINOR: trace: define simple -dt argument").
Add an optional argument for "-dt". This argument is interpreted as a
list of several trace statement separated by comma. For each statement,
a specific trace name can be specifed, or none to act on all sources.
Using double-colon separator, it is possible to add specifications on
the wanted level and verbosity.
Add '-dt' haproxy process argument. This will automatically activate all
trace sources on stderr with the error level. This could be useful to
troubleshoot issues such as protocol violations.
The mworker mode never had a proper 'hard-stop' (-st) for the reload,
this is a mode which was commonly used with the daemon mode, but it was
never implemented in mworker mode.
This patch fixes the problem by implementing a "hard-reload" command
over the master CLI. It does the same as the "reload" command, but
instead of waiting for the connections to stop in the previous process,
it immediately quits the previous process after binding.
This one reports streams considered as "suspicious", i.e. those with
no expiration dates or dates in the past, or those without a front
endpoint. More criteria could be added in the future.
It's often needed to be able to refine "show sess" when debugging, and
very often a first glance at old streams is performed, but that's a
difficult task in large dumps, and it takes lots of resources to dump
everything.
This commit adds "older <age>" to "show sess" in order to specify the
minimum age of streams that will be dumped. This should simplify the
identification of blocked ones.
The documentation about -q seems wrong, it does not output messages
after the startup, it disables all messages. It was always quiet with
the stdio_quiet() function.
Must be backported in all stable versions.
This provides more consistency between the master and the worker. When
"prompt timed" is passed on the master, the timed mode is toggled. When
enabled, for a master it will show the master process' uptime, and for
a worker it will show this worker's uptime. Example:
master> prompt timed
[0:00:00:50] master> show proc
#<PID> <type> <reloads> <uptime> <version>
11940 master 1 [failed: 0] 0d00h02m10s 2.8-dev11-474c14-21
# workers
11955 worker 0 0d00h00m59s 2.8-dev11-474c14-21
# old workers
11942 worker 1 0d00h02m10s 2.8-dev11-474c14-21
# programs
[0:00:00:58] master> @!11955
[0:00:01:03] 11955> @!11942
[0:00:02:17] 11942> @
[0:00:01:10] master>
Entering "prompt timed" toggles reporting of the process' uptime in
the prompt, which will report days, hours, minutes and seconds since
it was started. As discussed with Tim in issue #2145, this can be
convenient to roughly estimate the time between two outputs, as well
as detecting that a process failed to be reloaded for example.
Add a new output format "oneline" for "show quic" command. This prints
one connection per line with minimal information. The objective is to
have an equivalent of the netstat/ss tools with just enough information
to quickly find connection which are misbehaving.
A legend is printed on the first line to describe the field columns
starting with a dash character.
This should be backported up to 2.7.
Add an extra optional argument for "show quic" to specify desired output
format. Its objective is to control the verbosity per connections. For
the moment, only "full" is supported, which is the already implemented
output with maximum information.
This should be backported up to 2.7.
Depending on what we're debugging, some FDs can represent pollution in
the "show fd" output. Here we add a set of filters allowing to pick (or
exclude) any combination of listener, frontend conn, backend conn, pipes,
etc. "show fd l" will only list listening connections for example.
Starting haproxy with -dL helps enumerate the list of libraries in use.
But sometimes in order to go further we'd like to see their address
ranges. This is already supported on the CLI's "show libs" but not on
the command line where it can sometimes help troubleshoot startup issues.
Let's dump them when in verbose mode. This way it doesn't change the
existing behavior for those trying to enumerate libs to produce an archive.
Using -dMfail alone does nothing unless tune.fail-alloc is set, which
renders it pretty useless as-is, and is not intuitive. Let's change
this so that the filure rate is preset to 1% when the option is set on
the command line. This allows to inject failures without having to edit
the configuration.
The ocsp-related CLI commands tend to work with OCSP_CERTIDs as well as
certificate paths so the path should also be added to the output of the
"show ssl ocsp-response" command when no certid or path is provided.
In order to increase usability, the "show ssl ocsp-response" also takes
a frontend certificate path as parameter. In such a case, it behaves the
same way as "show ssl cert foo.pem.ocsp".
A new format option can be passed to the "show ssl ocsp-response" CLI
command to dump the contents of an OCSP response in base64. This is
needed because thanks to the new OCSP auto update mechanism, we could
end up using an OCSP response internally that was never provided by the
user.
This command can be used to dump information about the entries contained
in the ocsp update tree. It will display one line per concerned OCSP
response and will contain the expected next update time as well as the
time of the last successful update, and the number of successful and
failed attempts.
To prevent data loss for QUIC connections, haproxy global variable jobs
is incremented each time a quic-conn socket is allocated. This allows
the QUIC connection to terminate all its transfer operation during proxy
soft-stop. Without this patch, the process will be terminated without
waiting for QUIC connections.
Note that this is done in qc_alloc_fd(). This means only QUIC connection
with their owned socket will properly support soft-stop. In the other
case, the connection will be interrupted abruptly as before. Similarly,
jobs decrement is conducted in qc_release_fd().
This should be backported up to 2.7.
The -dF option can now be used to disable data fast-forward. It does the
same than the global option "tune.fast-forward off". Some reg-tests may rely
on this optim. To detect the feature and skip such script, the following
vtest command must be used:
feature cmd "$HAPROXY_PROGRAM -cc '!(globa.tune & GTUNE_NO_FAST_FWD)'"
Reduce default "show quic" output by masking connection on
closing/draing state due to a CONNECTION_CLOSE emission/reception. These
connections can still be displayed using the special argument "all".
This should be backported up to 2.7.
Implement a basic "show quic" CLI handler. This command will be useful
to display various information on all the active QUIC frontend
connections.
This work is heavily inspired by "show sess". Most notably, a global
list of quic_conn has been introduced to be able to loop over them. This
list is stored per thread in ha_thread_ctx.
Also add three CLI handlers for "show quic" in order to allocate and
free the command context. The dump handler runs on thread isolation.
Each quic_conn is referenced using a back-ref to handle deletion during
handler yielding.
For the moment, only a list of raw quic_conn pointers is displayed. The
handler will be completed over time with more information as needed.
This should be backported up to 2.7.