When not sharing pools (i.e. when building with -DDEBUG_DONT_SHARE_POOLS)
we have about 47 pools right now, while MAX_BASE_POOLS is only 32, meaning
that only the first 32 ones will benefit from a per-thread cache entry.
This totally kills performance when pools are not shared (roughly -20%).
Let's double the limit to gain some margin, and make it possible to set
it as a build option.
It might be useful to backport this to stable versions as they're likely
to be affected as well.
If the owning task is already dying (context was destroyed by fcgi_takeover)
there's no point taking the lock then removing it later since all the code
in between is conditionned by a non-null context. Let's simplify this.
If the owning task is already dying (context was destroyed by h2_takeover)
there's no point taking the lock then removing it later since all the code
in between is conditionned by a non-null context. Let's simplify this.
If the owning task is already dying (context was destroyed by h1_takeover)
there's no point taking the lock then removing it later since all the code
in between is conditionned by a non-null context. Let's simplify this.
In commit 3ef7a190b ("MEDIUM: tasks: apply a fair CPU distribution
between tasklet classes") we compute a total weight to be used to
split the CPU time between queues. There is a mention that the
total cannot be null, wihch is based on the fact that we only get
there if thread_has_task() returns non-zero. But there is a very
small race which can break this assumption: if two threads conflict
on MT_LIST_ADDQ() on an empty shared list and both roll back before
trying again, there is the possibility that a first call to
MT_LIST_ISEMPTY() sees the first thread install itself, then the
second call will see the list empty when both roll back. Thus we
could proceed with the queue while it's temporarily empty and
compute max lengths using a divide by zero. This case is very
hard to trigger, it seldom happens on 16 threads at 400k req/s.
Let's simply test for max_total and leave the loop when we've not
found any work.
No backport is needed, that's 2.2-only.
The parsing of http deny rules with no argument or only the deny_status argument
is buggy if followed by an ACLs expression (starting with "if" or "unless"
keyword). Instead of using the proxy errorfiles, a dummy error is used. To fix
the bug, the parsing function must also check for "if" or "unless" keyword in
such cases.
This patch should fix the issue #720. No backport is needed.
The algorithm improvement in bdb86bd ("MEDIUM: server: improve estimate
of the need for idle connections") is still not enough because there's
a hard limit between below and above the FD count, so it continues to
end up with many killed connections.
Here we're proceeding differently. Given that there are two configured
limits, a low and a high one, what we do is that we drop connections
when the high limit is reached (what's already done by the killing task
anyway), when we're between the low and the high threshold, we only keep
the connection if our idle entries are empty (with a preference for safe
ones), and below the low threshold, we keep any connection so as to give
them a chance of being reused or taken over by another thread.
Proceeding like this results in much less dropped connections, we
typically see a 99.3% reuse rate (76k conns for 10M requests over 200
servers and 4 threads, with 335k takeovers or 3%), and much less CPU
usage variations because there are no more bursts to try to kill extra
connections.
It should be possible to further improve this by counting the number
of threads exploiting a server and trying to optimize the amount of
per-thread idle connections so that it is approximately balanced among
the threads.
The idle server connection estimates brought in commit bdb86bd ("MEDIUM:
server: improve estimate of the need for idle connections") were committed
without the minimum of 1 idle conn needed for the current thread. The net
effect is that there are bursts of dropped connections when the load varies
because there's no provision for the last connection.
No backport needed, this is 2.2-dev.
Commit d645574 ("MINOR: soft-stop: let the first stopper only signal
other threads") introduced a minor mistake which is that when a stopping
thread signals all other threads, it also signals itself. When
single-threaded, the process constantly wakes up while waiting for
last connections to exit. Let's reintroduce the lost mask to avoid
this.
No backport is needed, this is 2.2-dev only.
This reverts previous commit 347bbf79d20e1cff57075a8a378355dfac2475e2i.
The original code was correct. This patch resulted from a mistaken analysis
and breaks the scheduler:
########################## Starting vtest ##########################
Testing with haproxy version: 2.2-dev11-90b7d9-23
# top TEST reg-tests/lua/close_wait_lf.vtc TIMED OUT (kill -9)
# top TEST reg-tests/lua/close_wait_lf.vtc FAILED (10.008) signal=9
1 tests failed, 0 tests skipped, 88 tests passed
Program terminated with signal SIGABRT, Aborted.
[Current thread is 1 (Thread 0x7fb0dac2c700 (LWP 11292))]
(gdb) bt
#0 0x00007fb0e7c143f8 in raise () from /lib64/libc.so.6
#1 0x00007fb0e7c15ffa in abort () from /lib64/libc.so.6
#2 0x000000000053f5d6 in ha_panic () at src/debug.c:269
#3 0x00000000005a6248 in wdt_handler (sig=14, si=<optimized out>, arg=<optimized out>) at src/wdt.c:119
#4 <signal handler called>
#5 0x00000000004fbccd in tasklet_wakeup (tl=0x1b5abc0) at include/haproxy/task.h:351
#6 listener_accept (fd=<optimized out>) at src/listener.c:999
#7 0x00000000004262df in fd_update_events (evts=<optimized out>, fd=6) at include/haproxy/fd.h:418
#8 _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:251
#9 0x0000000000548d0f in run_poll_loop () at src/haproxy.c:2949
#10 0x000000000054908b in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3067
#11 0x00007fb0e902b684 in start_thread () from /lib64/libpthread.so.0
#12 0x00007fb0e7ce5eed in clone () from /lib64/libc.so.6
(gdb) up
#5 0x00000000004fbccd in tasklet_wakeup (tl=0x1b5abc0) at include/haproxy/task.h:351
351 if (MT_LIST_ADDQ(&task_per_thread[tl->tid].shared_tasklet_list, (struct mt_list *)&tl->list) == 1) {
If the commit above is ever backported, this one must be as well!
In MT_LIST_ADDQ() and MT_LIST_ADD() we can't just check if the element is
already in a list, because there's a small race condition, it could be added
between the time we checked, and the time we actually set its next and prev.
So we have to lock it first.
This should be backported to 2.1.
The max_used_conns value is used as an estimate of the needed number of
connections on a server to know how many to keep open. But this one is
not reported, making it hard to troubleshoot reuse issues. Let's export
it in the sessions/current column.
Starting with commit 079cb9a ("MEDIUM: connections: Revamp the way idle
connections are killed") we started to improve the way to compute the
need for idle connections. But the condition to keep a connection idle
or drop it when releasing it was not updated. This often results in
storms of close when certain thresholds are met, and long series of
takeover() when there aren't enough connections left for a thread on
a server.
This patch tries to improve the situation this way:
- it keeps an estimate of the number of connections needed for a server.
This estimate is a copy of the max over previous purge period, or is a
max of what is seen over current period; it differs from max_used_conns
in that this one is a counter that's reset on each purge period ;
- when releasing, if the number of current idle+used connections is
lower than this last estimate, then we'll keep the connection;
- when releasing, if the current thread's idle conns head is empty,
and we don't exceed the estimate by the number of threads, then
we'll keep the connection.
- when cleaning up connections, we consider the max of the last two
periods to avoid killing too many idle conns when facing bursty
traffic.
Thanks to this we can better converge towards a situation where, provided
there are enough FDs, each active server keeps at least one idle connection
per thread all the time, with a total number close to what was needed over
the previous measurement period (as defined by pool-purge-delay).
On tests with large numbers of concurrent connections (30k) and many
servers (200), this has quite smoothed the CPU usage pattern, increased
the reuse rate and roughly halved the takeover rate.
There's a minor glitch with the way idle connections start to be evicted.
The lookup always goes from thread 0 to thread N-1. This causes depletion
of connections on the first threads and abundance on the last ones. This
is visible with the takeover() stats below:
$ socat - /tmp/sock1 <<< "show activity"|grep ^fd ; \
sleep 10 ; \
socat -/tmp/sock1 <<< "show activity"|grep ^fd
fd_takeover: 300144 [ 91887 84029 66254 57974 ]
fd_takeover: 359631 [ 111369 99699 79145 69418 ]
There are respectively 19k, 15k, 13k and 11k takeovers for only 4 threads,
indicating that the first thread needs a foreign FD twice more often than
the 4th one.
This patch changes this si that all threads are scanned in round robin
starting with the current one. The takeovers now happen in a much more
distributed way (about 4 times 9k) :
fd_takeover: 1420081 [ 359562 359453 346586 354480 ]
fd_takeover: 1457044 [ 368779 368429 355990 363846 ]
There is no need to backport this, as this happened along a few patches
that were merged during 2.2 development.
The FD takeover operation might have certain impacts explaining
unexpected activities, so it's important to report such a counter
there. We thus count the number of times a thread has stolen an
FD from another thread.
The servers have internal states describing the status of idle connections,
unfortunately these were not exported in the stats. This patch adds the 3
following gauges:
- idle_conn_cur : Current number of unsafe idle connections
- safe_conn_cur : Current number of safe idle connections
- used_conn_cur : Current number of connections in use
DEBUG_FD was added by commit 38e8a1c in 2.2-dev, and "show fd" was
slightly modified to still allow to print orphaned/closed FDs if their
count is non-null. But bypassing the existing test made it possible
to dereference fdt.owner which can be null. Let's adjust the condition
to avoid this.
No backport is needed.
The LRU cache head was an array of list, which causes false sharing
between 4 to 8 threads in the same cache line. Let's move it to the
thread_info structure instead. There's no need to do the same for the
pool_cache[] array since it's already quite large (32 pointers each).
By doing this the request rate increased by 1% on a 16-thread machine.
pool-t.h was mistakenly including the full-blown includes for threads,
lists and api instead of the types, and as such, CONFIG_HAP_LOCAL_POOLS
and CONFIG_HAP_LOCKLESS_POOLS were not visible everywhere.
The thread_info struct is convenient to store various per-thread info
without having to resort to a painful thread_local storage which is
slow and painful to initialize.
The problem is, by having this one in thread.h it's very difficult to
add more entries there because everyone already includes thread.h so
conversely thread.h cannot reference certain types.
There's no point in having this there, instead let's create a new pair
of files, tinfo{,-t}.h, which declare the structure. This way it will
become possible to extend them with other includes and have certain
files store their own types there.
In tcpcheck_eval_connect(), if we're targetting a server, increase its
curr_used_conns when creating a new connection, as the counter will be
decreased later when the connection is destroyed and conn_free() is called.
In connect_server(), we want to increase curr_used_conns only if the
connection is new, or if it comes from an idle_pool, otherwise it means
the connection is already used by at least one another stream, and it is
already accounted for.
We used to have 3 thread-based arrays for toremove_lock, idle_cleanup,
and toremove_connections. The problem is that these items are small,
and that this creates false sharing between threads since it's possible
to pack up to 8-16 of these values into a single cache line. This can
cause real damage where there is contention on the lock.
This patch creates a new array of struct "idle_conns" that is aligned
on a cache line and which contains all three members above. This way
each thread has access to its variables without hindering the other
ones. Just doing this increased the HTTP/1 request rate by 5% on a
16-thread machine.
The definition was moved to connection.{c,h} since it appeared a more
natural evolution of the ongoing changes given that there was already
one of them declared in connection.h previously.
It looked strange to see pool_evict_from_cache() always very present
on "perf top", but there was actually a reason to this: while b_free()
uses pool_free() which properly disposes the buffer into the local cache
and b_alloc_fast() allocates using pool_get_first() which considers the
local cache, b_alloc_margin() does not consider the local cache as it
only uses __pool_get_first() which only allocates from the shared pools.
The impact is that basically everywhere a buffer is allocated (muxes,
streams, applets), it's always picked from the shared pool (hence
involves locking) and is released to the local one and makes it grow
until it's required to trigger a flush using pool_evict_from_cache().
Buffers usage are thus not thread-local at all, and cause eviction of
a lot of possibly useful objects from the local caches.
Just fixing this results in a 10% request rate increase in an HTTP/1 test
on a 16-thread machine.
This bug was caused by recent commit ed891fd ("MEDIUM: memory: make local
pools independent on lockless pools") merged into 2.2-dev9, so not backport
is needed.
"show sess" and particularly "show sess all" can be very slow when dumping
lots of information, and while dumping, new sessions might appear, making
the output really endless. When threads are used, this causes a double
problem:
- all threads are paused during the dump, so an overly long dump degrades
the quality of service ;
- since all threads are paused, more events get postponed, possibly
resulting in more streams to be dumped on next invocation of the dump
function.
This patch addresses this long-lasting issue by doing something simple:
the CLI's stream is moved at the end of the steams list, serving as an
identifiable marker to end the dump, because all entries past it were
added after the command was entered. As a result, the CLI's stream always
appears as the last one.
It may make sense to backport this to stable branches where dumping live
streams is difficult as well.
Commit cd4159f ("MEDIUM: mux_h2: Implement the takeover() method.")
added a return in the middle of the function, and as usual with such
stray return statements, some unrolling was lost. Here it's only the
TRACE_LEAVE() call, so it's mostly harmless. That's 2.2 only, no
backport is needed.
Released version 2.2-dev11 with the following main changes :
- REGTEST: Add a simple script to tests errorfile directives in proxy sections
- BUG/MEDIUM: fcgi-app: Resolve the sink if a fcgi-app logs in a ring buffer
- BUG/MINOR: spoe: correction of setting bits for analyzer
- BUG/MINOR: cfgparse: Support configurations without newline at EOF
- MINOR: cfgparse: Warn on truncated lines / files
- BUG/MINOR: http_ana: clarify connection pointer check on L7 retry
- MINOR: debug: add a new DEBUG_FD build option
- BUG/MINOR: tasks: make sure never to exceed max_processed
- MINOR: task: add a new pointer to current tasklet queue
- BUG/MEDIUM: task: be careful not to run too many tasks at TL_URGENT
- BUG/MINOR: cfgparse: Fix argument reference in PARSE_ERR_TOOMANY message
- BUG/MINOR: cfgparse: Fix calculation of position for PARSE_ERR_TOOMANY message
- BUG/MEDIUM: ssl: fix ssl_bind_conf double free
- MINOR: ssl: free bind_conf_node in crtlist_free()
- MINOR: ssl: free the crtlist and the ckch during the deinit()
- BUG/MINOR: ssl: fix build with ckch_deinit() and crtlist_deinit()
- BUG/MINOR: ssl/cli: certs added from the CLI can't be deleted
- MINOR: ssl: move the ckch/crtlist deinit to ssl_sock.c
- MEDIUM: tasks: apply a fair CPU distribution between tasklet classes
- MINOR: tasks: make current_queue an index instead of a pointer
- MINOR: tasks: add a mask of the queues with active tasklets
- MINOR: tasks: pass the queue index to run_task_from_list()
- MINOR: tasks: make run_tasks_from_lists() scan the queues itself
- MEDIUM: tasks: add a tune.sched.low-latency option
- BUG/MEDIUM: ssl/cli: 'commit ssl cert' crashes when no private key
- BUG/MINOR: cfgparse: don't increment linenum on incomplete lines
- MINOR: tools: make parse_line() always terminate the args list
- BUG/MINOR: cfgparse: report extraneous args *after* the string is allocated
- MINOR: cfgparse: sanitize the output a little bit
- MINOR: cli/ssl: handle trailing slashes in crt-list commands
- MINOR: ssl: add the ssl_s_* sample fetches for server side certificate
- BUG/MEDIUM: http-ana: Don't loop trying to generate a malformed 500 response
- BUG/MINOR: stream-int: Don't wait to send truncated HTTP messages
- BUG/MINOR: http-ana: Set CF_EOI on response channel for generated responses
- BUG/MINOR: http-ana: Don't wait to send 1xx responses generated by HAProxy
- MINOR: spoe: Don't systematically create new applets if processing rate is low
- DOC: fix some typos in the ssl_s_{s|i}_dn documentation
- BUILD: fix ssl_sample.c when building against BoringSSL
- CI: travis-ci: switch BoringSSL builds to ninja
- CI: extend spellchecker whitelist
- DOC: assorted typo fixes in the documentation
- CLEANUP: assorted typo fixes in the code and comments
- MINOR: http: Add support for http 413 status
- REGTEST: ssl: tests the ssl_f_* sample fetches
- REGTEST: ssl: add some ssl_c_* sample fetches test
- DOC: ssl: update the documentation of "commit ssl cert"
- BUG/MINOR: cfgparse: correctly deal with empty lines
- BUG/MEDIUM: fetch: Fix hdr_ip misparsing IPv4 addresses due to missing NUL
The IPv4 code did not take into account that the header value might not
contain the trailing NUL byte, possibly reading stray data after the header
value, failing the parse and testing the IPv6 branch. That one adds the
missing NUL, but fails to parse IPv4 addresses.
Fix this issue by always adding the trailing NUL.
The bug was reported on GitHub as issue #715.
It's not entirely clear when this bug started appearing, possibly earlier
versions of smp_fetch_hdr guaranteed the NUL termination. However the
addition of the NUL in the IPv6 case was added together with IPv6 support,
hinting that at that point in time the NUL was not guaranteed.
The commit that added IPv6 support was 69fa99292e
which first appeared in HAProxy 1.5. This patch should be backported to
1.5+, taking into account the various buffer / chunk changes and the movement
across different files.
Issue 23653 in oss-fuzz reports a heap overflow bug which is in fact a
bug introduced by commit 9e1758efb ("BUG/MEDIUM: cfgparse: use
parse_line() to expand/unquote/unescape config lines") to address
oss-fuzz issue 22689, which was only partially fixed by commit 70f58997f
("BUG/MINOR: cfgparse: Support configurations without newline at EOF").
Actually on an empty line, end == line so we cannot dereference end-1
to check for a trailing LF without first being sure that end is greater
than line.
No backport is needed, this is 2.2 only.
Test the following ssl sample fetches:
ssl_c_der, ssl_c_sha1,hex, ssl_c_notafter, ssl_c_notbefore,
ssl_c_sig_alg, ssl_c_i_dn, ssl_c_s_dn, ssl_c_serial,hex, ssl_c_key_alg,
ssl_c_version
This reg-test could be used as far as haproxy 1.6.
Test the following ssl sample fetches:
ssl_f_der, ssl_f_sha1,hex, ssl_f_notafter, ssl_f_notbefore,
ssl_f_sig_alg, ssl_f_i_dn, ssl_f_s_dn, ssl_f_serial,hex, ssl_f_key_alg,
ssl_f_version
This reg-test could be used as far as haproxy 1.5.
BoringSSL does not have X509_get_X509_PUBKEY
let our emulation level define that for BoringSSL as well
Build log:
src/ssl_sample.o: In function `smp_fetch_ssl_x_key_alg':
/home/travis/build/haproxy/haproxy/src/ssl_sample.c:592: undefined reference to `X509_get_X509_PUBKEY'
clang-7: error: linker command failed with exit code 1 (use -v to see invocation)
Makefile:860: recipe for target 'haproxy' failed
make: *** [haproxy] Error 1
travis-ci: https://travis-ci.com/github/haproxy/haproxy/jobs/351670996
When an event must be processed, we decide to create a new SPOE applet if there
is no idle applet at all or if the processing rate is lower than the number of
waiting events. But when the processing rate is very low (< 1 event/second), a
new applet is created independently of the number of idle applets.
Now, when there is at least one idle applet when there is only one event to
process, no new applet is created.
This patch is related to the issue #690.
When an informational response (1xx) is returned by HAProxy, we must be sure to
send it ASAP. To do so, CF_SEND_DONTWAIT flag must be set on the response
channel to instruct the stream-interface to not set the CO_SFL_MSG_MORE flag on
the transport layer. Otherwise the response delivery may be delayed, because of
the commit 8945bb6c0 ("BUG/MEDIUM: stream-int: fix loss of CO_SFL_MSG_MORE flag
in forwarding").
This patch may be backported as far as 1.9, for HTX part only. But this part has
changed in the 2.2, so it may be a bit tricky. Note it does not fix any known
bug on previous versions because the CO_SFL_MSG_MORE flag is ignored by the h1
mux.
To be consistent with other processings on the channels, when HAProxy generates
a final response, the CF_EOI flag must be set on the response channel. This flag
is used to know that a full message was pushed into the channel (HTX messages
with an EOM block). It is used in conjunction with other channel's flags in
stream-interface functions. Especially when si_cs_send() is called, to know if
we must set or not the CO_SFL_MSG_MORE flag. Without CF_EOI, the CO_SFL_MSG_MORE
flag is always set and the message forwarding is delayed.
This patch may be backported as far as 1.9, for HTX part only. But this part has
changed in the 2.2, so it may be a bit tricky. Note it does not fix any known
bug on previous versions because the CO_SFL_MSG_MORE flag is ignored by the h1
mux.
In HTX, since the commit 8945bb6c0 ("BUG/MEDIUM: stream-int: fix loss of
CO_SFL_MSG_MORE flag in forwarding"), the CO_SFL_MSG_MORE flag is set on the
transport layer if the end of the HTTP message is not reached, to delay the data
forwarding. To do so, the CF_EOI flag is tested and must not be set on the
output channel.
But the CO_SFL_MSG_MORE flag is also added if the message was truncated. Only
CF_SHUTR is set if this case. So the forwarding may be delayed to wait more data
that will never come. So, in HTX, the CO_SFL_MSG_MORE flag must not be set if
the message is finished (full or truncated).
No backport is needed.
When HAProxy generates a 500 response, if the formatting failed, for instance
because the message is larger than a buffer, it retries to format it in loop. To
fix the bug, we must stop trying to send a response if it is a non-rewritable
response (TX_CONST_REPLY flag is set on the HTTP transaction).
Because this part is not trivial, some comments have been added.
No backport is needed.
This commit adds some sample fetches that were lacking on the server
side:
ssl_s_key_alg, ssl_s_notafter, ssl_s_notbefore, ssl_s_sig_alg,
ssl_s_i_dn, ssl_s_s_dn, ssl_s_serial, ssl_s_sha1, ssl_s_der,
ssl_s_version
Trailing slashes were not handled in crt-list commands on CLI which can
be useful when you use the commands with a directory.
Strip the slashes before looking for the crtlist in the tree.