The streams bookkeeping made in H2 is used for protocol compliance only
but it doesn't consider the number of conn_streams still attached to the
mux. It causes an issue when http-request set-nice rules are applied on
H2 requests processed on a saturated machine. Indeed, in this case, the
requests are accepted and assigned a default nice value of zero. When
they are processed, their nice value changes to a higher one (say 1024).
The response is sent through the H2 mux, which detects the end of stream
and decrements the protocol-level stream count (h2c->nb_streams). The
client may then send a new request. But the conn_stream is still attached
and will require a new call to process_stream() to finish, which is made
through the scheduler. Given that the machine is saturated, it is assumed
that many tasks are present in the scheduler. Thus the closing tasks holding
a higher nice value will pass after the new stream creations. If the client
is fast enough with a low latency link, it may add a lot of new stream
creations before the stream terminations have a chance to disappear due
to their high nice value, resulting in a huge amount of memory being used.
The solution consists in letting a mux always monitor its conn_streams and
refrain from creating new ones when it is full. Here the H2 mux checks the
nb_cs counter and sets a new blocked flag (H2_CF_DEM_TOOMANY) if the limit
was reached, so that the frame parser requests a pause in the new stream
creation, leaving some time for the pending conn_streams to vanish.
Several experiments were made using varying thresholds to see if
overbooking would provide any benefit here but it turned out not to be
the case, so the conn_stream limit remains set to the exact streams
limit. Interestingly various performance measurements showed that the
code tends to be slightly faster now than without the limit, probably
due to the smoother memory usage.
This commit requires previous patch ("MINOR: h2: keep a count of the number
of conn_streams attached to the mux"). It needs to be backported to 1.8.
The h2 mux only knows about the number of H2 streams which are not in a
CLOSED state. This is used for protocol compliance. But it doesn't hold
the number of really attached streams. It is a problem because depending
on scheduling, it is possible that more streams are attached to the mux
than the ones seen at the protocol level, due to some streams taking some
time to be detached. Let's add this count based on the conn_streams.
Note: this patch is part of a series of fixes which will have to be
backported to 1.8.
Commit 200b0fa ("MEDIUM: Add support for updating TLS ticket keys via
socket") introduced support for updating TLS ticket keys from the CLI,
but missed a small corner case : if multiple bind lines reference the
same tls_keys file, the same reference is used (as expected), but during
the clean shutdown, it will lead to a double free when destroying the
bind_conf contexts since none of the lines knows if others still use
it. The impact is very low however, mostly a core and/or a message in
the system's log upon old process termination.
Let's introduce some basic refcounting to prevent this from happening,
so that only the last bind_conf frees it.
Thanks to Janusz Dziemidowicz and Thierry Fournier for both reporting
the same issue with an easy reproducer.
This fix needs to be backported from 1.6 to 1.8.
With certain curl versions URLs which contain brackets may be interpreted
by the "URL globbing parser". This patch ensures that such brackets
are escaped.
Thank you to Ilya Shipitsin for having reported this issue.
By default, HAProxy's DNS resolution at runtime ensure that there is no
IP address duplication in a backend (for servers being resolved by the
same hostname).
There are a few cases where people want, on purpose, to disable this
feature.
This patch introduces a couple of new server side options for this purpose:
"resolve-opts allow-dup-ip" or "resolve-opts prevent-dup-ip".
dns_get_ip_from_response() is used to compare the caller current IP to
the IP available in the records returned by the DNS server.
A scoring system is in place to get the best IP address available.
That said, in the current implementation, there are a couple of issues:
1. a comment does not match what the code does
2. the code does not match what the commet says (score value is not
incremented with '2')
This patch fixes both issues.
Backport status: 1.8
The comment was about "prefered network ip version" while it's actually
"prefered ip version" in the code.
Fixed
Backport status: 1.7 and 1.8
Be careful, this patch may not apply on 1.7, since the score was '4'
for this item at that time.
Ilya Shipitsin reported that with some curl versions this reg test
may fail due to a wrong URI syntax with ::1 ipv6 local address in
this varnishtest script. This patch fixes this syntax issue and
replaces the iteration of "procees" commands by a "shell" command
to start curl processes (must be faster).
Thanks to Ilya Shipitsin for having reported this VTC file bug.
The master process will exit with the status of the last worker. When
the worker is killed with SIGTERM, it is expected to get 143 as an
exit status. Therefore, we consider this exit status as normal from a
systemd point of view. If it happens when not stopping, the systemd
unit is configured to always restart, so it has no adverse effect.
This has mostly a cosmetic effect. Without the patch, stopping HAProxy
leads to the following status:
â— haproxy.service - HAProxy Load Balancer
Loaded: loaded (/lib/systemd/system/haproxy.service; disabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2018-06-22 20:35:42 CEST; 8min ago
Docs: man:haproxy(1)
file:/usr/share/doc/haproxy/configuration.txt.gz
Process: 32715 ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE $EXTRAOPTS (code=exited, status=143)
Process: 32714 ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS (code=exited, status=0/SUCCESS)
Main PID: 32715 (code=exited, status=143)
After the patch:
â— haproxy.service - HAProxy Load Balancer
Loaded: loaded (/lib/systemd/system/haproxy.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Docs: man:haproxy(1)
file:/usr/share/doc/haproxy/configuration.txt.gz
Change the way the process groups are set. Indeed setsid() was called
for every processes which caused the worker to have a different process
group than the master.
This patch behave in a better way:
- In daemon mode only, each child do a setsid()
- In master worker + daemon mode, the setsid() is done in the master before
forking the children
- In any foreground mode, we don't do a setsid()
Could be backported in 1.8 but the master-worker mode is mostly used
with systemd which rely on cgroups so that won't affect much people.
The Lua parser doesn't takes in account end-of-headers containing
only '\n'. It expects always '\r\n'. If a '\n' is processes the Lua
parser considers it miss 1 byte, and wait indefinitely for new data.
When the client reaches their timeout, it closes the connection.
This close is not detected and the connection keep in CLOSE-WAIT
state.
I guess that this patch fix only a visible part of the problem.
If the Lua HTTP parser wait for data, the timeout server or the
connectio closed by the client may stop the applet.
How reproduce the problem:
HAProxy conf:
global
lua-load bug38.lua
frontend frt
timeout client 2s
timeout server 2s
mode http
bind *:8080
http-request use-service lua.donothing
Lua conf
core.register_service("donothing", "http", function(applet) end)
Client request:
echo -ne 'GET / HTTP/1.1\n\n' | nc 127.0.0.1 8080
Look for CLOSE-WAIT in the connection with "netstat" or "ss". I
use this script:
while sleep 1; do ss | grep CLOSE-WAIT; done
This patch must be backported in 1.6, 1.7 and 1.8
Workaround: enable the "hard-stop-after" directive, and perform
periodic reload.
stktable_release() has been involved in two recent crashes by being
used without enough care. Just like any free() function this one is
often called on an exit path with a possibly unsafe argument. Given
that there is another case (smp_fetch_sc_trackers()) which theorically
could call it with an unchecked NULL, though it cannot happen since
the function doesn't support being called with src_* hence cannot make
use of tmpstkctr, let's rather move the check into the function itself
to make it safer for the long term.
This patch could be backported to 1.8 as a strengthening measure.
The build without threads was once again broken.
This issue was introduced in commit ba86c6c ("MINOR: threads: Be sure to
remove threads from all_threads_mask on exit").
This is exactly the same problem as last time it happened, because of
all_threads_mask not being defined with USE_THREAD=
This must be backported in 1.8
When a lookup is done on a key not present in the stick-table the "st"
pointer is NULL and it is used to return the converter result, but it
is used untested with stktable_release().
This regression was introduced in 1.8.10 here:
BUG/MEDIUM: stick-tables: Decrement ref_cnt in table_* converters
commit d7bd88009d88dd413e01bc0baa90d6662a3d7718
Author: Daniel Corbett <dcorbett@haproxy.com>
Date: Sun May 27 09:47:12 2018 -0400
Minimal conf for reproducong the problem:
frontend test
mode http
stick-table type ip size 1m expire 1h store gpc0
bind *:8080
http-request redirect location /a if { src,in_table(test) }
The segfault is triggered using:
curl -i http://127.0.0.1:8080/
This patch must be backported in 1.8
With this patch we can provide LEVEL environment variable when
running reg-tests Makefile targe (reg testing) to set the execution
level of the reg-tests make target to run.
LEVEL default value is 1.
LEVEL=1 is to run all h*.vtc files which are the most important
reg testing files (to test haproxy core, HTTP compliance etc).
LEVEL=2 is to run all s*.vtc files which are a bit slow tests,
for instance tests requiring external programs (curl, socat etc).
LEVEL=3 is to run all l*.vtc files which are test files with again
more slow or with little interest.
With this patch, we set HAPROXY_PROGRAM environment variable
default value to the haproxy executable of the current working directory.
So, if the current directory is the haproxy sources directory,
the reg-tests Makefile target may be run with this shorter command:
$ VARNISTEST_PROGRAM=<...> make reg-tests
in place of
$ VARNISTEST_PROGRAM=<...> HAPROXY_PROGRAM=<...> make reg-tests
When HAProxy is started with several threads, Each running thread holds a bit in
the bitfiled all_threads_mask. This bitfield is used here and there to check
which threads are registered to take part in a specific processing. So when a
thread exits, it seems normal to remove it from all_threads_mask.
No direct impact could be identified with this right now but it would
be better to backport it to 1.8 as a preventive measure to avoid complex
situations like the one in previous bug.
When HAProxy is shutting down, it exits the polling loop when there is no jobs
anymore (jobs == 0). When there is no thread, it works pretty well, but when
HAProxy is started with several threads, a thread can decide to exit because
jobs variable reached 0 while another one is processing a task (e.g. a
health-check). At this stage, the running thread could decide to request a
synchronization. But because at least one of them has already gone, the others
will wait infinitly in the sync point and the process will never die.
To fix the bug, when the first thread (and only this one) detects there is no
active jobs anymore, it requests a synchronization. And in the sync point, all
threads will check if jobs variable reached 0 to exit the polling loop.
This patch must be backported in 1.8.
Only the pollers should remove bits in the update_mask. Removing it will
mean if the fd is currently in the global update list, it will never be
removed, and while it's mostly harmless in 1.9, in 1.8, only update_mask
is checked to know if the fd is already in the list or not, so we can end
up trying to add a fd that is already in the list, and corrupt it, which
means some fd may not be added to the poller.
This should be backported to 1.8.
Add reg-tests/README file about how to compile and use varnishtest, and
how to produce patches to add regression testing files to HAProxy sources.
Also update CONTRIBUTING file to encourage the contributors to write
regression testing files.
Add a makefile target 'reg-tests' to run all regression testing file
found in 'reg-tests' directory.
Add reg-tests/lua/h00000.vtc first regression testing file for a LUA
fixed by f874a83 commit.
Bug from 96b7834e: pkinfo is stored on SSL_CTX ex_data and should
not be also stored on SSL ex_data without reservation.
Simply extract pkinfo from SSL_CTX in ssl_sock_get_pkey_algo.
No backport needed.
We never saw unexplicated crash with SSL, so I suppose that we are
luck, or the slot 0 is always reserved. Anyway the usage of the macro
SSL_get_app_data() and SSL_set_app_data() seem wrong. This patch change
the deprecated functions SSL_get_app_data() and SSL_set_app_data()
by the new functions SSL_get_ex_data() and SSL_set_ex_data(), and
it reserves the slot in the SSL memory space.
For information, this is the two declaration which seems wrong or
incomplete in the OpenSSL ssl.h file. We can see the usage of the
slot 0 whoch is hardcoded, but never reserved.
#define SSL_set_app_data(s,arg) (SSL_set_ex_data(s,0,(char *)arg))
#define SSL_get_app_data(s) (SSL_get_ex_data(s,0))
This patch must be backported at least in 1.8, maybe in other versions.
The cipher list capture struct is stored in the SSL memory space,
but the slot is reserved in the SSL_CTX memory space. This causes
ramdom crashes.
This patch should be backported to 1.8
Patrick reported that this simple configuration made haproxy segfaults:
global
lua-load /tmp/haproxy.lua
frontend f1
mode http
bind :8000
default_backend b1
http-request lua.foo
backend b1
mode http
server s1 127.0.0.1:8080
with this '/tmp/haproxy.lua' script:
core.register_action("foo", { "http-req" }, function(txn)
txn.sc:ipmask(txn.f:src(), 24, 112)
end)
This is due to missing initialization of the array of arguments
passed to hlua_lua2arg_check() which makes it enter code with
corrupted arguments.
Thanks a lot to Patrick Hemmer for having reported this issue.
Must be backported to 1.8, 1.7 and 1.6.
We can't just set t to NULL if it's a tasklet, or we'd have a hard time
accessing to t->process, so just make sure we pass NULL as the first parameter
of t->process if it's a tasklet.
This should be a non-issue at this point, as tasklets aren't used yet.
Up until now, a tasklet couldn't be free'd while it was in the list, it is
no longer the case, so make sure we remove it from the list before freeing it.
To do so, we have to make sure we correctly initialize it, so use LIST_INIT,
instead of setting the pointers to NULL.
The bug happens with an existing entry, when you try to overwrite the
value with wrong data, for example, a string when the type is INT.
The code path was not secure and tried to set *err and *merr while
err = merr = NULL when performing an http action.
Must be backported in 1.6, 1.7, 1.8.
The behavior of sigprocmask in an multithreaded environment is
undefined.
The new macro ha_sigmask() calls either pthreads_sigmask() or
sigprocmask() if haproxy was built with thread support or not.
This should be backported to 1.8.
We don't have any reason of blocking those signals.
If SIGBUS, SIGFPE, SIGILL, or SIGSEGV are generated while they are blocked, the
result is undefined, unless the signal was generated by kill(2), sigqueue(3), or
raise(3).
This should be backported to 1.8.
Signals were handled in all threads which caused some signals to be lost
from time to time. To avoid complicated lock system (threads+signals),
we prefer handling the signals in one thread avoiding concurrent access.
The side effect of this bug was that some process were not leaving from
time to time during a reload.
This patch must be backported in 1.8.
When an unrecoverable error raises, the user receive poor information
for the trouble shooting. For example:
[ALERT] 157/143755 (21212) : Lua function 'hello-world': runtime error: memory allocation error: block too big.
Unfortunately, the memory allocation error can be throwed by many
function, and we have no informatio to reach the original cause.
This patch add the list of function called from the entry point to
the function in error, like this:
[ALERT] 157/143755 (21212) : Lua function 'hello-world': runtime error: memory allocation error: block too big from [C] method 'req_get_headers', bug35.lua:2 global 'ee', bug35.lua:6 global 'ff', bug35.lua:10 C function line 9.
When checking if a socket we got from the parent is suitable for a listener,
we just checked that the path matched sockname.tmp, however this is
unsuitable for abns sockets, where we don't have to create a temporary
file and rename it later.
To detect that, check that the first character of the sun_path is 0 for
both, and if so, that &sun_path[1] is the same too.
This should be backported to 1.8.
To make sure we don't inadvertently insert task in the global runqueue,
while only the local runqueue is used without threads, make its definition
and usage conditional on USE_THREAD.
When building without threads enabled, instead of just using the global
runqueue, just use the local runqueue associated with the only thread, as
that's what is now expected for a single thread in prcoess_runnable_tasks().
This should fix haproxy when built without threads.
When an applet is created, let's assign it the same nice value as the task
of the stream which owns it. It ensures that fairness is properly propagated
to applets, and that the CLI can regain a low latency behaviour again. Huge
differences have been seen under extreme loads, with the CLI being called
every 200 microseconds instead of 11 milliseconds.
Since applets are now part of the main scheduler, it's useful to report
their nice value and the number of calls to the applet handler, to see
where the CPU is spent.
When the connection is closed by HAProxy, the status code provided in the
DISCONNECT frame is lost. By retransmitting it in the agent's reply, we are sure
to have it in the SPOE logs.
This patch may be backported in 1.8.