The function was changed to be static in commit
6c39198b57, but even that commit
no longer uses it. The purpose of the change vs. outright removal
is unclear.
see issue #301
As reported in issue #485 the test for !len at the end of the
loop in get_var_int() is useless since it was already done inside
the loop. Actually the code is more readable if we remove the first
one so let's do this instead. The resulting code is exactly the same
since the compiler already optimized the test away.
In ssl_sock_load_dh_params(), if haproxy failed to apply the dhparam
with SSL_CTX_set_tmp_dh(), it will apply the DH with
SSL_CTX_set_dh_auto().
The problem is that we don't clean the OpenSSL errors when leaving this
function so it could fail to load the certificate, even if it's only a
warning.
Fixes bug #483.
Must be backported in 2.1.
Given that some OSes have bash in /usr/local/bin and in order not to
give too easy an excuse to Olivier for not backporting fixes, let's
make a few scripts rely on /usr/bin/env bash instead of /bin/bash :-)
We have the ability per bind option to ignore certain errors (CA, crt, ...),
and for this we use a 64-bit field. In issue #479 coverity reports a risk of
too large a left shift. For now as of OpenSSL 1.1.1 the highest error value
that may be reported by X509_STORE_CTX_get_error() seems to be around 50 so
there should be no risk yet, but it's enough of a warning to add a check so
that we don't accidently hide random errors in the future.
This may be backported to relevant stable branches.
The script is simply called from the repository holding the patch to
backport, with the last branch number and the commit(s) ID(s) to send
there and it then follows the chain of "down" repos to go down one step
until it meets the indicated last one. It basically automates what we do
by hand. Example:
./scripts/backport 1.9 1c7c0d6b97
Note that it does *not* push, which still has to be done by hand after
building and testing.
This new setting in the global section alters the way HAProxy will look
for unspecified files (.ocsp, .sctl, .issuer, bundles) during the
loading of the SSL certificates.
By default, HAProxy discovers automatically a lot of files not specified
in the configuration, and you may want to disable this behavior if you
want to optimize the startup time.
This patch sets flags in global_ssl.extra_files and then check them
before trying to load an extra file.
In __pool_get_first(), don't forget to unlock the pool lock if the pool is
empty, otherwise no writer will be able to take the lock, and as it is done
when reloading, it leads to an infinite loop on reload.
This should be backported with commit 04f5fe87d3
When using lockless pools, add a new rwlock, flush_pool. read-lock it when
getting memory from the pool, so that concurrenct access are still
authorized, but write-lock it when we're about to free memory, in
pool_flush() and pool_gc().
The problem is, when removing an item from the pool, we unreference it
to get the next one, however, that pointer may have been free'd in the
meanwhile, and that could provoke a crash if the pointer has been unmapped.
It should be OK to use a rwlock, as normal operations will still be able
to access the pool concurrently, and calls to pool_flush() and pool_gc()
should be pretty rare.
This should be backported to 2.1, 2.0 and 1.9.
In pool_create(), only initialize the pool spinlock if we just created the
pool, in the event we're reusing it, there's no need to initialize it again.
In pool_flush(), we can't just set the free_list to NULL, or we may suffer
the ABA problem. Instead, use a double-width CAS and update the sequence
number.
This should be backported to 2.1, 2.0 and 1.9.
This may, or may not, be related to github issue #476.
We can't clear flags on tasklets because we don't know if they're still
present upon return (they all return NULL, maybe that could change in
the future). As a side effect, once TASK_RUNNING is set, it's never
cleared anymore, which is misleading and resulted in some incorrect
flagging of bulk tasks in the recent scheduler changes. And the only
reason for setting TASK_RUNNING on tasklets was to detect self-wakers,
which is not done using a dedicated flag. So instead of setting this
flags for no opportunity to clear it, let's simply not set it.
Now that we can more accurately watch which connection is really
being woken up from itself, it was desirable to re-adjust the CPU BW
thresholds based on measurements. New tests with 60000 concurrent
connections were run at 100 Gbps with unbounded queues and showed
the following distribution:
scenario TC0 TC1 TC2 observation
-------------------+---+---+----+---------------------------
TCP conn rate : 32, 51, 17
HTTP conn rate : 34, 41, 25
TCP byte rate : 2, 3, 95 (2 MB objets)
splicing byte rate: 11, 6, 83 (2 MB objets)
H2 10k object : 44, 23, 33 client-limited
mixed traffic : 18, 10, 72 2*1m+1*0: 11kcps, 36 Gbps
The H2 experienced a huge change since it uses a persistent connection
that was accidently flagged in the previous test. The splicing test
exhibits a higher need for short tasklets, so does the mixed traffic
test. Given that latency mainly matters for conn rate and H2 here,
the ratios were readjusted as 33% for TC0, 50% for TC1 and 17% for
TC2, keeping in mind that whatever is not consumed by one class is
automatically shared in equal propertions by the next one(s). This
setting immediately provided a nice improvement as with the default
settings (maxpollevents=200, runqueue-depth=200), the same ratios as
above are still reported, while the time to request "show activity"
on the CLI dropped to 30-50ms. The average loop time is around 5.7ms
on the mixed traffic.
In addition, one extra stress test at 90.5 Gbps with 5100 conn/s shows
70-100ms CLI request time, with an average loop time of 17 ms.
sched->current is used to know the current task/tasklet, and is currently
only used by the panic dump code. However it turns out it was not set for
tasklets, which prevents us from using it for more usages, despite the
panic handling code already handling this case very well. Let's make sure
it's now set.
Commit a17664d829 ("MEDIUM: tasks: automatically requeue into the bulk
queue an already running tasklet") tried to inflict a penalty to
self-requeuing tasks/tasklets which correspond to those involved in
large, high-latency data transfers, for the benefit of all other
processing which requires a low latency. However, it turns out that
while it ought to do this on a case-by-case basis, basing itself on
the RUNNING flag isn't accurate because this flag doesn't leave for
tasklets, so we'd rather need a distinct flag to tag such tasklets.
This commit introduces TASK_SELF_WAKING to mark tasklets acting like
this. For now it's still set when TASK_RUNNING is present but this
will have to change. The flag is kept across wakeups.
Commit 140237471e made sure we hold the
toremove_lock for the corresponding thread before removing a connection
from its idle_orphan_conns list, however it failed to unlock it if we
found a connection, leading to a deadlock, so add the missing deadlock.
This should be backported to 2.1 and 2.0.
Measures with unbounded execution ratios under 40000 concurrent
connections at 100 Gbps showed the following CPU bandwidth
distribution between task classes depending on traffic scenarios:
scenario TC0 TC1 TC2 observation
-------------------+---+---+----+---------------------------
TCP conn rate : 29, 48, 23 221 kcps
HTTP conn rate : 29, 47, 24 200 kcps
TCP byte rate : 3, 5, 92 53 Gbps
splicing byte rate: 5, 10, 85 70 Gbps
H2 10k object : 10, 21, 74 client-limited
mixed traffic : 4, 7, 89 2*1m+1*0: 11kcps, 36 Gbps
Thus it seems that we always need a bit of bulk tasks even for short
connections, which seems to imply a suboptimal processing somewhere,
and that there are roughly twice as many tasks (TC1=normal) as regular
tasklets (TC0=urgent). This ratio stands even when data forwarding
increases. So at first glance it looks reasonable to enforce the
following ratio by default:
- 16% for TL_URGENT
- 33% for TL_NORMAL
- 50% for TL_BULK
With this, the TCP conn rate climbs to ~225 kcps, and the mixed traffic
pattern shows a more balanced 17kcps + 35 Gbps with 35ms CLI request
time time instead of 11kcps + 36 Gbps and 400 ms response time. The
byte rate tests (1M objects) are not affected at all. This setting
looks "good enough" to allow immediate merging, and could be refined
later.
It's worth noting that it resists very well to massive increase of
run queue depth and maxpollevents: with the run queue depth changed
from 200 to 10000 and maxpollevents to 10000 as well, the CLI's
request time is back to the previous ~400ms, but the mixed traffic
test reaches 52 Gbps + 7500 CPS, which was never met with the previous
scheduling model, while the CLI used to show ~1 minute response time.
The reason is that in the bulk class it becomes possible to perform
multiple rounds of recv+send and eliminate objects at once, increasing
the L3 cache hit ratio, and keeping the connection count low, without
degrading too much the latency.
Another test with mixed traffic involving 2/3 splicing on huge objects
and 1/3 on empty objects without touching any setting reports 51 Gbps +
5300 cps and 35ms CLI request time.
When a tasklet re-runs itself such as in this chain:
si_cs_io_cb -> si_cs_process -> si_notify -> si_chk_rcv
then we know it can easily clobber the run queue and harm latency. Now
what the scheduler does when it detects this is that such a tasklet is
automatically placed into the bulk list so that it's processed with the
remaining CPU bandwidth only. Thanks to this the CLI becomes instantly
responsive again even under heavy stress at 50 Gbps over 40kcon and
100% CPU on 16 threads.
We used to mix high latency tasks and low latency tasklets in the same
list, and to even refill bulk tasklets there, causing some unfairness
in certain situations (e.g. poll-less transfers between many connections
saturating the machine with similarly-sized in and out network interfaces).
This patch changes the mechanism to split the load into 3 lists depending
on the task/tasklet's desired classes :
- URGENT: this is mainly for tasklets used as deferred callbacks
- NORMAL: this is for regular tasks
- BULK: this is for bulk tasks/tasklets
Arbitrary ratios of max_processed are picked from each of these lists in
turn, with the ability to complete in one list from what was not picked
in the previous one. After some quick tests, the following setup gave
apparently good results both for raw TCP with splicing and for H2-to-H1
request rate:
- 0 to 75% for urgent
- 12 to 50% for normal
- 12 to what remains for bulk
Bulk is not used yet.
New function run_tasks_from_list() will run over a tasklet list and will
run all the tasks and tasklets it finds there within a limit of <max>
that is passed in arggument. This is a preliminary work for scheduler QoS
improvements.
Previous patch 160287b676 ("MEDIUM: pipe/thread: maintain a per-thread
local cache of recently used pipes") didn't replace all pipe counter
updates with atomic ops since some were already under a lock, which is
obviously not a valid reason since these ones can be updated in parallel
to other atomic ops. The result was that the pipes_used could seldom be
seen as negative in the stats (harmless) but also this could result in
slightly more pipes being allocated than permitted, thus stealing a few
file descriptors that were not usable for connections anymore. Let's use
pure atomic ops everywhere these counters are updated.
No backport is needed.
In order to completely remove the pipe locking cost and try to reuse
hot pipes, each thread now maintains a local cache of recently used pipes
that is no larger than its share (maxpipes/nbthreads). All extra pipes
are instead refilled into the global pool. Allocations are made from the
local pool first, and fall back to the global one before allocating one.
This completely removes the observed pipe locking cost at high bit rates,
which was still around 5-6%.
In a quick test involving splicing, we can see that get_pipe() and
put_pipe() together consume up to 12% of the CPU. That's not surprizing
considering how much work is performed under the lock, including the
pipe struct allocation, the pipe creation and its initialization. Same
for releasing, we don't need a lock there to call close() nor to free
to the pool.
Changing this alone was enough to cut the overhead in half. A better
approach should consist in having a per-thread pipe cache, which will
also help keep pages hot in the CPU caches.
src/ssl_sock.c: In function ‘cli_io_handler_show_cert’:
src/ssl_sock.c:10214:6: warning: unused variable ‘n’ [-Wunused-variable]
int n;
^
Fix this problem in the io handler of the "show ssl cert" function.
Given that raw_sock's functions solely act on connections and that all its
callers properly use subscribe() when they want to receive/send more, there
is no more reason for calling fd_{cant,cond,done}_{send,recv} anymore as
this call is immediately overridden by the subscribe call. It's also worth
noting that the purpose of fd_cond_recv() whose purpose was to speculatively
enable reading in the FD cache if the FD was active but not yet polled was
made to save on expensive epoll_ctl() calls and was implicitly covered more
cleanly by recent commit 5d7dcc2a8e ("OPTIM: epoll: always poll for recv if
neither active nor ready").
No change on the number of calls to epoll_ctl() was noticed consecutive to
this change.
The comments for match_word() in pattern.c mention that delimiters
at the start or end of the input string will be ignored, but this
is not mentionned in the documentation.
Backport to all supported versions.
this log could be sometimes a bit confusing (depending on the number in
fact) when you read it (e.g is it the number of active connection?) -
only trained eyes knows haproxy output a different log when closing
active connections while stopping.
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
hostname were limited to 62 char, which is not RFC1035 compliant;
- the parsing loop should stop when above max label char
- fix len label test where d[i] was wrongly used
- simplify the whole function to avoid using two extra char* variable
this should fix github issue #387
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
Reviewed-by: Tim Duesterhus <tim@bastelstu.be>
Acked-by: Baptiste <bedis9@gmail.com>
triggered by coverity; src_port is set earlier.
this should fix github issue #467
Fixes: 7fec021537 ("MEDIUM: proxy_protocol: Convert IPs to v6 when
protocols are mixed")
This should be backported to 1.8.
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
Reviewed-by: Tim Duesterhus <tim@bastelstu.be>
Many HTTP actions rely on <.arg.http> in the act_rule structure. Not all actions
use the log-format expression, but it must be initialized anyway. Otherwise,
HAProxy may crash during the deinit when the release function is called.
No backport needed. This patch should fix issue #468.
In issue #465, we see that Coverity detected dead code in checks.c
which is in fact a missing parenthesis to build the connect() flags
consecutive to the API change in commit fdcb007ad8 ("MEDIUM: proto:
Change the prototype of the connect() method.").
The impact should be imperceptible as in the best case it may have
resulted in a missed optimization trying to save a syscall or to merge
outgoing packets.
It may be backported as far as 2.0 though it's not critical.
We're getting almost 100% failure rate recently due to the "slow" tests
never completing in time on Travis. Peers do not synchronize their data
within the expected delay, health checks time out due to the tested
agent not responding, etc. This adds a lot of noise and completely voids
the value of the build test.
Let's disable the slow tests to try to get back to a fully working state.
In ssl_sock_init(), if we fail to allocate the BIO, don't forget to free
the SSL *, or we'd end up with a memory leak.
This should be backported to 2.1 and 2.0.
As the server early data buffer is allocated in the middle of the loop
used to allocate the SSL session without being freed before retrying,
this leads to a memory leak.
To fix this we move the section of code responsible of this early data buffer
alloction after the one reponsible of allocating the SSL session.
Must be backported to 2.1 and 2.0.
When commit 477902bd2e made the conn_stream
allocation unconditional, it unfortunately moved the code doing the allocation
inside #if USE_OPENSSL, which means anybody compiling haproxy without
openssl wouldn't allocate any conn_stream, and would get a segfault later.
Fix that by moving the code that does the allocation outside #if USE_OPENSSL.
In process_stream(), when a client or a server abort is handled, the
corresponding listener's counter is incremented. But, we must be sure to have a
listener attached to the session. This bug was introduced by the commit
cff0f739e5.
Thanks to Fred to reporting me the bug.
No need to backport this patch, except if commit cff0f739e5 is backported.
A stupid cut-paste bug was introduced in the commit cff0f739e5. Backend
counters must of course be incremented on the stream's backend. Not the
frontend.
No need to backport this patch, except if commit cff0f739e5 is backported.
A first patch was made during 2.0-dev to silence a bogus warning emitted
by gcc : dd1c8f1f72 ("MINOR: cfgparse: Add a cast to make gcc happier."),
but it happens it was not sufficient as the warning re-appeared on 32-bit
machines under gcc-8 and gcc-9 :
src/cfgparse.c: In function 'check_config_validity':
src/cfgparse.c:3642:33: warning: argument 1 range [2147483648, 4294967295] exceeds maximum object size 2147483647 [-Walloc-size-larger-than=]
newsrv->idle_orphan_conns = calloc((unsigned int)global.nbthread, sizeof(*newsrv->idle_orphan_conns));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This warning doesn't trigger in other locations, and it immediately
vanishes if the previous or subsequent loops do not depend on
global.nbthread anymore, or if the field ordering of the struct server
changes! As discussed in the thread at:
https://www.mail-archive.com/haproxy@formilux.org/msg36107.html
playing with -Walloc-size-larger-than has no effect. And a minimal
reproducer could be isolated, indicating it's pointless to circle around
this one. Let's just cast nbthread to ushort so that gcc cannot make
this wrong detection. It's unlikely we'll use more than 65535 threads in
the near future anyway.
This may be backported to older releases if they are also affected, at
least to ease the job of distro maintainers.
Thanks to Ilya for testing.
lua-prepend-path allows the administrator to specify a custom Lua library
path to load custom Lua modules that are useful within the context of HAProxy
without polluting the global Lua library folder.
While the H2 parser properly checks for the absence of anything but
"trailers" in the TE header field, we forget to check this when sending
the request to an H2 server. The problem is that an H2->H2 conversion
may keep "gzip" and fail on the next stage.
This patch makes sure that we only send "TE: trailers" if the TE header
contains the "trailers" token, otherwise it's dropped.
This fixes issue #464 and should be backported till 1.9.
Since commit 1b8e68e89a ("MEDIUM: stick-table: Stop handling stick-tables
as proxies."), a rule referencing the current proxy with no table leads
to the following error :
[ALERT] 023/071924 (16479) : Proxy 'px': unable to find stick-table '(null)'.
[ALERT] 023/071914 (16479) : Fatal errors found in configuration.
for a config like this one:
backend px
stick on src
This patch fixes it and should be backported as far as 2.0.
A few places in health checks and stream-int on the send path were still
checking for this flag. Now we do not and instead we rely on snd_buf()
to report the error if any.
It's worth noting that all 3 real muxes still use CO_FL_SOCK_WR_SH and
CO_FL_ERROR interchangeably at various places to decide to abort and/or
free their data. This should be clarified and fixed so that only
CO_FL_ERROR is used, and this will render the error paths simpler and
more accurate.