Since there was a risk of leaving fd_takeover() without properly
stopping the fd, let's take this opportunity for factoring the code
around a commont exit point that's common to both double-cas and locked
modes. This means using the "ret" variable inside the double-CAS code,
and inverting the loop to first test the old values. Doing do also
produces cleaner code because the compiler cannot factorize common
exit paths using asm statements that are present in some atomic ops.
The loop in fd_takeover() around the double-CAS is conditionned on
a previous value of old_masks[0] that always matches tid_bit on the
first iteration because it does not result from the atomic op but
from a pre-loaded value. Let's set the result of the atomic op there
instead so that the conflict between threads can be detected earlier
and before performing the double-word CAS.
When haproxy is compiled without double-word CAS, we use a migration lock
in fd_takeover(). This lock was covering the atomic OR on the running_mask
before checking its value, while it is not needed since this atomic op
already returns the result. Let's just refine the code to avoid grabbing
the lock in the event another thread has already stolen the FD, this may
reduce contention in high reuse rate scenarios.
This one was confusingly called, I thought it was the cumulated number
of streams but it's the number of calls to process_stream(). Let's make
this clearer.
empty_rq and long_rq are per-loop so it makes sense to group them
together with the loop count. In addition since ctxsw and tasksw
apply in the context of these counters, let's move them as well.
More precisely the difference between wake_tasks and long_rq should
roughly correspond to the number of inter-task messages. Visually
it's much easier to spot ratios of wakeup causes now.
In fd_takeover(), when a double-width compare-and-swap is implemented,
make sure, if we managed to get the fd, to call fd_stop_recv() on it, so
that the thread that used to own it will know it has to stop polling it.
In fd_takeover(), if we failed to grab the fd, when a double-width
compare-and-swap is not implemented, do not call fd_stop_recv() on the
fd, it is not ours and may be used by another thread.
We have poll_drop, poll_dead and poll_skip which are confusingly named
like their poll_io and poll_exp counterparts except that they are not
per poll() call but per-fd. This patch renames them to poll_drop_fd(),
poll_dead_fd() and poll_skip_fd() for this reason.
The "show activity" output mentions a number of indicators to explain
wake up reasons but doesn't have the number of times poll() sees some
I/O. And given that multiple events can happen simultaneously, it's
not always possible to deduce this metric by subtracting.
This patch adds a new "poll_io" counter that allows one to see how
often poll() returns with at least one active FD. This should help
detect stuck events and measure various ratios of poll sub-metrics.
fd_set_running() and fd_takeover() may both use a double-word CAS on the
(running_mask, thread_mask) couple and as such they expect the fields to
be exactly arranged like this. It's critical not to reorder them, so add
a comment to avoid such a potential mistake later.
Since 2.1-dev2, with commit 305d5ab46 ("MAJOR: fd: Get rid of the fd cache.")
we don't have the fd_lock anymore and as such its acitvity counter is always
zero. Let's remove it from the struct and from "show activity" output, as
there are already plenty of indicators to look at.
The cache line comment in the struct activity was updated to reflect
reality as it looks like another one already got removed in the past.
This effectively reverts the two following commits:
6f95f6e11 ("OPTIM: connection: disable receiving on disabled events when the run queue is too high")
065a02561 ("MEDIUM: connection: don't stop receiving events in the FD handler")
The problem as reported in issue #662 is that when the events signals
the readiness of input data that has to be forwarded over a congested
stream, the mux will read data and wake the stream up to forward them,
but the buffer full condition makes this impossible immediately, then
nobody in the chain will be able to disable the event after it was
first reported. And given we don't know at the connection level whether
an event was already reported or not, we can't decide anymore to
forcefully stop it if for any reason its processing gets delayed.
The problem is magnified in issue #662 by the fact that a shutdown is
reported with pending data occupying the buffer. The shutdown will
strike in loops and cause the upper layer stream to be notified until
it's handled, but with a buffer full it's not possible to call cs_recv()
hence to purge the event.
All this can only be handled optimally by implementing a lower layer,
direct mux-to-mux forwarding that will not require any scheduling. This
was no wake up will be needed and the event will be instantly handled
or paused for a long time.
For now let's simply revert these optimizations. Running a 1 MB transfer
test over H2 using 8 connections having each 32 streams with a limited
link of 320 Mbps shows the following profile before this fix:
calls syscall (100% CPU)
------ -------
259878 epoll_wait
519759 clock_gettime
17277 sendto
17129 recvfrom
672 epoll_ctl
And the following one after the fix:
calls syscall (2-3% CPU)
------ -------
17201 sendto
17357 recvfrom
2304 epoll_wait
4609 clock_gettime
1200 epoll_ctl
Thus the behavior is much better.
No backport is needed as these patches were only in 2.2-dev.
Many thanks to William Dauchy for reporting a lot of details around this
difficult issue.
Since we've seen clang emit bad code when the address sanitizer is enabled
at -O2, better clearly report it in the version output. It is detected both
for clang and gcc (both tested with and without).
This macro is provided by clang but gcc lacks it. Not having it makes
it painful to test features on both compilers. Better define it to zero
when not available so that __has_feature(foo) never errors.
For an unknown reason in commit bb1b63c079 I placed the compiler version
output in haproxy.c instead of version.c. Better have it in version.c which
is more suitable to this sort of things.
The spoe parser fails to check that the decoded key length is large
enough to match a given key but it uses the returned length in memcmp().
So returning "ver" could match "version" for example. In addition this
makes clang 10's ASAN complain because the second argument to memcmp()
is the static key which is shorter than the decoded buffer size, which
in practice has no impact.
I'm still not 100% sure the parser is entirely correct because even
with this fix it cannot parse a key whose name matches the beginning
of another one, but in practice this does not happen. Ideally a
preliminary length check before the comparison would be safer.
This needs to be backported as far as 1.7.
Since we dropped support for legacy mode, it's not the stream which
deals with the connection but the mux, and there's no point in closing
the client connection after most internal status codes. For example if
the client gets a 401 or a 503 because a server doesn't respond, it
makes no sense forcing the connection to close after reporting this
status, because it's already done by the mux if the client asks for it
or is not compatible with keep-alive. This current state was inherited
from the early days but is still limiting the amount of client-side
connection reuse in a number of circumstances (typically server-side
errors). This change was planned for 2.1 but forgotten.
The status codes for which the connection is not closed anymore are those
that do not depend on the client side connection itself, which are all
except 400 and 408. This could be backported to 2.1 but not further, in
order to make sure legacy and HTX behave strictly similarly.
One issue with the config parser is that while it tries to report as many
errors as possible at once, it's actually unbounded. Thus, when calling
haproxy on a wrong file, it can take ages to process, such as here on
half a gigabyte of map file instead of config file:
$ time ./haproxy -c -f large.map 2>&1 |wc -l
16777220
real 0m31.324s
user 0m22.595s
sys 0m28.909s
This patch modifies readcfgfile() to stop reading the config file after a
reasonable amount of fatal errors. This threshold is set to 50, which seems
more than enough to spot a recurrent issue with a bit of context in a terminal
to address several issues at once, without filling logs nor taking time to
parse the file. The difference is clear now:
$ time ./haproxy -c -f large.map 2>&1 |wc -l
55
real 0m0.005s
user 0m0.004s
sys 0m0.003s
This may be backported to older versions without causing too many
difficulties. However the patch will not apply as-is, it will require
to increment the "fatal" count for each place where ERR_FATAL is set
in the parsing loop.
Issue 22689 in oss-fuzz shows that specially crafted config files can take
a long time to process. This happens when variable expansion, backslash
escaping or unquoting causes calls to memmove() and possibly to realloc()
resulting in O(N^2) complexity with N following the line size.
By using parse_line() we now have a safe parser that remains in O(N)
regardless of the type of operation. Error reporting changed a little bit
since the errors are not reported anymore from the deepest parsing level.
As such we now report the beginning of the error. One benefit is that for
many invalid character sequences, the original line is shown and the first
bad char or sequence is designated with a caret ('^'), which tends to be
visually easier to spot, for example:
[ALERT] 167/170507 (14633) : parsing [mini5.cfg:19]: unmatched brace in environment variable name below:
"${VAR"}
^
or:
[ALERT] 167/170645 (14640) : parsing [mini5.cfg:18]: unmatched quote below:
timeout client 10s'
^
In case the target buffer is too short for the new line, the output buffer
is grown in 1kB chunks and kept till the end, so that it should not happen
too often.
Before this patch a test like below involving a 4 MB long line would take
138s to process, 98% of which were spent in __memmove_avx_unaligned_erms(),
and now it takes only 65 milliseconds:
$ perl -e 'print "\"\$A\""x1000000,"\n"' | ./haproxy -c -f /dev/stdin 2>/dev/null
This may be backported to stable versions after a long period of
observation to be sure nothing broke. It relies on patch "MINOR: tools:
add a new configurable line parse, parse_line()".
This function takes on input a string to tokenize, an output storage
(which may be the same) and a number of options indicating how to handle
certain characters (single & double quote support, backslash support,
end of line on '#', environment variables etc). On output it will provide
a list of pointers to individual words after having possibly unescaped
some character sequences, handled quotes and resolved environment
variables, and it will also indicate a status made of:
- a list of failures (overlap between src/dst, wrong quote etc)
- the pointer to the first sequence in error
- the required output length (a-la snprintf()).
This allows a caller to freely unescape/unquote a string by using a
pre-allocated temporary buffer and expand it as necessary. It takes
extreme care at avoiding expensive operations and intentionally does
not use memmove() when removing escapes, hence the reason for the
different input and output buffers. The goal is to use it as the basis
for the config parser.
As reported in issue #689, there is a subtle bug in the ebtree code used
to compared memory blocks. It stems from the platform-dependent memcmp()
implementation. Original implementations used to perform a byte-per-byte
comparison and to stop at the first non-matching byte, as in this old
example:
https://www.retro11.de/ouxr/211bsd/usr/src/lib/libc/compat-sys5/memcmp.c.html
The ebtree code has been relying on this to detect the first non-matching
byte when comparing keys. This is made so that a zero-terminated string
can fail to match against a longer string.
Over time, especially with large busses and SIMD instruction sets,
multi-byte comparisons have appeared, making the processor fetch bytes
past the first different byte, which could possibly be a trailing zero.
This means that it's possible to read past the allocated area for a
string if it was allocated by strdup().
This is not correct and definitely confuses address sanitizers. In real
life the problem doesn't have visible consequences. Indeed, multi-byte
comparisons are implemented so that aligned words are loaded (e.g. 512
bits at once to process a cache line at a time). So there is no way such
a multi-byte access will cross a page boundary and end up reading from
an unallocated zone. This is why it was never noticed before.
This patch addresses this by implementing a one-byte-at-a-time memcmp()
variant for ebtree, called eb_memcmp(). It's optimized for both small and
long strings and guarantees to stop after the first non-matching byte. It
only needs 5 instructions in the loop and was measured to be 3.2 times
faster than the glibc's AVX2-optimized memcmp() on short strings (1 to
257 bytes), since that latter one comes with a significant setup cost.
The break-even seems to be at 512 bytes where both version perform
equally, which is way longer than what's used in general here.
This fix should be backported to stable versions and reintegrated into
the ebtree code.
We cannot simply `release_sample_expr(rule->arg.vars.expr)` for a
`struct act_rule`, because `rule->arg` is a union that might not
contain valid `vars`. This leads to a crash on a configuration using
`http-request redirect` and possibly others:
frontend http
mode http
bind 127.0.0.1:80
http-request redirect scheme https
Instead a `struct act_rule` has a `release_ptr` that must be used
to properly free any additional storage allocated.
This patch fixes a regression in commit ff78fcdd7f.
It must be backported to whereever that patch is backported.
It has be verified that the configuration above no longer crashes.
It has also been verified that the configuration in ff78fcdd7f
does not leak.
Commit 0a3b43d9c ("MINOR: haproxy: Make use of deinit_and_exit() for
clean exits") introduced this build warning:
src/haproxy.c: In function 'main':
src/haproxy.c:3775:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
This is because the new deinit_and_exit() is not marked as "noreturn"
so depending on the optimizations, the noreturn attribute of exit() will
either leak through it and silence the warning or not and confuse the
compiler. Let's just add the attribute to fix this.
No backport is needed, this is purely 2.2.
It's unclear why the buffer length wasn't considered when tcp-response
rules were added in 1.5-dev3 with commit 97679e790 ("[MEDIUM] Implement
tcp inspect response rules"). But it's impossible to write working
tcp-response content rules as they're always waiting for the expiration
and do not consider the fact that the buffer is full. It's likely that
tcp-response content rules were only used with HTTP traffic.
This may be backported to stable versions, though it's not very
important considering that that nobody reported this in 10 years.
The req_body and res_body sample fetch functions forgot to set the
SMP_F_MAY_CHANGE flag, making them unusable in tcp content rules. Now we
set the flag as long as the channel is not full and nothing indicates
the end was reached.
This is marked as a bug because it's unusual for a sample fetch function
to return a final verdict while data my change, but this results from a
limitation that was affecting the legacy mode where it was not possible
to know whether the end was reached without de-chunking the message. In
HTX there is no more reason to limit this. This fix could be backported
to 2.1, and to 2.0 if really needed, though it will only be doable for
HTX, and legacy cannot be fixed.
As reported in issue #686, ARM64 build fails since the include files
reorganization. This is caused by the lack of string.h while a memcpy()
is present in __ha_cas_dw().
In issue #685 gcc 10 seems to find a null pointer deref in free_zlib().
There is no such case because all possible pools are tested and there's
no other one in zlib, except if there's a bug or memory corruption
somewhere else. The code used to be written like this to make sure that
any such bug couldn't remain unnoticed.
Now gcc 10 sees this theorical code path and complains, so let's just
change the code to place an explicit crash in case of no match (which
must never happen).
This might be backported if other versions are affected.
Given the following example configuration:
frontend foo
bind *:8080
mode http
http-request set-var(txn.foo) str(bar)
Running a configuration check within valgrind reports:
==23665== Memcheck, a memory error detector
==23665== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==23665== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==23665== Command: ./haproxy -c -f ./crasher.cfg
==23665==
[WARNING] 165/002941 (23665) : config : missing timeouts for frontend 'foo'.
| While not properly invalid, you will certainly encounter various problems
| with such a configuration. To fix this, please ensure that all following
| timeouts are set to a non-zero value: 'client', 'connect', 'server'.
Warnings were found.
Configuration file is valid
==23665==
==23665== HEAP SUMMARY:
==23665== in use at exit: 314,008 bytes in 87 blocks
==23665== total heap usage: 160 allocs, 73 frees, 1,448,074 bytes allocated
==23665==
==23665== 132 (48 direct, 84 indirect) bytes in 1 blocks are definitely lost in loss record 15 of 28
==23665== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==23665== by 0x4A2612: sample_parse_expr (sample.c:876)
==23665== by 0x54DF84: parse_store (vars.c:766)
==23665== by 0x528BDF: parse_http_req_cond (http_rules.c:95)
==23665== by 0x469F36: cfg_parse_listen (cfgparse-listen.c:1339)
==23665== by 0x459E33: readcfgfile (cfgparse.c:2167)
==23665== by 0x5074FD: init (haproxy.c:2021)
==23665== by 0x418262: main (haproxy.c:3126)
==23665==
==23665== LEAK SUMMARY:
==23665== definitely lost: 48 bytes in 1 blocks
==23665== indirectly lost: 84 bytes in 2 blocks
==23665== possibly lost: 0 bytes in 0 blocks
==23665== still reachable: 313,876 bytes in 84 blocks
==23665== suppressed: 0 bytes in 0 blocks
==23665== Reachable blocks (those to which a pointer was found) are not shown.
==23665== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==23665==
==23665== For counts of detected and suppressed errors, rerun with: -v
==23665== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
After this patch is applied the leak is gone as expected.
This is a very minor leak that can only be observed if deinit() is called,
shortly before the OS will free all memory of the process anyway. No
backport needed.
Particularly cleanly deinit() after a configuration check to clean up the
output of valgrind which reports "possible losses" without a deinit() and
does not with a deinit(), converting actual losses into proper hard losses
which makes the whole stuff easier to analyze.
As an example, given an example configuration of the following:
frontend foo
bind *:8080
mode http
Running `haproxy -c -f cfg` within valgrind will report 4 possible losses:
$ valgrind --leak-check=full ./haproxy -c -f ./example.cfg
==21219== Memcheck, a memory error detector
==21219== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==21219== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==21219== Command: ./haproxy -c -f ./example.cfg
==21219==
[WARNING] 165/001100 (21219) : config : missing timeouts for frontend 'foo'.
| While not properly invalid, you will certainly encounter various problems
| with such a configuration. To fix this, please ensure that all following
| timeouts are set to a non-zero value: 'client', 'connect', 'server'.
Warnings were found.
Configuration file is valid
==21219==
==21219== HEAP SUMMARY:
==21219== in use at exit: 1,436,631 bytes in 130 blocks
==21219== total heap usage: 153 allocs, 23 frees, 1,447,758 bytes allocated
==21219==
==21219== 7 bytes in 1 blocks are possibly lost in loss record 5 of 54
==21219== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21219== by 0x5726489: strdup (strdup.c:42)
==21219== by 0x468FD9: bind_conf_alloc (listener.h:158)
==21219== by 0x468FD9: cfg_parse_listen (cfgparse-listen.c:557)
==21219== by 0x459DF3: readcfgfile (cfgparse.c:2167)
==21219== by 0x5056CD: init (haproxy.c:2021)
==21219== by 0x418232: main (haproxy.c:3121)
==21219==
==21219== 14 bytes in 1 blocks are possibly lost in loss record 9 of 54
==21219== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21219== by 0x5726489: strdup (strdup.c:42)
==21219== by 0x468F9B: bind_conf_alloc (listener.h:154)
==21219== by 0x468F9B: cfg_parse_listen (cfgparse-listen.c:557)
==21219== by 0x459DF3: readcfgfile (cfgparse.c:2167)
==21219== by 0x5056CD: init (haproxy.c:2021)
==21219== by 0x418232: main (haproxy.c:3121)
==21219==
==21219== 128 bytes in 1 blocks are possibly lost in loss record 35 of 54
==21219== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21219== by 0x468F90: bind_conf_alloc (listener.h:152)
==21219== by 0x468F90: cfg_parse_listen (cfgparse-listen.c:557)
==21219== by 0x459DF3: readcfgfile (cfgparse.c:2167)
==21219== by 0x5056CD: init (haproxy.c:2021)
==21219== by 0x418232: main (haproxy.c:3121)
==21219==
==21219== 608 bytes in 1 blocks are possibly lost in loss record 46 of 54
==21219== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21219== by 0x4B953A: create_listeners (listener.c:576)
==21219== by 0x4578F6: str2listener (cfgparse.c:192)
==21219== by 0x469039: cfg_parse_listen (cfgparse-listen.c:568)
==21219== by 0x459DF3: readcfgfile (cfgparse.c:2167)
==21219== by 0x5056CD: init (haproxy.c:2021)
==21219== by 0x418232: main (haproxy.c:3121)
==21219==
==21219== LEAK SUMMARY:
==21219== definitely lost: 0 bytes in 0 blocks
==21219== indirectly lost: 0 bytes in 0 blocks
==21219== possibly lost: 757 bytes in 4 blocks
==21219== still reachable: 1,435,874 bytes in 126 blocks
==21219== suppressed: 0 bytes in 0 blocks
==21219== Reachable blocks (those to which a pointer was found) are not shown.
==21219== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==21219==
==21219== For counts of detected and suppressed errors, rerun with: -v
==21219== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
Re-running the same command with the patch applied will not report any
losses any more:
$ valgrind --leak-check=full ./haproxy -c -f ./example.cfg
==22124== Memcheck, a memory error detector
==22124== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==22124== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==22124== Command: ./haproxy -c -f ./example.cfg
==22124==
[WARNING] 165/001503 (22124) : config : missing timeouts for frontend 'foo'.
| While not properly invalid, you will certainly encounter various problems
| with such a configuration. To fix this, please ensure that all following
| timeouts are set to a non-zero value: 'client', 'connect', 'server'.
Warnings were found.
Configuration file is valid
==22124==
==22124== HEAP SUMMARY:
==22124== in use at exit: 313,864 bytes in 82 blocks
==22124== total heap usage: 153 allocs, 71 frees, 1,447,758 bytes allocated
==22124==
==22124== LEAK SUMMARY:
==22124== definitely lost: 0 bytes in 0 blocks
==22124== indirectly lost: 0 bytes in 0 blocks
==22124== possibly lost: 0 bytes in 0 blocks
==22124== still reachable: 313,864 bytes in 82 blocks
==22124== suppressed: 0 bytes in 0 blocks
==22124== Reachable blocks (those to which a pointer was found) are not shown.
==22124== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==22124==
==22124== For counts of detected and suppressed errors, rerun with: -v
==22124== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
It might be worth investigating what exactly HAProxy does to lose pointers
to the start of those 4 memory areas and then to be able to still free them
during deinit(). If HAProxy is able to free them, they ideally should be
"still reachable" and not "possibly lost".
The allocation did not account for either the trailing null byte or the
space, leading to a buffer overwrite.
This bug was detected by an assertion failure in the allocator. But can
also be easily detected using valgrind:
==25827== Invalid write of size 1
==25827== at 0x6529759: __vsprintf_chk (vsprintf_chk.c:84)
==25827== by 0x65296AC: __sprintf_chk (sprintf_chk.c:31)
==25827== by 0x4D6AB7: sprintf (stdio2.h:33)
==25827== by 0x4D6AB7: proxy_parse_smtpchk_opt (check.c:1799)
==25827== by 0x4A7DDD: cfg_parse_listen (cfgparse-listen.c:2269)
==25827== by 0x494AD3: readcfgfile (cfgparse.c:2167)
==25827== by 0x542995: init (haproxy.c:2021)
==25827== by 0x421DD2: main (haproxy.c:3121)
==25827== Address 0x78712a8 is 0 bytes after a block of size 24 alloc'd
==25827== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==25827== by 0x4D6A8C: proxy_parse_smtpchk_opt (check.c:1797)
==25827== by 0x4A7DDD: cfg_parse_listen (cfgparse-listen.c:2269)
==25827== by 0x494AD3: readcfgfile (cfgparse.c:2167)
==25827== by 0x542995: init (haproxy.c:2021)
==25827== by 0x421DD2: main (haproxy.c:3121)
This patch fixes issue #681.
This bug was introduced in commit fbcc77c6ba,
which first appeared in 2.2-dev7. No backport needed.
When building with gcc-8 -fsanitize=address, we get this warning once on
an strncpy() call in proto_uxst.c:
src/proto_uxst.c:262:3: warning: 'strncpy' output may be truncated copying 107 bytes from a string of length 4095 [-Wstringop-truncation]
strncpy(addr.sun_path, tempname, sizeof(addr.sun_path) - 1);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It happens despite the test on snprintf() at the top (since gcc's string
handling is totally empiric), and requires the strlen() test to be placed
"very close" to the strncpy() call (with "very close" yet to be determined).
There's no other way to shut this one except disabling it. Given there's
only one instance of this warning and the cost of dealing with it in the
code is not huge, let's decorate the code to make gcc happily believe it
is smart since it seems to have a mind of itself.
clang just failed on fd.c with this error:
src/fd.c:491:9: error: logical not is only applied to the left hand side of this comparison [-Werror,-Wlogical-not-parentheses]
while (HA_SPIN_TRYLOCK(OTHER_LOCK, &log_lock) != 0) {
^ ~~
That's because this expands to this:
while (!pl_try_s(&log_lock) != 0) {
Let's just add parenthesis in the TRYLOCK macros to avoid this.
This may need to be backported if commit df187875d ("BUG/MEDIUM: log:
don't hold the log lock during writev() on a file descriptor") is
backported as well as it seems to be the first one to trigger it.
In bug #676, it was reported that ssl-min-ver SSLv3 does not work in
Amazon environments with OpenSSL 1.0.2.
The reason for this is a patch of Amazon OpenSSL which sets
SSL_OP_NO_SSLv3 in SSL_CTX_new(). Which is kind of a problem with our
implementation of ssl-{min,max}-ver in old openSSL versions, because it
does not try to clear existing version flags.
This patch fixes the bug by cleaning versions flags known by HAProxy in
the SSL_CTX before applying the right ones.
Should be backported as far as 1.8.
Getting rid of this warning is cleaner solved using a 'fall through' comment,
because it clarifies intent to a human reader.
This patch adjust a few places that cause -Wimplicit-fallthrough to trigger:
- Fix typos in the comment.
- Remove redundant 'no break' that trips up gcc from comment.
- Move the comment out of the block when the 'case' is completely surrounded
by braces.
- Add comments where I could determine that the fall through was intentional.
Changes tested on
gcc (Debian 9.3.0-13) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
using
make -j4 all TARGET=linux-glibc USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1 USE_PCRE2=1 USE_PCRE2_JIT=1 USE_GETADDRINFO=1
Commit b5997f740 ("MAJOR: threads/map: Make acls/maps thread safe")
introduced a subtle bug in the pattern matching code. In order to cope
with the possibility that another thread might be modifying the pattern's
sample_data while it's being used, we return a thread-local static
sample_data which is a copy of the one found in the matched pattern. The
copy is performed depending on the sample_data's type. But the switch
statement misses some breaks and doesn't set the new sample_data pointer
at the right place, resulting in the original sample_data being restored
at the end before returning.
The net effect overall is that the correct sample_data is returned (hence
functionally speaking the matching works fine) but it's not thread-safe
so any del_map() or set_map() action could modify the pattern on one
thread while it's being used on another one. It doesn't seem likely to
cause a crash but could result in corrupted data appearing where the
value is consumed (e.g. when appended in a header or when logged) or an
ACL occasionally not matching after a map lookup.
This fix should be backported as far as 1.8.
Thanks to Tim for reporting it and to Emeric for the analysis.
Tested with
make -j4 all TARGET=linux-glibc USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1 USE_PCRE2=1 USE_PCRE2_JIT=1 USE_GETADDRINFO=1
against
gcc (Debian 9.3.0-13) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
In issue #648 a second problem was reported, indicating that some users
mistakenly send the log to an FD mapped on a file. This situation doesn't
even enable O_NONBLOCK and results in huge access times in the order of
milliseconds with the lock held and other threads waiting till the
watchdog fires to unblock the situation.
The problem with files is that O_NONBLOCK is ignored, and we still need
to lock otherwise we can end up with interleaved log messages.
What this patch does is different. Instead of locking all writers, it
uses a trylock so that there's always at most one logger and that other
candidates can simply give up and report a failure, just as would happen
if writev() returned -1 due to a pipe full condition. This solution is
elegant because it gives back the control to haproxy to decide to give
up when it takes too much time, while previously it was the kernel that
used to block the syscall.
However at high log rates (500000 req/s) there was up to 50% dropped logs
due to the contention on the lock. In order to address this, we try to
grab the lock up to 200 times and call ha_thread_relax() on failure. This
results in almost no failure (no more than previously with O_NONBLOCK). A
typical test with 6 competing threads writing to stdout chained to a pipe
to a single process shows around 1000 drops for 10 million logs at 480000
lines per second.
Please note that this doesn't mean that writing to a blocking FD is a good
idea, and it might only be temporarily done on testing environments for
debugging. A file or a terminal will continue to block the writing thread
while others spin a little bit and lose their logs, but the writing thread
will still experience performance-killing latencies.
This patch should be backported to 2.1 and 2.0. The code is in log.c in
2.0, but the principle is the same.
Apparently Cygwin requires sys/types.h before netinet/tcp.h but doesn't
include it by itself, as shown here:
https://github.com/haproxy/haproxy/actions/runs/131943890
This patch makes sure it's always present, which is in server.c and
the SPOA example.
Released version 2.2-dev9 with the following main changes :
- BUG/MINOR: http-htx: Don't forget to release the http reply in release function
- BUG/MINOR: http-htx: Fix a leak on error path during http reply parsing
- MINOR: checks: Remove dead code from process_chk_conn()
- REGTESTS: checks: Fix tls_health_checks when IPv6 addresses are used
- REGTESTS: Add missing OPENSSL to REQUIRE_OPTIONS for lua/txn_get_priv
- MINOR: lua: Use vars_unset_by_name_ifexist()
- CLEANUP: vars: Remove void vars_unset_by_name(const char*, size_t, struct sample*)
- MINOR: vars: Make vars_(un|)set_by_name(_ifexist|) return a success value
- MINOR: lua: Make `set_var()` and `unset_var()` return success
- MEDIUM: lua: Add `ifexist` parameter to `set_var`
- MEDIUM: ring: new section ring to declare custom ring buffers.
- REGTESTS: Add missing OPENSSL to REQUIRE_OPTIONS for compression/lua_validation
- REGTESTS: Require the version 2.2 to execute lua/set_var
- BUG/MEDIUM: checks: Refresh the conn-stream and the connection after a connect
- MINOR: checks: Remove useless tests on the connection and conn-stream
- BUG/MEDIUM: contrib/spoa: do not register python3.8 if --embed fail
- BUG/MEDIUM: connection: Ignore PP2 unique ID for stream-less connections
- BUG/MINOR: connection: Always get the stream when available to send PP2 line
- BUG/MEDIUM: backend: set the connection owner to the session when using alpn.
- MINOR: pools: compute an estimate of each pool's average needed objects
- MEDIUM: pools: directly free objects when pools are too much crowded
- REGTEST: Add connection/proxy_protocol_send_unique_id_alpn
- MINOR: http-ana: Make the function http_reply_to_htx() public
- MINOR: http-ana: Use proxy's error replies to emit 401/407 responses
- MINOR: http-rules: Use an action function to eval http-request auth rules
- CLEANUP: http: Remove unused HTTP message templates
- BUG/MEDIUM: checks: Don't blindly subscribe for receive if waiting for connect
- MINOR: checks: I/O callback function only rely on the data layer wake callback
- BUG/MINOR: lua: Add missing string length for lua sticktable lookup
- BUG/MEDIUM: logs: fix trailing zeros on log message.
- CI: cirrus-ci: skip reg-tests/connection/proxy_protocol_send_unique_id_alpn.vtc on CentOS 6
- BUG/MINOR: nameservers: fix error handling in parsing of resolv.conf
- BUG/MEDIUM: checks: Don't add a tcpcheck ruleset twice in the shared tree
- MEDIUM: ssl: use TLSv1.2 as the minimum default on bind lines
- CLEANUP: pools: use the regular lock for the flush operation on lockless pools
- SCRIPTS: publish-release: pass -n to gzip to remove timestamp
- MINOR: ring: re-work ring attach generic API.
- BUG/MINOR: error on unknown statement in ring section.
- MEDIUM: ring: add server statement to forward messages from a ring
- MEDIUM: ring: add new srv statement to support octet counting forward
- MINOR: ssl: set ssl-min-ver in ambiguous configurations
- CLEANUP: ssl: remove comment from dump_crtlist_sslconf()
- BUILD: sink: address build warning on 32-bit architectures
- BUG/MINOR: peers: fix internal/network key type mapping.
- CLEANUP: regex: remove outdated support for regex actions
- Revert "MINOR: ssl: rework add cert chain to CTX to be libssl independent"
- MINOR: mux-h1/proxy: Add a proxy option to disable clear h2 upgrade
- BUG/MEDIUM: lua: Reset analyse expiration timeout before executing a lua action
- DOC: add a line about comments in crt-list
- BUG/MEDIUM: hlua: Lock pattern references to perform set/add/del operations
- BUG/MINOR: checks: Fix test on http-check rulesets during config validity check
- BUG/MEDIUM: contrib/prometheus-exporter: Properly set flags to dump metrics
- BUG/MEDIUM: mworker: fix the copy of options in copy_argv()
- BUG/MINOR: init: -x can have a parameter starting with a dash
- BUG/MINOR: init: -S can have a parameter starting with a dash
- BUG/MEDIUM: mworker: fix the reload with an -- option
- BUG/MINOR: ssl: fix a trash buffer leak in some error cases
- BUG/MINOR: mworker: fix a memleak when execvp() failed
- MINOR: sample: Add secure_memcmp converter
- REORG: ebtree: move the C files from ebtree/ to src/
- REORG: ebtree: move the include files from ebtree to include/import/
- REORG: ebtree: clean up remains of the ebtree/ directory
- REORG: include: create new file haproxy/api-t.h
- REORG: include: create new file haproxy/api.h
- REORG: include: update all files to use haproxy/api.h or api-t.h if needed
- CLEANUP: include: remove common/config.h
- CLEANUP: include: remove unused template.h
- REORG: include: move MIN/MAX from tools.h to compat.h
- REORG: include: move SWAP/MID_RANGE/MAX_RANGE from tools.h to standard.h
- CLEANUP: include: remove unused common/tools.h
- REORG: include: move the base files from common/ to haproxy/
- REORG: include: move version.h to haproxy/
- REORG: include: move base64.h, errors.h and hash.h from common to to haproxy/
- REORG: include: move openssl-compat.h from common/ to haproxy/
- REORG: include: move ist.h from common/ to import/
- REORG: include: move the BUG_ON() code to haproxy/bug.h
- REORG: include: move debug.h from common/ to haproxy/
- CLEANUP: debug: drop unused function p_malloc()
- REORG: include: split buf.h into haproxy/buf-t.h and haproxy/buf.h
- REORG: include: move istbuf.h to haproxy/
- REORG: include: split mini-clist into haproxy/list and list-t.h
- REORG: threads: extract atomic ops from hathreads.h
- CLEANUP: threads: remove a few needless includes of hathreads.h
- REORG: include: split hathreads into haproxy/thread.h and haproxy/thread-t.h
- CLEANUP: thread: rename __decl_hathreads() to __decl_thread()
- REORG: include: move time.h from common/ to haproxy/
- REORG: include: move integer manipulation functions from standard.h to intops.h
- CLEANUP: include: remove excessive includes of common/standard.h
- REORG: include: move freq_ctr to haproxy/
- CLEANUP: pool: include freq_ctr.h and remove locally duplicated functions
- REORG: memory: move the pool type definitions to haproxy/pool-t.h
- REORG: memory: move the OS-level allocator to haproxy/pool-os.h
- MINOR: memory: don't let __pool_get_first() pick from the cache
- MEDIUM: memory: don't let pool_put_to_cache() free the objects itself
- MINOR: memory: move pool-specific path of the locked pool_free() to __pool_free()
- MEDIUM: memory: make local pools independent on lockless pools
- REORG: include: move common/memory.h to haproxy/pool.h
- REORG: include: move common/chunk.h to haproxy/chunk.h
- REORG: include: move activity to haproxy/
- REORG: include: move common/buffer.h to haproxy/dynbuf{,-t}.h
- REORG: include: move common/net_helper.h to haproxy/net_helper.h
- REORG: include: move common/namespace.h to haproxy/namespace{,-t}.h
- REORG: include: split common/regex.h into haproxy/regex{,-t}.h
- REORG: include: split common/xref.h into haproxy/xref{,-t}.h
- REORG: include: move common/ticks.h to haproxy/ticks.h
- REORG: include: split common/http.h into haproxy/http{,-t}.h
- REORG: include: split common/http-hdr.h into haproxy/http-hdr{,-t}.h
- REORG: include: move common/h1.h to haproxy/h1.h
- REORG: include: split common/htx.h into haproxy/htx{,-t}.h
- REORG: include: move hpack*.h to haproxy/ and split hpack-tbl
- REORG: include: move common/h2.h to haproxy/h2.h
- REORG: include: move common/fcgi.h to haproxy/
- REORG: include: move protocol.h to haproxy/protocol{,-t}.h
- REORG: tools: split common/standard.h into haproxy/tools{,-t}.h
- REORG: include: move dict.h to hparoxy/dict{,-t}.h
- REORG: include: move shctx to haproxy/shctx{,-t}.h
- REORG: include: move port_range.h to haproxy/port_range{,-t}.h
- REORG: include: move fd.h to haproxy/fd{,-t}.h
- REORG: include: move ring to haproxy/ring{,-t}.h
- REORG: include: move sink.h to haproxy/sink{,-t}.h
- REORG: include: move pipe.h to haproxy/pipe{,-t}.h
- CLEANUP: include: remove empty raw_sock.h
- REORG: include: move proto_udp.h to haproxy/proto_udp{,-t}.h
- REORG: include: move proto/proto_sockpair.h to haproxy/proto_sockpair.h
- REORG: include: move compression.h to haproxy/compression{,-t}.h
- REORG: include: move h1_htx.h to haproxy/h1_htx.h
- REORG: include: move http_htx.h to haproxy/http_htx{,-t}.h
- REORG: include: move hlua.h to haproxy/hlua{,-t}.h
- REORG: include: move hlua_fcn.h to haproxy/hlua_fcn.h
- REORG: include: move action.h to haproxy/action{,-t}.h
- REORG: include: move arg.h to haproxy/arg{,-t}.h
- REORG: include: move auth.h to haproxy/auth{,-t}.h
- REORG: include: move dns.h to haproxy/dns{,-t}.h
- REORG: include: move flt_http_comp.h to haproxy/
- REORG: include: move counters.h to haproxy/counters-t.h
- REORG: include: split mailers.h into haproxy/mailers{,-t}.h
- REORG: include: move capture.h to haproxy/capture{,-t}.h
- REORG: include: move frontend.h to haproxy/frontend.h
- REORG: include: move obj_type.h to haproxy/obj_type{,-t}.h
- REORG: include: move http_rules.h to haproxy/http_rules.h
- CLEANUP: include: remove unused mux_pt.h
- REORG: include: move mworker.h to haproxy/mworker{,-t}.h
- REORG: include: move ssl_utils.h to haproxy/ssl_utils.h
- REORG: include: move ssl_ckch.h to haproxy/ssl_ckch{,-t}.h
- REORG: move ssl_crtlist.h to haproxy/ssl_crtlist{,-t}.h
- REORG: include: move lb_chash.h to haproxy/lb_chash{,-t}.h
- REORG: include: move lb_fas.h to haproxy/lb_fas{,-t}.h
- REORG: include: move lb_fwlc.h to haproxy/lb_fwlc{,-t}.h
- REORG: include: move lb_fwrr.h to haproxy/lb_fwrr{,-t}.h
- REORG: include: move listener.h to haproxy/listener{,-t}.h
- REORG: include: move pattern.h to haproxy/pattern{,-t}.h
- REORG: include: move map to haproxy/map{,-t}.h
- REORG: include: move payload.h to haproxy/payload.h
- REORG: include: move sample.h to haproxy/sample{,-t}.h
- REORG: include: move protocol_buffers.h to haproxy/protobuf{,-t}.h
- REORG: include: move vars.h to haproxy/vars{,-t}.h
- REORG: include: split global.h into haproxy/global{,-t}.h
- REORG: include: move task.h to haproxy/task{,-t}.h
- REORG: include: move proto_tcp.h to haproxy/proto_tcp.h
- REORG: include: move signal.h to haproxy/signal{,-t}.h
- REORG: include: move tcp_rules.h to haproxy/tcp_rules.h
- REORG: include: move connection.h to haproxy/connection{,-t}.h
- REORG: include: move checks.h to haproxy/check{,-t}.h
- REORG: include: move http_fetch.h to haproxy/http_fetch.h
- REORG: include: move peers.h to haproxy/peers{,-t}.h
- REORG: include: move stick_table.h to haproxy/stick_table{,-t}.h
- REORG: include: move session.h to haproxy/session{,-t}.h
- REORG: include: move trace.h to haproxy/trace{,-t}.h
- REORG: include: move acl.h to haproxy/acl.h{,-t}.h
- REORG: include: split common/uri_auth.h into haproxy/uri_auth{,-t}.h
- REORG: move applet.h to haproxy/applet{,-t}.h
- REORG: include: move stats.h to haproxy/stats{,-t}.h
- REORG: include: move cli.h to haproxy/cli{,-t}.h
- REORG: include: move lb_map.h to haproxy/lb_map{,-t}.h
- REORG: include: move ssl_sock.h to haproxy/ssl_sock{,-t}.h
- REORG: include: move stream_interface.h to haproxy/stream_interface{,-t}.h
- REORG: include: move channel.h to haproxy/channel{,-t}.h
- REORG: include: move http_ana.h to haproxy/http_ana{,-t}.h
- REORG: include: move filters.h to haproxy/filters{,-t}.h
- REORG: include: move fcgi-app.h to haproxy/fcgi-app{,-t}.h
- REORG: include: move log.h to haproxy/log{,-t}.h
- REORG: include: move proxy.h to haproxy/proxy{,-t}.h
- REORG: include: move spoe.h to haproxy/spoe{,-t}.h
- REORG: include: move backend.h to haproxy/backend{,-t}.h
- REORG: include: move queue.h to haproxy/queue{,-t}.h
- REORG: include: move server.h to haproxy/server{,-t}.h
- REORG: include: move stream.h to haproxy/stream{,-t}.h
- REORG: include: move cfgparse.h to haproxy/cfgparse.h
- CLEANUP: hpack: export debug functions and move inlines to .h
- REORG: check: move the e-mail alerting code to mailers.c
- REORG: check: move tcpchecks away from check.c
- REORG: check: move email_alert* from proxy-t.h to mailers-t.h
- REORG: check: extract the external checks from check.{c,h}
- CLEANUP: include: don't include stddef.h directly
- CLEANUP: include: don't include proxy-t.h in global-t.h
- CLEANUP: include: move sample_data out of sample-t.h
- REORG: include: move the error reporting functions to from log.h to errors.h
- BUILD: reorder objects in the Makefile for faster builds
- CLEANUP: compiler: add a THREAD_ALIGNED macro and use it where appropriate
- CLEANUP: include: make atomic.h part of the base API
- REORG: include: move MAX_THREADS to defaults.h
- REORG: include: move THREAD_LOCAL and __decl_thread() to compiler.h
- CLEANUP: include: tree-wide alphabetical sort of include files
- REORG: include: make list-t.h part of the base API
- REORG: dgram: rename proto_udp to dgram
The set of files proto_udp.{c,h} were misleadingly named, as they do not
provide anything related to the UDP protocol but to datagram handling
instead, since currently all UDP processing is hard-coded where it's used
(dns, logs). They are to UDP what connection.{c,h} are to proto_tcp. This
was causing confusion about how to insert UDP socket management code,
so let's rename them right now to dgram.{c,h} which more accurately
matches what's inside since every function and type is already prefixed
with "dgram_".
There are list definitions everywhere in the code, let's drop the need
for including list-t.h to declare them. The rest of the list manipulation
is huge however and not needed everywhere so using the list walking macros
still requires to include list.h.
This patch fixes all the leftovers from the include cleanup campaign. There
were not that many (~400 entries in ~150 files) but it was definitely worth
doing it as it revealed a few duplicates.
Since these are used as type attributes or conditional clauses, they
are used about everywhere and should not require a dependency on
thread.h. Moving them to compiler.h along with other similar statements
like ALIGN() etc looks more logical; this way they become part of the
base API. This allowed to remove thread-t.h from ~12 files, one was
found to only require thread-t and not thread and dict.c was found to
require thread.h.
That's already where MAX_PROCS is set, and we already handle the case of
the default value so there is no reason for placing it in thread.h given
that most call places don't need the rest of the threads definitions. The
include was removed from global-t.h and activity.c.