Testing an undefined macro emits warnings due to -Wundef, and we have
exactly one such case in xxhash:
include/import/xxhash.h:3390:42: warning: "__cplusplus" is not defined [-Wundef]
#if ((defined(sun) || defined(__sun)) && __cplusplus) /* Solaris includes __STDC_VERSION__ with C++. Tested with GCC 5.5 */
Let's just prepend "defined(__cplusplus) &&" before __cplusplus to
resolve the problem. Upstream is still affected apparently.
Gcc before 7 does really not like direct operations on cast pointers
such as "((struct a*)b)->c += d;". It turns our that we have exactly
that construct in 3.0 since commit 5baa9ea168 ("MEDIUM: cache: Save
body size of cached objects and track it on delivery").
It's generally sufficient to use an intermediary variable such as :
"({ (struct a*) _ = b; _; })->c +=d;" but that's ugly. Fortunately
DISGUISE() implicitly does something very similar and works fine, so
let's use that.
No backport is needed.
Once data are received and placed in a channel buffer, if it is possible,
outgoing data are immediately forwarded. But we must take care to not do so
if there is also pending input data and a filter registered on the
channel. It is especially important for HTX streams because the HTX may be
altered, especially the extra field. And it is indeed an issue with the HTTP
compression filter and the H1 multiplexer. The wrong chunk size may be
announced leading to an internal error.
This patch should fix the issue #2530. It must be backported to all stable
versions.
The last fixes on the peers to improve the locking mechanism introduced new
peer flags and the value of some old flags was changed. This was done in the
commit 9b78e33837 ("MINOR: peers: Add 2 peer flags about the peer learn
status"). But, to ease the debugging of the peers team, old values are
restored.
This patch must be backported with the commit above.
Thanks to all previous changes, it is now possible to stop locking all peers
at once in the resync process function. Peer are locked one after the
other. Wen a peer is locked, another one may be locked when all peer sharing
the same shard must be updated. Otherwise, at anytime, at most one peer is
locked. This should significantly improve the situation.
This patch depends on the following patchs:
* BUG/MAJOR: peers: Update peers section state from a thread-safe manner
* BUG/MINOR: peers: Report a resync was explicitly requested from a thread-safe manner
* MINOR: peers: Add functions to commit peer changes from the resync task
* MINOR: peers: sligthly adapt part processing the stopping signal
* MINOR: peers: Add flags to report the peer state to the resync task
* MINOR: peers: Add 2 peer flags about the peer learn status
* MINOR: peers: Split resync process function to separate running/stopping states
It may be good to backport it to 2.9. All the seris should fix the issue #2470.
It is the main part of this series. In the peer applet, only the peer flags
are updated. It is now the responsibility of the resync process function to
check changes on each peer to update the peers section state accordingly.
Concretly, changes on the connection state (accepted, connected, released or
renewed) are first reported at the peer level and then handled in
__process_peer_state() function.
In the same manner, when the learn status of a peer changes, the peers
section state is no longer updated immediately. The resync task is woken up
to deal with this changes.
Thanks to these changes, the peers should be now really thread-safe.
This patch relies on the following ones:
* BUG/MINOR: peers: Report a resync was explicitly requested from a thread-safe manner
* MINOR: peers: Add functions to commit peer changes from the resync task
* MINOR: peers: sligthly adapt part processing the stopping signal
* MINOR: peers: Add flags to report the peer state to the resync task
* MINOR: peers: Add 2 peer flags about the peer learn status
* MINOR: peers: Split resync process function to separate running/stopping states
No bug was reported about the thread-safety of peers. Only a performance
issue was encountered with a huge number of peers (> 50). So there is no
reason to backport all these patches further than 2.9.
Flags on the peers section state must be updated from a thread-safe manner.
It is not true today. With this patch we take care PEERS_F_RESYNC_REQUESTED
flag is only set by the resync task. To do so, a peer flag is used. This
flag is only set once and never removed. It is juste used for debugging
purpose. So it is enough to set it on a peer and be sure to report it on the
peers section when the sync task is executed.
This patch relies on previous ones:
* MINOR: peers: Add functions to commit peer changes from the resync task
* MINOR: peers: sligthly adapt part processing the stopping signal
* MINOR: peers: Add flags to report the peer state to the resync task
* MINOR: peers: Add 2 peer flags about the peer learn status
* MINOR: peers: Split resync process function to separate running/stopping states
For now, nothing is done in these functions. It is only a patch to prepare
the huge part of the refactoring about the locking mechanism of the peers.
These functions will be responsible to check peers state and their learn
status to update the peers section flags accordingly.
The signal and the PEERS_F_DONOTSTOP flag are now handled in the loop on peers
to force sessions shutdown. We will need to loop on all peers to update their
state. It is easier this way.
As the previous patch, this patch is also part of the refactoring of peer
locking mechanisme. Here we add flags to represent a transitional state for
a peer. It will be the resync task responsibility to update the peers state
accordingly.
A peer may be in 4 transitional states:
* accepted : a connection was accepted from a peer
* connected: a connection to a peer was established
* release : a peer session was released
* renewed : a peer session was released because it was replaced by a new
one. Concretly, this will be equivalent to released+accepted
If none of these flags is set, it means the transition, if any, was
processed by the resync task, or no transition happened.
PEER_F_LEARN_PROCESS and PEER_F_LEARN_FINISHED flags are added to help to
fix locking issue about peers. Indeed, a peer is able to update the peers
"section" state under its own lock. Because the resync task locks all peers
at once, there is no conflict at this level. But there is nothing to prevent
2 peers to update the peers state in same time. So it seems there is no real
issue here, but there is a theorical thread-safety issue here. And it means
the locking mechanism of the peers must be reviewed.
In this context, the 2 flags above will help to move all update of the peers
state in the scope of resync task. Each peer will be able to update its own
state and the resync task will be responsible to update the peers state
accordingly.
The function responsible to deal with resynchro between all peers is now split
in two subfunctions. The first one is used when HAProxy is running while the
other one is used in soft-stop case.
This patch is required to be able to refactor locking mechanism of the peers.
There were several places in grpc and its dependency protobuf where unaligned
accesses were done. Read accesses to 32 (resp. 64) bits values should be performed
by read_u32() (resp. read_u64()).
Replace these unligned read accesses by correct calls to these functions.
Same fixes for doubles and floats.
Such unaligned read accesses could lead to crashes with bus errors on CPU
archictectures which do not fix them at run time.
This patch depends on this previous commit:
861199fa71 MINOR: net_helper: Add support for floats/doubles.
Must be backported as far as 2.6.
The global 'key-base' keyword allows to read the 'key' parameter of a
crt-store load line using a path prefix.
This is the equivalent of the 'crt-base' keyword but for 'key'.
It only applies on crt-store.
Add crt-base support for "crt-store". It will be used by 'crt', 'ocsp',
'issuer', 'sctl' load line parameter.
In order to keep compatibility with previous configurations and scripts
for the CLI, a crt-store load line will save its ckch_store using the
absolute crt path with the crt-base as the ckch tree key. This way, a
`show ssl cert` on the CLI will always have the completed path.
In 3.0-dev, with commit 7c9ce715c9 ("MINOR: ring: make callers use
ring_data() and ring_size(), not ring->buf"), we made startup_logs_dup()
use ring_size() to get the old ring size and pass it to ring_new() to
create a new ring. But due to the ambiguity of the allocate vs usable
size, this resulted in slightly shrinking the buffer compared to the
previous one, occasionally causing crashes if the first one was already
full of warnings, as seen in GH issue #2529. We need to use the allocated
size instead, thanks to the function brought by previous commit.
No backport is needed, this only affects 3.0-dev. Thanks to @felipewd
for the detailed report that allowed to spot the problem.
There's currently an abiguity around ring_size(), it's said to return
the allocated size but returns the usable size. We can't change it as
it's used everywhere in the code like this. Let's fix the comment and
add ring_allocated_size() instead for anything related to allocation.
In 2.6, a build issue for LRU in standalone test mode was addressed by
commit bf9c07fd9 ("BUILD/DEBUG: lru: update the standalone code to
support the revision"), but using revision 1 while looking up rev 0
results in 100% misses. Let's fix this and commit with revision 0 as
well.
No backport is needed, this only happens when hacking on the code.
Frontend and listen sections allow unlimited number of bind statements, it is
often, when there is a bind statement per supported protocol, like below:
listen test
mode http
bind quic4@0.0.0.0:443 name quic ssl crt ...
bind 0.0.0.0:443 name https ssl alpn http/1.1,h2 crt ...
bind 0.0.0.0:8080 ...
...
It seems useful to show corresponded protocol name in alerts and warnings,
when problem occures with port binding, connection resuming or sharding. This
helps to figure out immediately, which bind statement has a wrong setting or
which protocol module is the root cause of the issue.
When the integrity check fails, it's useful to get a dump of the area
around the first faulty byte. That's what this patch does. For example
it now shows this before reporting info about the tag itself:
Contents around first corrupted address relative to pool item:.
Contents around address 0xe4febc0792c0+40=0xe4febc0792e8:
0xe4febc0792c8 [80 75 56 d8 fe e4 00 00] [.uV.....]
0xe4febc0792d0 [a0 f7 23 a4 fe e4 00 00] [..#.....]
0xe4febc0792d8 [90 75 56 d8 fe e4 00 00] [.uV.....]
0xe4febc0792e0 [d9 93 fb ff fd ff ff ff] [........]
0xe4febc0792e8 [d9 93 fb ff ff ff ff ff] [........]
0xe4febc0792f0 [d9 93 fb ff ff ff ff ff] [........]
0xe4febc0792f8 [d9 93 fb ff ff ff ff ff] [........]
0xe4febc079300 [d9 93 fb ff ff ff ff ff] [........]
This may be backported to 2.9 and maybe even 2.8 as it does help spot
the cause of the memory corruption.
This function is particularly useful to dump unknown areas watching
for opportunistic symbols, so let's move it to tools.c so that we can
reuse it a little bit more.
When a corruption was detected in an object, it's often said that the
tag doesn't match the pool, but it should also check if it matches the
location of an earlier pool_free() call, which happens when -dMcaller
is used. That's what we're doing now.
In 2.9 with commit 7968fe3889 ("MEDIUM: stick-table: change the ref_cnt
atomically") we significantly relaxed the stick-tables locking when
dealing with peers by adjusting the ref_cnt atomically and moving it
out of the lock.
However it opened a tiny window that became problematic in 3.0-dev7
when the table's contention was lowered by commit 1a088da7c2 ("MAJOR:
stktable: split the keys across multiple shards to reduce contention").
What happens is that some peers may access the entry for reading at
the moment it's about to expire, and while the read accesses to push
the data remain unnoticed (possibly that from time to time we push
crap), but the releasing of the refcount causes a new write that may
damage anything else. The scenario is the following:
process_table_expire() peer_send_teachmsgs()
RDLOCK(&updt_lock);
tick_is_expired() != 0
ebmb_delete(ts->key);
if (ts->upd.node.leaf_p) {
HA_ATOMIC_INC(&ts->ref_cnt);
RDUNLOCK(&updt_lock);
WRLOCK(&updt_lock);
eb32_delete(&ts->upd);
}
__stksess_free(t, ts);
peer_send_updatemsg(ts);
RDLOCK(&updt_lock);
HA_ATOMIC_DEC(&ts->ref_cnt);
Here it's clear that the bottom part of peer_send_teachmsgs() believes
to be protected but may act on freed data.
This is more visible when enabling -dMtag,no-merge,integrity because
the ATOMIC_DEC(&ref_cnt) decrements one byte in the area, that makes
the eviction check fail while the tag has the address of the left
__stksess_free(), proving a completed pool_free() before the decrement,
and the anomaly there is pretty visible in the crash dump. Changing
INC()/DEC() with ADD(2)/DEC(2) shows that the byte is now off by two,
confirming that the operation happened there.
The solution is not very hard, it consists in checking for the ref_cnt
on the left after grabbing the lock, and doing both before deleting the
element, so that we have the guarantee that either the peer will not
take it or that it has already started taking it.
This was proven to be sufficient, as instead of crashing after 3s of
injection with 4 peers, 16 threads and 130k RPS, it survived for 15mn.
In order to stress the setup, a config involving 4+ peers, tracking
HTTP request with randoms and applying a bwlim-out filter with a
random key, with a client made of 160 h2 conns downloading 10 streams
of 4MB objects in parallel managed to trigger it within a few seconds:
frontend ft
http-request track-sc0 rand(100000) table tbl
filter bwlim-out lim-out limit 2047m key rand(100000000),ipmask(32) min-size 1 table tbl
http-request set-bandwidth-limit lim-out
use_backend bk
backend bk
server s1 198.18.0.30:8000
server s2 198.18.0.34:8000
backend tbl
stick-table type ip size 1000k expire 1s store http_req_cnt,bytes_in_rate(1s),bytes_out_rate(1s) peers peers
This seems to be very dependent on the timing and setup though.
This will need to be backported to 2.9. This part of the code was
reindented with shards but the block should remain mostly unchanged.
The logic to apply is the same.
Sending "trace peers event" on the CLI crashes because the event list
in the peers is not finished. This was introduced in 2.4 by commit
d865935f32 ("MINOR: peers: Add traces to peer_treat_updatemsg().")
so this must be backported to 2.4.
When adding the shards support to tables with commit 1a088da7c ("MAJOR:
stktable: split the keys across multiple shards to reduce contention"),
the condition to stop eliminating entries based on the batch size being
reached is based on a pre-decrement of the max_search counter, but now
it goes back into the outer loop which doesn't check it, so next time
it does it when entering the next shard, it will become even more
negative and will properly stop, but at first glance it looks like an
int overflow (which it is not). Let's make sure the outer loop stops
on this condition so that we don't continue searching when the limit
is reached.
While changing the stick-table indexing that led to commit 1a088da7c
("MAJOR: stktable: split the keys across multiple shards to reduce
contention"), I met a problem with the task's expiration date being
incorrectly updated, I fixed it and apparently I committed the wrong
version :-/
The effect is that the task's date is only correctly reset if the
table is empty, otherwise the task wakes up again and is queued at
the previous date, eating 100% CPU. The tick_isfirst() must not be
used when storing the last result.
No backport is needed as this was only merged in 3.0-dev7.
crt-list will be enhanced with ckch_conf keywords, however these keywords
does not fill the 'ssl_conf' structure. So we don't need to allocate the
ssl_conf for every options between [ ] but only when we found a relevant
one.
'crt-store' is a new section useful to define the struct ckch_store.
The "load" keyword in the "crt-store" section allows to define which
files you want to load for a specific certificate definition.
Ex:
crt-store
load crt "site1.crt" key "site1.key"
load crt "site2.crt" key "site2.key"
frontend in
bind *:443 ssl crt "site1.crt" crt "site2.crt"
This is part of the certificate loading which was discussed in #785.
When cache and stats applets were changed to use their own buffers, a change
was also performed to no longer access the stream from the I/O
handller. Among other things, the HTTP start-line of the request is now
retrieved to get the method. But, when these changes were brought, the inbuf
buffer allocation failures were not handled.
It is of course not so common. But if this happens, a crash may be
experienced. To fix the issue, we now check for inbuf allocation failures
before accessing it.
No backported needed.
We observed that a dynamic server which health check is down for longer
than slowstart delay at startup doesn't trigger the warmup phase, it
receives full traffic immediately. This has been confirmed by checking
haproxy UI, weight is immediately the full one (e.g. 75/75), without any
throttle applied. Further tests showed that it was similar if it was in
maintenance, and even when entering a down or maintenance state after
being up.
Another issue is that if the server is down for less time than
slowstart, when it comes back up, it briefly has a much higher weight
than expected for a slowstart.
An easy way to reproduce is to do the following:
- Add a server with e.g. a 20s slowstart and a weight of 10 in config
file
- Put it in maintenance using CLI (set server be1/srv1 state maint)
- Wait more than 20s, enable it again (set server be1/srv1 state ready)
- Observe UI, weight will show 10/10 immediately.
If server was down for less than 20s, you'd briefly see a weight and
throttle value that is inconsistent, e.g. 50% throttle value and a
weight of 5 if server comes back up after 10s before going back to
6% after a second or two.
Code analysis shows that the logic in server_recalc_eweight stops the
warmup task by setting server's next state to SRV_ST_RUNNING if it
didn't change state for longer than the slowstart duration, regardless
of its current state. As a consequence, a server being down or disabled
for longer than the slowstart duration will never enter the warmup phase
when it will be up again.
Regarding the weight when server comes back up, issue is that even if
the server is down, we still compute its next weight as if it was up,
hence when it comes back up, it can briefly have a much higher weight
than expected during slowstart, until the warmup task is called again
after last_change is updated.
This patch aims to fix both issues.
The doc about the build process has grown to a point where it was painful
to read when searching a specific element. This commit cuts it into a few
sub-categories for ease of searching, and it also adds a summary of the
most commonly used makefile variables, their usage and default settings.
"make opts" is nice because it shows all options being used, but it
does so in a copy-pastable way that aggregates everything on a single
line, rendering very poorly and making it hard to spot the relevant
variables.
Let's break lines and append a trailing backslash to them instead. This
gives something much more readable which remains copy-pastable. Options
can be inspected and more easily replicated. It was also verified that
copy-pasting the whole block after "make" does continue to work like
before and produces the same output.
This one is often used as a gateway to inject regular CFLAGS, even though
not designed for this. It's now ignored, but any attempt at setting it
reports a warning suggesting to use CFLAGS or ARCH_FLAGS instead.
The VERBOSE_CFLAGS variable is used to report the CFLAGS used in the
output of "haproxy -vv". It currently contains all -Wxxx and -Wno-xxx
because there was no other possibility till now, but these warnings
are highly specific to the compiler version in use, are automatically
detected and do not bring value being presented here. Worse, by
encouraging users to copy-paste them, they can end up with warnings
that are not relevant to their build environment.
In addition, their presence there doesn't give indication of whether
or not they triggered, so they cannot vouch for the output code quality.
Better just not report them there. However, as part of VERBOSE_CFLAGS
they were also used to detect whether or not to rebuild, via build_opts,
so they still have to be explicitly mentioned there if we want to make
sure that changing warning options triggers a rebuild to see their
effect.
Now we can have more useful outputs like this one which indicate precisely
what to use to safely rebuild the executable:
Build options :
TARGET = linux-glibc
CC = gcc
CFLAGS = -O2 -g -fwrapv
OPTIONS = USE_OPENSSL=1 USE_LUA=1 USE_ZLIB= USE_SLZ=1 USE_DEVICEATLAS=1 USE_51DEGREES=1 USE_WURFL=1 USE_QUIC=1 USE_PROMEX=1 USE_PCRE=1
DEBUG = -DDEBUG_DONT_SHARE_POOLS -DDEBUG_MEMORY_POOLS -DDEBUG_EXPR -DDEBUG_STRICT=2 -DDEBUG_DEV -DDEBUG_MEM_STATS -DDEBUG_POOL_TRACING
It's currently not possible to only set some -Wno... without breaking
the -W... and conversely. Let's split both sets apart so that it's now
possible to set -W... alone in WARN_CFLAGS to enable only some warnings,
and pass the -Wno... in SPEC_CFLAGS without losing the enabled ones.
The compiler-specific CFLAGS that are automatically detected (SPEC_CFLAGS)
are currently the ones affected by ERR and FAILFAST. Not only this makes
these options useless as soon as SPEC_CFLAGS is forced, but it also means
that any change to them causes a rebuild, so disabling FAILFAST or
enabling ERR to get more info on a faulty object causes a full rebuild
of all others. Let's just move them to ERROR_CFLAGS that only contains
these two. It's not documented outside of the makefile because it's not
supposed to be touched.
-Wfatal-errors is set by default and is not supported on older compilers.
Since it's part of all the automatically detected flags, it's painful to
remove when needed. Also it's a matter of taste, some developers might
prefer to get a long list of all errors at once, others prefer that the
build stops immediately after the root cause.
The default is now back to no -Wfatal-errors, and when FAILFAST is set to
any non-empty non-zero value, -Wfatal-errors is added:
$ make TARGET=linux-glibc USE_OPENSSL=0 USE_QUIC=1 FAILFAST=0 2>&1 | wc
132 536 6111
$ make TARGET=linux-glibc USE_OPENSSL=0 USE_QUIC=1 FAILFAST=1 2>&1 | wc
8 39 362
It's among the options that change a lot on the developer's side and it's
tempting to change from ERR=1 to ERR=0 on the make command line by reusing
the history, except it doesn't work. Let's explictily permit ERR=0 to
disable -Werror like ERR= does.