Following the patch b4daee ("MINOR: sock: add a check against cross
worker<->master socket activities"), this patch adds a dedicated applet
for the master CLI. It ensures that the CLI connection can't be
used with the master rights in the case of bugs.
In issue #933, @jaroslawr provided a report indicating that when using
many threads and many servers, it's very difficult to terminate the last
idle connections on each server. The issue has two causes in fact. The
first one is that during the calculation of the estimate of needed
connections, we round the computation up while in previous round it was
already rounded up, so we end up adding 1 to 1 which once divided by 2
remains 1. The second issue is that servers are not woken up anymore for
purging their connections if they don't have activity. The only reason
that was there to wake them up again was in case insufficient connections
were purged. And even then the purge task itself was not woken up. But
that is not enough for getting rid of the long tail of old connections
nor updating est_need_conns.
This patch makes sure to properly wake up as long as at least one idle
connection remains, and not to round up the needed connections anymore.
Prior to this patch, a test involving many connections which suddenly
stopped would keep many idle connections, now they're effectively halved
every pool-purge-delay.
This needs to be backported to 2.2.
Given that the previous issues caused spurious worker socket wakeups in
the master for inherited FDs that couldn't be closed, let's add a strict
test in the I/O callback to make sure that an accept() event is always
caught by the appropriate type of process (master for master listeners,
worker for worker listeners).
Since the h2 multiplexer no longer relies on the legacy HTTP representation, and
uses exclusively the HTX, the H1 parser state (h1m) is no longer used by the h2
streams. Thus it can be removed.
This patch may be backported as far as 2.1.
William noticed that the real issue in the abns test was that it was
failing to rebind some listeners, issues which were addressed in the
previous commits (in some cases, the master would even accept traffic
on the worker's socket which was not properly disabled, causing some
of the strange error messages).
The other issue documented in commit 2ea15a080 was the lack of
reliability when the test was run in parallel. This is caused by the
abns socket which uses a hard-coded address, making all tests in
parallel to step onto each others' toes. Since vtest cannot provide
abns sockets, we're instead concatenating the number of the listening
port that vtest allocated for another frontend to the abns path, which
guarantees to make them unique in the system.
The test works fine in all cases now, even with 100 in parallel.
We used to refrain from calling fd_want_recv() if fd_updt was not allocated
but it's not the right solution as this does not allow the FD to be set.
Instead, let's use the new fd_want_recv_safe() which will update the FD and
create an update entry only if possible. In addition, the equivalent test
before calling fd_stop_recv() was removed as totally useless since there's
not fd_updt creation in this case.
This does the same as fd_want_recv() except that it does check for
fd_updt[] to be allocated, as this may be called during early listener
initialization. Previously we used to check fd_updt[] before calling
fd_want_recv() but this is not correct since it does not update the
FD flags. This method will be safer.
In commit 374e9af35 ("MEDIUM: listener: let do_unbind_listener() decide
whether to close or not") it didn't appear necessary to have the master
process keep open the workers' inherited FDs. But this is actually
necessary to handle the reload on "bind fd@foo" situations, otherwise
the FD may be reassigned and the new socket cannot be set up, sometimes
causing "socket operation on non-socket" or other types of errors.
William found that this was the cause for the consistent failures of the
abns regtest, which already used to fail very often before this and was
as such marked as broken.
Interestingly I didn't have this issue with my test configs because
the FD number I used was higher and within the range of other listening
sockets. But this means that one of these wouldn't work as expected.
No backport is needed, this was introduced as part of the listeners
rework in 2.3.
It is not acceptable to suspend an inherited socket because we'd kill
its listening state, making it possibly unrecoverable for future
processes. The situation which can trigger this is when there is an
abns socket in a config and an inherited FD on another listener. Upon
soft reload, the abns fails to bind, a SIGTTOU is sent to the old
process which suspends everything, including the inherited FD, then
the new process can bind and tell the old one to quit. Except that the
new FD was not set back to the listen state, which is detected by
listener_accept() which can pause it. It's only upon second reload
that the FD works again.
The solution is to refrain from suspending such FDs since we don't own
them. And the next process will get them right anyway from its config.
For now only TCP and UDP face this issue so it's better to address this
on a protocol basis
No backport is needed, this is related to the new listeners in 2.3.
The test on listener->state == LI_LISTEN is not sufficient to decide
if we need to enable a listener. Indeed, there is a very special case
which is the inherited FD shared, which has to reflect the real socket
state even after the previous test, and as such needs to remain in
LI_LISTEN state. In this case we don't want a worker to start the
master's listener nor conversely. Let's add a specific test for this.
An interesting case was reported with threads and moderately sized
stick-tables. Sometimes the watchdog would trigger during the purge.
It turns out that the stick tables were sized in the 10s of K entries
which is the order of magnitude of the possible number of connections,
and that threads were used over distinct NUMA nodes. While at first
glance nothing looks problematic there, actually there is a risk that
a thread trying to purge the table faces 100% of entries still in use
by a connection with (ts->ref_cnt > 0), and ends up scanning the whole
table, while other threads on the other NUMA node are causing the
cache lines to bounce back and forth and considerably slow down its
progress to the point of possibly spending hundreds of milliseconds
there, multiplied by the number of queued threads all failing on the
same point.
Interestingly, smaller tables would not trigger it because the scan
would be faster, and larger ones would not trigger it because plenty
of entries would be idle!
The most efficient solution is to increase the table size to be large
enough for this never to happen, but this is not reliable. We could
have a parallel list of idle entries but that would significantly
increase the storage and processing cost only to improve a few rare
corner cases.
This patch takes a more pragmatic approach, it considers that it will
not visit more than twice the number of nodes to be deleted, which
means that it accepts to fail up to 50% of the time. Given that very
small batches are programmed each time (1/256 of the table size), this
means the operation will finish quickly (128 times faster than now),
and will reduce the inter-thread contention. If this needs to be
reconsidered, it will probably mean that the batch size needs to be
fixed differently.
This needs to be backported to stable releases which extensively use
threads, typically 2.0.
Kudos to Nenad Merdanovic for figuring the root cause triggering this!
This partially reverts the patch 400829cd2 ("BUG/MEDIUM: filters: Don't try to
init filters for disabled proxies"). Disabled proxies must not be skipped in
flt_deinit() and flt_deinit_all_per_thread() when HAProxy is stopped because,
obvioulsy, at this step, all proxies appear as disabled (or stopped, it is the
same state). It is safe to do so because, during startup, filters declared on
disabled proxies are removed. Thus they don't exist anymore during shutdown.
This patch must be backported in all versions where the patch above is.
The mem stats are pretty convenient to spot leaks, except that they count
free(NULL) as 1, and the code does actually have quite a number of free(foo)
guards where foo is NULL if the object was already freed. Let's just not
count these ones so that the stats remain consistent. Now it's possible
to compare the strdup()/malloc() and free() and verify they are consistent.
When a TCP connection is upgraded to HTTP, the passthrough multiplexer owning
the client connection is detroyed and replaced by an HTTP multiplexer. When it
happens, the connection context is changed (it is in fact the mux itself). Thus,
when the mux-pt is destroyed, the connection is not released. But, only the
connection must be kept. Everything else concerning the mux must be
released. Especially, the tasklet used for I/O subscriptions. In this part,
there was a bug and the tasklet was never released.
This patch should fix the issue #935. It must be backported as far as 2.0.
When servers based on server templates are initialized, the configuration file
and line are now copied. This helps to emit understandable warning and alert
messages.
This patch may be backported if needed, as far as 1.8.
On startup, if a server has no address but the dns resolutions are configured,
"none" method is added to the default init-addr methods, in addition to "last"
and "libc". Thus on startup, this server is set to RMAINT mode if no address is
found. It is only performed if no other init-addr method is configured.
Setting the RMAINT mode on startup is important to inhibit the health checks.
For instance, following servers will now be set to RMAINT mode on startup :
server srv nofound.tld:80 check resolvers mydns
server srv _http._tcp.service.local check resolvers mydns
server-template srv 1-3 _http._tcp.service.local check resolvers mydns
while followings ones will trigger an error :
server srv nofound.tld:80 check
server srv nofound.tld:80 check resolvers mydns init-addr libc
server srv _http._tcp.service.local check
server srv _http._tcp.service.local check resolvers mydns init-addr libc
server-template srv 1-3 _http._tcp.service.local check resolvers mydns init-addr libc
This patch must be backported as far as 1.8.
When a health-check fails, if no connection attempt was performed, a socket
error must be reported. But this was only done if the connection was not
allocated. It must also be done if there is no control layer. Otherwise, a
L7TOUT will be reported instead.
It is possible to not having a control layer for a connection if the connection
address family is invalid or not defined.
This patch must be backported to 2.2.
per-proxy and per-server post-check callback functions must be skipped for
disabled proxies because most of the configuration validity check is skipped for
these proxies.
This patch must be backported as far as 2.1.
Configuration is parsed for such proxies but not validated. Concretely, it means
check_config_validity() function does almost nothing for such proxies. Thus, we
must be careful to not initialize filters for disabled proxies because the check
callback function is not called. In fact, to be sure to avoid any trouble,
filters for disabled proxies are released.
This patch fixes a segfault at startup if the SPOE is configured for a disabled
proxy. It must be backported as far as 1.7 (maybe with some adaptations).
let us use SSL_CTRL_GET_RAW_CIPHERLIST for feature detection instead
of versions
[wla: SSL_CTRL_GET_RAW_CIPHERLIST was introduced by OpenSSL commit
94a209 along with SSL_CIPHER_find. It was removed in boringSSL.]
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
Released version 2.3-dev9 with the following main changes :
- CLEANUP: http_ana: remove unused assignation of `att_beg`
- BUG/MEDIUM: ssl: OCSP must work with BoringSSL
- BUG/MINOR: log: fix memory leak on logsrv parse error
- BUG/MINOR: log: fix risk of null deref on error path
- BUILD: ssl: more elegant OpenSSL early data support check
- CI: github actions: update h2spec to 2.6.0
- BUG/MINOR: cache: Check the return value of http_replace_res_status
- MINOR: cache: Store the "Last-Modified" date in the cache_entry
- MINOR: cache: Process the If-Modified-Since header in conditional requests
- MINOR: cache: Create res.cache_hit and res.cache_name sample fetches
- MINOR: mux-h2: register a stats module
- MINOR: mux-h2: add counters instance to h2c
- MINOR: mux-h2: add stats for received frame types
- MINOR: mux-h2: report detected error on stats
- MINOR: mux-h2: count open connections/streams on stats
- BUG/MINOR: server: fix srv downtime calcul on starting
- BUG/MINOR: server: fix down_time report for stats
- BUG/MINOR: lua: initialize sample before using it
- MINOR: cache: Add Expires header value parsing
- MINOR: ist: Add a case insensitive istmatch function
- BUG/MINOR: cache: Manage multiple values in cache-control header value
- BUG/MINOR: cache: Inverted variables in http_calc_maxage function
- MINOR: pattern: make pat_ref_append() return the newly added element
- MINOR: pattern: make pat_ref_add() rely on pat_ref_append()
- MINOR: pattern: export pat_ref_push()
- CLEANUP: pattern: use calloc() rather than malloc for structures
- CLEANUP: pattern: fix spelling/grammatical/copy-paste in comments
The code is horrible to work with because most functions are documented
with misleading comments resulting from many spelling and grammatical
mistakes, and plenty of remains of copy-paste mentioning arguments that
do not exist and return values that are never set. Too many hours wasted
writing non-working code because of assumptions resulting from this,
let's fix this once for all now!
It's particularly difficult to make sure that the various pattern
structures are properly initialized given that they can be allocated
at multiple places and systematically via malloc() instead of calloc(),
thus not even leaving the possibility of default values. Let's adjust
a few of them.
It's more convenient to return the element than to return just 0 or 1,
as the next thing we'll want to do is to act on this element! In addition
it was using variable arguments instead of consts, causing some reuse
constraints which were also addressed. This doesn't change its use as
a boolean, hence why call places were not modified.
The maxage and smaxage variables were inadvertently assigned the
Cache-Control s-maxage and max-age values respectively when it should
have been the other way around.
This can be backported on all branches after 1.8 (included).
If an HTTP request or response had a "Cache-Control" header that had
multiple comma-separated subparts in its value (like "max-age=1,
no-store" for instance), we did not process the values correctly and
only parsed the first one. That made us store some HTTP responses in the
cache when they were explicitely uncacheable.
This patch replaces the way the values are parsed by an http_find_header
loop that manages every sub part of the value independently.
This patch should be backported to 2.2 and 2.1. The bug also exists on
previous versions but since the sources changed, a new commit will have
to be created.
[wla: This patch requires bb4582c ("MINOR: ist: Add a case insensitive
istmatch function"). Backporting for < 2.1 is not a requirement since it
works well enough for most cases, it was a known limitation of the
implementation of non-htx version too]
When no Cache-Control max-age or s-maxage information is present in a
cached response, we need to parse the Expires header value (RFC 7234#5.3).
An invalid Expires date value or a date earlier than the reception date
will make the cache_entry stale upon creation.
For now, the Cache-Control and Expires headers are parsed after the
insertion of the response in the cache so even if the parsing of the
Expires results in an already stale entry, the entry will exist in the
cache.
Memset the sample before using it through hlua_lua2smp. This function is
ORing the smp.flags, so this field need to be cleared before its use.
This was reported by a coverity warning.
Fixes the github issue #929.
This bug can be backported up to 1.8.
Adjust condition used to report down_time for statistics. There was a
tiny probabilty to have a negative downtime if last_change was superior
to now. If this is the case, return only down_time.
This bug can backported up to 1.8.
When a server is up after a failure, its downtime was reset to 0 on the
statistics. This is due to a wrong condition that causes srv.down_time
to never be set. Fix this by updating down_time each time the server is in
STARTING state.
Fixes the github issue #920.
This bug can be backported up to 1.8.
Implement counters for h2 protocol error on connection or stream level.
Also count the total number of rst_stream and goaway frames sent by the
mux in response to a detected error.
Add pointer to counters as a member for h2c structure. This pointer is
initialized on h2_init function. This is useful to quickly access and
manipulate the counters inside every h2 functions.
Res.cache_hit sample fetch returns a boolean which is true when the HTTP
response was built out of a cache. The cache's name is returned by the
res.cache_name sample_fetch.
This resolves GitHub issue #900.
If a client sends a conditional request containing an If-Modified-Since
header (and no If-None-Match header), we try to compare the date with
the one stored in the cache entry (coming either from a Last-Modified
head, or a Date header, or corresponding to the first response's
reception time). If the request's date is earlier than the stored one,
we send a "304 Not Modified" response back. Otherwise, the stored is sent
(through a 200 OK response).
This resolves GitHub issue #821.
In order to manage "If-Modified-Since" requests, we need to keep a
reference time for our cache entries (to which the conditional request's
date will be compared).
This reference is either extracted from the "Last-Modified" header, or
the "Date" header, or the reception time of the response (in decreasing
order of priority).
The date values are converted into seconds since epoch in order to ease
comparisons and to limit storage space.
BorinSSL pretends to be 1.1.1 version of OpenSSL. It messes some
version based feature presense checks. For example, OpenSSL specific
early data support.
Let us change that feature detction to SSL_READ_EARLY_DATA_SUCCESS
macro check instead of version comparision.