Define a per-thread counters allocated with the greatest size of any
stat module counters. This variable is named trash_counters.
When using a proxy without allocated counters, return the trash counters
from EXTRA_COUNTERS_GET instead of a dangling pointer to prevent
segfault.
This is useful for all the proxies used internally and not
belonging to the global proxy list. As these objects does not appears on
the stat report, it does not matter to use the dummy counters.
For this fix to be functional, the extra counters are explicitly
initialized to NULL on proxy/server/listener init functions.
Most notably, the crash has already been detected with the following
vtc:
- reg-tests/lua/txn_get_priv.vtc
- reg-tests/peers/tls_basic_sync.vtc
- reg-tests/peers/tls_basic_sync_wo_stkt_backend.vtc
There is probably other parts that may be impacted (SPOE for example).
This bug was introduced in the current release and do not need to be
backported. The faulty commits are
"MINOR: ssl: count client hello for stats" and
"MINOR: ssl: add counters for ssl sessions".
Do not cache responses that do not have an explicit expiration time
(s-maxage or max-age Cache-Control directives or Expires header) or a
validator (ETag or Last-Modified headers) anymore, as suggested in
RFC 7234#3.
The TX_FLAG_IGNORE flag is used instead of the TX_FLAG_CACHEABLE so as
not to change the behavior of the checkcache option.
No details are provided when an error occurs during the parsing of an errorfile,
Thus it is a bit hard to diagnose where the problem is. Now, when it happens, an
understandable error message is reported.
This patch is not a bug fix in itself. But it will be required to change an
fatal error into a warning in last stable releases. Thus it must be backported
as far as 2.0.
It does not require heavy deletion from the expr anymore, so we can now
turn this to a single-linked list since most of the time we want to delete
all instances of a given pattern from the head. By doing so we save 32 bytes
of memory per pattern. The pat_unlink_from_head() function was adjusted
accordingly.
This function will be usable to purge at most a specified number of old
entries from a reference. Entries are declared old if their generation
number is in the past compared to the one passed in argument. This will
ease removal of early entries when new ones have been appended.
We also call malloc_trim() when available, at the end of the series,
because this is one place where there is a lot of memory to save. Reloads
of 1M IP addresses used in an ACL made the process grow up to 1.7 GB RSS
after 10 reloads and roughly stabilize there without this call, versus
only 260 MB when the call is present. Sadly there is no direct equivalent
for jemalloc, which stabilizes around 800MB-1GB.
pat_ref_load() basically combines pat_ref_append() and pat_ref_commit().
It's very similar to pat_ref_add() except that it also allows to set the
generation ID and the line number. pat_ref_add() was modified to directly
rely on it to avoid code duplication. Note that a previous declaration
of pat_ref_load() was removed as it was just a leftover of an earlier
incarnation of something possibly similar, so no existing functionality
was changed here.
This function will be used after a successful pat_ref_append() to propagate
the pattern to all use places (including parsing and indexing). On failure,
it will entirely roll back all insertions and free the pattern itself. It
also preserves the generation number so that it is convenient for use in
association with pat_ref_append(). pat_ref_add() was modified to rely on
it instead of open-coding the insertion and roll-back.
Right now it's not possible to perform a safe reload because we don't
know what patterns were recently added or were already present. This
patch adds a generation counter to the reference patterns so that it
is possible to know what generation of the reference they were loaded
with. A reference now has two generations, the current one, used for
all additions, and the next one, allocated to those wishing to update
the contents. The generation wraps at 2^32 so comparisons must be made
relative to the current position.
The idea will be that upon full reload, the caller will first get a new
generation ID, will insert all new patterns using it, will then switch
the current ID to the new one, and will delete all entries older than
the current ID. This has the benefit of supporting chunked updates that
remain consistent and that won't block the whole process for ages like
pat_ref_reload() currently does.
Till now the only way to remove a known reference was via
pat_ref_delete_by_id() which scans the whole list to find a matching pointer.
Let's add pat_ref_delete_by_ptr() which takes a valid pointer. It can be
called by the function above after the pointer is found, and can also be
used to roll back a failed insertion much more efficiently.
These ones are not used anymore, so let's remove them to remove a bit
of the complexity. The ACL keyword's delete() function could be removed
as well, though most keyword declarations are positional and we have a
high risk of introducing a mistake here, so let's not touch the ACL part.
pat_del_tree_gen() was already chained onto pat_del_list_gen() to deal
with remaining cases, so let's complete the merge and have a generic
pattern deletion function acting on the reference and taking care of
reliably removing all elements.
This is the next step in speeding up entry removal. Now we don't scan
the whole lists or trees for elements pointing to the target reference,
instead we start from the reference and delete all linked patterns.
This simplifies some delete functions since we don't need anymore to
delete multiple times from an expression since all nodes appear after
the reference element. We can now have one generic list and one generic
tree deletion function.
This required the replacement of pattern_delete() with an open-coded
version since we now need to lock all expressions first before proceeding.
This means there is a high risk of lock inversion here but given that the
expressions are always scanned in the same order from the same head, this
must not happen.
Now deleting first entries is instantaneous, and it's still slow to
delete the last ones when looking up their ID since it still requires
to look them up by a full scan, but it's already way faster than
previously. Typically removing the last 10 IP from a 20M entries ACL
with a full-scan each took less than 2 seconds.
It would be technically possible to make use of indexed entries to
speed up most lookups for removal by value (e.g. IP addresses) but
that's for later.
There is a data model issue in the current pattern design that makes
pattern deletion extremely expensive: there's no direct way from a
reference to access all indexed occurrences. As such, the only way
to remove all indexed entries corresponding to a reference update
is to scan all expressions's lists and trees to find a link to the
reference. While this was possibly OK when map removal was not
common and most maps were small, this is not conceivable anymore
with GeoIP maps containing 10M+ entries and del-map operations that
are triggered from http-request rulesets.
This patch introduces two list heads from the pattern reference, one
for the objects linked by lists and one for those linked by tree node.
Ideally a single list would be enough but the linked elements are too
much unrelated to be distinguished at the moment, so we'll need two
lists. However for the long term a single-linked list will suffice but
for now it's not possible due to the way elements are removed from
expressions. As such this patch adds 32 bytes of memory usage per
reference plus 16 per indexed entry, but both will be cut in half
later.
The links are not yet used for deletion, this patch only ensures the
list is always consistent.
Now we have a single prune() function to act on an expression, and one
delete function for the lists and one for the trees. The presence of a
pointer in the lists is enough to warrant a free, and we rely on the
PAT_SF_REGFREE flag to decide whether to free using free() or regfree().
Currently we have no way to know how to delete/prune a pattern in a
generic way. A pattern doesn't contain its own type so we don't know
what function to call. Tree nodes are roughly OK but not lists where
regex are possible. Let's add one new bit for sflags at index time to
indicate that regex_free() will be needed upon deletion. It's not used
for now.
It's not possible to uniquely update a single expression without updating
the pattern reference, I don't know why we've put the revision in the
expression back then, given that it in fact provides an update for a
full pattern. Let's move the revision into the reference's head instead.
This is in order to access malloc_trim() which is convenient after
clearing huge maps to reclaim memory. When this is detected, we also
define HA_HAVE_MALLOC_TRIM.
This patch implements a couple of converters to validate and extract data from a
MQTT (Message Queuing Telemetry Transport) message. The validation consists of a
few checks as well as "packet size" validation. The extraction can get any field
from the variable header and the payload.
This is limited to CONNECT and CONNACK packet types only. All other messages are
considered as invalid. It is not a problem for now because only the first packet
on each side can be parsed (CONNECT for the client and CONNACK for the server).
MQTT 3.1.1 and 5.0 are supported.
Reviewed and Fixed by Christopher Faulet <cfaulet@haproxy.com>
This patch implements a couple of converters to validate and extract tag value
from a FIX (Financial Information eXchange) message. The validation consists in
a few checks such as mandatory fields and checksum computation. The extraction
can get any tag value based on a tag string or tag id.
This patch requires the istend() function. Thus it depends on "MINOR: ist: Add
istend() function to return a pointer to the end of the string".
Reviewed and Fixed by Christopher Faulet <cfaulet@haproxy.com>
Released version 2.4-dev0 with the following main changes :
- MINOR: version: it's development again.
- DOC: mention in INSTALL that it's development again
Let us use SSL_CTX_set1_curves_list, defined by OpenSSL, as well as in
openssl-compat when SSL_CTRL_SET_CURVES_LIST is present (BoringSSL),
for feature detection instead of versions.
OpenSSL 1.0.2 and onwards define SSL_CTX_set1_curves_list which is both a
function and a macro. OpenSSL 1.0.2 to 1.1.0 define SSL_CTRL_SET_CURVES_LIST
as a macro, which disappeared from 1.1.1. BoringSSL only has that one and
not the former macro but it does have the function. Let's keep the test on
the macro matching the function name by defining the macro to itself when
needed.
This does the same as fd_want_recv() except that it does check for
fd_updt[] to be allocated, as this may be called during early listener
initialization. Previously we used to check fd_updt[] before calling
fd_want_recv() but this is not correct since it does not update the
FD flags. This method will be safer.
The mem stats are pretty convenient to spot leaks, except that they count
free(NULL) as 1, and the code does actually have quite a number of free(foo)
guards where foo is NULL if the object was already freed. Let's just not
count these ones so that the stats remain consistent. Now it's possible
to compare the strdup()/malloc() and free() and verify they are consistent.
It's more convenient to return the element than to return just 0 or 1,
as the next thing we'll want to do is to act on this element! In addition
it was using variable arguments instead of consts, causing some reuse
constraints which were also addressed. This doesn't change its use as
a boolean, hence why call places were not modified.
The "weight" column on the stats page is somewhat confusing when using
slowstart becaue it reports the effective weight, without being really
explicit about it. In some situations the user-configured weight is more
relevant (especially with long slowstarts where it's important to know
if the configured weight is correct).
This adds a new uweight stat which reports a server's user-configured
weight, and in a backend it receives the sum of all servers' uweights.
In addition it adds the mention of "effective" in a few descriptions
for the "weight" column (help and doc).
As a result, the list of servers in a backend is now always scanned
when dumping the stats. But this is not a problem given that these
servers are already scanned anyway and for way heavier processing.
"no-maint" is a bit similar to "up" except that it will only hide
servers that are in maintenance (or disabled in the configuration), and
not those that are enabled but failed a check. One benefit here is to
significantly reduce the output of the "show stat" command when using
large server-templates containing entries that are not yet provisioned.
Note that the prometheus exporter also has such an option which does
the exact same.
This reverts commit b7ba1d9011. Actually
this test had already been removed in the past by commit fac0f645d
("BUG/MEDIUM: queue: make pendconn_cond_unlink() really thread-safe"),
but the condition to reproduce the bug mentioned there was not clear.
Now after analysis and a certain dose of code cleanup, things start to
appear more obvious. what happens is that if we check the presence of
the node in the tree without taking the lock, we can see the NULL at
the instant the node is being unlinked by another thread in
pendconn_process_next_strm() as part of __pendconn_unlink_prx() or
__pendconn_unlink_srv(). Till now there is no issue except that the
pendconn is not removed from the queue during this operation and that
the task is scheduled to be woken up by pendconn_process_next_strm()
with the stream being added to the list of the server's active
connections by __stream_add_srv_conn(). The first thread finishes
faster and gets back to stream_free() faster than the second one
sets the srv_conn on the stream, so stream_free() skips the s->srv_conn
test and doesn't try to dequeue the freshly queued entry. At the
very least a barrier would be needed there but we can't afford to
free the stream while it's being queued. So there's no other solution
than making sure that either __pendconn_unlink_prx() or
pendconn_cond_unlink() get the entry but never both, which is why the
lock is required around the test. A possible solution would be to set
p->target before unlinking the entry and using it to complete the test.
This would leave no dead period where the pendconn is not seen as
attached.
It is possible, yet extremely difficult, to reproduce this bug, which
was first noticed in bug #880. Running 100 servers with maxconn 1 and
maxqueue 1 on leastconn and a connect timeout of 30ms under 16 threads
with DEBUG_UAF, with a traffic making the backend's queue oscillate
around zero (typically using 250 connections with a local httpterm
server) may rarely manage to trigger a use-after-free.
No backport is needed.
On connection error processing, we can see massive storms of calls to
pendconn_cond_unlink() to release a possible place in the queue. For
example, in issue #908, on average half of the threads are caught in
this function via back_try_conn_req() consecutive to a synchronous
error. However we wait until grabbing the lock to know if the pendconn
is effectively in a queue, which is expensive for many cases. We know
the transition may only happen from in-queue to out-of-queue so it's safe
to first run a preliminary check to see if it's worth going further. This
will allow to avoid the cost of locking for most requests. This should
not change anything for those completing correctly as they're already
run through pendconn_free() which doesn't call pendconn_cond_unlink()
unless deemed necessary.
This is an anticipation of finer grained locking for the queues. For now
all lock places take a write lock so that there is no difference at all
with previous code.
In addition to the previous simplification, most locks don't use the
seek or read lock (e.g. spinlocks etc) so let's split the dump into
distinct operations (write/seek/read) and only report those which
were used. Now the output size is roughly divided by 5 compared
to previous ones.
The lock stats are very verbose and more than half of them are used in
a typical test, making it hard to spot the sought values. Let's simply
report "not used" for those which have not been called at all.
When an internal response is returned to a client, the message payload must be
skipped if it is a reply to a HEAD request. The payload is removed from the HTX
message just before the message forwarding.
This bugs has been around for a long time. It was already there in the pre-HTX
versions. In legacy HTTP mode, internal errors are not parsed. So this bug
cannot be easily fixed. Thus, this patch should only be backported in all HTX
versions, as far as 2.0. However, the code has significantly changed in the
2.2. Thus in the 2.1 and 2.0, the patch must be entirely reworked.
Partial support of conditional HTTP requests. This commit adds the
support of the 'If-None-Match' header (see RFC 7232#3.2).
When a client specifies a list of ETags through one or more
'If-None-Match' headers, they are all compared to the one that might have
been stored in the corresponding http cache entry until one of them
matches.
If a match happens, a specific "304 Not Modified" response is
sent instead of the cached data. This response has all the stored
headers but no other data (see RFC 7232#4.1). Otherwise, the whole cached data
is sent.
Although unlikely in a GET/HEAD request, the "If-None-Match: *" syntax is
valid and also receives a "304 Not Modified" response (RFC 7434#4.3.2).
This resolves a part of GitHub issue #821.
Add a function that compares two etags that might be of different types.
If any of them is weak, the 'W/' prefix is discarded and a strict string
comparison is performed.
Co-authored-by: Tim Duesterhus <tim@bastelstu.be>
In issue #785, users are reporting that it's not convenient to load a
".crt.key" when the configuration contains a ".crt".
This option allows to remove the extension of the certificate before
trying to load any extra SSL file (.key, .ocsp, .sctl, .issuer etc.)
The patch changes a little bit the way ssl_sock_load_files_into_ckch()
looks for the file.
When a connection is released, depending on its state, it may be detached from
the session and it may be removed from the server lists. The first case may
happen for private or unsharable active connections. The second one should only
be performed for idle or available connections. We never try to remove a
connection from the server list if it is attached to a session. But it is also
important to never try to remove a private connecion from the server lists, even
if it is not attached to a session. Otherwise, the curr_used_conn server counter
is decremented once too often.
This bug was introduced by the commit 04a24c5ea ("MINOR: connection: don't check
priv flag on free"). It is related to the issue #881. It only affects the 2.3,
no backport is needed.
This counter is only updated and never used, and in addition it's done
without any atomicity so it's very unlikely to be correct on multi-CPU
systems! Let's just remove it since it's not used.
A few structures were slightly rearranged in order to plug some holes
left around the locks. Sizes ranging from 8 to 32 bytes could be saved
depending on the structures. No performance difference was noticed (none
was expected there), though memory usage might be slightly reduced in
some rare cases.
We don't need to waste the size of a long for the locks: with the plocks,
even an unsigned short would offer enough room for up to 126 threads! Let's
use an unsigned int which will be easier to place in certain structures
and will more conveniently plug some holes, and Atomic ops are at least
as fast on 32-bit as on 64-bit. This will not change anything for 32-bit
platforms.
It's a bit overkill to register an initcall to call a function to set
a lock to zero when not debugging, let's just declare the lock as
pre-initialized to zero.
BoringSSL is a fork of OpenSSL 1.1.0, however in
49e9f67d8b7cbeb3953b5548ad1009d15947a523 it has changed version to 1.1.1.
Should fix issue #895.
This must be backported to 2.2, 2.1, 2.0, 1.8