This reverts commit 9e46496d45. It was
wrong and is not reliable, depending on the compiler's version and
optimization, as the struct is assigned inside a statement, thus on
its own stack. It's not needed anymore now so let's remove this.
We previously relied on chunk_cat(dst, b_fromist(src)) for this but it
is not reliable as the allocated buffer is inside the expression and
may be on a temporary stack. While it's possible to allocate stack space
for a struct and return a pointer to it, it's not possible to initialize
it form a temporary variable to prevent arguments from being evaluated
multiple times. Since this is only used to append an ist after a chunk,
let's instead have a chunk_istcat() function to perform exactly this
from a native ist.
The only call place (URI computation in the cache) was updated.
Debug commands will usually mark the fate of the process. We'd rather
have them counted and visible in a core or in stats output than trying
to guess how a flag combination could happen. The counter is only
incremented when the command is about to be issued however, so that
failed attempts are ignored.
Some commands like the debug ones are not enabled by default but can be
useful on some production environments. In order to avoid the temptation
of using them incorrectly, let's introduce an "expert" mode for a CLI
connection, which allows some commands to appear and be used. It is
enabled by command "expert-mode on" which is not listed by default.
8c1cddef ("MINOR: ssl: new functions duplicate and free a ckch_store")
use some OpenSSL refcount functions that were introduced in OpenSSL
1.0.2 and OpenSSL 1.1.0.
Fix the problem by introducing them in openssl-compat.h.
Fix#336.
To avoid affecting too much the traffic during a certificate update,
create the SNIs in a IO handler which yield every 10 ckch instances.
This way haproxy continues to respond even if we tries to update a
certificate which have 50 000 instances.
As reported in issue #335, a lot of contention happens on the PATLRU lock
when performing expensive regex lookups. This is absurd since the purpose
of the LRU cache was to have a fast cache for expressions, thus the cache
must not be shared between threads and must remain lockless.
This commit makes the LRU cache thread-local and gets rid of the PATLRU
lock. A test with 7 threads on 4 cores climbed from 67kH/s to 369kH/s,
or a scalability factor of 5.5.
Given the huge performance difference and the regression caused to
users migrating from processes to threads, this should be backported at
least to 2.0.
Thanks to Brian Diekelman for his detailed report about this regression.
Tasklets may be woken up to run on the calling thread or by a specific thread
(the owner). But since we use a non-thread safe mechanism when the calling
thread is also the for the owner, there may sometimes be collisions when two
threads decide to wake the same tasklet up at the same time and one of them
is the owner.
This is more of a matter of usage than code, in that a tasklet usually is
designed to be woken up and executed on the calling thread only (most cases)
or on a specific thread. Thus it is a property of the tasklet itself as this
solely depends how the code is constructed around it.
This patch performs a small change to address this. By default tasklet_new()
creates a "local" tasklet, which will run on the calling thread, like in 2.0.
This is done by setting tl->tid to a negative value. If the caller wants the
tasklet to run exclusively on a specific thread, it just has to set tl->tid,
which is already what shared tasklet callers do anyway.
No backport is needed.
The use of ~(1 << tid) to compute the sleeping_mask in tasklet_wakeup()
will result in breakage above 32 threads, because (1<<31) = 0xFFFFFFFF8000000,
and upper values will lead to theorically undefined results, but practically
will wrap over 0x1 to 0x80000000 again and indicate wrong sleeping masks. It
seems that the main visible effect maybe extra latency on some threads or
short CPU loops on others.
No backport is needed.
In MT_LIST_BEHEAD(), explicitely set the next element of the prev to NULL,
instead of setting it to the prev of the next. If we only had one element,
then we'd set the next and the prev to the element itself, and thus it would
make the element appear to be outside any list.
A lot of our chunk-based functions are able to work on a buffer pointer
but not on an ist. Instead of duplicating all of them to also take an
ist as a source, let's have a macro to make a temporary dummy buffer
from an ist. This will only result in structure field manipulations
that the compiler will quickly figure to eliminate them with inline
functions, and in other cases it will just use 4 words in the stack
before calling a function, instead of performing intermediary
conversions.
The flag HTX_FL_PROXY_RESP is now set on responses generated by HAProxy,
excluding responses returned by applets and services. It is an informative flag
set by the applicative layer.
It currently is not possible to figure the exact haproxy version from a
core file for the sole reason that the version is stored into a const
string and as such ends up in the .text section that is not part of a
core file. By turning them into variables we move them to the data
section and they appear in core files. In order to help finding them,
we just prepend an extra variable in front of them and we're able to
immediately spot the version strings from a core file:
$ strings core | fgrep -A2 'HAProxy version'
HAProxy version follows
2.1-dev2-e0f48a-88
2019/10/15
(These are haproxy_version and haproxy_date respectively). This may be
backported to 2.0 since this part is not support to impact anything but
the developer's time spent debugging.
When raw data are copied or appended in a chunk, the result must not exceed the
chunk size but it can reach it. Unlike functions to copy or append a string,
there is no terminating null byte.
This patch must be backported as far as 1.8. Note in 1.8, the functions
chunk_cpy() and chunk_cat() don't exist.
Don't try to load the files containing the issuer and the OCSP response
each time we generate a SSL_CTX.
The .ocsp and the .issuer are now loaded in the struct
cert_key_and_chain only once and then loaded from this structure when
creating a SSL_CTX.
Don't try to load the file containing the sctl each time we generate a
SSL_CTX.
The .sctl is now loaded in the struct cert_key_and_chain only once and
then loaded from this structure when creating a SSL_CTX.
Note that this now make possible the use of sctl with multi-cert
bundles.
$ echo -e "set ssl cert certificate.pem <<\n$(cat certificate2.pem)\n" | \
socat stdio /var/run/haproxy.stat
Certificate updated!
The operation is locked at the ckch level with a HA_SPINLOCK_T which
prevents the ckch architecture (ckch_store, ckch_inst..) to be modified
at the same time. So you can't do a certificate update at the same time
from multiple CLI connections.
SNI trees are also locked with a HA_RWLOCK_T so reading operations are
locked only during a certificate update.
Bundles are supported but you need to update each file (.rsa|ecdsa|.dsa)
independently. If a file is used in the configuration as a bundle AND
as a unique certificate, both will be updated.
Bundles, directories and crt-list are supported, however filters in
crt-list are currently unsupported.
The code tries to allocate every SNIs and certificate instances first,
so it can rollback the operation if that was unsuccessful.
If you have too much instances of the certificate (at least 20000 in my
tests on my laptop), the function can take too much time and be killed
by the watchdog. This will be fixed later. Also with too much
certificates it's possible that socat exits before the end of the
generation without displaying a message, consider changing the socat
timeout in this case (-t2 for example).
The size of the certificate is currently limited by the maximum size of
a payload, that must fit in a buffer.
In order to allow the creation of sni_ctx in runtime, we need to split
the function to allow rollback.
We need to be able to allocate all sni_ctxs required before inserting
them in case we need to rollback if we didn't succeed the allocation.
The function was splitted in 2 parts.
The first one ckch_inst_add_cert_sni() allocates a struct sni_ctx, fill
it with the right data and insert it in the ckch_inst's list of sni_ctx.
The second will take every sni_ctx in the ckch_inst and insert them in
the bind_conf's sni tree.
struct ckch_inst represents an instance of a certificate (ckch_node)
used in a bind_conf. Every sni_ctx created for 1 ckch_node in a
bind_conf are linked in this structure.
This patch allocate the ckch_inst for each bind_conf and inserts the
sni_ctx in its linked list.
As using an mt_list for the tasklet list is costly, instead use a regular list,
but add an mt_list for tasklet woken up by other threads, to be run on the
current thread. At the beginning of process_runnable_tasks(), we just take
the new list, and merge it into the task_list.
This should give us performances comparable to before we started using a
mt_list, but allow us to use tasklet_wakeup() from other threads.
This macro atomically cuts the head of a list and returns the list
of elements as a detached list, meaning that they're all linked
together without any head. If the list was empty, NULL is returned.
Several times some users have expressed the non-intuitive aspect of some
of our stat/info metrics and suggested to add some help. This patch
replaces the char* arrays with an array of name_desc so that we now have
some reserved room to store a description with each stat or info field.
These descriptions are currently empty and not reported yet.
Now "show info" and "show stat" can parse "desc" as an output format
modifier that will be passed down the chain to add some descriptions
to the fields depending on the format in use. For now it is not
exploited.
This flag is used to decide to show the check box in front of a proxy
on the HTML stat page. It is always equal to STAT_ADMIN except when the
proxy has no backend capability (i.e. a pure frontend) or has no server,
in which case it's only used to avoid leaving an empty column at the
beginning of the table. Not only this is pretty useless, but it also
causes the columns not to align well when mixing multiple proxies with
or without servers.
Let's simply always use STAT_ADMIN and get rid of this flag.
We used to rely on some config flags defined in uri_auth.h set during
parsing, and another set of STAT_* flags defined in stats.h set at run
time, with a somewhat gray area between the two sets. This is confusing
in the stats code as both are called "flags" in various functions and
it's quite hard to know which one describes what.
This patch cleans this up by replacing all ST_* by a newly assigned
value from the STAT_* set so that we can now use unified flags to
describe both the configuration and the current state. There is no
functional change at all.
This flag was added in 1.4-rc1 by commit 329f74d463 ("[BUG] uri_auth: do
not attemp to convert uri_auth -> http-request more than once") to
address the case where two proxies inherit the stats settings from
the defaults instance, and the first one compiles the expression while
the second one uses it. In this case since they use the exact same
uri_auth pointer, only the first one should compile and the second one
must not fail the check. This was addressed by adding an ST_CONVDONE
flag indicating that the expression conversion was completed and didn't
need to be done again. But this is a hack and it becomes cumbersome in
the middle of the other flags which are all relevant to the stats
applet. Let's instead fix it by checking if we're dealing with an
alias of the defaults instance and refrain from compiling this twice.
This allows us to remove the ST_CONVDONE flag.
A typical config requiring this check is :
defaults
mode http
stats auth foo:bar
listen l1
bind :8080
listen l2
bind :8181
Without this (or previous) check it would cmoplain when checking l2's
validity since the rule was already built.
The function http_get_authority() may be used to parse a URI and looks for the
authority, between the scheme and the path. An option may be used to skip the
user info (part before the '@'). Most of time, the user info will be ignored.
The first flag, HTX_SL_F_HAS_AUTHORITY, is set when the uri contains an
authority. For the H1, it happens when a CONNECT request is received or when an
absolute uri is used. For the H2, it happens when the pseudo header ":authority"
is provided.
The second one, HTX_SL_F_NORMALIZED_URI, is set when the received uri is
represented as an absolute uri because of the protocol requirements. For now, it
is only used for h2 requests, when the pseudo headers :authority and :scheme are
found. Internally, the uri is represented as an absolute uri. This flag allows
us to make the difference between an absolute uri in h1 and h2.
This function now dumps info about the HTX message into a buffer, passed as
argument. In addition, it is possible to only dump meta information, without the
message content.
This function now uses the address of the pointer to the htx message where the
copy must be performed. This way, when a zero-copy is performed, there is no
need to refresh the caller's htx message. It is a bit easier to do that way,
especially to add traces in the mux-h1.
William reported that since commit 6b3089856f ("MEDIUM: fd: do not use
the FD_POLL_* flags in the pollers anymore") the master's CLI often
fails to access sub-processes. There are two causes to this. One is
that we did report FD_POLL_ERR on an FD as soon as FD_EV_SHUT_W was
seen, which is automatically inherited from POLLHUP. And since we do
not store the current shutdown state of an FD we can't know if the
poller reports a sudden close resulting from an error or just a
byproduct of a previous shutdown(WR) followed by a read0. The current
patch addresses this by only considering this when the FD was active,
since a shutdown FD is not active. The second issue is that *somewhere*
down the chain, channel data are ignored if an error is reported on a
channel. This results in content truncation, but this cause was not
figured yet.
No backport is needed.
It is now possible to format stats counters as floats. But the stats applet does
not use it.
This patch is required by the Prometheus exporter to send the time averages in
seconds. If the promex change is backported, this patch must be backported
first.
A different engine-id is now generated for each thread. So, it is possible to
enable the async mode with several threads.
This patch may be backported to older versions.
We often need ISO time + microseconds in traces and ring buffers, thus
function does this by calling gettimeofday() and keeping a cached value
of the part representing the tv_sec value, and only rewrites the microsecond
part. The cache is per-thread so it's lockless and safe to use as-is.
Some tests already show that it's easy to see 3-4 events in a single
microsecond, thus it's likely that the nanosecond version will have to
be implemented as well. But certain comments on the net suggest that
some parsers are having trouble beyond microsecond, thus for now let's
stick to the microsecond only.
Now that we can wake tasklet for other threads, make sure that if the thread
is sleeping, we wake it up, or the tasklet won't be executed until it's
done sleeping.
That also means that, before going to sleep, and after we put our bit
in sleeping_thread_mask, we have to check that nobody added a tasklet for
us, just checking for global_tasks_mask isn't enough anymore.
The aim is to rassemble all scheduler information related to the current
thread. It simply points to task_per_thread[tid] without having to perform
the operation at each time. We save around 1.2 kB of code on performance
sensitive paths and increase the request rate by almost 1%.
There are a number of tests there which are enforced on tasklets while
they will never apply (various handlers, destroyed task or not, arguments,
results, ...). Instead let's have a single TASK_IS_TASKLET() test and call
the tasklet processing function directly, skipping all the rest.
It now appears visible that the only unneeded code is the update to
curr_task that is never used for tasklets, except for opportunistic
reporting in the debug handler, which can only catch si_cs_io_cb,
which in practice doesn't appear in any report so the extra cost
incurred there is pointless.
This change alone removes 700 bytes of code, mostly in
process_runnable_tasks() and increases the performance by about
1%.