In MT_LIST_DEL_SAFE(), when the code was changed to use a temporary variable
instead of using the provided pointer directly, we shouldn't have changed
the code that set the pointer to NULL, as we really want the pointer
provided to be nullified, otherwise other parts of the code won't know
we just deleted an element, and bad things will happen.
This should be backported to 2.1.
The splice() syscall has been supported in glibc since version 2.5 issued
in 2006 and is present on supported systems so there's no need for having
our own arch-specific syscall definitions anymore.
This was made to support epoll on patched 2.4 kernels, and on early 2.6
using alternative libcs thanks to the arch-specific syscall definitions.
All the features we support have been around since 2.6.2 and present in
glibc since 2.3.2, neither of which are found in field anymore. Let's
simply drop this and use epoll normally.
The accept4() syscall has been present for a while now, there is no more
reason for maintaining our own arch-specific syscall implementation for
systems lacking it in libc but having it in the kernel.
This was introduced 10 years ago to squeeze a few CPU cycles per syscall
on 32-bit x86 machines and was already quite old by then, requiring to
explicitly enable support for this in the kernel. We don't even know if
it still builds, let alone if it works at all on recent kernels! Let's
completely drop this now.
We currently have two UUID generation functions, one for the sample
fetch and the other one in the SPOE filter. Both were a bit complicated
since they were made to support random() implementations returning an
arbitrary number of bits, and were throwing away 33 bits every 64. Now
we don't need this anymore, so let's have a generic function consuming
64 bits at once and use it as appropriate.
This is the replacement of failed attempt to add thread safety and
per-process sequences of random numbers initally tried with commit
1c306aa84d ("BUG/MEDIUM: random: implement per-thread and per-process
random sequences").
This new version takes a completely different approach and doesn't try
to work around the horrible OS-specific and non-portable random API
anymore. Instead it implements "xoroshiro128**", a reputedly high
quality random number generator, which is one of the many variants of
xorshift, which passes all quality tests and which is described here:
http://prng.di.unimi.it/
While not cryptographically secure, it is fast and features a 2^128-1
period. It supports fast jumps allowing to cut the period into smaller
non-overlapping sequences, which we use here to support up to 2^32
processes each having their own, non-overlapping sequence of 2^96
numbers (~7*10^28). This is enough to provide 1 billion randoms per
second and per process for 2200 billion years.
The implementation was made thread-safe either by using a double 64-bit
CAS on platforms supporting it (x86_64, aarch64) or by using a local
lock for the time needed to perform the shift operations. This ensures
that all threads pick numbers from the same pool so that it is not
needed to assign per-thread ranges. For processes we use the fast jump
method to advance the sequence by 2^96 for each process.
Before this patch, the following config:
global
nbproc 8
frontend f
bind :4445
mode http
log stdout format raw daemon
log-format "%[uuid] %pid"
redirect location /
Would produce this output:
a4d0ad64-2645-4b74-b894-48acce0669af 12987
a4d0ad64-2645-4b74-b894-48acce0669af 12992
a4d0ad64-2645-4b74-b894-48acce0669af 12986
a4d0ad64-2645-4b74-b894-48acce0669af 12988
a4d0ad64-2645-4b74-b894-48acce0669af 12991
a4d0ad64-2645-4b74-b894-48acce0669af 12989
a4d0ad64-2645-4b74-b894-48acce0669af 12990
82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12987
82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12992
82d5f6cd-f6c1-4f85-a89c-36ae85d26fb9 12986
(...)
And now produces:
f94b29b3-da74-4e03-a0c5-a532c635bad9 13011
47470c02-4862-4c33-80e7-a952899570e5 13014
86332123-539a-47bf-853f-8c8ea8b2a2b5 13013
8f9efa99-3143-47b2-83cf-d618c8dea711 13012
3cc0f5c7-d790-496b-8d39-bec77647af5b 13015
3ec64915-8f95-4374-9e66-e777dc8791e0 13009
0f9bf894-dcde-408c-b094-6e0bb3255452 13011
49c7bfde-3ffb-40e9-9a8d-8084d650ed8f 13014
e23f6f2e-35c5-4433-a294-b790ab902653 13012
There are multiple benefits to using this method. First, it doesn't
depend anymore on a non-portable API. Second it's thread safe. Third it
is fast and more proven than any hack we could attempt to try to work
around the deficiencies of the various implementations around.
This commit depends on previous patches "MINOR: tools: add 64-bit rotate
operators" and "BUG/MEDIUM: random: initialize the random pool a bit
better", all of which will need to be backported at least as far as
version 2.0. It doesn't require to backport the build fixes for circular
include files dependecy anymore.
This reverts commit 1c306aa84d.
It breaks the build on all non-glibc platforms. I got confused by the
man page (which possibly is the most confusing man page I've ever read
about a standard libc function) and mistakenly understood that random_r
was portable, especially since it appears in latest freebsd source as
well but not in released versions, and with a slightly different API :-/
We need to find a different solution with a fallback. Among the
possibilities, we may reintroduce this one with a fallback relying on
locking around the standard functions, keeping fingers crossed for no
other library function to call them in parallel, or we may also provide
our own PRNG, which is not necessarily more difficult than working
around the totally broken up design of the portable API.
As mentioned in previous patch, the random number generator was never
made thread-safe, which used not to be a problem for health checks
spreading, until the uuid sample fetch function appeared. Currently
it is possible for two threads or processes to produce exactly the
same UUID. In fact it's extremely likely that this will happen for
processes, as can be seen with this config:
global
nbproc 8
frontend f
bind :4445
mode http
log stdout daemon format raw
log-format "%[uuid] %pid"
redirect location /
It typically produces this log:
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30645
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30641
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30644
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30639
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30646
07764439-c24d-4e6f-a5a6-0138be59e7a8 30645
07764439-c24d-4e6f-a5a6-0138be59e7a8 30639
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30643
07764439-c24d-4e6f-a5a6-0138be59e7a8 30646
b6773fdd-678f-4d04-96f2-4fb11ad15d6b 30646
551ce567-0bfb-4bbd-9b58-cdc7e9365325 30642
07764439-c24d-4e6f-a5a6-0138be59e7a8 30642
What this patch does is to use a distinct per-thread and per-process
seed to make sure the same sequences will not appear, and will then
extend these seeds by "burning" a number of randoms that depends on
the global random seed, the thread ID and the process ID. This adds
roughly 20 extra bits of randomness, resulting in 52 bits total per
thread and per process.
It only takes a few milliseconds to burn these randoms and given
that threads start with a different seed, we know they will not
catch each other. So these random extra bits are essentially added
to ensure randomness between boots and cluster instances.
This replaces all uses of random() with ha_random() which uses the
thread-local state.
This must be backported as far as 2.0 or any version having the
UUID sample-fetch function since it's the main victim here.
It's important to note that this patch, in addition to depending on
the previous one "BUG/MEDIUM: init: initialize the random pool a bit
better", also depends on the preceeding build fixes to address a
circular dependency issue in the include files that prevented it
from building. Part or all of these patches may need to be backported
or adapted as well.
The htx_find_offset() function may be used to look for a block at a specific
offset in an HTX message, starting from the message head. A compound result is
returned, an htx_ret structure, with the found block and the position of the
offset in the block. If the offset is ouside of the HTX message, the returned
block is NULL.
The b_insert_blk() function may now be used to insert a string, given a pointer
and the string length, at an absolute offset in a buffer, moving data between
this offset and the buffer's tail just after the end of the inserted string. The
buffer's length is automatically updated. This function supports wrapping. All
the string is copied or nothing. So it returns 0 if there are not enough space
to perform the copy. Otherwise, the number of bytes copied is returned.
`istalloc` allocates memory and returns an `ist` with the size `0` that points
to this allocation.
`istfree` frees the pointed memory and clears the pointer.
A test on FreeBSD with clang 4 to 8 produces this on a call to a
spinning loop on the CLI:
call trace(5):
| 0x53e2bc [eb 16 48 63 c3 48 c1 e0]: wdt_handler+0x10c
| 0x800e02cfe [e8 5d 83 00 00 8b 18 8b]: libthr:pthread_sigmask+0x53e
with our own function it correctly produces this:
call trace(20):
| 0x53e2dc [eb 16 48 63 c3 48 c1 e0]: wdt_handler+0x10c
| 0x800e02cfe [e8 5d 83 00 00 8b 18 8b]: libthr:pthread_sigmask+0x53e
| 0x800e022bf [48 83 c4 38 5b 41 5c 41]: libthr:pthread_getspecific+0xdef
| 0x7ffffffff003 [48 8d 7c 24 10 6a 00 48]: main+0x7fffffb416f3
| 0x801373809 [85 c0 0f 84 6f ff ff ff]: libc:__sys_gettimeofday+0x199
| 0x801373709 [89 c3 85 c0 75 a6 48 8b]: libc:__sys_gettimeofday+0x99
| 0x801371c62 [83 f8 4e 75 0f 48 89 df]: libc:gettimeofday+0x12
| 0x51fa0a [48 89 df 4c 89 f6 e8 6b]: ha_thread_dump_all_to_trash+0x49a
| 0x4b723b [85 c0 75 09 49 8b 04 24]: mworker_cli_sockpair_new+0xd9b
| 0x4b6c68 [85 c0 75 08 4c 89 ef e8]: mworker_cli_sockpair_new+0x7c8
| 0x532f81 [4c 89 e7 48 83 ef 80 41]: task_run_applet+0xe1
So let's add clang+x86_64 to the list of platforms that will use our
simplified version. As a bonus it will not require to link with
-lexecinfo on FreeBSD and will work out of the box when passing
USE_BACKTRACE=1.
It happens that on aarch64 backtrace() only returns one entry (tested
with gcc 4.7.4, 5.5.0 and 7.4.1). Probably that it refrains from unwinding
the stack due to the risk of hitting a bad pointer. Here we can use
may_access() to know when it's safe, so we can actually unwind the stack
without taking risks. It happens that the faulting function (the one
just after the signal handler) is not listed here, very likely because
the signal handler uses a special stack and did not create a new frame.
So this patch creates a new my_backtrace() function in standard.h that
either calls backtrace() or does its own unrolling. The choice depends
on HA_HAVE_WORKING_BACKTRACE which is set in compat.h based on the build
target.
We use various hacks at a few places to try to identify known function
pointers in debugging outputs (show threads & show fd). Let's centralize
this into a new function dedicated to this. It already knows about the
functions matched by "show threads" and "show fd", and when built with
USE_DL, it can rely on dladdr1() to resolve other functions. There are
some limitations, as static functions are not resolved, linking with
-rdynamic is mandatory, and even then some functions will not necessarily
appear. It's possible to do a better job by rebuilding the whole symbol
table from the ELF headers in memory but it's less portable and the gains
are still limited, so this solution remains a reasonable tradeoff.
This function dumps <n> bytes from <addr> in hex form into buffer <buf>
enclosed in brackets after the address itself, formatted on 14 chars
including the "0x" prefix. This is meant to be used as a prefix for code
areas. For example: "0x7f10b6557690 [48 c7 c0 0f 00 00 00 0f]: "
It relies on may_access() to know if the bytes are dumpable, otherwise "--"
is emitted. An optional prefix is supported.
This lock was only needed to protect the buffer_wq list, but now we have
the mt_list for this. This patch simply turns the buffer_wq list to an
mt_list and gets rid of the lock.
It's worth noting that the whole buffer_wait thing still looks totally
wrong especially in a threaded context: the wakeup_cb() callback is
called synchronously from any thread and may end up calling some
connection code that was not expected to run on a given thread. The
whole thing should probably be reworked to use tasklets instead and be
a bit more centralized.
This commit adds ALWAYS_ALIGN(), MAYBE_ALIGN() and ATOMIC_ALIGN() to
be placed as delimitors inside structures to force alignment to a
given size. These depend on the architecture's capabilities so that
it is possible to always align, align only on archs not supporting
unaligned accesses at all, or only on those not supporting them for
atomic accesses (e.g. before a lock).
The isalnum(), isalpha(), isdigit() etc functions from ctype.h are
supposed to take an int in argument which must either reflect an
unsigned char or EOF. In practice on some platforms they're implemented
as macros referencing an array, and when passed a char, they either cause
a warning "array subscript has type 'char'" when lucky, or cause random
segfaults when unlucky. It's quite unconvenient by the way since none of
them may return true for negative values. The recent introduction of
cygwin to the list of regularly tested build platforms revealed a lot
of breakage there due to the same issues again.
So this patch addresses the problem all over the code at once. It adds
unsigned char casts to every valid use case, and also drops the unneeded
double cast to int that was sometimes added on top of it.
It may be backported by dropping irrelevant changes if that helps better
support uncommon platforms. It's unlikely to fix bugs on platforms which
would already not emit any warning though.
This used to be a minor optimization on ix86 where registers are scarce
and the calling convention not very efficient, but this platform is not
relevant enough anymore to warrant all this dirt in the code for the sake
of saving 1 or 2% of performance. Modern platforms don't use this at all
since their calling convention already defaults to using several registers
so better get rid of this once for all.
This patch turns the double negation of 'not unlikely' into 'likely'
and then turns the negation of 'not smaller' into 'greater or equal'
in an attempt to improve readability of the condition.
[wt: this was not a bug but purposely written like this to improve code
generation on older compilers but not needed anymore as described here:
https://www.mail-archive.com/haproxy@formilux.org/msg36392.html ]
We used to special-case the likely()/unlikely() macros for a series of
early gcc 4.x compilers which used to produce very bad code when using
__builtin_expect(x,1), which basically used to build an integer (0 or 1)
from a condition then compare it to integer 1. This was already fixed in
5.x, but even now, looking at the code produced by various flavors of 4.x
this bad behavior couldn't be witnessed anymore. So let's consider it as
fixed by now, which will allow to get rid of some ugly tricks at some
specific places. A test on 4.7.4 shows that the code shrinks by about 3kB
now, thanks to some tests being inlined closer to the call place and the
unlikely case being moved to real functions. See the link below for more
background on this.
Link: https://www.mail-archive.com/haproxy@formilux.org/msg36392.html
These ones are irrelevant to the config but rather to the platform, and
as such are better placed in compiler.h.
Here we take the opportunity for declaring a few extra capabilities:
- HA_UNALIGNED : CPU supports unaligned accesses
- HA_UNALIGNED_LE : CPU supports unaligned accesses in little endian
- HA_UNALIGNED_FAST : CPU supports fast unaligned accesses
- HA_UNALIGNED_ATOMIC : CPU supports unaligned accesses in atomics
This will help remove a number of #ifdefs with arch-specific statements.
Add a function that finds a character in an ist and returns an
updated ist with the length of the portion of the original string
that doesn't contain the char.
Might be backported to 2.1
The function is_idchar() was added by commit 36f586b ("MINOR: tools:
add is_idchar() to tell if a char may belong to an identifier") to
ease matching of sample fetch/converter names. But it lacked support
for the '+' character used in "base32+src" and "url32+src". A quick
way to figure the list of supported sample fetch+converter names is
to issue the following command:
git grep '"[^"]*",.*SMP_T_.*SMP_USE_'|cut -f2 -d'"'|sort -u
No more entry is reported once searching for characters not covered
by is_idchar().
No backport is needed.
This does like chunk_strcpy() except that the maximum string length may
be limited by the caller. A trailing zero is always appended. This is
particularly handy to extract portions of strings to put into the trash
for use with libc functions requiring a nul-terminated string.
While looking for other occurrences of do { continue; } while (0) I
found these few leftovers in mini-clist where an outer loop was made
around "do { } while (0)" then another loop was placed inside just to
handle the continue. Let's clean this up by just removing the outer
one. Most of the patch is only the inner part of the loop that is
reindented. It was verified that the resulting code is the same.
the htx_append_msg() function can now be used to append an HTX message to
another one. All the message is copied or nothing. If an error occurs during the
copy, all changes are rolled back.
This patch is mandatory to fix a bug in http_reply_and_close() function. Be
careful to backport it first.
In __pool_get_first(), don't forget to unlock the pool lock if the pool is
empty, otherwise no writer will be able to take the lock, and as it is done
when reloading, it leads to an infinite loop on reload.
This should be backported with commit 04f5fe87d3
When using lockless pools, add a new rwlock, flush_pool. read-lock it when
getting memory from the pool, so that concurrenct access are still
authorized, but write-lock it when we're about to free memory, in
pool_flush() and pool_gc().
The problem is, when removing an item from the pool, we unreference it
to get the next one, however, that pointer may have been free'd in the
meanwhile, and that could provoke a crash if the pointer has been unmapped.
It should be OK to use a rwlock, as normal operations will still be able
to access the pool concurrently, and calls to pool_flush() and pool_gc()
should be pretty rare.
This should be backported to 2.1, 2.0 and 1.9.
"set ssl cert <filename> <payload>" CLI command should have the same
result as reload HAproxy with the updated pem file (<filename>).
Is not the case, DHparams/cert-chain is kept from the previous
context if no DHparams/cert-chain is set in the context (<payload>).
This patch should be backport to 2.1
For complex stick tables with many entries/columns, it can be beneficial
to filter using multiple criteria. The maximum number of filter entries
can be controlled by defining STKTABLE_FILTER_LEN during build time.
This patch can be backported to older releases.
while working on issue #429, I encountered build failures with various
non-released openssl versions, let us improve ssl defines, switch to
features, not versions, for EVP_CTRL_AEAD_SET_IVLEN and
EVP_CTRL_AEAD_SET_TAG.
No backport is needed as there is no valid reason to build a stable haproxy
version against a development version of openssl.
Wietse Venema reported in the thread below that we have a signedness
issue with our hashes implementations: due to the use of const char*
for the input key that's often text, the crc32, sdbm, djb2, and wt6
algorithms return a platform-dependent value for binary input keys
containing bytes with bit 7 set. This means that an ARM or PPC
platform will hash binary inputs differently from an x86 typically.
Worse, some algorithms are well defined in the industry (like CRC32)
and do not provide the expected result on x86, possibly causing
interoperability issues (e.g. a user-agent would fail to compare the
CRC32 of a message body against the one computed by haproxy).
Fortunately, and contrary to the first impression, the CRC32c variant
used in the PROXY protocol processing is not affected. Thus the impact
remains very limited (the vast majority of input keys are text-based,
such as user-agent headers for exmaple).
This patch addresses the issue by fixing all hash functions' prototypes
(even those not affected, for API consistency). A reg test will follow
in another patch.
The vast majority of users do not use these hashes. And among those
using them, very few will pass them on binary inputs. However, for the
rare ones doing it, this fix MAY have an impact during the upgrade. For
example if the package is upgraded on one LB then on another one, and
the CRC32 of a binary input is used as a stick table key (why?) then
these CRCs will not match between both nodes. Similarly, if
"hash-type ... crc32" is used, LB inconsistency may appear during the
transition. For this reason it is preferable to apply the patch on all
nodes using such hashes at the same time. Systems upgraded via their
distros will likely observe the least impact since they're expected to
be upgraded within a short time frame.
And it is important for distros NOT to skip this fix, in order to avoid
distributing an incompatible implementation of a hash. This is the
reason why this patch is tagged as MAJOR, eventhough it's extremely
unlikely that anyone will ever notice a change at all.
This patch must be backported to all supported branches since the
hashes were introduced in 1.5-dev20 (commit 98634f0c). Some parts
may be dropped since implemented later.
Link to Wietse's report:
https://marc.info/?l=postfix-users&m=157879464518535&w=2
SSL_CTX_set_ecdh_auto() is not defined when OpenSSL 1.1.1 is compiled
with the no-deprecated option. Remove existing, incomplete guards and
add a compatibility macro in openssl-compat.h, just as OpenSSL does:
bf4006a6f9/include/openssl/ssl.h (L1486)
This should be backported as far as 2.0 and probably even 1.9.
LIBRESSL_VERSION_NUMBER evaluates to 0 under OpenSSL, making the condition
always true. Check for the define before checking it.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
[wt: to be backported as far as 1.9]
Typically server line like:
'server-template srv 1-1000 *:443 ssl ca-file ca-certificates.crt'
load ca-certificates.crt 1000 times and stay duplicated in memory.
Same case for bind line: ca-file is loaded for each certificate.
Same 'ca-file' can be load one time only and stay deduplicated in
memory.
As a corollary, this will prevent file access for ca-file when
updating a certificate via CLI.