The code used to rely on BITS_PER_LONG to decide on the most efficient
way to perform a 64-bit shift, but this macro is not defined (at best
it's __BITS_PER_LONG) and it's likely that it's been like this since
the early implementation of ebtrees designed on i386. Let's remove the
test on this macro and rely on sizeof(long) instead, it also has the
benefit of letting the compiler validate the two branches.
This can be backported to all versions. Thanks to Ezequiel Garcia for
reporting this one in issue #1369.
The first item inserted into an ebtree will be inserted directly below
the root, which is a simple struct eb_root which only holds two branch
pointers (left and right).
If we try to find a duplicated entry to this first leaf through a
ebmb_next_dup, our leaf_p pointer will point to the eb_root instead of a
complete eb_node so we cannot look for the bit part of our leaf_p since
it would try to cast our eb_root into an eb_node and perform an out of
bounds access when reading "eb_root_to_node(eb_untag(t,EB_LEFT)))->bit".
This bug was found by address sanitizer running on a CRL hot update VTC
test.
Note that the bug has been there since the import of the eb_next_dup()
and eb_prev_dup() function in 1.5-dev19 by commit 2b5702030 ("MINOR:
ebtree: add new eb_next_dup/eb_prev_dup() functions to visit duplicates").
It can be backported to all stable branches.
stdint.h is not as portable as inttypes.h. It doesn't exist at least
on AIX 5.1 and Solaris 7, while inttypes.h is present there and does
include stdint.h on platforms supporting it.
This is equivalent to libslz upstream commit e36710a ("slz: use
inttypes.h instead of stdint.h")
On ARM with native CRC support, no need to inflate the executable with
a 4kB CRC table, let's just drop it.
This is slz upstream commit d8715db20b2968d1f3012a734021c0978758f911.
As we now embed the library we don't need to support the older 1.0 API
any more, so we can remove the explicit calls to slz_make_crc_table()
and slz_prepare_dist_table().
SLZ is rarely packaged by distros and there have been complaints about
the CPU and memory usage of ZLIB, leading to some suggestions to better
address the issue by simply integrating SLZ into the tree (just 3 files).
See discussions below:
https://www.mail-archive.com/haproxy@formilux.org/msg38037.htmlhttps://www.mail-archive.com/haproxy@formilux.org/msg40079.htmlhttps://www.mail-archive.com/haproxy@formilux.org/msg40365.html
This patch does just this, after minor adjustments to these files:
- tables.h was renamed to slz-tables.h
- tables.h had the precomputed tables removed since not used here
- slz.c uses includes <import/slz*> instead of "slz*.h"
The slz commit imported here was b06c172 ("slz: avoid a build warning
with -Wimplicit-fallthrough"). No other change was performed either to
SLZ nor to haproxy at this point so that this operation may be replicated
if needed for a future version.
This library is required for the subsequent patch which adds
the JSON query possibility.
It is necessary to change the include statement in "src/mjson.c"
because the imported includes in haproxy are in "include/import"
orig: #include "mjson.h"
new: #include <import/mjson.h>
This argument is not being used inside the function (and the functions
themselves are unused as well) and not documented. Its purpose is not clear.
Just remove it.
There was a special case made to allow ARMv6 to use unaligned accesses
via a cast in xxHash when __ARM_FEATURE_UNALIGNED is defined. But while
ARMv6 (and v7) does support unaligned accesses, it's only for 32-bit
pointers, not 64-bit ones, leading to bus errors when the compiler emits
an ldrd instruction and the input (e.g. a pattern) is not aligned, as in
issue #1035.
Note that v7 was properly using the packed approach here and was safe,
however haproxy versions 2.3 and older use the old r39 xxhash code which
has the same issue for armv7. A slightly different fix is required there,
by using a different definition of packed for 32 and 64 bits.
The problem is really visible when running v7 code on a v8 kernel because
such kernels do not implement alignment trap emulation, and the process
dies when this happens. This is why in the issue above it was only detected
under lxc. The emulation could have been disabled on v7 as well by writing
zero to /proc/cpu/alignment though.
This commit is a backport of xxhash commit a470f2ef ("update default memory
access for armv6").
Thanks to @srkunze for the report and tests, @stgraber for his help on
setting up an easy reproducer outside of lxc, and @Cyan4973 for the
discussion around the best way to fix this. Details and alternate patches
available on https://github.com/Cyan4973/xxHash/issues/490.
This is from the output of codespell. It's done at once over a bunch
of files and only affects comments, so there is nothing user-visible.
No backport needed.
This way we make all xxhash functions inline, with implementations being
directly included within xxhash.h.
Makefile is updated as well, since we don't need to compile and link
xxhash.o anymore.
Inlining should improve performance on small data inputs.
A new XXH3 variant of hash functions shows a noticeable improvement in
performance (especially on small data), and also brings 128-bit support,
better inlining and streaming capabilities.
Performance comparison is available here:
https://github.com/Cyan4973/xxHash/wiki/Performance-comparison
As Ilya reported in issue #998, gcc 11 complains about misleading code
indentation which is in fact caused by dead assignments to zero after
a loop which stops on zero. Let's clean both of these.
As suggested by @AGSaidi in issue #958, on ARMv8 its convenient to use
an "isb" instruction in pl_cpu_relax() to improve fairness. Without it
I've met a few watchdog conditions on valid locks with 16 threads,
indicating that some threads couldn't manage to get it in 2 seconds. I
never happened again with it. In addition, the performance increased
by slightly more than 5% thanks to the reduced contention.
This should be backported as far as 2.2, possibly even 2.0.
As reported in issue #689, there is a subtle bug in the ebtree code used
to compared memory blocks. It stems from the platform-dependent memcmp()
implementation. Original implementations used to perform a byte-per-byte
comparison and to stop at the first non-matching byte, as in this old
example:
https://www.retro11.de/ouxr/211bsd/usr/src/lib/libc/compat-sys5/memcmp.c.html
The ebtree code has been relying on this to detect the first non-matching
byte when comparing keys. This is made so that a zero-terminated string
can fail to match against a longer string.
Over time, especially with large busses and SIMD instruction sets,
multi-byte comparisons have appeared, making the processor fetch bytes
past the first different byte, which could possibly be a trailing zero.
This means that it's possible to read past the allocated area for a
string if it was allocated by strdup().
This is not correct and definitely confuses address sanitizers. In real
life the problem doesn't have visible consequences. Indeed, multi-byte
comparisons are implemented so that aligned words are loaded (e.g. 512
bits at once to process a cache line at a time). So there is no way such
a multi-byte access will cross a page boundary and end up reading from
an unallocated zone. This is why it was never noticed before.
This patch addresses this by implementing a one-byte-at-a-time memcmp()
variant for ebtree, called eb_memcmp(). It's optimized for both small and
long strings and guarantees to stop after the first non-matching byte. It
only needs 5 instructions in the loop and was measured to be 3.2 times
faster than the glibc's AVX2-optimized memcmp() on short strings (1 to
257 bytes), since that latter one comes with a significant setup cost.
The break-even seems to be at 512 bytes where both version perform
equally, which is way longer than what's used in general here.
This fix should be backported to stable versions and reintegrated into
the ebtree code.
Fortunately that file wasn't made dependent upon haproxy since it was
integrated, better isolate it before it's too late. Its dependency on
api.h was the result of the change from config.h, which in turn wasn't
correct. It was changed back to stddef.h for size_t and sys/types.h for
ssize_t. The recently added reference to MAX() was changed as it was
placed only to avoid a zero length in the non-free-standing version and
was causing a build warning in the hpack encoder.
All files that were including one of the following include files have
been updated to only include haproxy/api.h or haproxy/api-t.h once instead:
- common/config.h
- common/compat.h
- common/compiler.h
- common/defaults.h
- common/initcall.h
- common/tools.h
The choice is simple: if the file only requires type definitions, it includes
api-t.h, otherwise it includes the full api.h.
In addition, in these files, explicit includes for inttypes.h and limits.h
were dropped since these are now covered by api.h and api-t.h.
No other change was performed, given that this patch is large and
affects 201 files. At least one (tools.h) was already freestanding and
didn't get the new one added.
This is where other imported components are located. All files which
used to directly include ebtree were touched to update their include
path so that "import/" is now prefixed before the ebtree-related files.
The ebtree.h file was slightly adjusted to read compiler.h from the
common/ subdirectory (this is the only change).
A build issue was encountered when eb32sctree.h is loaded before
eb32tree.h because only the former checks for the latter before
defining type u32. This was addressed by adding the reverse ifdef
in eb32tree.h.
No further cleanup was done yet in order to keep changes minimal.
[ plock commit 4c53fd3a0b2b1892817cebd0db012a52f4087850 ]
Pieter Baauw reported a build issue affecting haproxy after plock was
included. It happens that expressions of the form :
if ((const) ? (expr1) : (expr2))
do_something()
always produce code for both expr1 and expr2 on Clang when building
without optimization. The resulting asm code is even funny, basically
doing :
mov reg, 1
cmp reg, 1
...
This causes our sizeof() tests to fail to build because we purposely
dereference a fake function that reports the location and nature of the
inconsistency, but this fake function appears in the object code despite
all conditions being there to avoid it.
However the compiler is still smart enough to optimize away code doing
if (const)
do_something()
So we simply repeat the condition before do_something(), and the dummy
function is not referenced anymore unless really required.
[ plock commit 61e255286ae32e83e1a3174dd7c49eda99880a8b]
There are a few inlines such as pl_barrier() and pl_cpu_relax() which
are used a lot. Unfortunately, while building test code at -O0, inlining
is disabled and these ones are called a lot and show up a lot in any
profile, are traced into when single-stepping with a debugger, etc, thus
they are polluting the landscape. Since they're single-asm statements,
there is no reason for not turning them into macros.
The result becomes fairly visible here at -O0 :
$ size latency.inline latency.macro
text data bss dec hex filename
11431 692 656 12779 31eb treelock.inline
10967 692 656 12315 301b treelock.macro
And it was verified that regularly optimized code remains strictly identical.
[ plock commit 44081ea493dd78dab48076980e881748e9b33db5 ]
Older compilers (eg: gcc 3.4) don't provide __sync_synchronize() so let's
do it by hand on this platform.
[ plock commit b155d5c762fb9a9793911881f80e61faa6b0e889 ]
Local variables "l", "i" and "ret" were renamed "__pl_l", "__pl_i" and
"__pl_r" respectively, to limit the risk of conflicts with existing
variables in application code.
[ plock commit bfac5887ebabb8ef753b0351f162265767eb219b ]
Local variable "t" was renamed "__pl_t" to limit the risk of conflicts
with existing variables in application code.
This is based on the git SHA1 implementation and optimized to do word
accesses rather than byte accesses, and to avoid unnecessary copies into
the context array.
We replaced global.deviceatlas with global_deviceatlas since there's no need
to store all this into the global section. This removes the last #ifdefs,
and now the code is 100% self-contained in da.c. The file da.h was now
removed because it was only used to load dac.h, which is more easily
loaded directly from da.c. It provides another good example of how to
integrate code in the future without touching the core parts.
We replaced global._51degrees with global_51degrees since there's no need
to store all this into the global section. This removes the last #ifdefs,
and now the code is 100% self-contained in 51d.c. The file 51d.h was now
removed because it was only used to load 51Degrees.h, which is more easily
loaded from 51d.c. It provides a good example of how to integrate code in
the future without touching the core parts.
deinit_51degrees() is not called anymore from haproxy.c, removing
2 #ifdefs and one include. The function was made static. The include
file still includes 51Degrees.h which is needed by global.h and 51d.c
so it was not touched beyond this last function removal.
Instead of having a #ifdef in the main init code we now use the registered
init functions. Doing so also enables error checking as errors were previously
reported as alerts but ignored. Also they were incorrect as the 'status'
variable was hidden by a second one and was always reporting DA_SYS (which
is apparently an error) in every case including the case where no file was
loaded. The init_deviceatlas() function was unexported since it's not used
outside of this place anymore.
This removes some #ifdefs from the main haproxy code path. Function
init_51degrees() now returns ERR_* instead of exit(1) on error, and
this function was made static and is not exported anymore.
The only reason wurfl/wurfl.h was needed outside of wurfl.c was to expose
wurfl_handle which is a pointer to a structure, referenced by global.h.
By just storing a void* there instead, we can confine all wurfl code to
wurfl.c, which is really nice.
WURFL is a high-performance and low-memory footprint mobile device
detection software component that can quickly and accurately detect
over 500 capabilities of visiting devices. It can differentiate between
portable mobile devices, desktop devices, SmartTVs and any other types
of devices on which a web browser can be installed.
In order to add WURFL device detection support, you would need to
download Scientiamobile InFuze C API and install it on your system.
Refer to www.scientiamobile.com to obtain a valid InFuze license.
Any useful information on how to configure HAProxy working with WURFL
may be found in:
doc/WURFL-device-detection.txt
doc/configuration.txt
examples/wurfl-example.cfg
Please find more information about WURFL device detection API detection
at https://docs.scientiamobile.com/documentation/infuze/infuze-c-api-user-guide
Introduction of a new function in the LRU cache source file.
Purpose of this function is to be used to delete a number of entries in
the cache. 'number' is defined by the caller and the key removed are
taken at the tail of the tree
This was the first transparent proxy technology supported by haproxy
circa 2005 but it was obsoleted in 2007 by Tproxy 4.0 which removed a
lot of the earlier versions' shortcomings and was finally merged into
the kernel. Since nobody has been using cttproxy for many years now
and nobody has even just tried to compile the files, it's time to
remove it. The doc was updated as well.
Moved 51Degrees code from src/haproxy.c, src/sample.c and src/cfgparse.c
into a separate files src/51d.c and include/import/51d.h.
Added two new functions init_51degrees() and deinit_51degrees(), updated
Makefile and other code reorganizations related to 51Degrees.