There was a special case made to allow ARMv6 to use unaligned accesses
via a cast in xxHash when __ARM_FEATURE_UNALIGNED is defined. But while
ARMv6 (and v7) does support unaligned accesses, it's only for 32-bit
pointers, not 64-bit ones, leading to bus errors when the compiler emits
an ldrd instruction and the input (e.g. a pattern) is not aligned, as in
issue #1035.
Note that v7 was properly using the packed approach here and was safe,
however haproxy versions 2.3 and older use the old r39 xxhash code which
has the same issue for armv7. A slightly different fix is required there,
by using a different definition of packed for 32 and 64 bits.
The problem is really visible when running v7 code on a v8 kernel because
such kernels do not implement alignment trap emulation, and the process
dies when this happens. This is why in the issue above it was only detected
under lxc. The emulation could have been disabled on v7 as well by writing
zero to /proc/cpu/alignment though.
This commit is a backport of xxhash commit a470f2ef ("update default memory
access for armv6").
Thanks to @srkunze for the report and tests, @stgraber for his help on
setting up an easy reproducer outside of lxc, and @Cyan4973 for the
discussion around the best way to fix this. Details and alternate patches
available on https://github.com/Cyan4973/xxHash/issues/490.
This is from the output of codespell. It's done at once over a bunch
of files and only affects comments, so there is nothing user-visible.
No backport needed.
This way we make all xxhash functions inline, with implementations being
directly included within xxhash.h.
Makefile is updated as well, since we don't need to compile and link
xxhash.o anymore.
Inlining should improve performance on small data inputs.
A new XXH3 variant of hash functions shows a noticeable improvement in
performance (especially on small data), and also brings 128-bit support,
better inlining and streaming capabilities.
Performance comparison is available here:
https://github.com/Cyan4973/xxHash/wiki/Performance-comparison
As Ilya reported in issue #998, gcc 11 complains about misleading code
indentation which is in fact caused by dead assignments to zero after
a loop which stops on zero. Let's clean both of these.
As suggested by @AGSaidi in issue #958, on ARMv8 its convenient to use
an "isb" instruction in pl_cpu_relax() to improve fairness. Without it
I've met a few watchdog conditions on valid locks with 16 threads,
indicating that some threads couldn't manage to get it in 2 seconds. I
never happened again with it. In addition, the performance increased
by slightly more than 5% thanks to the reduced contention.
This should be backported as far as 2.2, possibly even 2.0.
As reported in issue #689, there is a subtle bug in the ebtree code used
to compared memory blocks. It stems from the platform-dependent memcmp()
implementation. Original implementations used to perform a byte-per-byte
comparison and to stop at the first non-matching byte, as in this old
example:
https://www.retro11.de/ouxr/211bsd/usr/src/lib/libc/compat-sys5/memcmp.c.html
The ebtree code has been relying on this to detect the first non-matching
byte when comparing keys. This is made so that a zero-terminated string
can fail to match against a longer string.
Over time, especially with large busses and SIMD instruction sets,
multi-byte comparisons have appeared, making the processor fetch bytes
past the first different byte, which could possibly be a trailing zero.
This means that it's possible to read past the allocated area for a
string if it was allocated by strdup().
This is not correct and definitely confuses address sanitizers. In real
life the problem doesn't have visible consequences. Indeed, multi-byte
comparisons are implemented so that aligned words are loaded (e.g. 512
bits at once to process a cache line at a time). So there is no way such
a multi-byte access will cross a page boundary and end up reading from
an unallocated zone. This is why it was never noticed before.
This patch addresses this by implementing a one-byte-at-a-time memcmp()
variant for ebtree, called eb_memcmp(). It's optimized for both small and
long strings and guarantees to stop after the first non-matching byte. It
only needs 5 instructions in the loop and was measured to be 3.2 times
faster than the glibc's AVX2-optimized memcmp() on short strings (1 to
257 bytes), since that latter one comes with a significant setup cost.
The break-even seems to be at 512 bytes where both version perform
equally, which is way longer than what's used in general here.
This fix should be backported to stable versions and reintegrated into
the ebtree code.
Fortunately that file wasn't made dependent upon haproxy since it was
integrated, better isolate it before it's too late. Its dependency on
api.h was the result of the change from config.h, which in turn wasn't
correct. It was changed back to stddef.h for size_t and sys/types.h for
ssize_t. The recently added reference to MAX() was changed as it was
placed only to avoid a zero length in the non-free-standing version and
was causing a build warning in the hpack encoder.
All files that were including one of the following include files have
been updated to only include haproxy/api.h or haproxy/api-t.h once instead:
- common/config.h
- common/compat.h
- common/compiler.h
- common/defaults.h
- common/initcall.h
- common/tools.h
The choice is simple: if the file only requires type definitions, it includes
api-t.h, otherwise it includes the full api.h.
In addition, in these files, explicit includes for inttypes.h and limits.h
were dropped since these are now covered by api.h and api-t.h.
No other change was performed, given that this patch is large and
affects 201 files. At least one (tools.h) was already freestanding and
didn't get the new one added.
This is where other imported components are located. All files which
used to directly include ebtree were touched to update their include
path so that "import/" is now prefixed before the ebtree-related files.
The ebtree.h file was slightly adjusted to read compiler.h from the
common/ subdirectory (this is the only change).
A build issue was encountered when eb32sctree.h is loaded before
eb32tree.h because only the former checks for the latter before
defining type u32. This was addressed by adding the reverse ifdef
in eb32tree.h.
No further cleanup was done yet in order to keep changes minimal.
[ plock commit 4c53fd3a0b2b1892817cebd0db012a52f4087850 ]
Pieter Baauw reported a build issue affecting haproxy after plock was
included. It happens that expressions of the form :
if ((const) ? (expr1) : (expr2))
do_something()
always produce code for both expr1 and expr2 on Clang when building
without optimization. The resulting asm code is even funny, basically
doing :
mov reg, 1
cmp reg, 1
...
This causes our sizeof() tests to fail to build because we purposely
dereference a fake function that reports the location and nature of the
inconsistency, but this fake function appears in the object code despite
all conditions being there to avoid it.
However the compiler is still smart enough to optimize away code doing
if (const)
do_something()
So we simply repeat the condition before do_something(), and the dummy
function is not referenced anymore unless really required.
[ plock commit 61e255286ae32e83e1a3174dd7c49eda99880a8b]
There are a few inlines such as pl_barrier() and pl_cpu_relax() which
are used a lot. Unfortunately, while building test code at -O0, inlining
is disabled and these ones are called a lot and show up a lot in any
profile, are traced into when single-stepping with a debugger, etc, thus
they are polluting the landscape. Since they're single-asm statements,
there is no reason for not turning them into macros.
The result becomes fairly visible here at -O0 :
$ size latency.inline latency.macro
text data bss dec hex filename
11431 692 656 12779 31eb treelock.inline
10967 692 656 12315 301b treelock.macro
And it was verified that regularly optimized code remains strictly identical.
[ plock commit 44081ea493dd78dab48076980e881748e9b33db5 ]
Older compilers (eg: gcc 3.4) don't provide __sync_synchronize() so let's
do it by hand on this platform.
[ plock commit b155d5c762fb9a9793911881f80e61faa6b0e889 ]
Local variables "l", "i" and "ret" were renamed "__pl_l", "__pl_i" and
"__pl_r" respectively, to limit the risk of conflicts with existing
variables in application code.
[ plock commit bfac5887ebabb8ef753b0351f162265767eb219b ]
Local variable "t" was renamed "__pl_t" to limit the risk of conflicts
with existing variables in application code.
This is based on the git SHA1 implementation and optimized to do word
accesses rather than byte accesses, and to avoid unnecessary copies into
the context array.
We replaced global.deviceatlas with global_deviceatlas since there's no need
to store all this into the global section. This removes the last #ifdefs,
and now the code is 100% self-contained in da.c. The file da.h was now
removed because it was only used to load dac.h, which is more easily
loaded directly from da.c. It provides another good example of how to
integrate code in the future without touching the core parts.
We replaced global._51degrees with global_51degrees since there's no need
to store all this into the global section. This removes the last #ifdefs,
and now the code is 100% self-contained in 51d.c. The file 51d.h was now
removed because it was only used to load 51Degrees.h, which is more easily
loaded from 51d.c. It provides a good example of how to integrate code in
the future without touching the core parts.
deinit_51degrees() is not called anymore from haproxy.c, removing
2 #ifdefs and one include. The function was made static. The include
file still includes 51Degrees.h which is needed by global.h and 51d.c
so it was not touched beyond this last function removal.
Instead of having a #ifdef in the main init code we now use the registered
init functions. Doing so also enables error checking as errors were previously
reported as alerts but ignored. Also they were incorrect as the 'status'
variable was hidden by a second one and was always reporting DA_SYS (which
is apparently an error) in every case including the case where no file was
loaded. The init_deviceatlas() function was unexported since it's not used
outside of this place anymore.
This removes some #ifdefs from the main haproxy code path. Function
init_51degrees() now returns ERR_* instead of exit(1) on error, and
this function was made static and is not exported anymore.
The only reason wurfl/wurfl.h was needed outside of wurfl.c was to expose
wurfl_handle which is a pointer to a structure, referenced by global.h.
By just storing a void* there instead, we can confine all wurfl code to
wurfl.c, which is really nice.
WURFL is a high-performance and low-memory footprint mobile device
detection software component that can quickly and accurately detect
over 500 capabilities of visiting devices. It can differentiate between
portable mobile devices, desktop devices, SmartTVs and any other types
of devices on which a web browser can be installed.
In order to add WURFL device detection support, you would need to
download Scientiamobile InFuze C API and install it on your system.
Refer to www.scientiamobile.com to obtain a valid InFuze license.
Any useful information on how to configure HAProxy working with WURFL
may be found in:
doc/WURFL-device-detection.txt
doc/configuration.txt
examples/wurfl-example.cfg
Please find more information about WURFL device detection API detection
at https://docs.scientiamobile.com/documentation/infuze/infuze-c-api-user-guide
Introduction of a new function in the LRU cache source file.
Purpose of this function is to be used to delete a number of entries in
the cache. 'number' is defined by the caller and the key removed are
taken at the tail of the tree
This was the first transparent proxy technology supported by haproxy
circa 2005 but it was obsoleted in 2007 by Tproxy 4.0 which removed a
lot of the earlier versions' shortcomings and was finally merged into
the kernel. Since nobody has been using cttproxy for many years now
and nobody has even just tried to compile the files, it's time to
remove it. The doc was updated as well.
Moved 51Degrees code from src/haproxy.c, src/sample.c and src/cfgparse.c
into a separate files src/51d.c and include/import/51d.h.
Added two new functions init_51degrees() and deinit_51degrees(), updated
Makefile and other code reorganizations related to 51Degrees.
It lookup a key in a LRU cache for use with specified domain and revision. It
differs from lru64_get as it does not create missing keys. The function returns
NULL if an error or a cache miss occurs.
Now, When a item is committed in an LRU tree, you can define a function to free
data owned by this item. This function will be called when the item is removed
from the LRU tree or when the tree is destroyed..
This diff is for the DeviceAtlas convertor.
This patch adds the following converters :
deviceatlas-json-file
deviceatlas-log-level
deviceatlas-property-separator
First, the configuration keywords handling (only the log
level configuration part does not end the haproxy process
if it is wrongly set, it fallbacks to the default level).
Furthermore, init, deinit phases and the API lookup phase,
the da_haproxy function which is fed by the input provided
and set all necessary properties chosen via the configuration
to the output, separated by the separator.
The xxhash library provides a very fast and excellent hash algorithm
suitable for many purposes. It excels at hashing large blocks but is
also extremely fast on small ones. It's distributed under a 2-clause
BSD license (GPL-compatible) so it can be included here. Updates are
distributed here :
https://github.com/Cyan4973/xxHash
This will be usable to implement some maps/acl caches for heavy datasets
loaded from files (mostly regex-based but in general anything that cannot
be indexed in a tree).
The ultree code has been removed in favor of a simpler and
cleaner ebtree implementation. The eternity queue does not
need to exist anymore, and the pool_tree64 has been removed.
The ebtree node is stored in the task itself. The qlist list
header is still used by the run-queue, but will be able to
disappear once the run-queue uses ebtree too.