BUG/MINOR: xxhash: make sure armv6 uses memcpy()

There was a special case made to allow ARMv6 to use unaligned accesses via a cast in xxHash when __ARM_FEATURE_UNALIGNED is defined. But while ARMv6 (and v7) does support unaligned accesses, it's only for 32-bit pointers, not 64-bit ones, leading to bus errors when the compiler emits an ldrd instruction and the input (e.g. a pattern) is not aligned, as in issue #1035. Note that v7 was properly using the packed approach here and was safe, however haproxy versions 2.3 and older use the old r39 xxhash code which has the same issue for armv7. A slightly different fix is required there, by using a different definition of packed for 32 and 64 bits. The problem is really visible when running v7 code on a v8 kernel because such kernels do not implement alignment trap emulation, and the process dies when this happens. This is why in the issue above it was only detected under lxc. The emulation could have been disabled on v7 as well by writing zero to /proc/cpu/alignment though. This commit is a backport of xxhash commit a470f2ef ("update default memory access for armv6"). Thanks to @srkunze for the report and tests, @stgraber for his help on setting up an easy reproducer outside of lxc, and @Cyan4973 for the discussion around the best way to fix this. Details and alternate patches available on https://github.com/Cyan4973/xxHash/issues/490.
2025-01-22 05:22:58 +00:00 · 2021-02-04 17:02:39 +01:00 · 2021-02-04 17:02:39 +01:00 · 4acb99f867
commit 4acb99f867
parent a8979a9b59
1 changed files with 7 additions and 7 deletions
--- a/include/import/xxhash.h
+++ b/include/import/xxhash.h
@ -791,15 +791,15 @@ XXH_PUBLIC_API XXH128_hash_t XXH128(const void* data, size_t len, XXH64_hash_t s
 * Method 1:
 *     `__attribute__((packed))` statement. It depends on compiler extensions
 *     and is therefore not portable.
- *     This method is safe if your compiler supports it, and *generally* as
- *     fast or faster than `memcpy`.
+ *     This method is safe _if_ your compiler supports it,
+ *     and *generally* as fast or faster than `memcpy`.
 * Method 2:
 *     Direct access via cast. This method doesn't depend on the compiler but
 *     violates the C standard.
 *     It can generate buggy code on targets which do not support unaligned
 *     memory accesses.
 *     But in some circumstances, it's the only known way to get the most
- *     performance (example: GCC + ARMv6)
+ *     performance.
 * Method 3:
 *     Byteshift. This can generate the best code on old compilers which don't
 *     inline small `memcpy()` calls, and it might also be faster on big-endian
@ -808,10 +808,10 @@ XXH_PUBLIC_API XXH128_hash_t XXH128(const void* data, size_t len, XXH64_hash_t s
 * Prefer these methods in priority order (0 > 1 > 2 > 3)
 */
 #ifndef XXH_FORCE_MEMORY_ACCESS   /* can be defined externally, on command line for example */
-#  if !defined(__clang__) && defined(__GNUC__) && defined(__ARM_FEATURE_UNALIGNED) && defined(__ARM_ARCH) && (__ARM_ARCH == 6)
-#    define XXH_FORCE_MEMORY_ACCESS 2
-#  elif !defined(__clang__) && ((defined(__INTEL_COMPILER) && !defined(_WIN32)) || \
-  (defined(__GNUC__) && (defined(__ARM_ARCH) && __ARM_ARCH >= 7)))
+   /* prefer __packed__ structures (method 1) for gcc on armv7 and armv8 */
+#  if !defined(__clang__) && ( \
+    (defined(__INTEL_COMPILER) && !defined(_WIN32)) || \
+    (defined(__GNUC__) && (defined(__ARM_ARCH) && __ARM_ARCH >= 7)) )
 #    define XXH_FORCE_MEMORY_ACCESS 1
 #  endif
 #endif