(Linux kernel commit eebcadfa21e66a893846ec168fb7c26a46a510cd).
Fix crc32c-pcl-intel-asm_64.S to access 32-bit arguments as 32-bit
values instead of 64-bit, since the upper bits of the corresponding
64-bit registers are not guaranteed to be zero. Also update the type of
the length argument to be unsigned int rather than int, as the assembly
code treats it as unsigned.
Note: there haven't been any reports of this bug actually causing
incorrect behavior. Neither gcc nor clang guarantee zero-extension to
64 bits, but zero-extension is likely to happen in practice because most
instructions that operate on 32-bit registers zero-extend to 64 bits.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David Sterba <dsterba@suse.com>
The accelerated crc32c needs to check for two CPU features, the crc32c
instructions is in SSE 4.2 and 'pclmulqdq' is a separate. There's still
old hardware used that does not have the PCLMUL instructions. Detect it
and make it the condition.
The pclmul is not supported on old compilers so also add a
configure-time detection and leave the SSE 4.2 only implementation as
the accelerated one if possible.
Issue: #676
Signed-off-by: David Sterba <dsterba@suse.com>
Copy faster implementation of crc32c from linux kernel as of 6.5-rc7
(x86_64, arch/x86/crypto/crc32c-pcl-intel-asm_64.S). This needs
assembler build support, so detect target architecture so
cross-compilation still works.
Add a special CPU flag so the old and new implementations can be
benchmarked and verified separately.
Sample benchmark:
CPU flags: 0x1ff
CPU features: SSE2 SSSE3 SSE41 SSE42 SHA AVX AVX2 CRC32C_PCL
Block size: 4096
Iterations: 1000000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 77177218, cycles/i 77
NULL-MEMCPY: cycles: 226313072, cycles/i 226, 62133.395 MiB/s
CRC32C-ref: cycles: 24418596066, cycles/i 24418, 575.859 MiB/s
CRC32C-NI: cycles: 1188335920, cycles/i 1188, 11833.073 MiB/s
CRC32C-PCL: cycles: 463193456, cycles/i 463, 30358.037 MiB/s
XXHASH: cycles: 851606646, cycles/i 851, 16511.916 MiB/s
SHA256-ref: cycles: 74476234956, cycles/i 74476, 188.808 MiB/s
SHA256-NI: cycles: 34198637428, cycles/i 34198, 411.177 MiB/s
BLAKE2-ref: cycles: 14761411664, cycles/i 14761, 952.597 MiB/s
BLAKE2-SSE2: cycles: 18101896796, cycles/i 18101, 776.807 MiB/s
BLAKE2-SSE41: cycles: 12599091062, cycles/i 12599, 1116.087 MiB/s
BLAKE2-AVX2: cycles: 9668247506, cycles/i 9668, 1454.418 MiB/s
The new implementation is about 2.5x faster.
Note: there new version does not work on musl because of linkage
problems (relocations in .rodata), so it's still using the old
implementation.
Signed-off-by: David Sterba <dsterba@suse.com>
The crc32c selection has been already using the pointer-based approach
so drop the cpuid detection and use our common code for that.
Signed-off-by: David Sterba <dsterba@suse.com>
There are some stale headers that we don't need and the int types are
using the kernel types and pull kerncompat.h. As this is a basic header
that should minimize dependencies use the standard int types.
Signed-off-by: David Sterba <dsterba@suse.com>
Add the GPL v2 header to files where it was missing and is not from an
external source, update to the most recent version with the address.
Signed-off-by: David Sterba <dsterba@suse.com>
With the introduction of xxhash64 to btrfs-progs we created a crypto/
directory for all the hashes used in btrfs (although no
cryptographically secure hash is there yet).
Move the crc32c implementation from kernel-lib/ to crypto/ as well so we
have all hashes consolidated.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>