btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>