2021-09-06 14:23:14 +00:00
|
|
|
/*
|
|
|
|
* This program is free software; you can redistribute it and/or
|
|
|
|
* modify it under the terms of the GNU General Public
|
|
|
|
* License v2 as published by the Free Software Foundation.
|
|
|
|
*
|
|
|
|
* This program is distributed in the hope that it will be useful,
|
|
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
|
|
|
* General Public License for more details.
|
|
|
|
*
|
|
|
|
* You should have received a copy of the GNU General Public
|
|
|
|
* License along with this program; if not, write to the
|
|
|
|
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
|
|
|
|
* Boston, MA 021110-1307, USA.
|
|
|
|
*/
|
|
|
|
|
2019-06-10 12:49:50 +00:00
|
|
|
#include "../kerncompat.h"
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 18:22:55 +00:00
|
|
|
#include <time.h>
|
|
|
|
#include <getopt.h>
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-01 19:41:53 +00:00
|
|
|
#include <unistd.h>
|
|
|
|
#if HAVE_LINUX_PERF_EVENT_H == 1 && HAVE_LINUX_HW_BREAKPOINT_H == 1
|
|
|
|
#include <linux/perf_event.h>
|
|
|
|
#include <linux/hw_breakpoint.h>
|
|
|
|
#include <sys/syscall.h>
|
|
|
|
#define HAVE_PERF
|
|
|
|
#endif
|
2019-06-10 12:49:50 +00:00
|
|
|
#include "crypto/hash.h"
|
|
|
|
#include "crypto/crc32c.h"
|
2019-10-07 16:23:52 +00:00
|
|
|
#include "crypto/sha.h"
|
2019-10-07 16:23:52 +00:00
|
|
|
#include "crypto/blake2.h"
|
2022-09-16 17:29:25 +00:00
|
|
|
#include "common/messages.h"
|
2023-02-09 01:54:35 +00:00
|
|
|
#include "common/cpu-utils.h"
|
2019-06-10 12:49:50 +00:00
|
|
|
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 18:22:55 +00:00
|
|
|
#ifdef __x86_64__
|
|
|
|
static const int cycles_supported = 1;
|
|
|
|
#else
|
|
|
|
static const int cycles_supported = 0;
|
2019-06-10 12:49:50 +00:00
|
|
|
#endif
|
|
|
|
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-01 19:41:53 +00:00
|
|
|
enum {
|
|
|
|
UNITS_CYCLES,
|
|
|
|
UNITS_TIME,
|
|
|
|
UNITS_PERF,
|
|
|
|
};
|
|
|
|
|
2019-06-10 12:49:50 +00:00
|
|
|
const int blocksize = 4096;
|
|
|
|
int iterations = 100000;
|
|
|
|
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 18:22:55 +00:00
|
|
|
#ifdef __x86_64__
|
2019-06-10 12:49:50 +00:00
|
|
|
static __always_inline unsigned long long rdtsc(void)
|
|
|
|
{
|
|
|
|
unsigned low, high;
|
|
|
|
|
|
|
|
asm volatile("rdtsc" : "=a" (low), "=d" (high));
|
|
|
|
|
|
|
|
return (low | ((u64)(high) << 32));
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline u64 read_tsc(void)
|
|
|
|
{
|
|
|
|
asm volatile("mfence");
|
|
|
|
return rdtsc();
|
|
|
|
}
|
|
|
|
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-01 19:41:53 +00:00
|
|
|
#define cpu_cycles() read_tsc()
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 18:22:55 +00:00
|
|
|
|
|
|
|
#else
|
|
|
|
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-01 19:41:53 +00:00
|
|
|
#define cpu_cycles() (0)
|
|
|
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#ifdef HAVE_PERF
|
|
|
|
|
|
|
|
static int perf_fd = -1;
|
|
|
|
static int perf_init(void)
|
|
|
|
{
|
|
|
|
static struct perf_event_attr attr = {
|
|
|
|
.type = PERF_TYPE_HARDWARE,
|
|
|
|
.config = PERF_COUNT_HW_CPU_CYCLES
|
|
|
|
};
|
|
|
|
|
|
|
|
perf_fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
|
|
|
|
return perf_fd;
|
|
|
|
}
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 18:22:55 +00:00
|
|
|
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-01 19:41:53 +00:00
|
|
|
static void perf_finish(void)
|
|
|
|
{
|
|
|
|
close(perf_fd);
|
|
|
|
}
|
|
|
|
|
|
|
|
static long long perf_cycles(void)
|
|
|
|
{
|
|
|
|
long long cycles;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
ret = read(perf_fd, &cycles, sizeof(cycles));
|
|
|
|
if (ret != sizeof(cycles))
|
|
|
|
return 0;
|
|
|
|
return cycles;
|
|
|
|
}
|
|
|
|
|
|
|
|
#else
|
|
|
|
static int perf_init()
|
|
|
|
{
|
|
|
|
errno = EOPNOTSUPP;
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
static void perf_finish() {}
|
|
|
|
static long long perf_cycles() {
|
|
|
|
return 0;
|
|
|
|
}
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 18:22:55 +00:00
|
|
|
#endif
|
|
|
|
|
|
|
|
static inline u64 get_time(void)
|
|
|
|
{
|
|
|
|
struct timespec ts;
|
|
|
|
|
|
|
|
clock_gettime(CLOCK_MONOTONIC, &ts);
|
|
|
|
return ts.tv_sec * 1000 * 1000 * 1000 + ts.tv_nsec;
|
|
|
|
}
|
|
|
|
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-01 19:41:53 +00:00
|
|
|
static inline u64 get_cycles(int units)
|
|
|
|
{
|
|
|
|
switch (units) {
|
|
|
|
case UNITS_CYCLES: return cpu_cycles();
|
|
|
|
case UNITS_TIME: return get_time();
|
|
|
|
case UNITS_PERF: return perf_cycles();
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-06-10 12:49:50 +00:00
|
|
|
/* Read the input and copy last bytes as the hash */
|
|
|
|
static int hash_null_memcpy(const u8 *buf, size_t length, u8 *out)
|
|
|
|
{
|
|
|
|
const u8 *end = buf + length;
|
|
|
|
|
|
|
|
while (buf + CRYPTO_HASH_SIZE_MAX < end) {
|
|
|
|
memcpy(out, buf, CRYPTO_HASH_SIZE_MAX);
|
|
|
|
buf += CRYPTO_HASH_SIZE_MAX;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Test overhead of the calls */
|
|
|
|
static int hash_null_nop(const u8 *buf, size_t length, u8 *out)
|
|
|
|
{
|
|
|
|
memset(out, 0xFF, CRYPTO_HASH_SIZE_MAX);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-01 19:41:53 +00:00
|
|
|
static const char *units_to_desc(int units)
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 18:22:55 +00:00
|
|
|
{
|
|
|
|
switch (units) {
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-01 19:41:53 +00:00
|
|
|
case UNITS_CYCLES: return "CPU cycles";
|
|
|
|
case UNITS_TIME: return "time: ns";
|
|
|
|
case UNITS_PERF: return "perf event: CPU cycles";
|
|
|
|
}
|
|
|
|
return "unknown";
|
|
|
|
}
|
|
|
|
|
|
|
|
static const char *units_to_str(int units)
|
|
|
|
{
|
|
|
|
switch (units) {
|
|
|
|
case UNITS_CYCLES: return "cycles";
|
|
|
|
case UNITS_TIME: return "nsecs";
|
|
|
|
case UNITS_PERF: return "perf_c";
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 18:22:55 +00:00
|
|
|
}
|
|
|
|
return "unknown";
|
|
|
|
}
|
|
|
|
|
2019-06-10 12:49:50 +00:00
|
|
|
int main(int argc, char **argv) {
|
|
|
|
u8 buf[blocksize];
|
|
|
|
u8 hash[32];
|
|
|
|
int idx;
|
|
|
|
int iter;
|
|
|
|
struct contestant {
|
|
|
|
char name[16];
|
|
|
|
int (*digest)(const u8 *buf, size_t length, u8 *out);
|
|
|
|
int digest_size;
|
|
|
|
u64 cycles;
|
btrfs-progs: crypto: print throughput in hash-speedtest
Calculate the estimated throughput as a number that's comparable across
machines.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 42928902, cycles/i 429
NULL-MEMCPY: cycles: 73014868, cycles/i 730, 18651.186 MiB/s
CRC32C: cycles: 182293290, cycles/i 1822, 7470.579 MiB/s
XXHASH: cycles: 138085981, cycles/i 1380, 9862.272 MiB/s
SHA256: cycles: 10576270837, cycles/i 105762, 128.764 MiB/s
BLAKE2b: cycles: 2263761293, cycles/i 22637, 601.585 MiB/s
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12164607, nsecs/i 121
NULL-MEMCPY: nsecs: 20423641, nsecs/i 204, 19095.518 MiB/s
CRC32C: nsecs: 51972794, nsecs/i 519, 7503.926 MiB/s
XXHASH: nsecs: 38935164, nsecs/i 389, 10016.651 MiB/s
SHA256: nsecs: 3030944497, nsecs/i 30309, 128.673 MiB/s
BLAKE2b: nsecs: 648489262, nsecs/i 6484, 601.398 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 21:09:52 +00:00
|
|
|
u64 time;
|
2023-02-09 01:54:35 +00:00
|
|
|
unsigned long cpu_flag;
|
2019-06-10 12:49:50 +00:00
|
|
|
} contestants[] = {
|
|
|
|
{ .name = "NULL-NOP", .digest = hash_null_nop, .digest_size = 32 },
|
|
|
|
{ .name = "NULL-MEMCPY", .digest = hash_null_memcpy, .digest_size = 32 },
|
|
|
|
{ .name = "CRC32C", .digest = hash_crc32c, .digest_size = 4 },
|
|
|
|
{ .name = "XXHASH", .digest = hash_xxhash, .digest_size = 8 },
|
2019-10-07 16:23:52 +00:00
|
|
|
{ .name = "SHA256", .digest = hash_sha256, .digest_size = 32 },
|
2023-02-09 01:54:35 +00:00
|
|
|
{ .name = "BLAKE2-ref", .digest = hash_blake2b, .digest_size = 32 },
|
|
|
|
{ .name = "BLAKE2-SSE2", .digest = hash_blake2b, .digest_size = 32,
|
|
|
|
.cpu_flag = CPU_FLAG_SSE2 },
|
|
|
|
{ .name = "BLAKE2-SSE41", .digest = hash_blake2b, .digest_size = 32,
|
|
|
|
.cpu_flag = CPU_FLAG_SSE41 },
|
|
|
|
{ .name = "BLAKE2-AVX2", .digest = hash_blake2b, .digest_size = 32,
|
|
|
|
.cpu_flag = CPU_FLAG_AVX2 },
|
2019-06-10 12:49:50 +00:00
|
|
|
};
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-01 19:41:53 +00:00
|
|
|
int units = UNITS_CYCLES;
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 18:22:55 +00:00
|
|
|
|
2023-02-09 01:54:35 +00:00
|
|
|
cpu_detect_flags();
|
|
|
|
cpu_print_flags();
|
|
|
|
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 18:22:55 +00:00
|
|
|
optind = 0;
|
|
|
|
while (1) {
|
|
|
|
static const struct option long_options[] = {
|
|
|
|
{ "cycles", no_argument, NULL, 'c' },
|
|
|
|
{ "time", no_argument, NULL, 't' },
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-01 19:41:53 +00:00
|
|
|
{ "perf", no_argument, NULL, 'p' },
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 18:22:55 +00:00
|
|
|
{ NULL, 0, NULL, 0}
|
|
|
|
};
|
|
|
|
int c;
|
|
|
|
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-01 19:41:53 +00:00
|
|
|
c = getopt_long(argc, argv, "ctp", long_options, NULL);
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 18:22:55 +00:00
|
|
|
if (c < 0)
|
|
|
|
break;
|
|
|
|
switch (c) {
|
|
|
|
case 'c':
|
|
|
|
if (!cycles_supported) {
|
2022-09-16 17:29:25 +00:00
|
|
|
error("cannot measure cycles on this arch, use --time");
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 18:22:55 +00:00
|
|
|
return 1;
|
|
|
|
}
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-01 19:41:53 +00:00
|
|
|
units = UNITS_CYCLES;
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 18:22:55 +00:00
|
|
|
break;
|
|
|
|
case 't':
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-01 19:41:53 +00:00
|
|
|
units = UNITS_TIME;
|
|
|
|
break;
|
|
|
|
case 'p':
|
|
|
|
if (perf_init() == -1) {
|
2022-09-16 17:29:25 +00:00
|
|
|
error(
|
|
|
|
"cannot initialize perf, please check sysctl kernel.perf_event_paranoid: %m");
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-01 19:41:53 +00:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
units = UNITS_PERF;
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 18:22:55 +00:00
|
|
|
break;
|
|
|
|
default:
|
2022-09-16 17:29:25 +00:00
|
|
|
error("unknown option");
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 18:22:55 +00:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
}
|
2019-06-10 12:49:50 +00:00
|
|
|
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 18:22:55 +00:00
|
|
|
if (argc - optind >= 1) {
|
|
|
|
iterations = atoi(argv[optind]);
|
2019-06-10 12:49:50 +00:00
|
|
|
if (iterations < 0)
|
|
|
|
iterations = 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
crc32c_optimization_init();
|
|
|
|
memset(buf, 0, 4096);
|
|
|
|
|
2020-12-25 19:15:11 +00:00
|
|
|
printf("Block size: %d\n", blocksize);
|
|
|
|
printf("Iterations: %d\n", iterations);
|
|
|
|
printf("Implementation: %s\n", CRYPTOPROVIDER);
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-01 19:41:53 +00:00
|
|
|
printf("Units: %s\n", units_to_desc(units));
|
2019-06-10 12:49:50 +00:00
|
|
|
printf("\n");
|
|
|
|
|
|
|
|
for (idx = 0; idx < ARRAY_SIZE(contestants); idx++) {
|
|
|
|
struct contestant *c = &contestants[idx];
|
|
|
|
u64 start, end;
|
btrfs-progs: crypto: print throughput in hash-speedtest
Calculate the estimated throughput as a number that's comparable across
machines.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 42928902, cycles/i 429
NULL-MEMCPY: cycles: 73014868, cycles/i 730, 18651.186 MiB/s
CRC32C: cycles: 182293290, cycles/i 1822, 7470.579 MiB/s
XXHASH: cycles: 138085981, cycles/i 1380, 9862.272 MiB/s
SHA256: cycles: 10576270837, cycles/i 105762, 128.764 MiB/s
BLAKE2b: cycles: 2263761293, cycles/i 22637, 601.585 MiB/s
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12164607, nsecs/i 121
NULL-MEMCPY: nsecs: 20423641, nsecs/i 204, 19095.518 MiB/s
CRC32C: nsecs: 51972794, nsecs/i 519, 7503.926 MiB/s
XXHASH: nsecs: 38935164, nsecs/i 389, 10016.651 MiB/s
SHA256: nsecs: 3030944497, nsecs/i 30309, 128.673 MiB/s
BLAKE2b: nsecs: 648489262, nsecs/i 6484, 601.398 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 21:09:52 +00:00
|
|
|
u64 tstart, tend;
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-01 19:41:53 +00:00
|
|
|
u64 total = 0;
|
2019-06-10 12:49:50 +00:00
|
|
|
|
2023-02-09 01:54:35 +00:00
|
|
|
if (c->cpu_flag != 0 && !cpu_has_feature(c->cpu_flag)) {
|
|
|
|
printf("%12s: no CPU support\n", c->name);
|
|
|
|
continue;
|
|
|
|
}
|
2021-05-27 08:58:38 +00:00
|
|
|
printf("%12s: ", c->name);
|
2019-06-10 12:49:50 +00:00
|
|
|
fflush(stdout);
|
|
|
|
|
2023-02-09 01:54:35 +00:00
|
|
|
cpu_set_level(c->cpu_flag);
|
btrfs-progs: crypto: print throughput in hash-speedtest
Calculate the estimated throughput as a number that's comparable across
machines.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 42928902, cycles/i 429
NULL-MEMCPY: cycles: 73014868, cycles/i 730, 18651.186 MiB/s
CRC32C: cycles: 182293290, cycles/i 1822, 7470.579 MiB/s
XXHASH: cycles: 138085981, cycles/i 1380, 9862.272 MiB/s
SHA256: cycles: 10576270837, cycles/i 105762, 128.764 MiB/s
BLAKE2b: cycles: 2263761293, cycles/i 22637, 601.585 MiB/s
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12164607, nsecs/i 121
NULL-MEMCPY: nsecs: 20423641, nsecs/i 204, 19095.518 MiB/s
CRC32C: nsecs: 51972794, nsecs/i 519, 7503.926 MiB/s
XXHASH: nsecs: 38935164, nsecs/i 389, 10016.651 MiB/s
SHA256: nsecs: 3030944497, nsecs/i 30309, 128.673 MiB/s
BLAKE2b: nsecs: 648489262, nsecs/i 6484, 601.398 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 21:09:52 +00:00
|
|
|
tstart = get_time();
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-01 19:41:53 +00:00
|
|
|
start = get_cycles(units);
|
2019-06-10 12:49:50 +00:00
|
|
|
for (iter = 0; iter < iterations; iter++) {
|
|
|
|
memset(buf, iter & 0xFF, blocksize);
|
|
|
|
memset(hash, 0, 32);
|
|
|
|
c->digest(buf, blocksize, hash);
|
|
|
|
}
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-01 19:41:53 +00:00
|
|
|
end = get_cycles(units);
|
btrfs-progs: crypto: print throughput in hash-speedtest
Calculate the estimated throughput as a number that's comparable across
machines.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 42928902, cycles/i 429
NULL-MEMCPY: cycles: 73014868, cycles/i 730, 18651.186 MiB/s
CRC32C: cycles: 182293290, cycles/i 1822, 7470.579 MiB/s
XXHASH: cycles: 138085981, cycles/i 1380, 9862.272 MiB/s
SHA256: cycles: 10576270837, cycles/i 105762, 128.764 MiB/s
BLAKE2b: cycles: 2263761293, cycles/i 22637, 601.585 MiB/s
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12164607, nsecs/i 121
NULL-MEMCPY: nsecs: 20423641, nsecs/i 204, 19095.518 MiB/s
CRC32C: nsecs: 51972794, nsecs/i 519, 7503.926 MiB/s
XXHASH: nsecs: 38935164, nsecs/i 389, 10016.651 MiB/s
SHA256: nsecs: 3030944497, nsecs/i 30309, 128.673 MiB/s
BLAKE2b: nsecs: 648489262, nsecs/i 6484, 601.398 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 21:09:52 +00:00
|
|
|
tend = get_time();
|
2019-06-10 12:49:50 +00:00
|
|
|
c->cycles = end - start;
|
btrfs-progs: crypto: print throughput in hash-speedtest
Calculate the estimated throughput as a number that's comparable across
machines.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 42928902, cycles/i 429
NULL-MEMCPY: cycles: 73014868, cycles/i 730, 18651.186 MiB/s
CRC32C: cycles: 182293290, cycles/i 1822, 7470.579 MiB/s
XXHASH: cycles: 138085981, cycles/i 1380, 9862.272 MiB/s
SHA256: cycles: 10576270837, cycles/i 105762, 128.764 MiB/s
BLAKE2b: cycles: 2263761293, cycles/i 22637, 601.585 MiB/s
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12164607, nsecs/i 121
NULL-MEMCPY: nsecs: 20423641, nsecs/i 204, 19095.518 MiB/s
CRC32C: nsecs: 51972794, nsecs/i 519, 7503.926 MiB/s
XXHASH: nsecs: 38935164, nsecs/i 389, 10016.651 MiB/s
SHA256: nsecs: 3030944497, nsecs/i 30309, 128.673 MiB/s
BLAKE2b: nsecs: 648489262, nsecs/i 6484, 601.398 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 21:09:52 +00:00
|
|
|
c->time = tend - tstart;
|
2023-02-09 01:54:35 +00:00
|
|
|
cpu_reset_level();
|
btrfs-progs: crypto: print throughput in hash-speedtest
Calculate the estimated throughput as a number that's comparable across
machines.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 42928902, cycles/i 429
NULL-MEMCPY: cycles: 73014868, cycles/i 730, 18651.186 MiB/s
CRC32C: cycles: 182293290, cycles/i 1822, 7470.579 MiB/s
XXHASH: cycles: 138085981, cycles/i 1380, 9862.272 MiB/s
SHA256: cycles: 10576270837, cycles/i 105762, 128.764 MiB/s
BLAKE2b: cycles: 2263761293, cycles/i 22637, 601.585 MiB/s
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12164607, nsecs/i 121
NULL-MEMCPY: nsecs: 20423641, nsecs/i 204, 19095.518 MiB/s
CRC32C: nsecs: 51972794, nsecs/i 519, 7503.926 MiB/s
XXHASH: nsecs: 38935164, nsecs/i 389, 10016.651 MiB/s
SHA256: nsecs: 3030944497, nsecs/i 30309, 128.673 MiB/s
BLAKE2b: nsecs: 648489262, nsecs/i 6484, 601.398 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 21:09:52 +00:00
|
|
|
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-01 19:41:53 +00:00
|
|
|
if (units == UNITS_CYCLES || units == UNITS_PERF)
|
btrfs-progs: crypto: print throughput in hash-speedtest
Calculate the estimated throughput as a number that's comparable across
machines.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 42928902, cycles/i 429
NULL-MEMCPY: cycles: 73014868, cycles/i 730, 18651.186 MiB/s
CRC32C: cycles: 182293290, cycles/i 1822, 7470.579 MiB/s
XXHASH: cycles: 138085981, cycles/i 1380, 9862.272 MiB/s
SHA256: cycles: 10576270837, cycles/i 105762, 128.764 MiB/s
BLAKE2b: cycles: 2263761293, cycles/i 22637, 601.585 MiB/s
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12164607, nsecs/i 121
NULL-MEMCPY: nsecs: 20423641, nsecs/i 204, 19095.518 MiB/s
CRC32C: nsecs: 51972794, nsecs/i 519, 7503.926 MiB/s
XXHASH: nsecs: 38935164, nsecs/i 389, 10016.651 MiB/s
SHA256: nsecs: 3030944497, nsecs/i 30309, 128.673 MiB/s
BLAKE2b: nsecs: 648489262, nsecs/i 6484, 601.398 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 21:09:52 +00:00
|
|
|
total = c->cycles;
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-01 19:41:53 +00:00
|
|
|
if (units == UNITS_TIME)
|
btrfs-progs: crypto: print throughput in hash-speedtest
Calculate the estimated throughput as a number that's comparable across
machines.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 42928902, cycles/i 429
NULL-MEMCPY: cycles: 73014868, cycles/i 730, 18651.186 MiB/s
CRC32C: cycles: 182293290, cycles/i 1822, 7470.579 MiB/s
XXHASH: cycles: 138085981, cycles/i 1380, 9862.272 MiB/s
SHA256: cycles: 10576270837, cycles/i 105762, 128.764 MiB/s
BLAKE2b: cycles: 2263761293, cycles/i 22637, 601.585 MiB/s
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12164607, nsecs/i 121
NULL-MEMCPY: nsecs: 20423641, nsecs/i 204, 19095.518 MiB/s
CRC32C: nsecs: 51972794, nsecs/i 519, 7503.926 MiB/s
XXHASH: nsecs: 38935164, nsecs/i 389, 10016.651 MiB/s
SHA256: nsecs: 3030944497, nsecs/i 30309, 128.673 MiB/s
BLAKE2b: nsecs: 648489262, nsecs/i 6484, 601.398 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 21:09:52 +00:00
|
|
|
total = c->time;
|
|
|
|
|
2021-05-27 08:58:38 +00:00
|
|
|
printf("%s: %12llu, %s/i %8llu",
|
btrfs-progs: crypto: print throughput in hash-speedtest
Calculate the estimated throughput as a number that's comparable across
machines.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 42928902, cycles/i 429
NULL-MEMCPY: cycles: 73014868, cycles/i 730, 18651.186 MiB/s
CRC32C: cycles: 182293290, cycles/i 1822, 7470.579 MiB/s
XXHASH: cycles: 138085981, cycles/i 1380, 9862.272 MiB/s
SHA256: cycles: 10576270837, cycles/i 105762, 128.764 MiB/s
BLAKE2b: cycles: 2263761293, cycles/i 22637, 601.585 MiB/s
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12164607, nsecs/i 121
NULL-MEMCPY: nsecs: 20423641, nsecs/i 204, 19095.518 MiB/s
CRC32C: nsecs: 51972794, nsecs/i 519, 7503.926 MiB/s
XXHASH: nsecs: 38935164, nsecs/i 389, 10016.651 MiB/s
SHA256: nsecs: 3030944497, nsecs/i 30309, 128.673 MiB/s
BLAKE2b: nsecs: 648489262, nsecs/i 6484, 601.398 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 21:09:52 +00:00
|
|
|
units_to_str(units), total,
|
|
|
|
units_to_str(units), total / iterations);
|
|
|
|
if (idx > 0) {
|
|
|
|
float t;
|
|
|
|
float mb;
|
|
|
|
|
|
|
|
t = (float)c->time / 1000 / 1000 / 1000;
|
|
|
|
mb = blocksize * iterations / 1024 / 1024;
|
2021-05-27 08:58:38 +00:00
|
|
|
printf(", %12.3f MiB/s", mb / t);
|
btrfs-progs: crypto: print throughput in hash-speedtest
Calculate the estimated throughput as a number that's comparable across
machines.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 42928902, cycles/i 429
NULL-MEMCPY: cycles: 73014868, cycles/i 730, 18651.186 MiB/s
CRC32C: cycles: 182293290, cycles/i 1822, 7470.579 MiB/s
XXHASH: cycles: 138085981, cycles/i 1380, 9862.272 MiB/s
SHA256: cycles: 10576270837, cycles/i 105762, 128.764 MiB/s
BLAKE2b: cycles: 2263761293, cycles/i 22637, 601.585 MiB/s
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12164607, nsecs/i 121
NULL-MEMCPY: nsecs: 20423641, nsecs/i 204, 19095.518 MiB/s
CRC32C: nsecs: 51972794, nsecs/i 519, 7503.926 MiB/s
XXHASH: nsecs: 38935164, nsecs/i 389, 10016.651 MiB/s
SHA256: nsecs: 3030944497, nsecs/i 30309, 128.673 MiB/s
BLAKE2b: nsecs: 648489262, nsecs/i 6484, 601.398 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-26 21:09:52 +00:00
|
|
|
}
|
|
|
|
putchar('\n');
|
2019-06-10 12:49:50 +00:00
|
|
|
}
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-01 19:41:53 +00:00
|
|
|
perf_finish();
|
2019-06-10 12:49:50 +00:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|