mirror of https://git.ffmpeg.org/ffmpeg.git
82a68a8771
Its performance loss ranges from either being just as fast as individual loads (Skylake), a few percent slower (Alderlake), 8% slower (Zen 3), to completely disasterous (older/other CPUs). Sadly, gathers never panned out fast on x86, even with the benefit of time and implementation experience. This also saves a register, as there's no need to fill out an additional register mask. Zen 3 (16384-point transform): Before: 1561050 decicycles in av_tx (fft), 131072 runs, 0 skips After: 1449621 decicycles in av_tx (fft), 131072 runs, 0 skips Alderlake: 2% slower on big transforms (65536), to 1% (131072), to a few percent for smaller sizes. |
||
---|---|---|
.. | ||
Makefile | ||
asm.h | ||
bswap.h | ||
cpu.c | ||
cpu.h | ||
cpuid.asm | ||
emms.asm | ||
emms.h | ||
fixed_dsp.asm | ||
fixed_dsp_init.c | ||
float_dsp.asm | ||
float_dsp_init.c | ||
imgutils.asm | ||
imgutils_init.c | ||
intmath.h | ||
intreadwrite.h | ||
lls.asm | ||
lls_init.c | ||
pixelutils.asm | ||
pixelutils.h | ||
pixelutils_init.c | ||
timer.h | ||
tx_float.asm | ||
tx_float_init.c | ||
w64xmmtest.h | ||
x86inc.asm | ||
x86util.asm |