ffmpeg/libavutil/x86
Lynne 82a68a8771
x86/tx_float: remove vgatherdpd usage
Its performance loss ranges from either being just as fast as individual loads
(Skylake), a few percent slower (Alderlake), 8% slower (Zen 3), to completely
disasterous (older/other CPUs).

Sadly, gathers never panned out fast on x86, even with the benefit of time and
implementation experience.

This also saves a register, as there's no need to fill out an additional
register mask.

Zen 3 (16384-point transform):
Before: 1561050 decicycles in           av_tx (fft),  131072 runs,      0 skips
After:  1449621 decicycles in           av_tx (fft),  131072 runs,      0 skips

Alderlake:
2% slower on big transforms (65536), to 1% (131072), to a few percent for smaller
sizes.
2022-05-20 10:12:34 +02:00
..
Makefile x86/tx_float: do not build tx_float_init.c if x86 assembly is disabled 2022-01-27 02:17:46 +01:00
asm.h
bswap.h
cpu.c avutil/cpu: add AVX512 Icelake flag 2022-03-10 16:45:48 -03:00
cpu.h avutil/cpu: add AVX512 Icelake flag 2022-03-10 16:45:48 -03:00
cpuid.asm libavutil: include assembly with full path from source root 2022-02-08 10:42:26 +01:00
emms.asm libavutil: include assembly with full path from source root 2022-02-08 10:42:26 +01:00
emms.h avutil/x86/emms: Don't unnecessarily include lavu/cpu.h 2022-02-21 12:37:51 +01:00
fixed_dsp.asm libavutil: include assembly with full path from source root 2022-02-08 10:42:26 +01:00
fixed_dsp_init.c
float_dsp.asm libavutil: include assembly with full path from source root 2022-02-08 10:42:26 +01:00
float_dsp_init.c
imgutils.asm
imgutils_init.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
intmath.h x86/intmath: add VEX encoded versions of av_clipf() and av_clipd() 2021-11-19 11:21:03 -03:00
intreadwrite.h
lls.asm libavutil: include assembly with full path from source root 2022-02-08 10:42:26 +01:00
lls_init.c
pixelutils.asm libavutil: include assembly with full path from source root 2022-02-08 10:42:26 +01:00
pixelutils.h
pixelutils_init.c
timer.h
tx_float.asm x86/tx_float: remove vgatherdpd usage 2022-05-20 10:12:34 +02:00
tx_float_init.c x86/tx_float: remove vgatherdpd usage 2022-05-20 10:12:34 +02:00
w64xmmtest.h
x86inc.asm avutil/cpu: add AVX512 Icelake flag 2022-03-10 16:45:48 -03:00
x86util.asm