ffmpeg/libavutil/x86
Ganesh Ajjanagadde 5989add4ab lavu/x86/lls: add fma3 optimizations for update_lls
This improves accuracy (very slightly) and speed for processors having
fma3.

Sample benchmark (fate flac-16-lpc-cholesky, Haswell):
old:
5993610 decicycles in ff_lpc_calc_coefs,      64 runs,      0 skips
5951528 decicycles in ff_lpc_calc_coefs,     128 runs,      0 skips

new:
5252410 decicycles in ff_lpc_calc_coefs,      64 runs,      0 skips
5232869 decicycles in ff_lpc_calc_coefs,     128 runs,      0 skips

Tested with FATE and --disable-fma3, also examined contents of
lavu/lls-test.

Reviewed-by: James Almer <jamrial@gmail.com>
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2016-01-15 16:46:13 -05:00
..
asm.h
bswap.h avutil/x86/bswap: Remove warning about bswap intrinsics with msvc. 2015-11-23 23:03:32 +11:00
cpu.c
cpu.h
cpuid.asm
emms.asm
emms.h
float_dsp_init.c
float_dsp.asm x86/float_dsp: zero extend offset from ff_scalarproduct_float_sse 2016-01-08 16:14:32 -03:00
intmath.h x86/intmath: add missing early clobber to output operands 2016-01-15 13:32:58 -03:00
intreadwrite.h
lls_init.c lavu/x86/lls: add fma3 optimizations for update_lls 2016-01-15 16:46:13 -05:00
lls.asm lavu/x86/lls: add fma3 optimizations for update_lls 2016-01-15 16:46:13 -05:00
Makefile
pixelutils_init.c
pixelutils.asm
pixelutils.h
timer.h
w64xmmtest.h
x86inc.asm
x86util.asm