Commit Graph

8 Commits

Author SHA1 Message Date
Muhammad Faiz de1308429a swresample/x86/resample: extend resample_double to support avx and fma3
benchmark:
sse2 10.670s
avx   8.763s
fma3  8.380s

Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
2017-03-19 12:24:41 +07:00
Muhammad Faiz 6031e5d1af swresample/x86: add support for exact_rational
phase_shift and phase_mask is removed
generally exact_rational=on is faster than exact_rational=off

Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
2016-06-21 05:18:21 +07:00
James Almer 5750d6c5e9 x86: move XOP emulation code back to x86inc
Only two functions that use xop multiply-accumulate instructions where the
first operand is the same as the fourth actually took advantage of the macros.

This further reduces differences with x264's x86inc.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-08-03 17:11:13 -03:00
James Almer c45b7f0d80 x86/swr: add ff_resample_{common, linear}_int16_xop
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-02 01:11:20 +02:00
James Almer 1a69224f44 x86/swr: add ff_resample_{common, linear}_float_fma
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-02 01:09:53 +02:00
James Almer dd2c9034b1 x86/swr: convert resample_{common, linear}_double_sse2 to yasm
Signed-off-by: James Almer <jamrial@gmail.com>

312531 -> 311528 dezicycles

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-01 17:57:36 +02:00
Ronald S. Bultje 847bb638c0 swr: convert resample_common/linear_int16_mmx2/sse2 to yasm.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-30 20:11:50 +02:00
Ronald S. Bultje faa1471ffc swr: rewrite resample_common/linear_float_sse/avx in yasm.
Linear interpolation goes from 63 (llvm) or 58 (gcc) to 48 (yasm)
cycles/sample on 64bit, or from 66 (llvm/gcc) to 52 (yasm) cycles/
sample on 32bit. Bon-linear goes from 43 (llvm) or 38 (gcc) to
32 (yasm) cycles/sample on 64bit, or from 46 (llvm) or 44 (gcc) to
38 (yasm) cycles/sample on 32bit (all testing on OSX 10.9.2, llvm
5.1 and gcc 4.8/9).

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-28 17:06:47 +02:00