Commit Graph

76 Commits

Author SHA1 Message Date
James Almer 37b35feb64 x86/swr: add SSE2/AVX pack_8ch functions
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-30 23:05:27 -03:00
James Almer edff061fb0 x86/swr: add ff_float_to_int32_a_avx2
13797 decicycles in ff_float_to_int32_a_sse2, 32768 runs, 0 skips
8603 decicycles in ff_float_to_int32_a_avx2, 32766 runs, 2 skips

Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-11-07 15:01:35 -03:00
James Almer b385c4c6a3 x86/swr: replace sse4 instructions in pack_6ch with sse ones
There's no benefit from using blendps here except on CPUs with AVX, where
it's faster than shufps according to Intel's documentation.
As such, rename the sse4 functions to sse/sse2 and use shufps instead.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-11-06 20:54:00 -03:00
James Almer 9937362c54 x86/swr: use lavu helper macros to check CPU extensions
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-04 02:12:16 +02:00
James Almer 8279a15284 x86/swr: split audioconvert and rematrix DSP into separate files
Also rename resample_x86_dsp.c to resample_init.c

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-04 02:00:11 +02:00
James Almer 857cd1f33b swr: initialize only the necessary resample dsp functions
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-04 01:37:41 +02:00
James Almer b5f0eac068 swr: rename swresample_dsp init functions to swri_resample_dsp
The swresample_ prefix is not for internal functions

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-02 13:18:30 +02:00
James Almer c45b7f0d80 x86/swr: add ff_resample_{common, linear}_int16_xop
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-02 01:11:20 +02:00
James Almer 1a69224f44 x86/swr: add ff_resample_{common, linear}_float_fma
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-02 01:09:53 +02:00
James Almer dd2c9034b1 x86/swr: convert resample_{common, linear}_double_sse2 to yasm
Signed-off-by: James Almer <jamrial@gmail.com>

312531 -> 311528 dezicycles

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-01 17:57:36 +02:00
Ronald S. Bultje 847bb638c0 swr: convert resample_common/linear_int16_mmx2/sse2 to yasm.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-30 20:11:50 +02:00
Ronald S. Bultje faa1471ffc swr: rewrite resample_common/linear_float_sse/avx in yasm.
Linear interpolation goes from 63 (llvm) or 58 (gcc) to 48 (yasm)
cycles/sample on 64bit, or from 66 (llvm/gcc) to 52 (yasm) cycles/
sample on 32bit. Bon-linear goes from 43 (llvm) or 38 (gcc) to
32 (yasm) cycles/sample on 64bit, or from 46 (llvm) or 44 (gcc) to
38 (yasm) cycles/sample on 32bit (all testing on OSX 10.9.2, llvm
5.1 and gcc 4.8/9).

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-28 17:06:47 +02:00
Ronald S. Bultje 083cd3d1f7 swr: compile mmx2 s16p functions only on x86-32.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-15 13:34:53 +02:00
James Almer 7f4dfbd080 swr: add prototypes for resample dsp functions
Should fix compilation failures with MSVC and any other compiler
without inline asm support.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-15 01:33:17 +02:00
Ronald S. Bultje ada8f9c046 swr: remove obsolete function prototypes.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-15 00:07:25 +02:00
Ronald S. Bultje 7128a35f8c swr: split out DSP functions.
DSP bits of swri_resample go into their own mini-DSP functions; DSP
init goes from a per-call branch in multiple_resample to a proper
DSP init routine; x86 bits go into x86/; swri_resample() moves out of
resample_template.c into resample.c because it's independent of DSP
code or sample type; multiple_resample() is simplified.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-14 20:21:39 +02:00
James Almer a9bf713d35 swresample: add swri_resample_float_avx
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-16 05:27:03 +02:00
Matt Oliver 1898c2f49d inline asm: fix arrays as named constraints.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-07 15:02:45 +02:00
James Almer 4cdea92976 swresample/resample: add missing xmm clobbers
Might fix fate-swr on ICL

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-07 01:32:40 +02:00
James Almer cdac3ab59f swresample: add swri_resample_double_sse2
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-25 16:46:07 +02:00
James Almer 63dbba655e swresample/resample: sse float linear interpolation
About two times faster

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-24 02:34:02 +01:00
James Almer fa25c4c400 swresample/resample: mmx2/sse2 int16 linear interpolation
About three times faster

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-24 02:33:16 +01:00
James Almer 32291ba6ea swresample: add swri_resample_float_sse
At least two times faster than the C version.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-20 06:01:06 +01:00
Matt Oliver 8236747511 Automatically change MANGLE() into named inline asm operands when direct symbol reference in inline asm are not supported.
This is part of the patch-set for intel C inline asm on windows support

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-18 23:39:30 +01:00
James Almer 7c8bf09edd swresample: change COMMON_CORE_INT16 asm from SSSE3 to SSE2
pshuf+paddd is slightly faster than phaddd.
The real gain is in pre-ssse3 processors like AMD K8 and K10, which get
a big boost in performance compared to the mmxext version

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-18 15:00:50 +01:00
Martin Storsjö 3dd04cbcf7 swresample: Add arm&x86 clobber tests
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-18 18:38:57 +01:00
Reimar Döffinger cbeaf67888 Avoid using empty macro arguments.
These are not supported by all compilers (gcc 2.95 but also older SPARC
compilers, see gcc bug #33304 for example), and there is no real need for them.
One use of this feature remains in libavdevice/v4l2.c which can't be
replaced quite as easily.

Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
2013-12-31 12:19:59 +01:00
Ronald S. Bultje ad75d2b590 x86: Fix compilation with nasm on PPC & OS/2
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 12:36:19 +02:00
Michael Niedermayer ca2818b881 swresample/x86/audio_convert: add emms to CONV
Might fix Ticket1874

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-06-18 02:26:36 +02:00
Michael Niedermayer 4cfc92081d swr: add native_simd_one
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-06-04 23:50:45 +02:00
Michael Niedermayer 3174616f59 Merge commit '6860b4081d046558c44b1b42f22022ea341a2a73'
* commit '6860b4081d046558c44b1b42f22022ea341a2a73':
  x86: include x86inc.asm in x86util.asm
  cng: Reindent some incorrectly indented lines
  cngdec: Allow flushing the decoder
  cngdec: Make the dbov variable have the right unit
  cngdec: Fix the memset size to cover the full array
  cngdec: Update the LPC coefficients after averaging the reflection coefficients
  configure: fix print_config() with broke awks

Conflicts:
	libavcodec/x86/ac3dsp.asm
	libavcodec/x86/dct32.asm
	libavcodec/x86/deinterlace.asm
	libavcodec/x86/dsputil.asm
	libavcodec/x86/dsputilenc.asm
	libavcodec/x86/fft.asm
	libavcodec/x86/fmtconvert.asm
	libavcodec/x86/h264_chromamc.asm
	libavcodec/x86/h264_deblock.asm
	libavcodec/x86/h264_deblock_10bit.asm
	libavcodec/x86/h264_idct.asm
	libavcodec/x86/h264_idct_10bit.asm
	libavcodec/x86/h264_intrapred.asm
	libavcodec/x86/h264_intrapred_10bit.asm
	libavcodec/x86/h264_weight.asm
	libavcodec/x86/vc1dsp.asm
	libavcodec/x86/vp3dsp.asm
	libavcodec/x86/vp56dsp.asm
	libavcodec/x86/vp8dsp.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-10-31 13:43:33 +01:00
Michael Niedermayer 31a797eb28 swr: add av_cold to init/free functions
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-09-09 02:26:20 +02:00
Carl Eugen Hoyos a26789cf9f Fix compilation with yasm-0.6.2. 2012-09-01 10:59:16 +02:00
Carl Eugen Hoyos 52be5428c0 Add some missing _EXTERNAL suffixes to yasm source files. 2012-08-31 15:39:03 +02:00
Michael Niedermayer 9f088a1ed4 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  mpegvideo: reduce excessive inlining of mpeg_motion()
  mpegvideo: convert mpegvideo_common.h to a .c file
  build: factor out mpegvideo.o dependencies to CONFIG_MPEGVIDEO
  Move MASK_ABS macro to libavcodec/mathops.h
  x86: move MANGLE() and related macros to libavutil/x86/asm.h
  x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h
  aacdec: Don't fall back to the old output configuration when no old configuration is present.
  rtmp: Add message tracking
  rtsp: Support mpegts in raw udp packets
  rtsp: Support receiving plain data over UDP without any RTP encapsulation
  rtpdec: Remove an unused include
  rtpenc: Remove an av_abort() that depends on user-supplied data
  vsrc_movie: discourage its use with avconv.
  avconv: allow no input files.
  avconv: prevent invalid reads in transcode_init()
  avconv: rename OutputStream.is_past_recording_time to finished.

Conflicts:
	configure
	doc/filters.texi
	ffmpeg.c
	ffmpeg.h
	libavcodec/Makefile
	libavcodec/aacdec.c
	libavcodec/mpegvideo.c
	libavformat/version.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-09 19:31:56 +02:00
Michael Niedermayer 68712ce820 swr/x86: 16bit integer mix functions need SSE2 not SSE
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-07-07 20:52:34 +02:00
Michael Niedermayer c88e60af76 swr/x86: 10l, missed some SSE2 instructions in code marked as SSE.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-07-05 15:28:10 +02:00
Clément Bœsch ca612a27ae swr: fix make checkheaders. 2012-06-30 11:21:53 +02:00
Clément Bœsch 022cbb6791 swr: small align cosmetic. 2012-06-30 11:18:45 +02:00
Clément Bœsch 3491c2a909 swr: use __asm__ instead of __asm.
For consistency only.
2012-06-30 11:18:05 +02:00
Michael Niedermayer 4ccf6e3971 swr: MMX2 & SSSE3 int16 resample core
about 4 times faster

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-06-28 00:36:27 +02:00
Michael Niedermayer 5f8f6243ef swr: fix 10l use of uninitialized data
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-06-13 13:43:42 +02:00
Michael Niedermayer 728f86edfc swr: mix_2_1_int16_mmx/sse
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-06-12 17:49:12 +02:00
Michael Niedermayer d504266cef swr: mix_1_1_int16_sse
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-06-12 16:43:19 +02:00
Michael Niedermayer cbeeaf2593 swr: mix_1_1 int16 MMX
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-06-12 16:35:13 +02:00
Michael Niedermayer 52afa43691 swr: mix_2_1_float SSE/AVX
Based-on code by Justin Ruggles
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-06-12 16:35:13 +02:00
Michael Niedermayer beb0cd6acf swr: SIMD rematrixing and SSE/AVX mix_1_1 float
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-06-12 16:35:07 +02:00
Michael Niedermayer a927641e7a libswresample-simd: Add ff_pack_6ch_float_to_int32_a_avx and ff_pack_6ch_float_to_int32_a_sse4
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 20:56:18 +02:00
Michael Niedermayer ca986a06ad libswresample-simd: add ff_pack_6ch_int32_to_float_a_avx and ff_pack_6ch_int32_to_float_a_sse4
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 20:53:30 +02:00
Michael Niedermayer c4047ad9e0 libswresample: make NOP_N macro less picky on its parameters
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 20:45:32 +02:00