Commit Graph

10 Commits

Author SHA1 Message Date
James Darnley 0a5814c9ba yadif: x86 assembly for 9 to 14-bit samples
These smaller samples do not need to be unpacked to double words
allowing the code to process more pixels every iteration (still 2 in MMX
but 6 in SSE2).  It also avoids emulating the missing double word
instructions on older instruction sets.

Like with the previous code for 16-bit samples this has been tested on
an Athlon64 and a Core2Quad.

Athlon64:
1809275 decicycles in C,    32718 runs, 50 skips
 911675 decicycles in mmx,  32727 runs, 41 skips, 2.0x faster
 495284 decicycles in sse2, 32747 runs, 21 skips, 3.7x faster

Core2Quad:
 921363 decicycles in C,     32756 runs, 12 skips
 486537 decicycles in mmx,   32764 runs,  4 skips, 1.9x faster
 293296 decicycles in sse2,  32759 runs,  9 skips, 3.1x faster
 284910 decicycles in ssse3, 32759 runs,  9 skips, 3.2x faster

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-16 22:32:54 +01:00
James Darnley 17e7b49501 yadif: x86 assembly for 16-bit samples
This is a fairly dumb copy of the assembly for 8-bit samples but it
works and produces identical output to the C version.  The options have
been tested on an Athlon64 and a Core2Quad.

Athlon64:
1810385 decicycles in C,    32726 runs, 42 skips
1080744 decicycles in mmx,  32744 runs, 24 skips, 1.7x faster
 818315 decicycles in sse2, 32735 runs, 33 skips, 2.2x faster

Core2Quad:
 924025 decicycles in C,     32750 runs, 18 skips
 623995 decicycles in mmx,   32767 runs,  1 skips, 1.5x faster
 406223 decicycles in sse2,  32764 runs,  4 skips, 2.3x faster
 387842 decicycles in ssse3, 32767 runs,  1 skips, 2.4x faster
 307726 decicycles in sse4,  32763 runs,  5 skips, 3.0x faster

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-16 22:32:34 +01:00
Diego Biurrun e66240f22e avfilter: x86: consistent filenames for filter optimizations 2013-02-04 15:00:47 +01:00
Diego Biurrun 76d90125cd vf_hqdn3d: x86: Add proper arch optimization initialization 2013-02-01 13:11:45 +01:00
Daniel Kang 899157b308 yadif: Port inline assembly to yasm
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-01-09 18:41:02 +01:00
Justin Ruggles f96f1e06a4 x86: af_volume: add SSE2-optimized s16 volume scaling 2012-12-05 11:23:37 -05:00
Diego Biurrun f6c38c5f4e avfilter: call x86 init functions under if (ARCH_X86), not if (HAVE_MMX) 2012-10-12 19:58:51 +02:00
Loren Merritt 7a1944b907 vf_hqdn3d: x86 asm
13% faster on penryn, 16% on sandybridge, 15% on bulldozer
Not simd; a compiler should have generated this, but gcc didn't.
2012-08-26 10:49:14 +00:00
Nolan L d5f187fd33 Add gradfun filter, ported from MPlayer.
Patch by Nolan L nol888 <=> gmail >=< com.

See thread:
Subject: [FFmpeg-devel] [PATCH] Port gradfun to libavfilter (GCI)
Date: Mon, 29 Nov 2010 07:18:14 -0500

Originally committed as revision 25942 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-12 17:59:10 +00:00
Aurelien Jacobs fa6f4ebc08 use a Makefile in x86 subdir
Originally committed as revision 25234 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-27 21:50:26 +00:00