Commit Graph

242 Commits

Author SHA1 Message Date
Michael Niedermayer ac22385931 H.264 idct functions that include the chroma, inter luma and intra16 luma loops
thus avoiding the calling overhead.
New functions are not yet used.

Originally committed as revision 16206 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-12-18 02:36:48 +00:00
Aurelien Jacobs 5e6604490a avoid POSIX reserved _t suffix
Originally committed as revision 16117 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-12-14 00:48:16 +00:00
Loren Merritt 5fecfb7d58 clear_block mmx
Originally committed as revision 16045 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-12-10 21:35:17 +00:00
Diego Biurrun 9686df2be5 Delete unnecessary 'extern' keywords.
Originally committed as revision 15990 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-12-03 15:23:30 +00:00
Dominik Mierzejewski 82d1605fe7 Remove duplicated MM_* macros for CPU capabilities from dsputil.h.
Add missing one for FF_MM_ALTIVEC to avcodec.h.
Rename all the occurences of MM_* to the corresponding FF_MM_*.

Originally committed as revision 15770 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-11-03 18:08:00 +00:00
Diego Pettenò 782fc0c36f Rename template included sources from .h to _template.c.
There are multiple source files that are #include'd rather than
compiled, as they are used as template for generation of similar code,
like asm-optimised code. Some of these files are right now named with
a .h extension, although they are not header in any reasonable sense.

Rename them so that instead of being named with .h extension they are
named with _template.c as final part.

Originally committed as revision 15730 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-10-27 14:35:58 +00:00
Reimar Döffinger 31c4f07017 Use x86_reg type instead of long in float_to_int16 MMX/SSE functions.
Fixes compilation on MinGW64.

Originally committed as revision 15655 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-10-20 16:05:29 +00:00
David Conrad 0dba1995bc Cosmetics: reindent
Originally committed as revision 15644 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-10-19 04:44:24 +00:00
David Conrad ca4a4ac1b3 Combine non-bitexact sections
Originally committed as revision 15643 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-10-19 04:43:35 +00:00
David Conrad daa1ea049a VP3 loop filter is mmx2 not mmx
Originally committed as revision 15642 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-10-19 04:40:24 +00:00
David Conrad 357f45d9bc MMX VP3 Loop Filter
Originally committed as revision 15630 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-10-17 03:18:08 +00:00
Diego Pettenò be449fca79 Convert asm keyword into __asm__.
Neither the asm() nor the __asm__() keyword is part of the C99
standard, but while GCC accepts the former in C89 syntax, it is not
accepted in C99 unless GNU extensions are turned on (with -fasm). The
latter form is accepted in any syntax as an extension (without
requiring further command-line options).

Sun Studio C99 compiler also does not accept asm() while accepting
__asm__(), albeit reporting warnings that it's not valid C99 syntax.

Originally committed as revision 15627 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-10-16 13:34:09 +00:00
David Conrad 8cfd78ce8f Ensure MMX/SSE2 VP3 IDCT selection isn't disabled when only Theora is enabled
Originally committed as revision 15350 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-09-17 19:49:31 +00:00
David Conrad ccd3ec82b8 MMX/SSE2 VP3 IDCT are bitexact now that the dequantization matrices are permutated correctly
Originally committed as revision 15345 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-09-17 19:30:03 +00:00
David Conrad b4c3d83584 Use ff_vp3_idct_data in vp3dsp_mmx.c rather than duplicating it
Originally committed as revision 15118 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-08-31 07:05:55 +00:00
David Conrad 21383da8c4 Let ff_pw_8 be used as an SSE constant
Originally committed as revision 15052 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-08-30 19:40:21 +00:00
Loren Merritt ebceaa1cd5 gcc chokes on the 7 registers needed for float_to_int16_interleave6 (even inside HAVE_7REGS), so write it in yasm
Originally committed as revision 14749 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-08-14 04:40:46 +00:00
Loren Merritt ee46753739 gcc chokes on xmm constraints, so pessimize int32_to_float_fmul_scalar_sse a little
Originally committed as revision 14748 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-08-14 04:39:59 +00:00
Loren Merritt 675872382f special case 6 channel version of float_to_int16_interleave
5% faster ac3

Originally committed as revision 14744 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-08-13 23:36:37 +00:00
Loren Merritt 911e21a306 simd int->float
20% faster ac3 if downmixing, 15% if not

Originally committed as revision 14743 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-08-13 23:35:40 +00:00
Loren Merritt ac2e556456 simd downmix
13% faster ac3 if downmixing

Originally committed as revision 14742 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-08-13 23:33:48 +00:00
Loren Merritt 862b98d42c cosmetics in dsp init
Originally committed as revision 14704 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-08-12 00:51:45 +00:00
Uoti Urpala f769b746aa Mark add_png_paeth_prediction_* functions which are only used within this file
as static. patch by Uoti Urpala, uoti.urpala pp1.inet fi

Originally committed as revision 14509 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-08-02 17:32:55 +00:00
Loren Merritt 5eb0f2a425 float_to_int16_interleave: change src to an array of pointers instead of assuming it's contiguous.
this has no immediate effect, but will allow it to be used in more codecs.

Originally committed as revision 14252 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-07-16 00:50:12 +00:00
Loren Merritt 4342a7f30b 10l, float_to_int16_interleave_sse/3dnow wrote the wrong samples
Originally committed as revision 14236 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-07-15 04:11:30 +00:00
Loren Merritt b9fa32082c exploit mdct symmetry
2% faster vorbis on conroe, k8. 7% on celeron.

Originally committed as revision 14207 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-07-13 15:03:58 +00:00
Loren Merritt f27e1d645e simplify vorbis windowing
Originally committed as revision 14205 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-07-13 14:56:01 +00:00
Kostya Shishkov d7e1fc4254 SSE2 optimizations for Monkey's Audio decoder vector functions
Originally committed as revision 14161 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-07-11 04:48:38 +00:00
Michael Niedermayer e98750c373 float_to_int16_sse2()
20% faster than sse

Originally committed as revision 14138 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-07-09 07:21:12 +00:00
Michael Niedermayer 35ee72b1d7 1 c-asm loop less and 1x unroll of float_to_int16_sse()
25% faster

Originally committed as revision 14104 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-07-07 21:25:18 +00:00
Michael Niedermayer 560fa9bf51 Fix x86-64
Originally committed as revision 14103 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-07-07 21:04:29 +00:00
Michael Niedermayer 63b737d4f9 dont use C-asm loops and unroll once float_to_int16_3dnow()
30% faster

Originally committed as revision 14102 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-07-07 20:46:03 +00:00
Reimar Döffinger 00eebe3d6a Fix add_bytes_mmx and add_bytes_l2_mmx for w < 16
Originally committed as revision 13877 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-06-22 07:05:40 +00:00
Diego Biurrun 245976da2a Use full path for #includes from another directory.
Originally committed as revision 13098 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-05-09 11:56:36 +00:00
Ramiro Polla 40d0e665d0 Do not misuse long as the size of a register in x86.
typedef x86_reg as the appropriate size and use it instead.

Originally committed as revision 13081 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-05-08 21:11:24 +00:00
Alexander Strange f73a6393e7 Add a new xvid-style IDCT using SSE2.
Originally committed as revision 12843 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-04-16 01:36:14 +00:00
Alexander Strange 54a0b6e590 Add a header file to declare Xvid IDCT functions.
patch by Alexander Strange, astrange ithinksw com

Originally committed as revision 12794 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-04-12 16:54:36 +00:00
Loren Merritt ce53144bac h264 chroma mc ssse3
width8: 180->92, width4: 78->63 cycles (core2)

Originally committed as revision 12661 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-04-01 04:51:28 +00:00
Zuxy Meng 9e8e6d318c Add missed call to ff_cavsdsp_init_3dnow() in dsputil_init_mmx()
Originally committed as revision 12540 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-03-21 12:36:49 +00:00
Michael Niedermayer 943032b155 Hardcode register to prevent aparent miscompilation.
Fixes regression tests with gcc 2.95.

Originally committed as revision 12512 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-03-20 14:24:29 +00:00
Michael Niedermayer dea00a4623 remove unused temp
Originally committed as revision 12511 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-03-20 14:09:31 +00:00
Aurelien Jacobs 5a6a9e78ab move draw_edges() into dsputil
Originally committed as revision 12309 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-03-04 00:07:41 +00:00
Aurelien Jacobs 97d1d009e2 split encoding part of dsputil_mmx into its own file
Originally committed as revision 12223 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-02-25 23:14:22 +00:00
Reimar Döffinger 78d3d94f14 __asm __volatile -> asm volatile, improves code consistency and works
(as far as that is possible) with the Sun C compiler.

Originally committed as revision 12188 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-02-24 14:46:22 +00:00
Loren Merritt 4a9ca0a279 simd and unroll png_filter_row
cycles per 1000 pixels on core2:
left: 9211->5170
top: 9283->2138
avg: 12215->7611
paeth: 64024->17360
overall rgb png decoding speed: +45%
overall greyscale png decoding speed: +6%

Originally committed as revision 12164 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-02-21 07:10:46 +00:00
Loren Merritt 1d67b037f7 sse2 h264 motion compensation. not new code, just separate out the cases that didn't need ssse3.
Originally committed as revision 11877 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-02-06 12:32:31 +00:00
Loren Merritt 20d565be6d put loop counter in a register if possible. makes some of the qpel functions 3% faster.
Originally committed as revision 11876 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-02-06 04:44:21 +00:00
Loren Merritt a2b7bc8e71 constant was excessively aligned
Originally committed as revision 11874 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-02-06 03:51:53 +00:00
Loren Merritt ddf969705f ssse3 h264 motion compensation.
25% faster tham mmx on core2, 35% if you discount fullpel, 4% overall decoding.

Originally committed as revision 11871 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-02-05 11:22:55 +00:00
Loren Merritt fa9b873e08 clean up an ugliness introduced in r11826. this syntax will require fewer changes when adding future sse2 code.
Originally committed as revision 11868 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-02-05 01:16:48 +00:00