Commit Graph

351 Commits

Author SHA1 Message Date
Ramiro Polla ba40452095 idct_sse2_xvid: only mark xmm>=8 as clobbered on x86_64
Originally committed as revision 25614 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 16:28:28 +00:00
Ramiro Polla 05c018078c motion_est_mmx: prefer xmm registers below xmm6 when they are available
Originally committed as revision 25612 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 15:07:21 +00:00
Ramiro Polla 5d543a3d13 dsputil_mmx: add xmm registers to clobber list
Originally committed as revision 25611 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 13:57:58 +00:00
Ramiro Polla e2d13c5882 cosmetics: split long line
Originally committed as revision 25610 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 13:46:17 +00:00
Ramiro Polla 0d729e0de2 fdct_mmx: add xmm registers to clobber list
Originally committed as revision 25609 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 13:45:04 +00:00
Ramiro Polla 616735eb97 idct_sse2_xvid: add xmm registers to clobber list
Originally committed as revision 25608 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 13:17:43 +00:00
Ramiro Polla 9943f3b91c mpegvideo_mmx: add xmm registers to clobber list
Originally committed as revision 25607 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 13:15:16 +00:00
Ramiro Polla 559738eff3 dsputil_mmx: prefer xmm registers below xmm6 when they are available
Originally committed as revision 25606 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 13:13:53 +00:00
Ramiro Polla 51d592dbcb h264dsp: add xmm registers to clobber list
Originally committed as revision 25604 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-30 17:14:22 +00:00
Ramiro Polla ac19f4a3e8 indent
Originally committed as revision 25598 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-28 18:31:30 +00:00
Ramiro Polla cae05859e1 h264dsp: merge some more asm blocks
Originally committed as revision 25597 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-28 18:22:21 +00:00
Ramiro Polla c6a908be58 dct32: mark xmm registers in clobber list in ff_dct32_float_sse()
Originally committed as revision 25569 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-25 20:29:29 +00:00
Ramiro Polla b32c9ca9a3 h264dsp: merge some asm blocks
Some code was initializing some xmm registers in one asm block and using them
in the following block, assuming they wouldn't be changed in between blocks.

Originally committed as revision 25568 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-25 18:02:02 +00:00
Reimar Döffinger 6c2142809c Add d modifier to asm argument to fix nasm compilation.
Originally committed as revision 25397 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-07 19:18:18 +00:00
Ramiro Polla 326bf69acc fft: mark xmm registers as clobbered in ff_imdct_calc_sse
Originally committed as revision 25363 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-06 01:27:02 +00:00
Ronald S. Bultje dd68d4db43 MMX, MMX2, SSE2 and SSSE3 optimizations for pred16x16/8x8_plane H264 intra
prediction (plus some with different rounding for svq3/rv40). Speedup (for
SSSE3) about ~6-fold, 3.6% faster overall with cathedral sample.

Originally committed as revision 25361 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-05 22:06:18 +00:00
İsmail Dönmez 9276bdddca snowdsp: Explicitly state the operand sizes
Fixes compilation with clang's builtin assembler

Patch by İsmail Dönmez, ismail at namtrac dot org

Originally committed as revision 25331 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-04 13:08:13 +00:00
Ronald S. Bultje a52ffc3f54 Move static inline function to a macro, so that constant propagation in
inline asm works for gcc-3.x also (hopefully). Should fix gcc-3.x FATE
breakage after r25254.

Originally committed as revision 25262 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 17:42:26 +00:00
Eli Friedman 329d689f75 Use sse2 variant of put_pixels16() for no_rnd also. Provides a minor speed
increase to e.g. vc1, snow and mpeg decoding.

Patch by Eli Friedman <eli dot friedman gmail com>.

Originally committed as revision 25259 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 15:34:43 +00:00
Ronald S. Bultje cd17285e6c Merge b_idx and edge variables, and optimize the ASM to directly load variables
from memory locations/offsets depending on b_idx plus constants, rather than
having gcc do this. This saves several lea calls and together saves about
10 cycles in h264_loop_filter_strength_mmx2().

Originally committed as revision 25256 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 14:04:39 +00:00
Ronald S. Bultje 0cc8a5d088 Remove mv_mask variable. Replace the related pand -1/0 instructions by either
a pxor, or remove the instruction alltogether. Altogether, this saves 1
instruction.

Originally committed as revision 25255 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 14:03:30 +00:00
Ronald S. Bultje c0673f2cf4 Remove d_idx as a variable, and instead load it as a constant in the asm.
This has no measurable speed effect because the surrounding code doesn't
take advantage of this yet.

Originally committed as revision 25254 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 14:02:32 +00:00
Ronald S. Bultje 2c3135f6d3 Unroll inner bidir loop in h264_loop_filter_strength_mmx2(), which gets rid
of the d_idx variable and therefore allows for future optimizations. No speed
difference by this commit itself.

Originally committed as revision 25253 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 13:35:24 +00:00
Ronald S. Bultje 4b81511cab Unloop the outer loop in h264_loop_filter_strength_mmx2(), which allows
inlining various constants within the loop code. 20 cycles faster on
cathedral sample.

Originally committed as revision 25252 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 13:34:20 +00:00
Reimar Döffinger 02b424d9c8 Add d suffix to movd target register to make it work with nasm.
Originally committed as revision 25206 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-26 09:15:18 +00:00
Reimar Döffinger dc77e985b7 Split and then simplify address generation macro.
Allows nasm to work for this code.

Originally committed as revision 25205 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-26 09:08:11 +00:00
Ronald S. Bultje 7e117771cd Remove unused variable.
Originally committed as revision 25173 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-24 15:31:46 +00:00
Ronald S. Bultje ae11291865 Unroll loop in h264_idct_add16intra_sse2(). Basically identical to r25171, this
inlines scan8[] and removes loop setup. 15% faster, 0.4% overall.

See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML.

Originally committed as revision 25172 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-24 14:07:23 +00:00
Ronald S. Bultje 4bca677494 Unroll loop in h264_idct_add8_sse2(). This means we can inline scan8[] in the
code directly also and remove loop setup. 20% faster in function, 0.8% overall.

See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML.

Originally committed as revision 25171 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-24 14:05:45 +00:00
Måns Rullgård c0bc8b9afb x86: disable SSE functions using stack when stack is not aligned
This fixes crashes with ICC 10.1.

Originally committed as revision 25153 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-21 17:57:21 +00:00
Måns Rullgård f41237c9db x86: remove hack disabling sse2 h264 loop filter with 32-bit icc
Originally committed as revision 25146 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-18 20:44:32 +00:00
Ronald S. Bultje ada65af9d1 Don't access upper 32 bits of a 32-bit int on 64-bit systems.
Originally committed as revision 25140 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-17 12:24:22 +00:00
Ronald S. Bultje 6c3d021891 Properly add HAVE_YASM around yasmified symbols. Should fix compile error
on configurations using --disable-yasm.

Originally committed as revision 25138 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-17 03:01:57 +00:00
Ronald S. Bultje e2e341048e Move hadamard_diff{,16}_{mmx,mmx2,sse2,ssse3}() from inline asm to yasm,
which will hopefully solve the Win64/FATE failures caused by these functions.

Originally committed as revision 25137 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-17 01:56:06 +00:00
Ronald S. Bultje d0acc2d2e9 Move sse16_sse2() from inline asm to yasm. It is one of the functions causing
Win64/FATE issues.

Originally committed as revision 25136 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-17 01:44:17 +00:00
Ronald S. Bultje 1d16a1cf99 Rename h264_idct_sse2.asm to h264_idct.asm; move inline IDCT asm from
h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now
coded in asm instead of C, this is (depending on the function) up to 50%
faster for cases where gcc didn't do a great job at looping.

Since h264_idct_add8() is now faster than the manual loop setup in h264.c,
in-asm idct calling can now be enabled for chroma as well (see r16207). For
MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does
the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%.

Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-14 13:36:26 +00:00
Jason Garrett-Glaser 8acb554aff LGPL SSE2 H.264 iDCT
This leaves no more GPL-only H.264 decoding asm code.

Approved by Loren.

Originally committed as revision 25092 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-10 02:25:12 +00:00
Stefano Sabatini c6c98d0897 Move mm_support() from libavcodec to libavutil, make it a public
function and rename it to av_get_cpu_flags().

Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-08 15:07:14 +00:00
Reimar Döffinger b1c32fb5e5 Use "d" suffix for general-purpose registers used with movd.
This increases compatibilty with nasm and is also more consistent,
e.g. with h264_intrapred.asm and h264_chromamc.asm that already
do it that way.

Originally committed as revision 25042 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-05 10:10:16 +00:00
Stefano Sabatini 7160bb716b Rename FF_MM_ symbols related to CPU features flags as AV_CPU_FLAG_
symbols, and move them from libavcodec/avcodec.h to libavutil/cpu.h.

Originally committed as revision 25040 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-04 09:59:08 +00:00
Ronald S. Bultje 2c166c3af1 Port latest x264 deblock asm (before they moved to using NV12 as internal
format), LGPL'ed with permission from Jason and Loren. This includes mmx2
code, so remove inline asm from h264dsp_mmx.c accordingly.

Originally committed as revision 25031 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-03 16:52:46 +00:00
Eli Friedman a10a9f5cd0 Fix typo in r25019.
Patch by Eli Friedman <eli.friedman at gmail dot com>.

Originally committed as revision 25022 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-01 23:19:36 +00:00
Ronald S. Bultje 615da9b1d9 Unscrew breakage after my last commit because of symbol prefixes.
Originally committed as revision 25020 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-01 21:10:19 +00:00
Ronald S. Bultje a33a2562c1 Rename h264_weight_sse2.asm to h264_weight.asm; add 16x8/8x16/8x4 non-square
biweight code to sse2/ssse3; add sse2 weight code; and use that same code to
create mmx2 functions also, so that the inline asm in h264dsp_mmx.c can be
removed. OK'ed by Jason on IRC.

Originally committed as revision 25019 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-01 20:56:16 +00:00
Ronald S. Bultje 14bc1f2485 Split h264dsp_mmx.c (which was #included in dsputil_mmx.c) in h264_qpel_mmx.c,
still #included in dsputil_mmx.c and is part of DSPContext, and h264dsp_mmx.c,
which represents H264DSPContext and is now compiled on its own.

Originally committed as revision 25018 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-01 20:48:59 +00:00
Ronald S. Bultje 5929b3a651 Fix vertical align.
Originally committed as revision 25009 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-31 12:32:24 +00:00
Ronald S. Bultje 79ce0f002e Fix compilation failure if yasm is disabled (missing vp3 symbols).
Originally committed as revision 24992 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 20:30:40 +00:00
Ronald S. Bultje de1c253bab Split intra prediction initialization (i.e. assigning of function pointers)
into its own file, it doesn't belong in h264dsp_mmx.c (much less so in
dsputil_mmx.c).

Originally committed as revision 24990 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 16:34:13 +00:00
Ronald S. Bultje d0eb5a1174 Move H264 chroma MC from inline asm to yasm. This fixes VP3/5/6 and VC-1
fate failures on Win64.

Originally committed as revision 24989 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 16:31:04 +00:00
Ronald S. Bultje e9f5f020c6 Move VP3 IDCT functions from inline ASM to YASM. This fixes part of the VP3/5/6
issues on Win64.

Originally committed as revision 24988 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 16:25:46 +00:00
Ronald S. Bultje 7e7c4b6008 Put ff_ prefix on non-static {put_signed,put,add}_pixels_clamped_mmx()
functions.

Originally committed as revision 24987 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 16:22:27 +00:00
Loren Merritt 19d929f9a3 cosmetics in imdct_sse
Originally committed as revision 24958 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-28 21:03:13 +00:00
Ronald S. Bultje 4eca52ed19 Fix typos when converting inline asm to yasm, fixes MMX-only fate-ea-vp61.
Originally committed as revision 24948 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-26 14:33:39 +00:00
Ronald S. Bultje 6697bc33e2 Revert r24931, it broke Win32 and some BSD compiles (yay fate).
Originally committed as revision 24934 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-25 20:36:35 +00:00
Ronald S. Bultje 72f642400b Mark xmm6 and xmm7 as clobbered in ff_vp3_idct_sse2(), which is contributing
to the VP6 fate failures on Win64.

Originally committed as revision 24931 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-25 19:57:05 +00:00
Måns Rullgård 69dad87c48 VP6: fix vp6_filter_diag4_mmx/sse on 64-bit
The stride can be negative and must be sign extended before being
used in pointer arithmetic.

Originally committed as revision 24926 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-25 15:41:11 +00:00
Ronald S. Bultje 89fa3504ed Move vp6_filter_diag4() x86 SIMD code from inline ASM to YASM. This should
help in fixing the Win64 fate failures.

Originally committed as revision 24922 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-25 13:44:16 +00:00
Ronald S. Bultje 3a0885146c Move vp6_filter_diag4() from DSPContext to VP56DSPContext.
Originally committed as revision 24921 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-25 13:42:28 +00:00
Måns Rullgård c0ec9918b0 Remove global mm_flags variable
Originally committed as revision 24909 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-24 17:47:05 +00:00
Ronald S. Bultje 3611c45ab7 Mark xmm registers as clobbered in simple loopfilter. Should fix the last
two VP8-related fate failures on Win64.

Originally committed as revision 24908 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-24 16:52:27 +00:00
Alex Converse cb4f12466b imdct/x86: Use "s->mdct_size" instead of "1 << s->mdct_bits".
It generates smaller cleaner code.

Originally committed as revision 24887 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-23 15:51:09 +00:00
Ronald S. Bultje 684d608bde Fix segfaults in VP8 SIMD code on Win64 (and FATE/win64 failures).
Originally committed as revision 24871 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-23 02:41:22 +00:00
Alex Converse 78b5c97d3e Convert ff_imdct_half_sse() to yasm.
This is to avoid split asm sections that attempt to preserve some
registers between sections.

Originally committed as revision 24869 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-22 14:39:58 +00:00
Jason Garrett-Glaser 05c04cdf54 VP5/6/8: ~7% faster arithmetic decoding
Grab from the bitstream in 16-bit chunks instead of 8-bit chunks.
TODO: grab in 32-bit chunks on 64-bit systems.

Originally committed as revision 24783 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-12 01:11:32 +00:00
Jason Garrett-Glaser 4a384de5b8 Split h264dsp and h264pred in configure.
Many H.264 derivatives, like RV40 and VP8, use the H.264 prediction functions
but not the weight/loopfilter functions.
This should reduce the size of builds with one of these derivatives but without
H.264 decoding itself.

Originally committed as revision 24741 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-07 23:10:25 +00:00
Jason Garrett-Glaser 98fe09df7b Add file missing in r24702
Originally committed as revision 24703 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-05 00:49:48 +00:00
Eli Friedman c12d6955e2 H.264: SSE2/SSSE3 weighted prediction asm
Patch by Eli Friedman <eli.friedman at gmail dot com>

Originally committed as revision 24702 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-05 00:13:38 +00:00
Måns Rullgård f079a64aea Move cavs dsp functions to their own struct
Originally committed as revision 24685 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-03 20:59:00 +00:00
Jason Garrett-Glaser 8b9b5e085f VP5/6/8: add one inline missed in r24677
Originally committed as revision 24682 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-03 11:21:22 +00:00
Jason Garrett-Glaser 827d43bb9d VP8: move zeroing of luma DC block into the WHT
Lets us do the zeroing in asm instead of C.
Also makes it consistent with the way the regular iDCT code does it.

Originally committed as revision 24668 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-02 20:18:09 +00:00
Ronald S. Bultje 6341838f3c Use word-writing instead of dword-writing (with two cached but otherwise
unchanged bytes) in the horizontal simple loopfilter. This makes the filter
quite a bit faster in itself (~30 cycles less on Core1), probably mostly
because we don't need a complex 4x4 transpose, but only a simple byte
interleave. Also allows using pextrw on SSE4, which speeds up even more
(e.g. 25% faster on Core i7).

Originally committed as revision 24638 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-31 23:13:15 +00:00
Vitor Sessak fa738b3ad1 Remove x86/mmx.h. It is not used anymore and has been deprecated for years.
Originally committed as revision 24618 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-31 16:20:45 +00:00
Vitor Sessak de4bc44abb Convert deinterlacing MMX code to YASM
Originally committed as revision 24615 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-31 14:50:51 +00:00
Vitor Sessak 740dfe7012 Fix compilation in x86_64. I broke it with r24580.
Originally committed as revision 24582 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-29 22:45:21 +00:00
Vitor Sessak 2c3dda6838 Translate libmpeg2 MMX IDCT to plain asm
Originally committed as revision 24580 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-29 22:19:54 +00:00
Ronald S. Bultje ab4d031889 Use pmaddubsw for the mbedge_filter (>=ssse3), 6-10 cycles faster.
Originally committed as revision 24514 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-26 21:18:19 +00:00
Jason Garrett-Glaser e25dee602f VP8: Much faster SSE2 MC
5-10% faster or more on Phenom, Athlon 64, and some others.
Helps some on pre-SSSE3 Intel chips as well, but not as much.

Originally committed as revision 24513 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-26 19:34:00 +00:00
Ronald S. Bultje 48adb7e7a4 Enable no-loop memory/register saving for ssse3/sse4 also.
Originally committed as revision 24511 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-26 14:07:57 +00:00
Ronald S. Bultje 2a180c69ea Save a register (or regsize of stackspace for x86-32) for the no-loop
mbedge loopfilter functions, by re-using space that holds a variable
that we no longer need.

Originally committed as revision 24510 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-26 14:00:15 +00:00
Ronald S. Bultje bcd4aa6498 Use nested ifs instead of &&, which appears to not work with %ifidn (i.e. this
construct was always enabled, even for <ssse3 versions).

Originally committed as revision 24509 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-26 13:56:51 +00:00
Ronald S. Bultje 2208053bd3 Split pextrw macro-spaghetti into several opt-specific macros, this will make
future new optimizations (imagine a sse5) much easier. Also fix a bug where
we used the direction (%2) rather than optimization (%1) to enable this, which
means it wasn't ever actually used...

Originally committed as revision 24507 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-26 13:50:59 +00:00
Ronald S. Bultje 6de5b7c6b8 Fix obvious bug in assignment. Somehow, the test vectors don't test this...
Originally committed as revision 24489 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-25 02:42:40 +00:00
Ronald S. Bultje e3f7bf774c Fix SPLATB_REG mess. Used to be a if/elseif/elseif/elseif spaghetti, so this
splits it into small optimization-specific macros which are selected for each
DSP function. The advantage of this approach is that the sse4 functions now
use the ssse3 codepath also without needing an explicit sse4 codepath.

Originally committed as revision 24487 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-24 19:33:05 +00:00
Eli Friedman 3611e7a309 Inline asm for VP56 arith coder
This is a lot more reliable to get cmov rather than trying to trick gcc into
generating it, useful since it's 2% faster overall.

Patch by Eli Friedman <eli.friedman at gmail>

Originally committed as revision 24471 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 21:46:30 +00:00
Jason Garrett-Glaser 3ae079a3c8 VP8: optimize DC-only chroma case in the same way as luma.
Add MMX idct_dc_add4uv function for this case.
~40% faster chroma idct.

Originally committed as revision 24455 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 06:02:52 +00:00
Jason Garrett-Glaser 51c9156438 VP8 asm: cosmetics (spacing)
Originally committed as revision 24453 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 03:02:56 +00:00
Jason Garrett-Glaser 8a467b2d44 VP8: 30% faster idct_mb
Take shortcuts based on statistically common situations.
Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT
blocks are common.
TODO: tie this more directly into the MB mode, since the DC-level transform is
only used for non-splitmv blocks?

Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 02:58:27 +00:00
Jason Garrett-Glaser c25c776708 VP8: clear DCT blocks in iDCT instead of using clear_blocks.
~0.3% faster overall.

Originally committed as revision 24448 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 00:07:16 +00:00
Ronald S. Bultje dc5eec8085 Use pextrw for SSE4 mbedge filter result writing, speedup 5-10cycles on
CPUs supporting it.

Originally committed as revision 24437 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-22 19:59:34 +00:00
Ronald S. Bultje 003243c3c2 Fix and enable horizontal >=SSE2 mbedge loopfilter.
Originally committed as revision 24409 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-22 01:35:26 +00:00
Loren Merritt c7b1d9768c relicense h264 deblock sse2 to lgpl
Originally committed as revision 24408 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-22 00:39:49 +00:00
Loren Merritt 532e769701 sync yasm macros from x264
Originally committed as revision 24406 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-21 22:45:16 +00:00
Jason Garrett-Glaser 8731dbd890 Eliminate one instruction in VP8 dc_add_sse4
Originally committed as revision 24405 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-21 22:41:37 +00:00
Jason Garrett-Glaser 7dd224a42d Various VP8 x86 deblocking speedups
SSSE3 versions, improve SSE2 versions a bit.
SSE2/SSSE3 mbedge h functions are currently broken, so explicitly disable them.

Originally committed as revision 24403 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-21 22:11:03 +00:00
Jason Garrett-Glaser b8b231b5dc Make mmx VP8 WHT faster
Avoid pextrw, since it's slow on many older CPUs.
Now it doesn't require mmxext either.

Originally committed as revision 24397 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-21 20:51:01 +00:00
David Conrad af521abc28 Add header declarations for mmx/sse constants missing them
Originally committed as revision 24381 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-21 10:02:07 +00:00
David Conrad c7eec58170 Move ff_pw_* from vc1dsp_mmx.c to dsputil_mmx.c
Should fix compilation with icc and should help prevent any future duplicates

Originally committed as revision 24380 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-21 10:02:03 +00:00
Ronald S. Bultje e9e456d850 VP8 MBedge loopfilter MMX/MMX2/SSE2 functions for both luma (width=16)
and chroma (width=8).

Originally committed as revision 24378 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-20 22:58:56 +00:00
Ronald S. Bultje 268821e76e Chroma (width=8) inner loopfilter MMX/MMX2/SSE2 for VP8 decoder.
Originally committed as revision 24377 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-20 22:04:18 +00:00
Ronald S. Bultje c60ed66dbe Revert r24339 (it causes fate failures on x86-64) - I'll figure out what's
wrong with it tomorrow or so, then re-submit.

Originally committed as revision 24341 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-19 23:57:09 +00:00
Ronald S. Bultje 6526976f0c Remove FF_MM_SSE2/3 flags for CPUs where this is generally not faster than
regular MMX code. Examples of this are the Core1 CPU. Instead, set a new flag,
FF_MM_SSE2/3SLOW, which can be checked for particular SSE2/3 functions that
have been checked specifically on such CPUs and are actually faster than
their MMX counterparts.

In addition, use this flag to enable particular VP8 and LPC SSE2 functions
that are faster than their MMX counterparts.

Based on a patch by Loren Merritt <lorenm AT u washington edu>.

Originally committed as revision 24340 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-19 22:38:23 +00:00
Ronald S. Bultje 1878f685c0 Implement chroma (width=8) inner loopfilter MMX/MMX2/SSE2 functions.
Originally committed as revision 24339 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-19 21:53:28 +00:00
Ronald S. Bultje fb9bdf048c Be more efficient with registers or stack memory. Saves 8/16 bytes stack
for x86-32, or 2 MM registers on x86-64.

Originally committed as revision 24338 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-19 21:45:36 +00:00
Ronald S. Bultje 3facfc99da Change function prototypes for width=8 inner and mbedge loopfilter functions
so that it does both U and V planes at the same time. This will have speed
advantages when using SSE2 (or higher) optimizations, since we can do both
the U and V rows together in a single xmm register.

This also renames filter16 to filter16y and filter8 to filter8uv so that it's
more obvious what each function is used for.

Originally committed as revision 24337 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-19 21:18:04 +00:00
Loren Merritt 1ee076b1b1 more credits to D. J. Bernstein for fft
Originally committed as revision 24308 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-18 20:06:42 +00:00
Ronald S. Bultje 819b2dd2b1 Attempt to fix x86-64 testsuite on fate.
Originally committed as revision 24275 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-16 21:35:30 +00:00
Ronald S. Bultje 6f323f1251 Remove duplicate define.
Originally committed as revision 24272 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-16 19:54:47 +00:00
Ronald S. Bultje 889b2c26ee Revert 24270, it contained some stuff that shouldn't have been in there.
Originally committed as revision 24271 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-16 19:54:25 +00:00
Ronald S. Bultje 2356a7834b Remove duplicate define.
Originally committed as revision 24270 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-16 19:42:32 +00:00
Ronald S. Bultje ede1b9665a Give x86 r%d registers names, this will simplify implementation of the chroma
inner loopfilter, and it also allows us to save one register on x86-64/sse2.

Originally committed as revision 24269 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-16 19:38:10 +00:00
Ronald S. Bultje 526e831a46 Change return statement, the REP_RET is a mistake since the else case (x86-64,
sse2) doesn't actually loop, so REP_RET isn't necessary.

Originally committed as revision 24268 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-16 18:29:14 +00:00
Ronald S. Bultje a711eb4829 VP8 H/V inner loopfilter MMX/MMXEXT/SSE2 optimizations.
Originally committed as revision 24250 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-15 23:02:34 +00:00
David Conrad faa26db28b MMX/SSE VC1 loop filter
Originally committed as revision 24208 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-11 22:53:01 +00:00
David Conrad 7af8fbd348 Make ff_pw_4 128 bits
Originally committed as revision 24207 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-11 22:52:55 +00:00
Vitor Sessak 881fd7a62f Move SSE optimized 32-point DCT to its own file. Should fix breakage with YASM
disabled.

Originally committed as revision 24078 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-06 17:48:23 +00:00
Vitor Sessak 4dcc4f8eaa SSE optimized 32-point DCT
Originally committed as revision 24077 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-06 16:58:54 +00:00
Ronald S. Bultje f2a30bd840 Simple H/V loopfilter for VP8 in MMX, MMX2 and SSE2 (yay for yasm macros).
Originally committed as revision 24029 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-03 19:26:30 +00:00
Jason Garrett-Glaser b06855f18a SSSE3 versions of vp8 width4 bilinear MC functions
Originally committed as revision 24013 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-03 00:48:12 +00:00
Jason Garrett-Glaser dcc602d802 SSSE3 versions of width4 VP8 6-tap MC functions
Also make some small changes to saturation order of 4-tap SSSE3 MC to fix a
non-bitexactness bug.

Patch mostly by Eli Friedman <eli.friedman AT gmail DOT com>.

Originally committed as revision 23965 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-02 05:27:41 +00:00
Jason Garrett-Glaser 8434fc26eb Fix 100L in vp8dsp asm init
Originally committed as revision 23946 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-01 22:09:22 +00:00
Jason Garrett-Glaser 17dc7c7a60 Fix h264/vp8 intra pred on Athlon XP
Whose idea was it to have a CPU that didn't SIGILL on an invalid instruction?

Originally committed as revision 23927 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-01 10:29:47 +00:00
Måns Rullgård 49bd8e4b84 Fix grammar errors in documentation
Originally committed as revision 23904 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-30 15:38:06 +00:00
Jason Garrett-Glaser 82a8d0f114 Use add instead of lshift in mmxext vp8 idct
Originally committed as revision 23891 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-29 17:23:17 +00:00
Ronald S. Bultje 565344e7e4 Remove unused macros (duplicates from the now-LGPL x86util.asm).
Originally committed as revision 23890 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-29 17:04:29 +00:00
Ronald S. Bultje 2dd2f71692 MMX idct_add for VP8.
Originally committed as revision 23886 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-29 14:43:11 +00:00
Jason Garrett-Glaser 29e719377f Add missing mm_support call toff_h264_pred_init_x86.
I'm not sure if this is supposed to be here, but it can't hurt.

Originally committed as revision 23885 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-29 12:28:06 +00:00
Jason Garrett-Glaser 004cda8e79 Add mmxext version of VP8 DC Hadamard transform
Originally committed as revision 23878 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-29 01:41:59 +00:00
Jason Garrett-Glaser 37355fe823 Make x86util.asm LGPL so we can use it in LGPL asm
Strip out most x264-specific stuff (not used anywhere in ffmpeg).

Originally committed as revision 23877 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-29 00:40:12 +00:00
Jason Garrett-Glaser bc14f04b2f MMXEXT version of vp8 4x4 vertical pred
Originally committed as revision 23876 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-29 00:23:52 +00:00
Jason Garrett-Glaser fb9927ad7d Add mmx/mmxext/ssse3 4x4 TM intra pred functions for vp8
Originally committed as revision 23875 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-28 23:53:07 +00:00
Jason Garrett-Glaser 8b746bb473 Add missing comment header for predict_4x4_dc_mmxext
Originally committed as revision 23874 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-28 23:37:24 +00:00
Jason Garrett-Glaser 270a85d259 Fix some intra pred MMX functions that used MMXEXT instructions
Also add predict_4x4_dc MMXEXT function for vp8/h264.

Originally committed as revision 23873 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-28 23:35:17 +00:00
Jason Garrett-Glaser a912da761d Fix VP8 bilinear mc on x86_64
Originally committed as revision 23872 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-28 22:13:14 +00:00
Baptiste Coudurier 50f70541d3 Change MMXEXT to MMX2, MMXEXT is deprecated
Originally committed as revision 23865 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-28 21:12:00 +00:00
Jason Garrett-Glaser 0fecad09fe Add x86 asm functions for VP8 put_pixels
Originally committed as revision 23858 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-28 19:14:40 +00:00
Jason Garrett-Glaser a173aa8940 Add MMX, SSE2, SSSE3 asm for VP8 bilinear MC
Originally committed as revision 23857 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-28 18:56:24 +00:00
Måns Rullgård 1f65b67c46 Fix x86 build with h264dsp disabled
Originally committed as revision 23844 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-28 10:02:15 +00:00
Eli Friedman b3858964d6 Add const to some pointer parameters.
Patch by Eli Friedman,  eli D friedman A gmail

Originally committed as revision 23826 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-27 15:11:38 +00:00
David Conrad 30bdefd1de Fix build without yasm
Originally committed as revision 23816 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-27 02:52:43 +00:00
Jason Garrett-Glaser 0178d14fe5 First shot at VP8 optimizations:
- MMXEXT, SSE2 and SSSE3 MC functions
- MMX and SSE4 IDCT dc_add functions

Patch by Jason Garrett-Glaser <darkshikari gmail com> and myself.

Originally committed as revision 23815 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-27 02:01:45 +00:00
Måns Rullgård 0912db0206 Make vp8 select h264dsp and use this to pull in mmx intrapred
Originally committed as revision 23790 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-25 19:10:08 +00:00
Carl Eugen Hoyos 0c59074868 Fix compilation without --enable-gpl.
Originally committed as revision 23789 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-25 19:06:29 +00:00
Carl Eugen Hoyos 96da2a6967 Cosmetics: Fix indentation.
Originally committed as revision 23785 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-25 18:34:03 +00:00
Jason Garrett-Glaser 4af8cdfc3f 16x16 and 8x8c x86 SIMD intra pred functions for VP8 and H.264
Originally committed as revision 23783 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-25 18:25:49 +00:00
Vitor Sessak 89c7d8058c Fix compilation on x64.
Originally committed as revision 23753 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-24 08:53:32 +00:00
Vitor Sessak 57dbd12b6d Fix asm constraints in apply_window()
Originally committed as revision 23752 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-24 08:46:47 +00:00
Vitor Sessak bc2b368215 SSE-optimized MP3 floating point windowing functions
Originally committed as revision 23750 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-24 07:44:50 +00:00
Jason Garrett-Glaser 2966cc1849 Update x264asm header files to latest versions.
Modify the asm accordingly.
GLOBAL is now no longoer necessary for PIC-compliant loads.

Originally committed as revision 23739 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-23 19:20:46 +00:00
David Conrad 413abbe164 Add bitexact versions of put_no_rnd_pixels8 _x2 and _y2 for vp3/theora
Originally committed as revision 23463 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-04 04:46:26 +00:00
David Conrad 179655b6c6 vp3: The DC-only IDCT is surprisingly not supposed to be bitexact to the
full IDCT. Fix this.

Originally committed as revision 23358 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-05-28 07:01:34 +00:00