Commit Graph

58 Commits

Author SHA1 Message Date
Gil Pedersen 257de5fb25 h264dsp_mmx: Add #ifdefs around some mmxext functions on x86_64.
This fixes linking errors due to undefined symbols on x86_64 OS X.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-05-16 15:35:53 +02:00
Jason Garrett-Glaser 5705b02079 10-bit H.264 x86 chroma v loopfilter asm
Also delete some unused deblock asm macros.
2011-05-11 11:09:10 -07:00
Jason Garrett-Glaser 9f3d6ca4f1 Port x86 10-bit H.264 deblock asm from x264 2011-05-10 20:02:15 -07:00
Jason Garrett-Glaser 8ad77b65b5 Update x86 H.264 deblock asm
Includes AVX versions from x264.
2011-05-10 20:01:58 -07:00
Ronald S. Bultje 86b29553f8 h264dsp_mmx: place bracket outside #if/#endif block.
Should fix compile on systems missing yasm/nasm.
2011-05-10 08:39:38 -04:00
Oskar Arvidsson 19a0729b4c Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder.
This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).

Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.

Preparatory patch for high bit depth h264 decoding support.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-05-10 07:24:36 -04:00
Mans Rullgard 2912e87a6c Replace FFmpeg with Libav in licence headers
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-19 13:33:20 +00:00
Jason Garrett-Glaser 19fb234e4a H.264: split luma dc idct out and implement MMX/SSE2 versions
About 2.5x the speed.

NOTE: the way that the asm code handles large qmuls is a bit suboptimal.
If x264-style dequant was used (separate shift and qmul values), it might
be possible to get some extra speed.

Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk
2011-01-14 21:34:25 +00:00
Ronald S. Bultje a52ffc3f54 Move static inline function to a macro, so that constant propagation in
inline asm works for gcc-3.x also (hopefully). Should fix gcc-3.x FATE
breakage after r25254.

Originally committed as revision 25262 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 17:42:26 +00:00
Ronald S. Bultje cd17285e6c Merge b_idx and edge variables, and optimize the ASM to directly load variables
from memory locations/offsets depending on b_idx plus constants, rather than
having gcc do this. This saves several lea calls and together saves about
10 cycles in h264_loop_filter_strength_mmx2().

Originally committed as revision 25256 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 14:04:39 +00:00
Ronald S. Bultje 0cc8a5d088 Remove mv_mask variable. Replace the related pand -1/0 instructions by either
a pxor, or remove the instruction alltogether. Altogether, this saves 1
instruction.

Originally committed as revision 25255 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 14:03:30 +00:00
Ronald S. Bultje c0673f2cf4 Remove d_idx as a variable, and instead load it as a constant in the asm.
This has no measurable speed effect because the surrounding code doesn't
take advantage of this yet.

Originally committed as revision 25254 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 14:02:32 +00:00
Ronald S. Bultje 2c3135f6d3 Unroll inner bidir loop in h264_loop_filter_strength_mmx2(), which gets rid
of the d_idx variable and therefore allows for future optimizations. No speed
difference by this commit itself.

Originally committed as revision 25253 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 13:35:24 +00:00
Ronald S. Bultje 4b81511cab Unloop the outer loop in h264_loop_filter_strength_mmx2(), which allows
inlining various constants within the loop code. 20 cycles faster on
cathedral sample.

Originally committed as revision 25252 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 13:34:20 +00:00
Ronald S. Bultje 7e117771cd Remove unused variable.
Originally committed as revision 25173 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-24 15:31:46 +00:00
Måns Rullgård c0bc8b9afb x86: disable SSE functions using stack when stack is not aligned
This fixes crashes with ICC 10.1.

Originally committed as revision 25153 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-21 17:57:21 +00:00
Måns Rullgård f41237c9db x86: remove hack disabling sse2 h264 loop filter with 32-bit icc
Originally committed as revision 25146 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-18 20:44:32 +00:00
Ronald S. Bultje 1d16a1cf99 Rename h264_idct_sse2.asm to h264_idct.asm; move inline IDCT asm from
h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now
coded in asm instead of C, this is (depending on the function) up to 50%
faster for cases where gcc didn't do a great job at looping.

Since h264_idct_add8() is now faster than the manual loop setup in h264.c,
in-asm idct calling can now be enabled for chroma as well (see r16207). For
MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does
the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%.

Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-14 13:36:26 +00:00
Jason Garrett-Glaser 8acb554aff LGPL SSE2 H.264 iDCT
This leaves no more GPL-only H.264 decoding asm code.

Approved by Loren.

Originally committed as revision 25092 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-10 02:25:12 +00:00
Stefano Sabatini c6c98d0897 Move mm_support() from libavcodec to libavutil, make it a public
function and rename it to av_get_cpu_flags().

Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-08 15:07:14 +00:00
Stefano Sabatini 7160bb716b Rename FF_MM_ symbols related to CPU features flags as AV_CPU_FLAG_
symbols, and move them from libavcodec/avcodec.h to libavutil/cpu.h.

Originally committed as revision 25040 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-04 09:59:08 +00:00
Ronald S. Bultje 2c166c3af1 Port latest x264 deblock asm (before they moved to using NV12 as internal
format), LGPL'ed with permission from Jason and Loren. This includes mmx2
code, so remove inline asm from h264dsp_mmx.c accordingly.

Originally committed as revision 25031 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-03 16:52:46 +00:00
Ronald S. Bultje a33a2562c1 Rename h264_weight_sse2.asm to h264_weight.asm; add 16x8/8x16/8x4 non-square
biweight code to sse2/ssse3; add sse2 weight code; and use that same code to
create mmx2 functions also, so that the inline asm in h264dsp_mmx.c can be
removed. OK'ed by Jason on IRC.

Originally committed as revision 25019 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-01 20:56:16 +00:00
Ronald S. Bultje 14bc1f2485 Split h264dsp_mmx.c (which was #included in dsputil_mmx.c) in h264_qpel_mmx.c,
still #included in dsputil_mmx.c and is part of DSPContext, and h264dsp_mmx.c,
which represents H264DSPContext and is now compiled on its own.

Originally committed as revision 25018 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-01 20:48:59 +00:00
Ronald S. Bultje de1c253bab Split intra prediction initialization (i.e. assigning of function pointers)
into its own file, it doesn't belong in h264dsp_mmx.c (much less so in
dsputil_mmx.c).

Originally committed as revision 24990 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 16:34:13 +00:00
Ronald S. Bultje d0eb5a1174 Move H264 chroma MC from inline asm to yasm. This fixes VP3/5/6 and VC-1
fate failures on Win64.

Originally committed as revision 24989 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 16:31:04 +00:00
Ronald S. Bultje 7e7c4b6008 Put ff_ prefix on non-static {put_signed,put,add}_pixels_clamped_mmx()
functions.

Originally committed as revision 24987 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 16:22:27 +00:00
Måns Rullgård c0ec9918b0 Remove global mm_flags variable
Originally committed as revision 24909 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-24 17:47:05 +00:00
Jason Garrett-Glaser 4a384de5b8 Split h264dsp and h264pred in configure.
Many H.264 derivatives, like RV40 and VP8, use the H.264 prediction functions
but not the weight/loopfilter functions.
This should reduce the size of builds with one of these derivatives but without
H.264 decoding itself.

Originally committed as revision 24741 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-07 23:10:25 +00:00
Eli Friedman c12d6955e2 H.264: SSE2/SSSE3 weighted prediction asm
Patch by Eli Friedman <eli.friedman at gmail dot com>

Originally committed as revision 24702 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-05 00:13:38 +00:00
Jason Garrett-Glaser 17dc7c7a60 Fix h264/vp8 intra pred on Athlon XP
Whose idea was it to have a CPU that didn't SIGILL on an invalid instruction?

Originally committed as revision 23927 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-01 10:29:47 +00:00
Jason Garrett-Glaser 29e719377f Add missing mm_support call toff_h264_pred_init_x86.
I'm not sure if this is supposed to be here, but it can't hurt.

Originally committed as revision 23885 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-29 12:28:06 +00:00
Jason Garrett-Glaser bc14f04b2f MMXEXT version of vp8 4x4 vertical pred
Originally committed as revision 23876 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-29 00:23:52 +00:00
Jason Garrett-Glaser fb9927ad7d Add mmx/mmxext/ssse3 4x4 TM intra pred functions for vp8
Originally committed as revision 23875 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-28 23:53:07 +00:00
Jason Garrett-Glaser 270a85d259 Fix some intra pred MMX functions that used MMXEXT instructions
Also add predict_4x4_dc MMXEXT function for vp8/h264.

Originally committed as revision 23873 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-28 23:35:17 +00:00
Baptiste Coudurier 50f70541d3 Change MMXEXT to MMX2, MMXEXT is deprecated
Originally committed as revision 23865 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-28 21:12:00 +00:00
Måns Rullgård 1f65b67c46 Fix x86 build with h264dsp disabled
Originally committed as revision 23844 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-28 10:02:15 +00:00
Carl Eugen Hoyos 96da2a6967 Cosmetics: Fix indentation.
Originally committed as revision 23785 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-25 18:34:03 +00:00
Jason Garrett-Glaser 4af8cdfc3f 16x16 and 8x8c x86 SIMD intra pred functions for VP8 and H.264
Originally committed as revision 23783 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-25 18:25:49 +00:00
Reimar Döffinger 1c71b5c89a Replace more "m" constraints with MANGLE to fix compilation issues
with x86_32 gcc 4.4.4 and -fPIC.

Originally committed as revision 23082 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-05-10 21:16:08 +00:00
Reimar Döffinger 27eecec359 Convert two "m" constraints to MANGLE to fix compilation with some compilers.
Originally committed as revision 22760 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-04-01 16:52:14 +00:00
Måns Rullgård 84dc2d8afa Remove DECLARE_ALIGNED_{8,16} macros
These macros are redundant.  All uses are replaced with the generic
DECLARE_ALIGNED macro instead.

Originally committed as revision 22233 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-03-06 14:24:59 +00:00
Loren Merritt 900479bb74 optimize h264_loop_filter_strength_mmx2
244->160 cycles on core2

Originally committed as revision 21462 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-26 17:17:48 +00:00
Måns Rullgård c67278098d Move array specifiers outside DECLARE_ALIGNED() invocations
Originally committed as revision 21377 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-22 03:25:11 +00:00
David Conrad 1f630b9717 Use two separate memory arguments since 8+() is invalid gas syntax
Originally committed as revision 21360 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-21 09:46:57 +00:00
Michael Niedermayer b4c2ada528 Attempt to fix asm compilation failure.
Only tested on gcc 4 & x86_64.

Originally committed as revision 21355 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-20 19:23:19 +00:00
David Conrad c4f2b6dce3 Use constant offsets for memory operands since gcc is unable to
This fixes gcc failing to fit 6 memory locations into 7 registers on x86-32

Originally committed as revision 21337 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-20 00:34:10 +00:00
Michael Niedermayer 9ac4548ff7 Fix h264_loop_filter_strength_mmx2() so it works with b frames.
Originally committed as revision 21327 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-19 16:40:36 +00:00
Michael Niedermayer ebddd2e253 Remove -2 -> -1 remapping, its not needed anymore as we must remap all
references per LUT anyway.

Originally committed as revision 21323 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-19 14:28:19 +00:00
Ramiro Polla 74a841af8b Replace more uses of __attribute__((aligned)) by DECLARE_ALIGNED.
Originally committed as revision 19089 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-06-04 23:25:09 +00:00