Commit Graph

99 Commits

Author SHA1 Message Date
Ronald S. Bultje ed63f527f2 Fix build if yasm is not available. 2011-06-18 08:34:14 -04:00
Daniel Kang f188a1e0ca H.264: Add x86 assembly for 10-bit MC Chroma H.264 functions.
Mainly ported from 8-bit H.264 MC Chroma.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-06-18 07:52:19 -04:00
Jason Garrett-Glaser c90b94424c 4:4:4 H.264 decoding support
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
2011-06-13 21:16:30 -07:00
Jason Garrett-Glaser 504811baea Roll back 4:4:4 H.264 for now
Needs some ARM/PPC asm modifications.
2011-06-13 13:38:46 -07:00
Jason Garrett-Glaser c9c493872c 4:4:4 H.264 decoding support
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
2011-06-13 12:21:39 -07:00
Jason Garrett-Glaser 9f3d6ca4f1 Port x86 10-bit H.264 deblock asm from x264 2011-05-10 20:02:15 -07:00
Oskar Arvidsson 19a0729b4c Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder.
This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).

Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.

Preparatory patch for high bit depth h264 decoding support.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-05-10 07:24:36 -04:00
Alexander Strange 1500be13f2 dsputil: allow to skip drawing of top/bottom edges. 2011-03-26 17:45:38 -04:00
Justin Ruggles e6e9823488 Add apply_window_int16() to DSPContext with x86-optimized versions and use it
in the ac3_fixed encoder.
2011-03-22 21:08:30 -04:00
Mans Rullgard 2912e87a6c Replace FFmpeg with Libav in licence headers
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-19 13:33:20 +00:00
Ronald S. Bultje bf6fa73245 dsputil_mmx.c: remove ff_vector128.
Remove ff_vector128, it is identical to ff_pb_80.
2011-02-19 10:51:15 -05:00
Ronald S. Bultje 12802ec060 dsputil: move VC1-specific stuff into VC1DSPContext. 2011-02-17 17:35:35 -05:00
Justin Ruggles c73d99e672 Separate format conversion DSP functions from DSPContext.
This will be beneficial for use with the audio conversion API without
requiring it to depend on all of dsputil.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-02-02 02:44:53 +00:00
Ronald S. Bultje 81f2a3f4ff Implement a SIMD version of emulated_edge_mc() for x86.
From ~550 cycles (C version) to 170 (SSE/x86-64), 206 (MMX/x86-32)
and 196 (SSE2/x86-32) cycles.
2011-01-31 20:55:56 -05:00
Justin Ruggles d19b744a36 cosmetics: indentation
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-31 20:30:15 +00:00
Justin Ruggles 80ba1ddb58 Remove unneeded add bias from 3 functions.
DSPContext.vector_fmul_window()
DCADSPContext.lfe_fir()
SynthFilterContext.synth_filter_float()

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-31 20:28:42 +00:00
Justin Ruggles 6eabb0d3ad Change DSPContext.vector_fmul() from dst=dst*src to dest=src0*src1.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-22 17:53:27 +00:00
Mans Rullgard ef4a65149d Replace ASMALIGN() with .p2align
This macro has unconditionally used .p2align for a long time and
serves no useful purpose.
2011-01-18 20:48:24 +00:00
Mans Rullgard ac3c9d0169 x86: remove VLA in ac3_downmix_sse 2011-01-18 20:48:24 +00:00
Ronald S. Bultje ec3233a855 Fix ff_pw_3 alignment.
Originally committed as revision 26344 to svn://svn.ffmpeg.org/ffmpeg/trunk
2011-01-14 23:26:34 +00:00
Jason Garrett-Glaser 19fb234e4a H.264: split luma dc idct out and implement MMX/SSE2 versions
About 2.5x the speed.

NOTE: the way that the asm code handles large qmuls is a bit suboptimal.
If x264-style dequant was used (separate shift and qmul values), it might
be possible to get some extra speed.

Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk
2011-01-14 21:34:25 +00:00
Ronald S. Bultje 8d147f1f60 For rounding in chroma MC SSSE3, use 16-byte pw_3/4 instead of reading 8 bytes
and then using movlhps to dup it into the higher half of the register.

Originally committed as revision 26086 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-24 17:23:22 +00:00
Baptiste Coudurier 90f1f3bf00 In yadif filter, declare asm constants directly to avoid dependency on libavcodec
Originally committed as revision 25895 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-06 00:14:15 +00:00
Baptiste Coudurier 9e95999e2a 10l, add ff_pw_1 to dsputil_mmx for yadif sse2
Originally committed as revision 25881 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-04 13:06:06 +00:00
İsmail Dönmez 80e33d2451 dsputil: Use explicit movzbl instead of movzx
This fixes compilation with the latest clang trunk version.

Patch by İsmail Dönmez, ismail at namtrac dot org

Originally committed as revision 25628 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-11-01 19:35:51 +00:00
Ramiro Polla 153ca56b38 xmm_clobbers: list xmm registers first in clobber list
suncc does not like the leading commas inside the macro, but it has no problem
with trailing commas.

Originally committed as revision 25615 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 18:14:48 +00:00
Ramiro Polla 5d543a3d13 dsputil_mmx: add xmm registers to clobber list
Originally committed as revision 25611 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 13:57:58 +00:00
Ramiro Polla 559738eff3 dsputil_mmx: prefer xmm registers below xmm6 when they are available
Originally committed as revision 25606 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-31 13:13:53 +00:00
Ronald S. Bultje dd68d4db43 MMX, MMX2, SSE2 and SSSE3 optimizations for pred16x16/8x8_plane H264 intra
prediction (plus some with different rounding for svq3/rv40). Speedup (for
SSSE3) about ~6-fold, 3.6% faster overall with cathedral sample.

Originally committed as revision 25361 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-05 22:06:18 +00:00
Eli Friedman 329d689f75 Use sse2 variant of put_pixels16() for no_rnd also. Provides a minor speed
increase to e.g. vc1, snow and mpeg decoding.

Patch by Eli Friedman <eli dot friedman gmail com>.

Originally committed as revision 25259 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 15:34:43 +00:00
Stefano Sabatini c6c98d0897 Move mm_support() from libavcodec to libavutil, make it a public
function and rename it to av_get_cpu_flags().

Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-08 15:07:14 +00:00
Stefano Sabatini 7160bb716b Rename FF_MM_ symbols related to CPU features flags as AV_CPU_FLAG_
symbols, and move them from libavcodec/avcodec.h to libavutil/cpu.h.

Originally committed as revision 25040 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-04 09:59:08 +00:00
Ronald S. Bultje 2c166c3af1 Port latest x264 deblock asm (before they moved to using NV12 as internal
format), LGPL'ed with permission from Jason and Loren. This includes mmx2
code, so remove inline asm from h264dsp_mmx.c accordingly.

Originally committed as revision 25031 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-03 16:52:46 +00:00
Ronald S. Bultje 14bc1f2485 Split h264dsp_mmx.c (which was #included in dsputil_mmx.c) in h264_qpel_mmx.c,
still #included in dsputil_mmx.c and is part of DSPContext, and h264dsp_mmx.c,
which represents H264DSPContext and is now compiled on its own.

Originally committed as revision 25018 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-01 20:48:59 +00:00
Ronald S. Bultje 79ce0f002e Fix compilation failure if yasm is disabled (missing vp3 symbols).
Originally committed as revision 24992 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 20:30:40 +00:00
Ronald S. Bultje d0eb5a1174 Move H264 chroma MC from inline asm to yasm. This fixes VP3/5/6 and VC-1
fate failures on Win64.

Originally committed as revision 24989 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 16:31:04 +00:00
Ronald S. Bultje e9f5f020c6 Move VP3 IDCT functions from inline ASM to YASM. This fixes part of the VP3/5/6
issues on Win64.

Originally committed as revision 24988 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 16:25:46 +00:00
Ronald S. Bultje 7e7c4b6008 Put ff_ prefix on non-static {put_signed,put,add}_pixels_clamped_mmx()
functions.

Originally committed as revision 24987 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 16:22:27 +00:00
Ronald S. Bultje 3a0885146c Move vp6_filter_diag4() from DSPContext to VP56DSPContext.
Originally committed as revision 24921 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-25 13:42:28 +00:00
Måns Rullgård c0ec9918b0 Remove global mm_flags variable
Originally committed as revision 24909 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-24 17:47:05 +00:00
Eli Friedman c12d6955e2 H.264: SSE2/SSSE3 weighted prediction asm
Patch by Eli Friedman <eli.friedman at gmail dot com>

Originally committed as revision 24702 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-05 00:13:38 +00:00
Måns Rullgård f079a64aea Move cavs dsp functions to their own struct
Originally committed as revision 24685 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-03 20:59:00 +00:00
Loren Merritt c7b1d9768c relicense h264 deblock sse2 to lgpl
Originally committed as revision 24408 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-22 00:39:49 +00:00
David Conrad c7eec58170 Move ff_pw_* from vc1dsp_mmx.c to dsputil_mmx.c
Should fix compilation with icc and should help prevent any future duplicates

Originally committed as revision 24380 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-21 10:02:03 +00:00
Ronald S. Bultje e9e456d850 VP8 MBedge loopfilter MMX/MMX2/SSE2 functions for both luma (width=16)
and chroma (width=8).

Originally committed as revision 24378 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-20 22:58:56 +00:00
Ronald S. Bultje a711eb4829 VP8 H/V inner loopfilter MMX/MMXEXT/SSE2 optimizations.
Originally committed as revision 24250 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-15 23:02:34 +00:00
David Conrad 7af8fbd348 Make ff_pw_4 128 bits
Originally committed as revision 24207 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-11 22:52:55 +00:00
Ronald S. Bultje f2a30bd840 Simple H/V loopfilter for VP8 in MMX, MMX2 and SSE2 (yay for yasm macros).
Originally committed as revision 24029 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-03 19:26:30 +00:00
Eli Friedman b3858964d6 Add const to some pointer parameters.
Patch by Eli Friedman,  eli D friedman A gmail

Originally committed as revision 23826 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-27 15:11:38 +00:00
Jason Garrett-Glaser 4af8cdfc3f 16x16 and 8x8c x86 SIMD intra pred functions for VP8 and H.264
Originally committed as revision 23783 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-25 18:25:49 +00:00