Vitor Sessak
6204feb160
dct32: Add AVX implementation of 32-point DCT
2011-05-21 17:42:26 +02:00
Vitor Sessak
4e653b98c8
dct32: Change pass 6 permutation to allow for AVX implementation
2011-05-21 17:42:26 +02:00
Vitor Sessak
3758eb0eb9
dct32: port SSE 32-point DCT to YASM
2011-05-21 17:42:26 +02:00
Diego Biurrun
153382e1b6
multiple inclusion guard cleanup
...
Add missing multiple inclusion guards; clean up #endif comments;
add missing library prefixes; keep guard names consistent.
2011-05-21 13:48:10 +02:00
Dave Yeo
d69f9a4234
Add support for a.out object format to assembler macros.
...
This format is still used by e.g. OS/2.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-05-20 17:52:21 +02:00
Mans Rullgard
0b5e44ed29
mpegaudiodsp: fix x86 and ppc makefiles
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-05-19 16:32:24 +01:00
Mans Rullgard
c4f5c2d6f4
Move some mpegaudio functions to new mpegaudiodsp subsystem
...
This separation allows these functions to be used in a cleaner
fashion from other codecs (e.g. qdm2) and simplifies creating
optimised versions of them.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-05-19 12:25:34 +01:00
Justin Ruggles
e98a95e779
10l: wrap float_interleave functions in HAVE_YASM.
...
fixes compilation with --disable-yasm
2011-05-18 20:18:08 -04:00
Justin Ruggles
32f8fb8ecf
Add float_interleave() to FmtConvertContext with x86-optimized versions.
...
Partially based on patches by clsid2 in ffdshow-tryout.
ff_float_interleave6() x86 improvements by Loren Merrit.
2011-05-18 17:27:05 -04:00
Daniel Kang
d0005d347d
Modify x86util.asm to ease transitioning to 10-bit H.264 assembly.
...
Arguments for variable size instructions are added to many macros, along
with other various changes. The x86util.asm code was ported from x264.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-05-17 20:44:48 +02:00
Gil Pedersen
257de5fb25
h264dsp_mmx: Add #ifdefs around some mmxext functions on x86_64.
...
This fixes linking errors due to undefined symbols on x86_64 OS X.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-05-16 15:35:53 +02:00
Diego Biurrun
888fa31eca
Fix FSF address copy paste error in some license headers.
2011-05-14 21:32:31 +02:00
Jason Garrett-Glaser
5705b02079
10-bit H.264 x86 chroma v loopfilter asm
...
Also delete some unused deblock asm macros.
2011-05-11 11:09:10 -07:00
Jason Garrett-Glaser
9f3d6ca4f1
Port x86 10-bit H.264 deblock asm from x264
2011-05-10 20:02:15 -07:00
Jason Garrett-Glaser
8ad77b65b5
Update x86 H.264 deblock asm
...
Includes AVX versions from x264.
2011-05-10 20:01:58 -07:00
Ronald S. Bultje
86b29553f8
h264dsp_mmx: place bracket outside #if/#endif block.
...
Should fix compile on systems missing yasm/nasm.
2011-05-10 08:39:38 -04:00
Oskar Arvidsson
19a0729b4c
Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder.
...
This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).
Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.
Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-05-10 07:24:36 -04:00
Diego Biurrun
a734fa575f
Remove disabled non-optimized code variants.
2011-04-29 20:01:13 +02:00
Vitor Sessak
9d35fa520e
Add AVX FFT implementation.
...
Signed-off-by: Reinhard Tartler <siretart@tauware.de>
2011-04-26 18:25:24 +02:00
Vitor Sessak
33cbfa6fa3
Update x86inc.asm from x264 to allow AVX emulation using SSE and MMX.
...
Signed-off-by: Reinhard Tartler <siretart@tauware.de>
2011-04-26 18:18:22 +02:00
Alexander Strange
1500be13f2
dsputil: allow to skip drawing of top/bottom edges.
2011-03-26 17:45:38 -04:00
Justin Ruggles
e6e9823488
Add apply_window_int16() to DSPContext with x86-optimized versions and use it
...
in the ac3_fixed encoder.
2011-03-22 21:08:30 -04:00
Mans Rullgard
0aded9484d
Move dct and rdft definitions to separate files
...
This leaves fft.h with only the core FFT and MDCT definitions
thus making it more managable.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-20 17:15:33 +00:00
Mans Rullgard
2912e87a6c
Replace FFmpeg with Libav in licence headers
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-19 13:33:20 +00:00
Justin Ruggles
0f999cfddb
ac3enc: add float_to_fixed24() with x86-optimized versions to AC3DSPContext
...
and use in scale_coefficients() for the floating-point AC-3 encoder.
2011-03-17 16:46:48 -04:00
Justin Ruggles
79414257e2
mathops: fix MULL() when the compiler does not inline the function.
...
If the function is not inlined, an immmediate cannot be used for the
shift parameter, so the %cl register must be used instead in that case.
This fixes compilation for x86-32 using gcc with --disable-optimizations.
2011-03-15 20:49:37 -04:00
Justin Ruggles
aaff3b312e
mathops: change "g" constraint to "rm" in x86-32 version of MUL64().
...
The 1-arg imul instruction cannot take an immediate argument, only a register
or memory argument.
2011-03-15 13:43:47 -04:00
Justin Ruggles
b181b8fb96
mathops: convert MULL/MULH/MUL64 to inline functions rather than macros.
...
This fixes unexpected name collisions that were occurring with variables
declared within the macros.
It also fixes the fate-acodec-ac3_fixed regression test on x86-32.
2011-03-15 13:43:47 -04:00
Justin Ruggles
f1efbca5e9
ac3enc: add SIMD-optimized shifting functions for use with the fixed-point AC3 encoder.
2011-03-14 08:45:31 -04:00
Mans Rullgard
a5444fee06
Add CONFIG_AC3DSP symbol to simplify makefiles
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-12 11:35:26 +00:00
Ronald S. Bultje
bf6fa73245
dsputil_mmx.c: remove ff_vector128.
...
Remove ff_vector128, it is identical to ff_pb_80.
2011-02-19 10:51:15 -05:00
Ronald S. Bultje
12802ec060
dsputil: move VC1-specific stuff into VC1DSPContext.
2011-02-17 17:35:35 -05:00
Justin Ruggles
1f004fc512
ac3dsp: Change punpckhqdq to movhlps in ac3_max_msb_abs_int16().
...
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-02-16 14:08:34 -05:00
Justin Ruggles
fbb6b49dab
ac3enc: Add x86-optimized function to speed up log2_tab().
...
AC3DSPContext.ac3_max_msb_abs_int16() finds the maximum MSB of the absolute
value of each element in an array of int16_t.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-02-13 16:49:39 -05:00
Loren Merritt
e6b1ed693a
FFT: factor a shuffle out of the inner loop and merge it into fft_permute.
...
6% faster SSE FFT on Conroe, 2.5% on Penryn.
Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>
2011-02-13 15:36:39 +01:00
Justin Ruggles
dda3f0ef48
Add x86-optimized versions of exponent_min().
...
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-02-10 15:32:47 -05:00
Ronald S. Bultje
17cf7c68ed
Fix ff_emu_edge_core_sse() on Win64.
...
Fix emu_edge_v_extend_15 to be <128 bytes on Win64, by being more strict
on the size of registers and which registers are being used for operations
where multiple are available. This fixes segfaults in emulated_edge()
function calls on Win64.
2011-02-08 18:25:12 -05:00
Justin Ruggles
c73d99e672
Separate format conversion DSP functions from DSPContext.
...
This will be beneficial for use with the audio conversion API without
requiring it to depend on all of dsputil.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-02-02 02:44:53 +00:00
Alex Converse
770c410fbb
Fix ff_imdct_calc_sse() on gcc-4.6
...
Gcc 4.6 only preserves the first value when using an array with an "m"
constraint.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-02-02 02:40:05 +00:00
Ronald S. Bultje
81f2a3f4ff
Implement a SIMD version of emulated_edge_mc() for x86.
...
From ~550 cycles (C version) to 170 (SSE/x86-64), 206 (MMX/x86-32)
and 196 (SSE2/x86-32) cycles.
2011-01-31 20:55:56 -05:00
Justin Ruggles
d19b744a36
cosmetics: indentation
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-31 20:30:15 +00:00
Justin Ruggles
80ba1ddb58
Remove unneeded add bias from 3 functions.
...
DSPContext.vector_fmul_window()
DCADSPContext.lfe_fir()
SynthFilterContext.synth_filter_float()
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-31 20:28:42 +00:00
Mans Rullgard
80944df720
x86: fix overflow in h264 8x8 planar prediction
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-24 23:24:28 +00:00
Justin Ruggles
6eabb0d3ad
Change DSPContext.vector_fmul() from dst=dst*src to dest=src0*src1.
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-22 17:53:27 +00:00
Justin Ruggles
1c189fc533
cosmetics related to LPC changes.
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-21 19:59:08 +00:00
Justin Ruggles
77a78e9bdc
Separate window function from autocorrelation.
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-21 19:59:08 +00:00
Justin Ruggles
56f8952b25
Move lpc_compute_autocorr() from DSPContext to a new struct LPCContext.
...
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-21 19:58:59 +00:00
Ronald S. Bultje
b9c7f66e6d
Fix horizontal/horizontal_up 8x8l intra prediction x86/simd functions.
...
The original functions did not work correctly for edge pixels, e.g.
when CODEC_FLAG_EMU_EDGE is set, leading to corrupt output in e.g. VLC.
Based on a patch by Daniel Kang <daniel d kang gmail com>.
Signed-off-by: Ronald S. Bultje <rsbultje gmail com>
2011-01-19 20:34:42 -05:00
Mans Rullgard
ef4a65149d
Replace ASMALIGN() with .p2align
...
This macro has unconditionally used .p2align for a long time and
serves no useful purpose.
2011-01-18 20:48:24 +00:00
Mans Rullgard
ac3c9d0169
x86: remove VLA in ac3_downmix_sse
2011-01-18 20:48:24 +00:00