ffmpeg

Commit Graph

Author	SHA1	Message	Date
Ronald S. Bultje	5361e10a5e	proresdsp: port x86 assembly to cpuflags.	2012-07-27 11:43:06 -07:00
Ronald S. Bultje	bde73f28af	mpegaudio: bury inline asm under HAVE_INLINE_ASM.	2012-07-26 13:43:16 -07:00
Ronald S. Bultje	30b45d9c38	x86inc: automatically insert vzeroupper for YMM functions.	2012-07-26 13:43:16 -07:00
Ronald S. Bultje	a1878a88a1	vp3: don't use calls to inline asm in yasm code. Mixing yasm and inline asm is a bad idea, since if either yasm or inline asm is not supported by your toolchain, all of the asm stops working. Thus, better to use either one or the other alone. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2012-07-25 14:24:30 -04:00
Ronald S. Bultje	79195ce565	x86/dsputil: put inline asm under HAVE_INLINE_ASM. This allows compiling with compilers that don't support gcc-style inline assembly. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2012-07-25 14:24:27 -04:00
Yang Wang	845e92fd6a	dsputil_mmx: fix incorrect assembly code In ff_put_pixels_clamped_mmx(), there are two assembly code blocks. In the first block (in the unrolled loop), the instructions "movq 8%3, %%mm1 \n\t", and so forth, have problems. From above instruction, it is clear what the programmer wants: a load from p + 8. But this assembly code doesn’t guarantee that. It only works if the compiler puts p in a register to produce an instruction like this: "movq 8(%edi), %mm1". During compiler optimization, it is possible that the compiler will be able to constant propagate into p. Suppose p = &x[10000]. Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction becomes "movq 810000(%edx)". That is, it will stride by 810000 instead of 8. This will cause a segmentation fault. This error was fixed in the second block of the assembly code, but not in the unrolled loop. How to reproduce: This error is exposed when we build using Intel C++ Compiler, with IPO+PGO optimization enabled. Crashed when decoding an MJPEG video. Signed-off-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2012-07-25 14:22:18 -04:00
Jason Garrett-Glaser	85a3c19ed1	dsputil: x86: add SHUFFLE_MASK_W macro Simplifies pshufb masks that operate on words.	2012-07-22 16:56:58 -04:00
Diego Biurrun	9f97af2688	x86: dsputil: drop some unused CPU flag debug code	2012-07-19 10:17:56 +02:00
Mans Rullgard	28f9ab7029	vp3: move idct and loop filter pointers to new vp3dsp context This moves all VP3-specific function pointers from dsputil to a new vp3dsp context. There is no reason to ever use the VP3 IDCT where an MPEG2 IDCT is expected or vice versa. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-07-18 10:32:19 +01:00
Mans Rullgard	ab9f987661	build: add CONFIG_VP3DSP, reduce repetition in OBJS lists Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-07-18 10:32:18 +01:00
Martin Storsjö	f27386cdc7	x86: h264_intrapred: Don't add the 'd' suffix to the SPLATB_REG macro The SPLATB_REG macro already adds the 'd' suffix internally. This fixes building on Win64, which has been broken since `878e66902`. This worked for unix, where r2 happened to be rdx in this case, which with the first suffix rdxd was mapped to eax, and eaxd is defined back to eax. On win64 however, r2 happened to be R8 in this case, and R8d mapps to R8D just fine, but there's no mapping for R8Dd to anything. Signed-off-by: Martin Storsjö <martin@martin.st>	2012-07-06 21:07:23 +03:00
Diego Biurrun	878e669029	x86: h264_intrapred: use newly introduced SPLAT* and PSHUFLW macros	2012-07-05 17:37:11 +02:00
Loren Merritt	4d4752366f	x86inc: add SPLATB_LOAD, SPLATB_REG, PSHUFLW macros Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-07-05 17:37:11 +02:00
Diego Biurrun	d20f133ef9	x86: h264_intrapred: port to cpuflag macros	2012-07-05 17:37:10 +02:00
Martin Storsjö	07eeeb1d4f	vp8: Add ifdef guards around the sse2 loopfilter in the sse2slow branch too This was missed in the the previous commit in `70a1c800`. Signed-off-by: Martin Storsjö <martin@martin.st>	2012-07-05 09:39:01 +03:00
Martin Storsjö	70a1c8000f	vp8: loopfilter >=sse2 functions need aligned stack on x86-32. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-07-04 08:25:50 -07:00
Ronald S. Bultje	723b266d72	dsputilenc: group yasm and inline asm function pointer assignment.	2012-07-04 07:46:27 -07:00
Ronald S. Bultje	ceabc13f12	dsputilenc_mmx: split assignment of ff_sse16_sse2 to SSE2 section.	2012-06-30 09:24:52 -07:00
Ronald S. Bultje	66a02159ea	x86: fmtconvert: add special asm for float_to_int16_interleave_misc_* This gets rid of a variable-length array and a for loop in C code. Signed-off-by: Martin Storsjö <martin@martin.st>	2012-06-30 19:10:36 +03:00
Mans Rullgard	f2fd167835	x86: vc1: fix and enable optimised loop filter The problem is that the ssse3 psign instruction does the wrong thing here. Commit `ea60dfe` incorrectly removed a macro emulating this instruction for pre-ssse3 code. However, the emulation is incorrect, and the code relies on the behaviour of the macro. Specifically, the psign sets destination elements to zero where the corresponding source element is zero, whereas the emulation only negates destination elements where the source is negative. Furthermore, the PSIGNW_MMX macro in x86util.asm is totally bogus, which is why the original VC-1 code had an additional right shift when using it. Since the psign instruction cannot be used here, skip all the macro hell and use the working instruction sequence directly. None of this was noticed due a stray return statement in ff_vc1dsp_init_mmx() which meant that only the mmx version of the loop filter was ever used (before being removed in `ea60dfe`). Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-30 00:12:05 +01:00
Christophe Gisquet	a5bfa66df5	x86: fft: replace call to memcpy by a loop The function call was a mess to handle, and memcpy cannot make the assumptions we do in the new code. Tested on an IMC sample: 430c -> 370c. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-27 12:49:33 +01:00
Mans Rullgard	0595334892	x86: fft: elf64: fix PIC build In a 64-bit PIC build, external functions must be called through the PLT. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-25 22:58:18 +01:00
Mans Rullgard	8725da49a2	x86: fft: win64: fix stack alignment for memcpy() call	2012-06-25 15:10:39 +01:00
Mans Rullgard	8299260470	x86: fft: convert sse inline asm to yasm	2012-06-25 13:31:00 +01:00
Ronald S. Bultje	8123e0901f	x86: place some inline asm under #if HAVE_INLINE_ASM Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-25 13:23:12 +01:00
Mans Rullgard	0b6f973635	h264: use asm cabac reader under a generic condition This removes a dependency on implementation details from generic code and allows easy addition of the equivalent optimisation for other architectures than x86. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-23 22:14:21 +01:00
Diego Biurrun	fe07c9c6b5	x86: Only use optimizations with cmov if the CPU supports the instruction	2012-06-23 16:21:50 +02:00
Mans Rullgard	29686d6ea3	x86: remove unused inline asm macros from dsputil_mmx.h Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-23 14:14:06 +01:00
Mans Rullgard	685f5438bb	x86: move some inline asm macros to the only places they are used Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-23 14:14:06 +01:00
Diego Biurrun	a5a93fa8f5	cosmetics: do not use full path for local headers	2012-06-22 10:49:40 +02:00
Ronald S. Bultje	d9669eab0b	dwt: remove variable-length arrays Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-17 23:20:10 +01:00
Justin Ruggles	d5a7229ba4	Add a float DSP framework to libavutil Move vector_fmul() from DSPContext to AVFloatDSPContext.	2012-06-08 13:14:38 -04:00
Vitor Sessak	bac0729d9e	x86: use new schema for ASM macros Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2012-05-29 14:49:45 +02:00
Justin Ruggles	713548cbad	x86: lavc: use %if HAVE_AVX guards around AVX functions in yasm code. This is needed for older versions of yasm/nasm that do not support AVX. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-05-22 20:46:02 +02:00
Kieran Kunhya	5ff01259a8	Convert vector_fmul range of functions to YASM and add AVX versions Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>	2012-05-21 17:13:05 -04:00
Michael Kostylev	6797d1948b	x86: rv40: Mark rv40_weight functions as MMX2; they use MMX2 instructions.	2012-05-15 23:54:08 +02:00
Justin Ruggles	95a98ab3f0	ac3dsp: simplify x86 versions of ac3_max_msb_abs_int16 Simplifies the code by using cpuflags and a new macro. Also fixes the invalid use of the MMX2 pshufw operation in the MMX-only function.	2012-05-15 15:23:59 -04:00
Vitor Sessak	fcc456b829	x86: use more standard construct for setting ASM functions in FFT code Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-05-14 15:38:42 +02:00
Michael Kostylev	ea60dfe284	x86: vc1: drop MMX loop filter implementation, which uses MMX2 instructions.	2012-05-12 14:02:45 +02:00
Christophe Gisquet	110d0cdc9d	rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC Code mostly inspired by vp8's MC, however: - its MMX2 horizontal filter is worse because it can't take advantage of the coefficient redundancy - that same coefficient redundancy allows better code for non-SSSE3 versions Benchmark (rounded to tens of unit): V8x8 H8x8 2D8x8 V16x16 H16x16 2D16x16 C 445 358 985 1785 1559 3280 MMX* 219 271 478 714 929 1443 SSE2 131 158 294 425 515 892 SSSE3 120 122 248 387 390 763 End result is overall around a 15% speedup for SSSE3 version (on 6 sequences); all loop filter functions now take around 55% of decoding time, while luma MC dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-05-10 18:42:43 +02:00
Ronald S. Bultje	bec207f9f9	snowdsp: explicitily state instruction size. Fixes a compile error with clang at -O0.	2012-05-02 09:57:12 -07:00
Christophe GISQUET	e75d1d4f73	dsputil x86: revert a test back to its previous value Commit `356ee8d` caused the initial inversion. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-28 11:00:51 -07:00
Christophe Gisquet	fe5ed69dc7	rv34dsp x86: implement MMX2 inverse transform 141 cycles down to 51. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-28 10:58:47 -07:00
Roland Scheidegger	9b9df1cdff	h264: new assembly version of get_cabac for x86_64 with PIC This adds a hand-optimized assembly version for get_cabac much like the existing one, but it works if the table offsets are RIP-relative. Compared to the non-RIP-relative version this adds 2 lea instructions and it needs one extra register. get_cabac() gets about 40% faster, for an overall speedup of about 5%. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-28 09:43:25 -07:00
Roland Scheidegger	14e9ffc1e4	h264: use one table instead of several for cabac functions The reason is this is easier for PIC code (in particular on darwin...). Keep the old names as pointers (static in cabac_functions.h so gcc knows these are just immediate offsets) so the c code can nicely stay the same (alternatively could use offsets directly in the functions needing the tables). This should produce the same code as before with non-pic and better code (confirmed) with pic. The assembly uses the new table but still won't work for PIC case. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-28 08:26:12 -07:00
Roland Scheidegger	444f47b55c	h264: (trivial) remove unneeded macro argument in x86/cabac.h Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-28 08:24:56 -07:00
Mans Rullgard	2bcbd98459	Remove lowres video decoding This feature is complex, of questionable utility, and slows down normal decoding. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-04-21 18:56:19 +01:00
Mans Rullgard	95510be8c3	avcodec: remove AVCodecContext.dsp_mask This removes all references to AVCodecContext.dsp_mask and marks it for eviction at the next version bump. It has been superseded by av_set_cpu_flag_mask() which, unlike this field, works everywhere. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-04-21 18:30:01 +01:00
Ronald S. Bultje	87a246341b	h264: use proper PROLOGUE statement for a function using 8 registers. Fixes crashes when using biweight on win64.	2012-04-16 08:07:21 -07:00
Ronald S. Bultje	b089ca871a	dsputil: fix optimized emu_edge function on Win64. Recent register allocation changes (x86inc.asm update) changed the register order and thus opcodes for the inner loops. One of them became >128bytes, which confuses other parts of this function where it jumps to fixed-offset positions to extend the edge by fixed amounts. A simple register change fixes this.	2012-04-13 11:28:30 -07:00
Justin Ruggles	de7f22ab0c	ac3dsp: call femms/emms at the end of float_to_fixed24() for 3DNow and SSE Fixes ac3-encode and eac3-encode FATE test failures with SSE2 disabled. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-12 21:33:04 -07:00
Ronald S. Bultje	76538d7a78	h264: fix 10bit biweight functions after recent x86inc.asm fixes. This should have been updated in the x86inc.asm update, but was accidently forgotten.	2012-04-12 21:13:57 -07:00
Diego Biurrun	7bb3a302fe	build: Consistently handle conditional compilation for all optimization OBJS.	2012-04-12 09:00:49 +02:00
Henrik Gramner	729f90e268	x86inc improvements for 64-bit Add support for all x86-64 registers Prefer caller-saved register over callee-saved on WIN64 Support up to 15 function arguments Also (by Ronald S. Bultje) Fix up our asm to work with new x86inc.asm. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>	2012-04-11 15:47:00 -04:00
Christophe GISQUET	2130bd8f5b	rv40dsp x86: use only one register, for both increment and loop counter Around 10 cycles faster for luma. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-10 10:07:09 -07:00
Christophe GISQUET	272b252c01	rv40dsp: implement prescaled versions for biweight. Quite often, the original weights are multiple of 512. By prescaling them by 1/512 when they are computed (once per frame), no intermediate shifting is needed, and no prescaling on each call either. The x86 code already used that trick. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-10 10:06:48 -07:00
Christophe GISQUET	6b81da2fd0	dsputil x86: use SSE float instruction instead of SSE2 integer equivalent All the more required since the users are pure SSE functions. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-04 11:24:27 -07:00
Christophe GISQUET	cd88105f6f	dsputil x86: remove deprecated parameter from scalarproduct_int16 prototype Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-04 11:24:08 -07:00
Christophe GISQUET	f9888520cc	vp8dsp x86: perform rounding shift with a single instruction Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-04 11:23:36 -07:00
Ronald S. Bultje	a940198130	cabac: add overread protection to BRANCHLESS_GET_CABAC(). Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind	2012-03-28 08:01:29 -07:00
Ronald S. Bultje	448dc42571	cabac: increment jump locations by one in callers of BRANCHLESS_GET_CABAC().	2012-03-28 08:01:29 -07:00
Ronald S. Bultje	16f6e83f74	cabac: remove unused argument from BRANCHLESS_GET_CABAC_UPDATE().	2012-03-28 08:01:29 -07:00
Ronald S. Bultje	951014e5bb	cabac: use struct+offset instead of memory operand in BRANCHLESS_GET_CABAC().	2012-03-28 08:01:29 -07:00
Ronald S. Bultje	a0bdcb019e	h264: add overread protection to get_cabac_bypass_sign_x86().	2012-03-28 08:01:29 -07:00
Ronald S. Bultje	95bfa4ead7	h264: reindent get_cabac_bypass_sign_x86().	2012-03-28 08:01:29 -07:00
Ronald S. Bultje	db025929f2	h264: use struct offsets in get_cabac_bypass_sign_x86().	2012-03-28 08:01:29 -07:00
Diego Biurrun	ad0e31f134	build: prettyprinting cosmetics	2012-03-26 13:00:10 +02:00
Diego Biurrun	62ce9defb8	x86: dsputil: prettyprint gcc inline asm	2012-03-25 11:50:48 +02:00
Diego Biurrun	3b54912113	x86: K&R prettyprinting cosmetics for dsputil_mmx.c	2012-03-25 11:50:48 +02:00
Diego Biurrun	915a2a0a65	x86: conditionally compile H.264 QPEL optimizations	2012-03-25 11:50:45 +02:00
Diego Biurrun	3816642eab	dsputil_mmx: Surround QPEL macros by "do { } while (0);" blocks. This makes them safe to use in non-fully braced if-blocks and similar.	2012-03-25 11:48:37 +02:00
Ronald S. Bultje	71ea26811c	aacsbr: handle m_max values smaller than 4. Prevents a signflip in the counter, and a subsequent crash because of overreads/overwrites. Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind CC: libav-stable@libav.org	2012-03-23 12:56:08 -07:00
Ronald S. Bultje	a928ed3751	vp8: convert mbedge loopfilter x86 assembly to use named arguments.	2012-03-10 11:36:33 -08:00
Ronald S. Bultje	bee330e300	vp8: convert inner loopfilter x86 assembly to use named arguments.	2012-03-10 11:36:33 -08:00
Reimar Döffinger	6eda85e15b	sbrdsp.asm: convert all instructions to float/SSE ones. Since the values are floats, using the float operations makes sense, improves performance on some CPUs and makes the code SSE compatible instead of needing SSE2. Based on suggestion by Jason. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-03-07 13:50:13 -08:00
Christophe GISQUET	7e1ce6a6ac	dsputil: remove shift parameter from scalarproduct_int16 There is only one caller, which does not need the shifting. Other use cases are situations where different roundings would be needed. The x86 and neon versions are modified accordingly. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-03-07 10:29:52 -08:00
Diego Biurrun	1e9d55e45e	x86: Remove duplicated AVG_3DNOW_OP / AVG_MMX2_OP macros from h264_qpel_mmx.c.	2012-03-07 09:36:04 +01:00
Reimar Döffinger	b5161908e0	SBR DSP: fix SSE code to not use SSE2 instructions. movq from SSE register _to_ memory is an SSE2 instruction. Use the SSE movlps function instead that does the same thing. Signed-off-by: Reimar DÃ¶ffinger <Reimar.Doeffinger@gmx.de> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-03-06 13:40:35 -08:00
Mans Rullgard	356ee8d7de	x86: clean up ff_dsputil_init_mmx() This splits ff_dsputil_init_mmx() into multiple functions, one for each MMX/SSE level, somewhat simplifying the nested conditions. Signed-off-by: Mans Rullgard <mans@mansr.com> Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-03-05 14:40:03 +01:00
Ronald S. Bultje	b4188f0d46	vp8: convert simple loopfilter x86 assembly to use named arguments.	2012-03-03 20:40:00 -08:00
Ronald S. Bultje	8476ca3b4e	vp8: convert idct x86 assembly to use named arguments.	2012-03-03 20:40:00 -08:00
Ronald S. Bultje	21ffc78fd7	vp8: convert mc x86 assembly to use named arguments.	2012-03-03 20:40:00 -08:00
Ronald S. Bultje	28170f1a39	vp8: convert loopfilter x86 assembly to use cpuflags().	2012-03-03 20:40:00 -08:00
Ronald S. Bultje	e25be47154	vp8: convert idct/mc x86 assembly to use cpuflags().	2012-03-03 20:39:59 -08:00
Ronald S. Bultje	291c9b6285	h264: change underread for 10bit QPEL to overread. This prevents us from reading before the start of the buffer, and thus prevents crashes resulting from this behaviour. Fixes bug 237.	2012-03-02 10:33:05 -08:00
Ronald S. Bultje	45549339bc	vp8: disable mmx functions with sse/sse2 counterparts on x86-64. x86-64 is guaranteed to have at least SSE2, therefore the MMX/MMX2 functions will never be used in practice.	2012-03-02 10:32:05 -08:00
Ronald S. Bultje	bd66f073fe	vp8: change int stride to ptrdiff_t stride. On 64bit platforms with 32bit int, this means we won't have to sign- extend the integer anymore.	2012-03-02 10:31:50 -08:00
Ronald S. Bultje	b0c4f04338	h264: fix mmxext chroma deblock to use correct TC values.	2012-02-27 09:38:44 -08:00
Christophe GISQUET	2784d18791	SBR DSP x86: implement SSE sbr_hf_g_filt Unrolling the main loop to process, instead of 4 elements: - 8: minor gain of 2 cycles (not worth the extra object size) - 2: loss of 8 cycles. Assigning STEP to a register is a loss. Output address (Y) is almost always unaligned. Timings: - C (32/64 bits): 117/109 cycles - SSE: 57 cycles Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-02-23 15:50:09 -08:00
Christophe GISQUET	34454c761f	SBR DSP x86: implement SSE sbr_sum_square_sse The 32bits targets have been compiled with -mfpmath=sse for proper reference. sbr_sum_square C /32bits: 82c (unrolled)/102c C /64bits: 69c (unrolled)/82c SSE/32bits: 42c SSE/64bits: 31c Use of SSE4.1 dpps to perform the final sum is slower. Not unrolling to perform 8 operations in a loop yields 10 more cycles. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-02-23 15:50:06 -08:00
Ronald S. Bultje	3ab9a2a557	rv34: change most "int stride" into "ptrdiff_t stride". This prevents having to sign-extend on 64-bit systems with 32-bit ints, such as x86-64. Also fixes crashes on systems where we don't do it and arguments are not in registers, such as Win64 for all weight functions.	2012-02-20 14:58:25 -08:00
Ronald S. Bultje	8fb26950ed	h264: don't use redzone in loopfilter on win64. Red zone usage is not allowed in the Win64 ABI.	2012-02-19 15:31:03 -08:00
Christophe GISQUET	f3e084909b	mpegaudio: replace memcpy by SIMD code By replacing memcpy with an unrolled loop using the alignment knowledge it has, some speedup can be obtained. Before (gcc 4.6.1): ~400 cycles After: ~370 cycles Overall, around 2% speed increase when decoding a 2400s mp3 to f32le. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-02-15 20:11:54 -08:00
Martin Storsjö	efd29844eb	mpegvideo: Add ff_ prefix to nonstatic functions Signed-off-by: Martin Storsjö <martin@martin.st>	2012-02-15 22:07:23 +02:00
Martin Storsjö	873c89e2a6	dsputil: Add ff_ prefix to inv_zigzag_direct16 Signed-off-by: Martin Storsjö <martin@martin.st>	2012-02-15 22:06:42 +02:00
Martin Storsjö	9cf0841ef3	dsputil: Add ff_ prefix to the dsputil_init functions Signed-off-by: Martin Storsjö <martin@martin.st>	2012-02-15 22:06:34 +02:00
Justin Ruggles	d483bb58c3	ac3dsp: do not use pshufb in ac3_extract_exponents_ssse3() We need to do unsigned saturation in order to cover the corner case when the absolute coefficient value is 16777215 (the maximum value). Fixes Bug #216	2012-02-09 21:04:44 -05:00
Diego Biurrun	0bba26466f	cosmetics: Delete empty lines at end of file.	2012-02-09 12:26:45 +01:00
Ronald S. Bultje	ce1e250ee9	h264: manually save/restore XMM registers for functions using INIT_MMX. On Win64, these registers are callee-save, so not saving/restoring them correctly is a violation of ABI and can lead to crashes or corrupt data.	2012-02-08 10:31:14 -08:00
Ronald S. Bultje	4ff6dea390	pngdsp: swap argument inversion.	2012-02-07 14:32:26 -08:00

1 2 3 4 5 ...

607 Commits