ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2024-12-27 18:02:11 +00:00

Author	SHA1	Message	Date
Loren Merritt	7a1944b907	vf_hqdn3d: x86 asm 13% faster on penryn, 16% on sandybridge, 15% on bulldozer Not simd; a compiler should have generated this, but gcc didn't.	2012-08-26 10:49:14 +00:00
Justin Ruggles	6092dafb5a	lavr: x86: optimized 6-channel s16 to fltp conversion	2012-08-23 20:10:57 -04:00
Mans Rullgard	5b170c0bea	x86: remove FASTDIV inline asm GCC 4.3 and later do the right thing with the plain C code. Earlier versions in 32-bit mode generate one extra instruction, needlessly zeroing what would be the high half of the shifted value. At least two gcc configurations miscompile the inline asm in some situations. In 64-bit mode, all gcc versions generate imul r64, r64 followed by shr. On Intel i7 and later, this imul is faster 32-bit mul. On older Intel and all AMD, it is slightly slower. On Atom it is much slower. Considering where the FASTDIV macro is used, any overall negative performance impact of this change should be negligible. If anyone cares, they should file a bug against gcc and get the instruction selection fixed. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-22 14:29:10 +01:00
Martin Storsjö	33e112847d	Add more missing includes after removing the implicit common.h Signed-off-by: Martin Storsjö <martin@martin.st>	2012-08-16 10:49:54 +03:00
Martin Storsjö	70766c2182	Add some more missing includes after removing the implicit common.h Signed-off-by: Martin Storsjö <martin@martin.st>	2012-08-15 23:48:48 +03:00
Mans Rullgard	070a402b60	x86: move MANGLE() and related macros to libavutil/x86/asm.h These x86-specific macros do not belong in generic code. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-09 00:58:20 +01:00
Mans Rullgard	c318626ce2	x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h This puts x86-specific things in the x86/ subdirectory where they belong. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-09 00:58:20 +01:00
Mans Rullgard	edd8226795	x86: fix build with nasm 2.08 It appears that something goes wrong in old nasm versions when the %+ operator is used in the last argument of a macro invocation and this argument is tested with %ifdef within the macro. This patch rearranges the macro arguments such that the %+ operator is never used in the last argument.	2012-08-07 15:24:34 +01:00
Mans Rullgard	180d43bc67	x86: use nop cpu directives only if supported nasm does not support 'CPU foonop' directives. This adds a configure test for the directive and uses it only if supported. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-07 15:22:20 +01:00
Mans Rullgard	7238265052	x86: fix rNmp macros with nasm For some reason, nasm requires this. No harm done to yasm. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-07 15:21:58 +01:00
Mans Rullgard	a3df4781f4	x86: add colons after labels nasm prints a warning if the colon is missing. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-07 15:20:56 +01:00
Diego Biurrun	239fdf1b4a	x86: build: replace mmx2 by mmxext Refactoring mmx2/mmxext YASM code with cpuflags will force renames. So switching to a consistent naming scheme beforehand is sensible. The name "mmxext" is more official and widespread and also the name of the CPU flag, as reported e.g. by the Linux kernel.	2012-08-03 22:51:05 +02:00
Diego Biurrun	ca844b7be9	x86: Use consistent 3dnowext function and macro name suffixes Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to "3dnowext", which is a more common name of the CPU flag, as reported e.g. by the Linux kernel, unifies this.	2012-08-03 14:00:47 +02:00
Loren Merritt	f8d8fe255d	x86inc: clip num_args to 7 on x86-32. This allows us to unconditionally set the cglobal num_args parameter to a bigger value, thus making writing yasm code even easier than before. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-07-28 08:29:45 -07:00
Ronald S. Bultje	96c9cc1094	x86inc: sync to latest version from x264.	2012-07-28 08:29:44 -07:00
Justin Ruggles	79687079a9	x86: add support for fmaddps fma4 instruction with abstraction to avx/sse	2012-07-27 11:25:48 -04:00
Ronald S. Bultje	30b45d9c38	x86inc: automatically insert vzeroupper for YMM functions.	2012-07-26 13:43:16 -07:00
Jason Garrett-Glaser	85a3c19ed1	dsputil: x86: add SHUFFLE_MASK_W macro Simplifies pshufb masks that operate on words.	2012-07-22 16:56:58 -04:00
Ronald S. Bultje	358d854df8	x86/cpu: implement get/set_eflags using intrinsics Signed-off-by: Diego Biurrun <diego@biurrun.de> Signed-off-by: Martin Storsjö <martin@martin.st>	2012-07-10 14:33:32 +03:00
Ronald S. Bultje	c0ee695bd7	x86/cpu: implement support for cpuid through intrinsics Signed-off-by: Martin Storsjö <martin@martin.st>	2012-07-10 14:33:24 +03:00
Ronald S. Bultje	3f150ffba3	x86/cpu: implement support for xgetbv through intrinsics Signed-off-by: Martin Storsjö <martin@martin.st>	2012-07-10 14:33:17 +03:00
Ronald S. Bultje	07b287020c	x86/timer: implement an intrinsic-based version for rdtsc (AV_READ_TIME).	2012-07-07 13:35:07 -07:00
Loren Merritt	4d4752366f	x86inc: add SPLATB_LOAD, SPLATB_REG, PSHUFLW macros Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-07-05 17:37:11 +02:00
Loren Merritt	2cd1f5cadc	x86inc: modify ALIGN to not generate long nops on i586 Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-07-05 17:37:11 +02:00
Mans Rullgard	889c1ec4cc	x86: cpu: clean up check for cpuid instruction support This adds macros for accessing the EFLAGS register and uses these instead of coding the entire check in inline asm. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-07-01 12:25:33 +01:00
Mans Rullgard	963cdf39b4	x86: cpu: whitespace (mostly) cosmetics This adds whitespace around operators, aligns line continuation backslashes, and breaks long lines. Also fixes an ifdef halfway through a statement. The one line of duplication this saved is not worth the ugliness. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-25 16:24:31 +01:00
Ronald S. Bultje	8123e0901f	x86: place some inline asm under #if HAVE_INLINE_ASM Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-06-25 13:23:12 +01:00
Diego Biurrun	65345a5a30	x86: Add CPU flag for the i686 cmov instruction	2012-06-23 16:21:50 +02:00
Justin Ruggles	82b2df9790	float_dsp: add x86-optimized functions for vector_fmac_scalar()	2012-06-18 18:01:14 -04:00
Justin Ruggles	d5a7229ba4	Add a float DSP framework to libavutil Move vector_fmul() from DSPContext to AVFloatDSPContext.	2012-06-08 13:14:38 -04:00
Vitor Sessak	4a301706fd	x86: Avoid movs on BUTTERFLYPS when in AVX mode Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2012-05-29 15:29:46 +02:00
Justin Ruggles	5cc6d5244d	lavr: replace the SSE version of ff_conv_fltp_to_flt_6ch() with SSE4 and AVX The current SSE version is slower than the MMX version on Athlon64 and Sandy Bridge, but the SSE4 and AVX versions are faster on Sandy Bridge.	2012-05-09 16:17:59 -04:00
Justin Ruggles	c8af852b97	Add libavresample This is a new library for audio sample format, channel layout, and sample rate conversion.	2012-04-24 21:28:27 -04:00
Loren Merritt	705f3d4759	x86inc: support AVX abstraction for 2-operand instructions Add cvtdq2ps and cvtps2dq to the AVX instruction list. Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>	2012-04-18 21:14:32 -04:00
Diego Biurrun	baaab6069a	build: Move all arch OBJS declarations into arch subdirectory Makefiles.	2012-04-12 21:30:13 +02:00
Henrik Gramner	729f90e268	x86inc improvements for 64-bit Add support for all x86-64 registers Prefer caller-saved register over callee-saved on WIN64 Support up to 15 function arguments Also (by Ronald S. Bultje) Fix up our asm to work with new x86inc.asm. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>	2012-04-11 15:47:00 -04:00
Ronald S. Bultje	98b9da2ac7	x86inc: add *mp named argument support to DEFINE_ARGS.	2012-03-14 20:09:53 -07:00
Loren Merritt	0f53d0cf4b	x86inc: don't "bake" stack_offset in named arguments. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-03-03 20:39:59 -08:00
Haruhiko Yamagata	166f399377	x86inc: support yasm -f win64 flag also. This sets __OUTPUT_FORMAT__ to win64 instead of win32, even though both (through -m amd64) produce 64-bit binary code. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-02-08 10:31:14 -08:00
Henrik Gramner	9cf7385309	x86inc: allow manual use of WIN64_SPILL_XMM. Functions using INIT_MMX may still access XMM registers through direct means (xmm0-15). Therefore, they still need to be marked for clobber so they can be properly saved/restored. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-02-08 10:31:14 -08:00
Ronald S. Bultje	7e4d9d5d45	win64: add a XMM clobber test configure option. This will be useful to test more aggressively for failures to mark XMM registers as clobbered in Win64 builds, and prevent regressions thereof. Based on a patch by Ramiro Polla <ramiro.polla@gmail.com>	2012-02-02 12:00:48 -08:00
Ronald S. Bultje	412b248edb	x86inc.asm: fix typo. Assemblers don't understand ! in %if statements.	2012-01-27 16:33:03 +08:00
Ronald S. Bultje	3b15a6d742	config.asm: change %ifdef directives to %if directives. This allows combining multiple conditionals in a single statement.	2012-01-27 10:19:57 +08:00
Vitor Sessak	39df0c434c	mpegaudiodec: optimized iMDCT transform Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-01-08 17:40:55 -08:00
Mans Rullgard	5b0d35eaed	x86: bswap: remove test for bswap instruction Firstly, this test never worked as intended, always reporting success. Secondly, bswap is available from 486 onward and can thus be assumed present. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-12-12 12:14:15 +00:00
Mans Rullgard	f64c2e710f	bswap: make generic implementation more compiler-friendly With these changes, gcc 4.5 and later recognise it as a bswap and use the proper instructions on ARM and x86. On x86, the 16-bit bswap is recognised from gcc 4.1. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-12-12 12:14:14 +00:00
Sean McGovern	be0675ce67	x86 cpuid: set vendor union members separately Solaris Studio (suncc) has difficulty with filling in members of a union. Instead, let's retrieve and store the cpuid() results separately. This is still a compiler bug, however this fix does not cause a regression on other platforms. Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2011-12-08 00:57:11 +01:00
Vitor Sessak	6b6ee58249	x86inc: Flag shufps as an floating-point instruction for the AVX emulation code. Without this, code like "shufps m0, m1, m2, 0xaa" would not work in CPUs not supporting SSE2. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-11-27 13:10:33 -08:00
Justin Ruggles	f2bd8a0786	x86inc: use sse versions of common macros instead of sse2 when applicable	2011-11-06 19:14:13 -05:00
Loren Merritt	2f7f2e4b41	Update x86inc.asm to latest x264 version, and add AVX symmetry. We keep INIT_AVX (for backwards compatibility). 3arg AVX ops with a memory arg can only have it in src2, whereas SSE emulation of 3arg prefers to have it in src1 (i.e. the mov). So, if the op is symmetric and the wrong one is memory, swap them.	2011-11-05 20:48:14 -07:00
Justin Ruggles	4e8e262476	fmtconvert: port int32_to_float_fmul_scalar() x86 inline asm to yasm	2011-10-21 10:13:05 -04:00
Jason Garrett-Glaser	96a59cf37b	x86: XOP/FMA4 CPU detection support	2011-09-26 15:30:31 -07:00
Sean McGovern	5938e02185	cpu detection: avoid a signed overflow 1<<31 overflows because 1 is signed, so force it to unsigned. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-09-03 08:31:50 -07:00
Ronald S. Bultje	38e06c2969	Move clipd macros to x86util.asm. This allows sharing them between multiple .asm files.	2011-08-17 20:56:06 -07:00
Ronald S. Bultje	b2c087871d	Move x86util.asm from libavcodec/ to libavutil/. This allows using it in swscale also.	2011-08-12 11:43:03 -07:00
Ronald S. Bultje	3a39195b1d	Move x86inc.asm to libavutil/. This allows using it in libswscale/ also.	2011-08-12 11:43:02 -07:00
Jason Garrett-Glaser	15919ee48f	bswap: use native types for av_bwap16(). This prevents a call to bytestream_get_be16() using a movzwl both before and after the ror instruction, which is obviously inefficient. Arm uses the same trick also. Sintel decoding goes from (avg+SD) 9.856 +/- 0.003 to 9.797 +/- 0.003 sec. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-04-22 20:05:48 -04:00
Justin Ruggles	45ed822550	cosmetics: indentation	2011-03-22 09:11:07 -04:00
Justin Ruggles	eba586b0d9	Add a CPU flag for the Atom processor. The Atom has SSSE3 support, which is useful in many cases, but sometimes the SSSE3 version is slower than the SSE2 equivalent on the Atom, but is generally faster on other processors supporting SSSE3. This flag allows for selectively disabling certain SSSE3 functions on the Atom.	2011-03-22 09:11:07 -04:00
Mans Rullgard	2912e87a6c	Replace FFmpeg with Libav in licence headers Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-03-19 13:33:20 +00:00
Mans Rullgard	ef66953875	x86: use raw opcode for xgetbv instruction This allows the CPU detection to work with assemblers not supporting the xgetbv mnemonic. These include clang and some BSD versions. All AVX code will be written for yasm, where the main assembler is not involved. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-02-20 17:31:23 +00:00
Mans Rullgard	87f1355f9b	x86: check for AVX support This adds configure and runtime checks for AVX support on x86 CPUs. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-02-20 13:20:42 +00:00
Justin Ruggles	74b1f96859	Add check for Athlon64 and similar AMD processors with slow SSE2. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-02-11 16:58:18 -05:00
Janne Grunau	2c3589bfda	consolidate .gitignore patters into a single file Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>	2011-01-18 21:32:05 +01:00
Janne Grunau	348b8218f7	convert svn:ignore properties to .gitignore files Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>	2011-01-17 15:50:14 +01:00
Måns Rullgård	65d45cea34	Add missing #include <string.h> in x86/cpu.c Originally committed as revision 25088 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-09 19:40:59 +00:00
Måns Rullgård	9275438a19	Clean up av_get_cpu_flag() Instead of defining functions in per-arch header files included by the main cpu.c, define them normally and call them from the generic one. Originally committed as revision 25084 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-09 18:51:45 +00:00
Stefano Sabatini	c6c98d0897	Move mm_support() from libavcodec to libavutil, make it a public function and rename it to av_get_cpu_flags(). Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-08 15:07:14 +00:00
Måns Rullgård	8fc0162ac4	Add av_ prefix to bswap macros Originally committed as revision 24170 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-10 22:12:30 +00:00
Diego Biurrun	ba87f0801d	Remove explicit filename from Doxygen @file commands. Passing an explicit filename to this command is only necessary if the documentation in the @file block refers to a file different from the one the block resides in. Originally committed as revision 22921 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-04-20 14:45:34 +00:00
Måns Rullgård	2ed6f39944	Replace many includes of libavutil/common.h with what is actually needed This reduces the number of false dependencies on header files and speeds up compilation. Originally committed as revision 22407 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-03-09 17:39:19 +00:00
Måns Rullgård	9c9a0840d0	Add lots of missing includes Originally committed as revision 22337 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-03-08 18:43:52 +00:00
Måns Rullgård	75fb5c24ed	Move FASTDIV macro to intmath.h Originally committed as revision 21335 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-01-19 23:25:36 +00:00
Alexander Strange	f6d0390657	Add macros for 64- and 128-bit write-combining optimization to intreadwrite.h. Add x86 implementation using MMX/SSE. Originally committed as revision 21281 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-01-18 10:24:33 +00:00
Måns Rullgård	439ccc4e0e	Split libavutil/timer.h per architecture Originally committed as revision 18304 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-04-01 22:56:22 +00:00
Diego Biurrun	bad5537e2c	Use full internal pathname in doxygen @file directives. Otherwise doxygen complains about ambiguous filenames when files exist under the same name in different subdirectories. Originally committed as revision 16912 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-02-01 02:00:19 +00:00
Aurelien Jacobs	b250f9c66d	Change semantic of CONFIG_, HAVE_ and ARCH_*. They are now always defined to either 0 or 1. Originally committed as revision 16590 to svn://svn.ffmpeg.org/ffmpeg/trunk	2009-01-13 23:44:16 +00:00
Måns Rullgård	3a90480ac4	split bswap.h into per-arch files Originally committed as revision 15663 to svn://svn.ffmpeg.org/ffmpeg/trunk	2008-10-21 22:29:57 +00:00

1 2 3

128 Commits