Commit Graph

133 Commits

Author SHA1 Message Date
Diego Biurrun e0c6cce447 x86: Replace checks for CPU extensions and flags by convenience macros
This separates code relying on inline from that relying on external
assembly and fixes instances where the coalesced check was incorrect.
2012-09-08 18:18:34 +02:00
Justin Ruggles 7327525997 x86: float_dsp: fix ff_vector_fmac_scalar_avx() on Win64
The SWAP macro does not work for explicit xmm/ymm usage, so instead just move
the scalar value from xmm2 to xmm0.
2012-09-07 14:49:10 -04:00
Diego Biurrun f82c4fb27f x86: Add convenience macros to check for CPU extensions and flags 2012-09-04 01:44:59 +02:00
Diego Biurrun 17337f54c0 x86: Split inline and external assembly #ifdefs 2012-08-31 01:53:25 +02:00
Diego Biurrun a886b279a0 x86: cosmetics: Comment some #endifs for better readability 2012-08-30 18:50:33 +02:00
Loren Merritt 7a1944b907 vf_hqdn3d: x86 asm
13% faster on penryn, 16% on sandybridge, 15% on bulldozer
Not simd; a compiler should have generated this, but gcc didn't.
2012-08-26 10:49:14 +00:00
Justin Ruggles 6092dafb5a lavr: x86: optimized 6-channel s16 to fltp conversion 2012-08-23 20:10:57 -04:00
Mans Rullgard 5b170c0bea x86: remove FASTDIV inline asm
GCC 4.3 and later do the right thing with the plain C code.  Earlier
versions in 32-bit mode generate one extra instruction, needlessly
zeroing what would be the high half of the shifted value.  At least
two gcc configurations miscompile the inline asm in some situations.

In 64-bit mode, all gcc versions generate imul r64, r64 followed by
shr.  On Intel i7 and later, this imul is faster 32-bit mul.  On
older Intel and all AMD, it is slightly slower.  On Atom it is much
slower.

Considering where the FASTDIV macro is used, any overall negative
performance impact of this change should be negligible.  If anyone
cares, they should file a bug against gcc and get the instruction
selection fixed.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-22 14:29:10 +01:00
Martin Storsjö 33e112847d Add more missing includes after removing the implicit common.h
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-08-16 10:49:54 +03:00
Martin Storsjö 70766c2182 Add some more missing includes after removing the implicit common.h
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-08-15 23:48:48 +03:00
Mans Rullgard 070a402b60 x86: move MANGLE() and related macros to libavutil/x86/asm.h
These x86-specific macros do not belong in generic code.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-09 00:58:20 +01:00
Mans Rullgard c318626ce2 x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h
This puts x86-specific things in the x86/ subdirectory where they
belong.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-09 00:58:20 +01:00
Mans Rullgard edd8226795 x86: fix build with nasm 2.08
It appears that something goes wrong in old nasm versions when the
%+ operator is used in the last argument of a macro invocation and
this argument is tested with %ifdef within the macro.  This patch
rearranges the macro arguments such that the %+ operator is never
used in the last argument.
2012-08-07 15:24:34 +01:00
Mans Rullgard 180d43bc67 x86: use nop cpu directives only if supported
nasm does not support 'CPU foonop' directives.  This adds a configure
test for the directive and uses it only if supported.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 15:22:20 +01:00
Mans Rullgard 7238265052 x86: fix rNmp macros with nasm
For some reason, nasm requires this.  No harm done to yasm.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 15:21:58 +01:00
Mans Rullgard a3df4781f4 x86: add colons after labels
nasm prints a warning if the colon is missing.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 15:20:56 +01:00
Diego Biurrun 239fdf1b4a x86: build: replace mmx2 by mmxext
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
2012-08-03 22:51:05 +02:00
Diego Biurrun ca844b7be9 x86: Use consistent 3dnowext function and macro name suffixes
Currently there is a wild mix of 3dn2/3dnow2/3dnowext.  Switching to
"3dnowext", which is a more common name of the CPU flag, as reported
e.g. by the Linux kernel, unifies this.
2012-08-03 14:00:47 +02:00
Loren Merritt f8d8fe255d x86inc: clip num_args to 7 on x86-32.
This allows us to unconditionally set the cglobal num_args
parameter to a bigger value, thus making writing yasm code
even easier than before.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-07-28 08:29:45 -07:00
Ronald S. Bultje 96c9cc1094 x86inc: sync to latest version from x264. 2012-07-28 08:29:44 -07:00
Justin Ruggles 79687079a9 x86: add support for fmaddps fma4 instruction with abstraction to avx/sse 2012-07-27 11:25:48 -04:00
Ronald S. Bultje 30b45d9c38 x86inc: automatically insert vzeroupper for YMM functions. 2012-07-26 13:43:16 -07:00
Jason Garrett-Glaser 85a3c19ed1 dsputil: x86: add SHUFFLE_MASK_W macro
Simplifies pshufb masks that operate on words.
2012-07-22 16:56:58 -04:00
Ronald S. Bultje 358d854df8 x86/cpu: implement get/set_eflags using intrinsics
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-07-10 14:33:32 +03:00
Ronald S. Bultje c0ee695bd7 x86/cpu: implement support for cpuid through intrinsics
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-07-10 14:33:24 +03:00
Ronald S. Bultje 3f150ffba3 x86/cpu: implement support for xgetbv through intrinsics
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-07-10 14:33:17 +03:00
Ronald S. Bultje 07b287020c x86/timer: implement an intrinsic-based version for rdtsc (AV_READ_TIME). 2012-07-07 13:35:07 -07:00
Loren Merritt 4d4752366f x86inc: add SPLATB_LOAD, SPLATB_REG, PSHUFLW macros
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-07-05 17:37:11 +02:00
Loren Merritt 2cd1f5cadc x86inc: modify ALIGN to not generate long nops on i586
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-07-05 17:37:11 +02:00
Mans Rullgard 889c1ec4cc x86: cpu: clean up check for cpuid instruction support
This adds macros for accessing the EFLAGS register and uses
these instead of coding the entire check in inline asm.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-07-01 12:25:33 +01:00
Mans Rullgard 963cdf39b4 x86: cpu: whitespace (mostly) cosmetics
This adds whitespace around operators, aligns line continuation
backslashes, and breaks long lines.  Also fixes an ifdef halfway
through a statement.  The one line of duplication this saved is
not worth the ugliness.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-06-25 16:24:31 +01:00
Ronald S. Bultje 8123e0901f x86: place some inline asm under #if HAVE_INLINE_ASM
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-06-25 13:23:12 +01:00
Diego Biurrun 65345a5a30 x86: Add CPU flag for the i686 cmov instruction 2012-06-23 16:21:50 +02:00
Justin Ruggles 82b2df9790 float_dsp: add x86-optimized functions for vector_fmac_scalar() 2012-06-18 18:01:14 -04:00
Justin Ruggles d5a7229ba4 Add a float DSP framework to libavutil
Move vector_fmul() from DSPContext to AVFloatDSPContext.
2012-06-08 13:14:38 -04:00
Vitor Sessak 4a301706fd x86: Avoid movs on BUTTERFLYPS when in AVX mode
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2012-05-29 15:29:46 +02:00
Justin Ruggles 5cc6d5244d lavr: replace the SSE version of ff_conv_fltp_to_flt_6ch() with SSE4 and AVX
The current SSE version is slower than the MMX version on Athlon64 and Sandy
Bridge, but the SSE4 and AVX versions are faster on Sandy Bridge.
2012-05-09 16:17:59 -04:00
Justin Ruggles c8af852b97 Add libavresample
This is a new library for audio sample format, channel layout, and sample rate
conversion.
2012-04-24 21:28:27 -04:00
Loren Merritt 705f3d4759 x86inc: support AVX abstraction for 2-operand instructions
Add cvtdq2ps and cvtps2dq to the AVX instruction list.

Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
2012-04-18 21:14:32 -04:00
Diego Biurrun baaab6069a build: Move all arch OBJS declarations into arch subdirectory Makefiles. 2012-04-12 21:30:13 +02:00
Henrik Gramner 729f90e268 x86inc improvements for 64-bit
Add support for all x86-64 registers
Prefer caller-saved register over callee-saved on WIN64
Support up to 15 function arguments

Also (by Ronald S. Bultje)
Fix up our asm to work with new x86inc.asm.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
2012-04-11 15:47:00 -04:00
Ronald S. Bultje 98b9da2ac7 x86inc: add *mp named argument support to DEFINE_ARGS. 2012-03-14 20:09:53 -07:00
Loren Merritt 0f53d0cf4b x86inc: don't "bake" stack_offset in named arguments.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-03-03 20:39:59 -08:00
Haruhiko Yamagata 166f399377 x86inc: support yasm -f win64 flag also.
This sets __OUTPUT_FORMAT__ to win64 instead of win32, even though both
(through -m amd64) produce 64-bit binary code.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-02-08 10:31:14 -08:00
Henrik Gramner 9cf7385309 x86inc: allow manual use of WIN64_SPILL_XMM.
Functions using INIT_MMX may still access XMM registers through direct
means (xmm0-15). Therefore, they still need to be marked for clobber
so they can be properly saved/restored.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-02-08 10:31:14 -08:00
Ronald S. Bultje 7e4d9d5d45 win64: add a XMM clobber test configure option.
This will be useful to test more aggressively for failures to mark XMM
registers as clobbered in Win64 builds, and prevent regressions thereof.

Based on a patch by Ramiro Polla <ramiro.polla@gmail.com>
2012-02-02 12:00:48 -08:00
Ronald S. Bultje 412b248edb x86inc.asm: fix typo.
Assemblers don't understand ! in %if statements.
2012-01-27 16:33:03 +08:00
Ronald S. Bultje 3b15a6d742 config.asm: change %ifdef directives to %if directives.
This allows combining multiple conditionals in a single statement.
2012-01-27 10:19:57 +08:00
Vitor Sessak 39df0c434c mpegaudiodec: optimized iMDCT transform
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-01-08 17:40:55 -08:00
Mans Rullgard 5b0d35eaed x86: bswap: remove test for bswap instruction
Firstly, this test never worked as intended, always reporting
success.  Secondly, bswap is available from 486 onward and can
thus be assumed present.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-12-12 12:14:15 +00:00
Mans Rullgard f64c2e710f bswap: make generic implementation more compiler-friendly
With these changes, gcc 4.5 and later recognise it as a bswap
and use the proper instructions on ARM and x86.  On x86, the
16-bit bswap is recognised from gcc 4.1.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-12-12 12:14:14 +00:00
Sean McGovern be0675ce67 x86 cpuid: set vendor union members separately
Solaris Studio (suncc) has difficulty with filling in
members of a union. Instead, let's retrieve and store the
cpuid() results separately. This is still a compiler bug,
however this fix does not cause a regression on other platforms.

Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2011-12-08 00:57:11 +01:00
Vitor Sessak 6b6ee58249 x86inc: Flag shufps as an floating-point instruction for the AVX emulation code.
Without this, code like "shufps m0, m1, m2, 0xaa" would not work in CPUs
not supporting SSE2.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-11-27 13:10:33 -08:00
Justin Ruggles f2bd8a0786 x86inc: use sse versions of common macros instead of sse2 when applicable 2011-11-06 19:14:13 -05:00
Loren Merritt 2f7f2e4b41 Update x86inc.asm to latest x264 version, and add AVX symmetry.
We keep INIT_AVX (for backwards compatibility). 3arg AVX ops with
a memory arg can only have it in src2, whereas SSE emulation of
3arg prefers to have it in src1 (i.e. the mov). So, if the op is
symmetric and the wrong one is memory, swap them.
2011-11-05 20:48:14 -07:00
Justin Ruggles 4e8e262476 fmtconvert: port int32_to_float_fmul_scalar() x86 inline asm to yasm 2011-10-21 10:13:05 -04:00
Jason Garrett-Glaser 96a59cf37b x86: XOP/FMA4 CPU detection support 2011-09-26 15:30:31 -07:00
Sean McGovern 5938e02185 cpu detection: avoid a signed overflow
1<<31 overflows because 1 is signed, so force it to unsigned.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-09-03 08:31:50 -07:00
Ronald S. Bultje 38e06c2969 Move clipd macros to x86util.asm.
This allows sharing them between multiple .asm files.
2011-08-17 20:56:06 -07:00
Ronald S. Bultje b2c087871d Move x86util.asm from libavcodec/ to libavutil/.
This allows using it in swscale also.
2011-08-12 11:43:03 -07:00
Ronald S. Bultje 3a39195b1d Move x86inc.asm to libavutil/.
This allows using it in libswscale/ also.
2011-08-12 11:43:02 -07:00
Jason Garrett-Glaser 15919ee48f bswap: use native types for av_bwap16().
This prevents a call to bytestream_get_be16() using a movzwl both before
and after the ror instruction, which is obviously inefficient. Arm uses
the same trick also.

Sintel decoding goes from (avg+SD) 9.856 +/- 0.003 to 9.797 +/- 0.003 sec.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-04-22 20:05:48 -04:00
Justin Ruggles 45ed822550 cosmetics: indentation 2011-03-22 09:11:07 -04:00
Justin Ruggles eba586b0d9 Add a CPU flag for the Atom processor.
The Atom has SSSE3 support, which is useful in many cases, but sometimes the
SSSE3 version is slower than the SSE2 equivalent on the Atom, but is generally
faster on other processors supporting SSSE3. This flag allows for selectively
disabling certain SSSE3 functions on the Atom.
2011-03-22 09:11:07 -04:00
Mans Rullgard 2912e87a6c Replace FFmpeg with Libav in licence headers
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-19 13:33:20 +00:00
Mans Rullgard ef66953875 x86: use raw opcode for xgetbv instruction
This allows the CPU detection to work with assemblers not supporting
the xgetbv mnemonic.  These include clang and some BSD versions.

All AVX code will be written for yasm, where the main assembler
is not involved.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-02-20 17:31:23 +00:00
Mans Rullgard 87f1355f9b x86: check for AVX support
This adds configure and runtime checks for AVX support on x86 CPUs.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-02-20 13:20:42 +00:00
Justin Ruggles 74b1f96859 Add check for Athlon64 and similar AMD processors with slow SSE2.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-02-11 16:58:18 -05:00
Janne Grunau 2c3589bfda consolidate .gitignore patters into a single file
Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>
2011-01-18 21:32:05 +01:00
Janne Grunau 348b8218f7 convert svn:ignore properties to .gitignore files
Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>
2011-01-17 15:50:14 +01:00
Måns Rullgård 65d45cea34 Add missing #include <string.h> in x86/cpu.c
Originally committed as revision 25088 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-09 19:40:59 +00:00
Måns Rullgård 9275438a19 Clean up av_get_cpu_flag()
Instead of defining functions in per-arch header files included
by the main cpu.c, define them normally and call them from the
generic one.

Originally committed as revision 25084 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-09 18:51:45 +00:00
Stefano Sabatini c6c98d0897 Move mm_support() from libavcodec to libavutil, make it a public
function and rename it to av_get_cpu_flags().

Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-08 15:07:14 +00:00
Måns Rullgård 8fc0162ac4 Add av_ prefix to bswap macros
Originally committed as revision 24170 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-10 22:12:30 +00:00
Diego Biurrun ba87f0801d Remove explicit filename from Doxygen @file commands.
Passing an explicit filename to this command is only necessary if the
documentation in the @file block refers to a file different from the
one the block resides in.

Originally committed as revision 22921 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-04-20 14:45:34 +00:00
Måns Rullgård 2ed6f39944 Replace many includes of libavutil/common.h with what is actually needed
This reduces the number of false dependencies on header files and
speeds up compilation.

Originally committed as revision 22407 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-03-09 17:39:19 +00:00
Måns Rullgård 9c9a0840d0 Add lots of missing includes
Originally committed as revision 22337 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-03-08 18:43:52 +00:00
Måns Rullgård 75fb5c24ed Move FASTDIV macro to intmath.h
Originally committed as revision 21335 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-19 23:25:36 +00:00
Alexander Strange f6d0390657 Add macros for 64- and 128-bit write-combining optimization to intreadwrite.h.
Add x86 implementation using MMX/SSE.

Originally committed as revision 21281 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-18 10:24:33 +00:00
Måns Rullgård 439ccc4e0e Split libavutil/timer.h per architecture
Originally committed as revision 18304 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-04-01 22:56:22 +00:00
Diego Biurrun bad5537e2c Use full internal pathname in doxygen @file directives.
Otherwise doxygen complains about ambiguous filenames when files exist
under the same name in different subdirectories.

Originally committed as revision 16912 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-02-01 02:00:19 +00:00
Aurelien Jacobs b250f9c66d Change semantic of CONFIG_*, HAVE_* and ARCH_*.
They are now always defined to either 0 or 1.

Originally committed as revision 16590 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-01-13 23:44:16 +00:00
Måns Rullgård 3a90480ac4 split bswap.h into per-arch files
Originally committed as revision 15663 to svn://svn.ffmpeg.org/ffmpeg/trunk
2008-10-21 22:29:57 +00:00