Commit Graph

104 Commits

Author SHA1 Message Date
Diego Biurrun 490df522c7 x86: cpu: Drop unused HAVE_RWEFLAGS condition
The test for rweflags was dropped in a previous commit.
2012-11-28 00:28:09 +01:00
Justin Ruggles 947f933687 x86: float_dsp: add SSE version of vector_fmul_scalar() 2012-11-26 11:30:19 -05:00
Diego Biurrun 87af05c575 x86: SPLATD: port to cpuflags 2012-11-18 18:34:05 +01:00
Diego Biurrun 26301caaa1 x86: mmx2 ---> mmxext in asm constructs 2012-11-14 00:58:51 +01:00
Diego Biurrun 2b479bcab0 build: Drop AVX assembly ifdefs
An assembler able to cope with AVX instructions is now required.
2012-11-11 20:43:28 +01:00
Diego Biurrun f0d124f005 x86inc: Set program_name outside of x86inc.asm
This reduces the local difference to the x264 upstream version.
2012-11-11 11:06:19 +01:00
Diego Biurrun 4b60fac419 x86: PALIGNR: port to cpuflags 2012-11-09 21:31:31 +01:00
Diego Biurrun dbb37e7711 x86: PABSW: port to cpuflags 2012-11-05 14:51:10 +01:00
Diego Biurrun 0a7a94f2e5 x86: Refactor PSWAPD fallback implementations and port to cpuflags 2012-11-02 17:05:29 +01:00
Diego Biurrun 26f01bd106 x86: PMINUB: port to cpuflags 2012-11-02 15:38:15 +01:00
Diego Biurrun 61bc2bc7d4 x86util: Add cpuflags_mmxext alias for cpuflags_mmx2
"mmxext" is a more sensible name and more common in outside projects.
2012-11-02 15:22:34 +01:00
Diego Biurrun 012f73e271 x86inc: Only define program_name if the macro is unset
This allows overriding the value from outside of the file.
2012-11-02 14:38:00 +01:00
Dave Yeo 9c167914a1 x86: Fix assembly with NASM
Unlike YASM, NASM only looks for include files in the current
directory, not in the directory that included files reside in.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-10-31 10:20:35 +01:00
Diego Biurrun 588fafe7f3 x86: MMX2 ---> MMXEXT in macro names 2012-10-31 01:04:55 +01:00
Diego Biurrun 6860b4081d x86: include x86inc.asm in x86util.asm
This is necessary to allow refactoring some x86util macros with cpuflags.
2012-10-31 00:37:42 +01:00
Ronald S. Bultje 08b028c18d Remove INIT_AVX from x86inc.asm. 2012-10-29 14:51:14 -07:00
Diego Biurrun a7329e5fc2 x86: get_cpu_flags: add necessary ifdefs around function body
ff_get_cpu_flags_x86() requires cpuid(), which is conditionally defined
elsewhere in the file.  Surrounding the function body with ifdefs allows
building even when cpuid is not defined.  An empty cpuflags mask is
returned in this case.
2012-10-04 19:29:14 +02:00
Diego Biurrun f6fbce761e x86: Drop CPU detection intrinsics
Now that there is CPU detection in YASM, there will always be one of
inline or external assembly enabled, which obviates the need to fall
back on CPU detection through compiler intrinsics.
2012-10-04 19:29:14 +02:00
Diego Biurrun 1f6d86991f x86: Add YASM implementations of cpuid and xgetbv from x264
This allows detecting CPU features with builds that have neither
gcc inline assembly nor the right compiler intrinsics enabled.
2012-10-04 19:29:14 +02:00
Diego Biurrun 54b243141e x86: cpu: Break out test for cpuid capabilities into separate function 2012-10-04 18:09:21 +02:00
Diego Biurrun cc5e9e5ff0 x86: ff_get_cpu_flags_x86(): Avoid a pointless variable indirection 2012-10-04 17:58:42 +02:00
Diego Biurrun e0c6cce447 x86: Replace checks for CPU extensions and flags by convenience macros
This separates code relying on inline from that relying on external
assembly and fixes instances where the coalesced check was incorrect.
2012-09-08 18:18:34 +02:00
Justin Ruggles 7327525997 x86: float_dsp: fix ff_vector_fmac_scalar_avx() on Win64
The SWAP macro does not work for explicit xmm/ymm usage, so instead just move
the scalar value from xmm2 to xmm0.
2012-09-07 14:49:10 -04:00
Diego Biurrun f82c4fb27f x86: Add convenience macros to check for CPU extensions and flags 2012-09-04 01:44:59 +02:00
Diego Biurrun 17337f54c0 x86: Split inline and external assembly #ifdefs 2012-08-31 01:53:25 +02:00
Diego Biurrun a886b279a0 x86: cosmetics: Comment some #endifs for better readability 2012-08-30 18:50:33 +02:00
Loren Merritt 7a1944b907 vf_hqdn3d: x86 asm
13% faster on penryn, 16% on sandybridge, 15% on bulldozer
Not simd; a compiler should have generated this, but gcc didn't.
2012-08-26 10:49:14 +00:00
Justin Ruggles 6092dafb5a lavr: x86: optimized 6-channel s16 to fltp conversion 2012-08-23 20:10:57 -04:00
Mans Rullgard 5b170c0bea x86: remove FASTDIV inline asm
GCC 4.3 and later do the right thing with the plain C code.  Earlier
versions in 32-bit mode generate one extra instruction, needlessly
zeroing what would be the high half of the shifted value.  At least
two gcc configurations miscompile the inline asm in some situations.

In 64-bit mode, all gcc versions generate imul r64, r64 followed by
shr.  On Intel i7 and later, this imul is faster 32-bit mul.  On
older Intel and all AMD, it is slightly slower.  On Atom it is much
slower.

Considering where the FASTDIV macro is used, any overall negative
performance impact of this change should be negligible.  If anyone
cares, they should file a bug against gcc and get the instruction
selection fixed.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-22 14:29:10 +01:00
Martin Storsjö 33e112847d Add more missing includes after removing the implicit common.h
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-08-16 10:49:54 +03:00
Martin Storsjö 70766c2182 Add some more missing includes after removing the implicit common.h
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-08-15 23:48:48 +03:00
Mans Rullgard 070a402b60 x86: move MANGLE() and related macros to libavutil/x86/asm.h
These x86-specific macros do not belong in generic code.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-09 00:58:20 +01:00
Mans Rullgard c318626ce2 x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h
This puts x86-specific things in the x86/ subdirectory where they
belong.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-09 00:58:20 +01:00
Mans Rullgard edd8226795 x86: fix build with nasm 2.08
It appears that something goes wrong in old nasm versions when the
%+ operator is used in the last argument of a macro invocation and
this argument is tested with %ifdef within the macro.  This patch
rearranges the macro arguments such that the %+ operator is never
used in the last argument.
2012-08-07 15:24:34 +01:00
Mans Rullgard 180d43bc67 x86: use nop cpu directives only if supported
nasm does not support 'CPU foonop' directives.  This adds a configure
test for the directive and uses it only if supported.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 15:22:20 +01:00
Mans Rullgard 7238265052 x86: fix rNmp macros with nasm
For some reason, nasm requires this.  No harm done to yasm.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 15:21:58 +01:00
Mans Rullgard a3df4781f4 x86: add colons after labels
nasm prints a warning if the colon is missing.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 15:20:56 +01:00
Diego Biurrun 239fdf1b4a x86: build: replace mmx2 by mmxext
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
2012-08-03 22:51:05 +02:00
Diego Biurrun ca844b7be9 x86: Use consistent 3dnowext function and macro name suffixes
Currently there is a wild mix of 3dn2/3dnow2/3dnowext.  Switching to
"3dnowext", which is a more common name of the CPU flag, as reported
e.g. by the Linux kernel, unifies this.
2012-08-03 14:00:47 +02:00
Loren Merritt f8d8fe255d x86inc: clip num_args to 7 on x86-32.
This allows us to unconditionally set the cglobal num_args
parameter to a bigger value, thus making writing yasm code
even easier than before.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-07-28 08:29:45 -07:00
Ronald S. Bultje 96c9cc1094 x86inc: sync to latest version from x264. 2012-07-28 08:29:44 -07:00
Justin Ruggles 79687079a9 x86: add support for fmaddps fma4 instruction with abstraction to avx/sse 2012-07-27 11:25:48 -04:00
Ronald S. Bultje 30b45d9c38 x86inc: automatically insert vzeroupper for YMM functions. 2012-07-26 13:43:16 -07:00
Jason Garrett-Glaser 85a3c19ed1 dsputil: x86: add SHUFFLE_MASK_W macro
Simplifies pshufb masks that operate on words.
2012-07-22 16:56:58 -04:00
Ronald S. Bultje 358d854df8 x86/cpu: implement get/set_eflags using intrinsics
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-07-10 14:33:32 +03:00
Ronald S. Bultje c0ee695bd7 x86/cpu: implement support for cpuid through intrinsics
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-07-10 14:33:24 +03:00
Ronald S. Bultje 3f150ffba3 x86/cpu: implement support for xgetbv through intrinsics
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-07-10 14:33:17 +03:00
Ronald S. Bultje 07b287020c x86/timer: implement an intrinsic-based version for rdtsc (AV_READ_TIME). 2012-07-07 13:35:07 -07:00
Loren Merritt 4d4752366f x86inc: add SPLATB_LOAD, SPLATB_REG, PSHUFLW macros
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-07-05 17:37:11 +02:00
Loren Merritt 2cd1f5cadc x86inc: modify ALIGN to not generate long nops on i586
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-07-05 17:37:11 +02:00