Commit Graph

500 Commits

Author SHA1 Message Date
James Almer 35347e7e9b Merge commit '4cf84e254ae75b524e1cacae499a97d7cc9e5906'
* commit '4cf84e254ae75b524e1cacae499a97d7cc9e5906':
  Drop some unnecessary config.h #includes

Merged-by: James Almer <jamrial@gmail.com>
2018-02-11 23:08:48 -03:00
Diego Biurrun 4cf84e254a Drop some unnecessary config.h #includes 2018-02-06 10:03:15 +01:00
Henrik Gramner 6f62b0bd4f x86inc: Drop cpuflags_slowctz 2018-01-20 19:23:37 +01:00
Henrik Gramner eb5f063e7c x86inc: Correctly set mmreg variables 2018-01-20 19:23:37 +01:00
Henrik Gramner 6b6edd1216 x86inc: Support creating global symbols from local labels
On ELF platforms such symbols needs to be flagged as functions with the
correct visibility to please certain linkers in some scenarios.
2018-01-20 19:23:37 +01:00
Henrik Gramner 9e4b3675f2 x86inc: Use .rdata instead of .rodata on Windows
The standard section for read-only data on Windows is .rdata. Nasm will
flag non-standard sections as executable by default which isn't ideal.
2018-01-20 19:23:37 +01:00
Henrik Gramner 3a02cbe3fa x86inc: Enable AVX emulation for floating-point pseudo-instructions
There are 32 pseudo-instructions for each floating-point comparison
instruction, but only 8 of them are actually valid in legacy-encoded mode.
The remaining 24 requires the use of VEX-encoded (v-prefixed) instructions
and can therefore be disregarded for this purpose.
2018-01-20 19:23:37 +01:00
James Almer 90d216cb90 x86inc: set the correct amount of simd regs in x86_64 when avx512 is enabled but not used
Fixes compilation of libavresample/x86/audio_mix.asm

Reviewed-by: Gramner
Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-24 23:02:54 -03:00
Henrik Gramner f7197f68dc x86inc: AVX-512 support
AVX-512 consists of a plethora of different extensions, but in order to keep
things a bit more manageable we group together the following extensions
under a single baseline cpu flag which should cover SKL-X and future CPUs:
 * AVX-512 Foundation (F)
 * AVX-512 Conflict Detection Instructions (CD)
 * AVX-512 Byte and Word Instructions (BW)
 * AVX-512 Doubleword and Quadword Instructions (DQ)
 * AVX-512 Vector Length Extensions (VL)

On x86-64 AVX-512 provides 16 additional vector registers, prefer using
those over existing ones since it allows us to avoid using `vzeroupper`
unless more than 16 vector registers are required. They also happen to
be volatile on Windows which means that we don't need to save and restore
existing xmm register contents unless more than 22 vector registers are
required.

Big thanks to Intel for their support.
2017-12-24 22:02:41 +01:00
James Darnley e2218ed8ce avutil: add alignment needed for AVX-512 2017-12-24 22:02:41 +01:00
James Darnley 4783a01c11 avutil: detect when AVX-512 is available 2017-12-24 22:02:41 +01:00
James Darnley 8b81eabe57 avutil: add AVX-512 flags 2017-12-24 22:02:41 +01:00
Martin Vignali b37196adff avutil/x86util : add macro for loading a 128 bits constants in an xmm or in each part of an ymm in order to simplify avx2 asm func 2017-12-02 18:25:15 +01:00
Dale Curtis 50e30d9bb7 Don't use _tzcnt instrinics with clang for windows w/o BMI.
Technically _tzcnt* intrinsics are only available when the BMI
instruction set is present. However the instruction encoding
degrades to "rep bsf" on older processors.

Clang for Windows debatably restricts the _tzcnt* instrinics behind
the __BMI__ architecture define, so check for its presence or
exclude the usage of these intrinics when clang is present.

See also:
https://ffmpeg.org/pipermail/ffmpeg-devel/2015-November/183404.html
https://bugs.llvm.org/show_bug.cgi?id=30506
http://lists.llvm.org/pipermail/cfe-dev/2016-October/051034.html

Signed-off-by: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: Matt Oliver <protogonoi@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-10-25 21:50:37 +02:00
James Almer 2904db9045 Merge commit '994c4bc10751e39c7ed9f67ffd0c0dea5223daf2'
* commit '994c4bc10751e39c7ed9f67ffd0c0dea5223daf2':
  x86util: Port all macros to cpuflags

See d5f8a642f6

Merged-by: James Almer <jamrial@gmail.com>
2017-10-21 12:15:57 -03:00
James Almer 3d828c9fd5 cpu: split flag checks per arch in av_cpu_max_align()
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-10-09 11:48:24 +02:00
James Almer 3b345d389b avutil/cpu: split flag checks per arch in av_cpu_max_align()
Signed-off-by: James Almer <jamrial@gmail.com>
2017-09-27 23:10:09 -03:00
James Almer 0c005fa86f Merge commit '7abdd026df6a9a52d07d8174505b33cc89db7bf6'
* commit '7abdd026df6a9a52d07d8174505b33cc89db7bf6':
  asm: Consistently uppercase SECTION markers

Merged-by: James Almer <jamrial@gmail.com>
2017-09-26 18:48:06 -03:00
Ivan Kalvachev 30ae07d7ef Add macros to x86util.asm .
Improved version of VBROADCASTSS that works like the avx2 instruction.
Emulation of vpbroadcastd.
Horizontal sum HSUMPS that places the result in all elements.
Emulation of blendvps and pblendvb.

Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
2017-08-18 17:18:32 +01:00
James Almer 4d62ee6746 x86inc: don't use read-only data sections on COFF targets
Yasm:
src/libavfilter/x86/af_volume.asm:24: warning: Standard COFF does not support read-only data sections
src/libavfilter/x86/af_volume.asm:24: warning: Unrecognized qualifier `align'

Nasm:
src/libavfilter/x86/af_volume.asm:24: error: standard COFF does not support section alignment specification
src/libavutil/x86/x86inc.asm:92: ... from macro `SECTION_RODATA' defined here

Tested-by: Clément Bœsch <u@pkh.me>
Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-27 12:48:04 -03:00
Diego Biurrun fd502f4f5f build: Generalize yasm/nasm-related variable names
None of them are specific to the YASM assembler.

(Cherry-picked from libav commit 39e208f4d4)

Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-21 17:00:29 -03:00
James Almer e229df9478 x86/aacpsdsp: add ff_ps_hybrid_synthesis_deint_{sse,sse4}
About 2x faster than the c version.
2017-06-18 22:33:27 -03:00
Henrik Gramner aad1b6786e x86inc: Add some additional cpuflag relations
Simplifies writing assembly code that depends on available instructions.

LZCNT implies SSE2
BMI1 implies AVX+LZCNT
AVX2 implies BMI2
2017-06-12 11:41:25 +02:00
Anton Mitrofanov d991b3e8a8 x86inc: Remove argument from WIN64_RESTORE_XMM
The use of rsp was pretty much hardcoded there and probably didn't work
otherwise with stack_size > 0.
2017-06-09 13:43:01 +02:00
Henrik Gramner cd4ca82459 x86inc: Prefer r14/r15 over r12/r13 on x86-64
Due to a peculiarity in the ModR/M addressing encoding, the r12 and r13
registers sometimes requires an additional byte when used as a base register.

r14 and r15 doesn't have that issue, so prefer using them.
2017-06-09 13:43:00 +02:00
Henrik Gramner 88dcdfad09 x86inc: Make REP_RET identical to RET in SSSE3+ functions
There's no point in emitting a rep prefix before ret on modern CPUs.
2017-06-09 13:43:00 +02:00
Henrik Gramner 406e0ddc0b x86inc: Fix call with memory operands
We overload the `call` instruction with a macro, but it would misbehave when
the macro argument wasn't a valid identifier. Fix it by explicitly checking
if the argument is an identifier.
2017-06-09 13:43:00 +02:00
James Almer 0fbc7a2169 x86/float_dsp: remove usage of integer instructions 2017-05-12 23:34:49 -03:00
James Almer f1d80bc630 x86/float_dsp: add ff_vector_fmul_reverse_avx2
~20% faster than AVX.

Signed-off-by: James Almer <jamrial@gmail.com>
2017-04-11 21:35:35 -03:00
James Almer ed9b25a148 x86/float_dsp: add ff_vector_dmac_scalar_{sse2,avx,fma3} 2017-04-10 12:18:55 -03:00
Clément Bœsch f291a9a1ad Merge commit '99434f4df81b6801b2b535d5b9143305595784f6'
* commit '99434f4df81b6801b2b535d5b9143305595784f6':
  float_dsp: Have implementation match function pointer prototype

Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-03-30 10:23:25 +02:00
James Almer c97e986e90 Merge commit '7911186ed616ae81dd8617d6d0e8b08c818db9d8'
* commit '7911186ed616ae81dd8617d6d0e8b08c818db9d8':
  emms: Give apriv_emms_yasm() a more general name

Merged-by: James Almer <jamrial@gmail.com>
2017-03-23 18:28:56 -03:00
James Almer 29db87af52 Merge commit '6be7944ee2ec2f045e6eb9a93237e992c8b20ac4'
* commit '6be7944ee2ec2f045e6eb9a93237e992c8b20ac4':
  x86: Add missing colons after assembly labels

Merged-by: James Almer <jamrial@gmail.com>
2017-03-23 18:05:27 -03:00
James Almer d8962ffbd8 avutil/x86util: don't use movss in VBROADCASTSS macro when src and dst args are the same
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2017-03-21 19:15:00 -03:00
Clément Bœsch 3898e346b3 Merge commit '07e1f99a1bb41d1a615676140eefc85cf69fa793'
* commit '07e1f99a1bb41d1a615676140eefc85cf69fa793':
  x86util: Document SBUTTERFLY macro

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 18:38:07 +01:00
Clément Bœsch 8200b16a9c Merge commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5'
* commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5':
  imgutils: add a function for copying image data from GPU mapped memory

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:34:10 +01:00
Diego Biurrun 994c4bc107 x86util: Port all macros to cpuflags
Also do some small cosmetic changes: Drop pointless _MMX suffix from ABSD2
macro name, drop pointless check for MMX support, we always assume MMX is
available in our SIMD code, fix spelling.
2017-03-14 17:23:32 +01:00
Diego Biurrun 39e208f4d4 build: Generalize yasm/nasm-related variable names
None of them are specific to the YASM assembler.
2017-03-01 10:18:15 +01:00
James Darnley 5336887867 avcodec/h264: sse2, avx h luma mbaff deblock/loop filter
x86-64 only

Yorkfield:
- sse2: ~2.17x (434 vs. 200 cycles)

Nehalem:
- sse2: ~2.94x (409 vs. 139 cycles)

Skylake:
- sse2: ~3.10x (370 vs. 119 cycles)
- avx:  ~3.29x (370 vs. 112 cycles)
2017-02-18 20:26:52 +01:00
James Darnley 7627df15d4 x86util: import MOVHL macro
Originally committed to x264 in 1637239a by Henrik Gramner who has
agreed to re-license it as LGPL.  Original commit message follows.

    x86: Avoid some bypass delays and false dependencies

    A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning
    between int and float domains, so try to avoid that if possible.
2017-02-18 20:26:51 +01:00
James Darnley 9d815b7424 avcodec/x86: deduplicate PASS8ROWS macro 2017-02-18 20:26:49 +01:00
Diego Biurrun 7abdd026df asm: Consistently uppercase SECTION markers 2017-02-03 11:37:53 +01:00
James Almer 8d5df204d0 Merge commit '8e9cd81d291b1010c625b2766058aadf4affb537'
* commit '8e9cd81d291b1010c625b2766058aadf4affb537':
  x86: cpu: Detect Conroe CPUs and their slow shuffle unit

Merged-by: James Almer <jamrial@gmail.com>
2017-01-31 15:20:54 -03:00
James Almer 2eab48177d Merge commit '7d7355aa92bb36ca0765c49a569a999bcb96f332'
* commit '7d7355aa92bb36ca0765c49a569a999bcb96f332':
  x86: Add SSSE3_SLOW CPU flag and related convenience macros

Merged-by: James Almer <jamrial@gmail.com>
2017-01-31 15:17:19 -03:00
Henrik Gramner cd09e3b349 x86inc: Avoid using eax/rax for storing the stack pointer
When allocating stack space with an alignment requirement that is larger
than the current stack alignment we need to store a copy of the original
stack pointer in order to be able to restore it later.

If we chose to use another register for this purpose we should not pick
eax/rax since it can be overwritten as a return value.
2017-01-09 16:00:29 +01:00
Henrik Gramner 3cba1ad76d x86inc: Avoid using eax/rax for storing the stack pointer
When allocating stack space with an alignment requirement that is larger
than the current stack alignment we need to store a copy of the original
stack pointer in order to be able to restore it later.

If we chose to use another register for this purpose we should not pick
eax/rax since it can be overwritten as a return value.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2017-01-09 13:21:12 +01:00
Diego Biurrun 99434f4df8 float_dsp: Have implementation match function pointer prototype
libavutil/x86/float_dsp_init.c(144) : warning C4028: formal parameter 1 different from declaration
libavutil/x86/float_dsp_init.c(144) : warning C4028: formal parameter 2 different from declaration
2016-11-03 17:43:55 +01:00
Michael Niedermayer 051517648b avutil/x86/emms: Document the emms_c() vs alloc/free relation.
Reviewed-by: Andreas Cadhalpun <andreas.cadhalpun@googlemail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-10-23 13:02:37 +02:00
Diego Biurrun 7911186ed6 emms: Give apriv_emms_yasm() a more general name 2016-10-18 13:09:09 +02:00
Diego Biurrun 6be7944ee2 x86: Add missing colons after assembly labels
This fixes many warnings of the sort
warning: label alone on a line without a colon might be in error
2016-10-17 16:31:26 +02:00