Commit Graph

380 Commits

Author SHA1 Message Date
Ganesh Ajjanagadde 531b0a316b avutil/x86/asm: rename REG_SP to REG_sp
REG_SP is defined by Solaris system headers.
This fixes a sea of warnings while building on Solaris:
http://fate.ffmpeg.org/report.cgi?time=20150820233505&slot=x86-opensolaris-gcc4.3

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-08-22 02:56:53 +02:00
Anton Mitrofanov 8db0f71b49 x86inc: warn if XOP integer FMA instruction emulation is impossible
Signed-off-by: Henrik Gramner <henrik@gramner.com>
2015-08-05 16:15:40 +02:00
Henrik Gramner f0b7882ceb x86inc: Drop SECTION_TEXT macro
The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
2015-08-04 20:13:09 +02:00
Henrik Gramner 826790f596 x86inc: Support arbitrary stack alignments
Change ALLOC_STACK to always align the stack before allocating stack space for
consistency. Previously alignment would occur either before or after allocating
stack space depending on whether manual alignment was required or not.
2015-08-04 20:13:09 +02:00
James Almer 5750d6c5e9 x86: move XOP emulation code back to x86inc
Only two functions that use xop multiply-accumulate instructions where the
first operand is the same as the fourth actually took advantage of the macros.

This further reduces differences with x264's x86inc.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-08-03 17:11:13 -03:00
Henrik Gramner 127203ba5a x86inc: Various minor backports from x264
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-08-03 04:08:33 +02:00
Henrik Gramner f151fbd9e5 x86inc: Disable vpbroadcastq workaround in newer yasm versions
The bug was fixed in 1.3.0, so only perform the workaround in earlier versions.

Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-08-03 03:13:20 +02:00
James Almer 4d2c014a8f x86/float_dsp: add missing colon to labels
Silences warnings with Nasm

Signed-off-by: James Almer <jamrial@gmail.com>
2015-07-26 02:51:08 -03:00
James Almer bd48764532 avutil/x86/bswap: force inline asm versions with ICC
Recent ICC versions that define GCC as >= 4.5 (like ICC 13) apparently can't
optimize the generic C versions of av_bswap*() on their own.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-07-18 20:48:09 -03:00
Michael Niedermayer 2ecbf44f21 Merge commit 'd1a6cb195f610978ba5d2351e60f938f7f261d59'
* commit 'd1a6cb195f610978ba5d2351e60f938f7f261d59':
  x86: Serialize rdtsc in read_time()

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-07-09 12:28:09 +02:00
Henrik Gramner d1a6cb195f x86: Serialize rdtsc in read_time()
Improves the accuracy of measurements, especially in short sections.

To quote the Intel 64 and IA-32 Architectures Software Developer's Manual:
"The RDTSC instruction is not a serializing instruction. It does not necessarily
wait until all previous instructions have been executed before reading the counter.
Similarly, subsequent instructions may begin execution before the read operation
is performed. If software requires RDTSC to be executed only after all previous
instructions have completed locally, it can either use RDTSCP (if the processor
supports that instruction) or execute the sequence LFENCE;RDTSC."

SSE2 is a requirement for lfence so only use it on SSE2-capable systems.
Prefer lfence;rdtsc over rdtscp since rdtscp is supported on fewer systems.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-07-09 00:10:13 +02:00
James Almer 93e7b7fb34 avutil/x86/intmath: add missing check for inline assembly
Signed-off-by: James Almer <jamrial@gmail.com>
2015-06-27 14:33:53 -03:00
James Almer 1e51e517be avutil/x86/intmath: use bzhi gcc builtin in av_mod_uintp2()
Signed-off-by: James Almer <jamrial@gmail.com>
2015-06-27 12:56:55 -03:00
James Almer c16e99e3b3 x86: check for AV_CPU_FLAG_AVXSLOW where useful
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-06-01 00:15:35 +02:00
Michael Niedermayer 16c430e8ef Merge commit 'cae39851201b7781f1262e1c23627b45e6e80bb4'
* commit 'cae39851201b7781f1262e1c23627b45e6e80bb4':
  x86: Add helper macros to check for slow cpuflags

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-31 23:59:48 +02:00
James Almer cae3985120 x86: Add helper macros to check for slow cpuflags
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-05-31 12:07:11 +02:00
James Almer d68c05380c x86: check for AV_CPU_FLAG_AVXSLOW where useful
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-05-31 12:07:11 +02:00
James Almer f7cafb5d02 x86: add AV_CPU_FLAG_AVXSLOW flag
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-05-31 12:07:11 +02:00
Timothy Gu dd4d709be7 x86inc: Clear __SECT__
Silences warning(s) like:

    libavcodec/x86/fft.asm:93: warning: section flags ignored on
    section redeclaration

The cause of this warning is that because `struc` and `endstruc`
attempts to revert to the previous section state [1].

The section state is stored in the macro __SECT__, defined by
x86inc.asm to be `.note.GNU-stack ...`, through the `SECTION`
directive [2].

Thus, the `.note.GNU-stack` section is defined twice
(once in x86inc.asm, once during `endstruc`), causing the warning.

That is the first part of the commit: using the primitive `[section]` format
for .note.GNU-stack etc., which does not update `__SECT__` [2].

That fixes only half of the problem. Even without any `SECTION` directives,
`__SECT__` is predefined as `.text`, which conflicting with the later
`SECTION_TEXT` (which expands to `.text align=16`).

[1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4
[2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-05-28 11:40:15 +02:00
Timothy Gu 204b228a1d x86inc: Clear __SECT__
This commit silences warning(s) like:

    libavcodec/x86/fft.asm:93: warning: section flags ignored on section
    redeclaration

The cause of this warning is that because `struc` and `endstruc` attempts to
revert to the previous section state [1]. The section state is stored in the
macro __SECT__, defined by x86inc.asm to be `.note.GNU-stack ...`, through the
`SECTION` directive [2].  Thus, the `.note.GNU-stack` section is defined twice
(once in x86inc.asm, once during `endstruc`), causing the warning.

That is the first part of the commit: using the primitive `[section]` format
for .note.GNU-stack etc., which does not update `__SECT__` [2].

That fixes only half of the problem. Even without any `SECTION` directives,
`__SECT__` is predefined as `.text`, which conflicting with the later
`SECTION_TEXT` (which expands to `.text align=16`).

[1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4
[2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-28 00:08:37 +02:00
James Almer c312bfac4c x86/cpu: add AV_CPU_FLAG_AVXSLOW flag
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-05-27 03:31:11 -03:00
Michael Niedermayer d630f38f47 avutil/x86/Makefile: fix conditional x86/emms.o build
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-04-09 01:12:51 +02:00
Ronald S. Bultje b926f02e81 avutil/x86/Makefile: Make building and linking of emms.c conditional
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-04-08 17:25:35 +02:00
James Almer 60b9373dbd libavutil: add bmi2 optimized av_mod_uintp2
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-03-20 15:47:43 -03:00
Peter Cordes 9e5687adf2 pixelutils: Comment on (lack of) sad_8x8_sse2
Signed-off-by: Peter Cordes <peter@cordes.ca>
2015-03-04 21:58:53 +01:00
James Almer bc65abc8d7 libavutil: add x86 optimized av_popcount
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-25 19:58:00 -03:00
Christophe Gisquet d9293c776e x86inc: Correctly warn on use of SSE2 instructions in SSE functions
SSE2 instructions that are XMM-implementations of pre-existing MMX/MMX2
instructions did not issue warnings when used in SSE functions. Handle
it by also checking the register type when such instructions are used.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-17 12:35:58 +01:00
Christophe Gisquet e93d3a22cb x86: lavu/x264asm: fix ymm register instantiation
This mimicks what is done for the other instruction sets.

Tested-by: James Almer <jamrial@gmail.com>
Tested-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-04 00:18:29 +01:00
James Darnley 12120174ce lavu/x86/x86inc: deprecate INIT_AVX
The same can be done with INIT_XMM avx

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-02 01:09:16 +01:00
Anton Mitrofanov a1684311b3 x264asm: warn when inappropriate instruction used in function with specified cpuflags
Requested-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Requested-by: "Ronald S. Bultje" <rsbultje@gmail.com>
2015-02-02 00:06:14 +01:00
James Almer 37b35feb64 x86/swr: add SSE2/AVX pack_8ch functions
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-30 23:05:27 -03:00
Kieran Kunhya 9a738c27dc v210enc: Add SIMD optimised 8-bit and 10-bit encoders
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2014-12-05 13:03:49 +00:00
Kieran Kunhya 36091742d1 v210enc: Add SIMD optimised 8-bit and 10-bit encoders
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-26 20:30:47 +01:00
Michael Niedermayer 579a0fdc21 avutil/lls: Make unchanged function arguments const
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-28 19:32:07 +02:00
lvqcl e58fc44649 avutil/x86/cpu: fix cpuid sub-leaf selection
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-27 13:21:31 +02:00
Henrik Gramner f629705b02 x86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflags
Previously there was a limit of two cpuflags.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-09-09 02:00:25 -07:00
Loren Merritt ec217218c2 x86inc: Free up variable name "n" in global namespace
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-09-09 02:00:19 -07:00
Henrik Gramner 176a0fca3f x86inc: Make ym# behave the same way as xm#
This makes more sense for future implementations of templates with zmm registers.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-09-09 01:45:14 -07:00
Henrik Gramner 428aa14a48 x86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflags
Previously there was a limit of two cpuflags.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-05 14:06:03 +02:00
Henrik Gramner 720c21d11f x86inc: Make ym# behave the same way as xm#
This makes more sense for future implementations of templates with zmm registers.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-05 01:55:28 +02:00
Loren Merritt a4dbabc8b3 x86inc: free up variable name "n" in global namespace
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-05 01:41:50 +02:00
Clément Bœsch 554d819062 avutil/pixelutils: faster pixelutils_sad_16x16
501 to 439 decicycles.

See 45c7f3997e.
2014-08-23 20:12:56 +02:00
Clément Bœsch 45c7f3997e avutil/pixelutils: faster pixelutils_sad_[au]_16x16
~560 → ~500 decicycles

This is following the comments from Michael in
https://ffmpeg.org/pipermail/ffmpeg-devel/2014-August/160599.html

Using 2 registers for accumulator didn't help. On the other hand,
some re-ordering between the movs and psadbw allowed going ~538 to ~500.
2014-08-23 10:18:53 +02:00
Michael Niedermayer 70b8668fb5 drop LLS1, rename LLS2 to LLS
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-09 23:20:31 +02:00
Clément Bœsch 28a2107a8d avutil: add pixelutils API 2014-08-05 21:05:52 +02:00
James Almer d0f56ca071 x86/hevc_deblock: improve 8bit transpose store macros
Up to four instructions less depending on function and instruction set.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-03 04:24:15 +02:00
James Almer 1ace9573dc x86/hevc_idct: replace old and unused idct functions
Only 8-bit and 10-bit idct_dc() functions are included (adding others should be trivial).

Benchmarks on an Intel Core i5-4200U:

idct8x8_dc
       SSE2   MMXEXT  C
cycles 22     26      57

idct16x16_dc
       AVX2   SSE2    C
cycles 27     32      249

idct32x32_dc
       AVX2   SSE2    C
cycles 62     126     1375

Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-26 18:00:11 +02:00
Michael Niedermayer 8d0c7031a8 Merge commit '79793f833784121d574454af4871866576c0749d'
* commit '79793f833784121d574454af4871866576c0749d':
  Update Fiona's name in copyright statements.

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-01 15:43:40 +02:00
Diego Biurrun 79793f8337 Update Fiona's name in copyright statements. 2014-07-01 03:26:51 -07:00
Christophe Gisquet 9107612818 x86util: add and use RSHIFT/LSHIFT macros
Those macros take a byte number as shift argument, as this argument
differs between MMX and SSE2 instructions.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-15 13:19:27 +02:00