Commit Graph

1440 Commits

Author SHA1 Message Date
Justin Ruggles
6092dafb5a lavr: x86: optimized 6-channel s16 to fltp conversion 2012-08-23 20:10:57 -04:00
Mans Rullgard
5b170c0bea x86: remove FASTDIV inline asm
GCC 4.3 and later do the right thing with the plain C code.  Earlier
versions in 32-bit mode generate one extra instruction, needlessly
zeroing what would be the high half of the shifted value.  At least
two gcc configurations miscompile the inline asm in some situations.

In 64-bit mode, all gcc versions generate imul r64, r64 followed by
shr.  On Intel i7 and later, this imul is faster 32-bit mul.  On
older Intel and all AMD, it is slightly slower.  On Atom it is much
slower.

Considering where the FASTDIV macro is used, any overall negative
performance impact of this change should be negligible.  If anyone
cares, they should file a bug against gcc and get the instruction
selection fixed.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-22 14:29:10 +01:00
Diego Biurrun
66baa45801 configure: Drop fastdiv option
There is no point in having the user disable any fastdiv macros.
Besides the condition implementation was broken and only disabled
the C implementation, but no platform specific assembly versions.
2012-08-22 01:02:18 +02:00
Martin Storsjö
33e112847d Add more missing includes after removing the implicit common.h
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-08-16 10:49:54 +03:00
Martin Storsjö
70766c2182 Add some more missing includes after removing the implicit common.h
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-08-15 23:48:48 +03:00
Martin Storsjö
1d9c2dc89a Don't include common.h from avutil.h
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-08-15 22:32:06 +03:00
Mans Rullgard
87fa05a0da ARM: intmath: use native-size return types for clipping functions
This avoids having the compiler redundantly mask the values to
the smaller size.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-13 14:51:52 +01:00
Mans Rullgard
6c4975eaaf libavutil: add saturating addition functions
Fixed-point audio codecs often use saturating arithmetic, and
special instructions for these operations are common.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-13 01:03:10 +01:00
Mans Rullgard
33de86db2b dict: move struct AVDictionary definition to dict.c
This makes struct AVDictionary fully opaque now that nothing
needs to access it directly any more.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-10 15:15:00 +01:00
Mans Rullgard
987170cb9d dict: add av_dict_count()
This adds a function to retrieve the number of entries in a
dictionary and updates the places directly accessing what should
be an opaque struct to use this new function instead.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-10 15:15:00 +01:00
Mans Rullgard
0d735ca214 ARM: add missing "cc" clobber in av_clipl_int32_arm()
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-10 10:51:10 +01:00
Mans Rullgard
54918d0394 libavutil: remove unused av_abort() macro
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-09 20:52:39 +01:00
Mans Rullgard
1c4ab37c38 libavutil: drop offsetof() fallback definition
The only compiler I have that does not define the standard
offsetof() macro is "Bruce's C Compiler", a simple compiler
for producing 8/16-bit 8086 code, usually for use in early
stages of PC booting.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-09 20:52:39 +01:00
Mans Rullgard
d913fd1f00 libavutil: drop fallback definitions of INTxx_MIN/MAX
This list is incomplete (we also use UINT16_MAX), so there does
not appear to be any system we care about that needs these.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-09 20:52:39 +01:00
Diego Biurrun
804d7a1aa6 doxygen: Fix function parameter names to match the code 2012-08-09 20:05:55 +02:00
Mans Rullgard
d7a4f8f8b9 Move MASK_ABS macro to libavcodec/mathops.h
This macro is only used in two places, both in libavcodec, so this
is a more sensible place for it.

Two small tweaks to the macro are made:

- removing the trailing semicolon
- dropping unnecessary 'volatile' from the x86 asm

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-09 00:58:20 +01:00
Mans Rullgard
070a402b60 x86: move MANGLE() and related macros to libavutil/x86/asm.h
These x86-specific macros do not belong in generic code.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-09 00:58:20 +01:00
Mans Rullgard
c318626ce2 x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h
This puts x86-specific things in the x86/ subdirectory where they
belong.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-09 00:58:20 +01:00
Mans Rullgard
ec9d2c15c1 ARM: use Q/R inline asm operand modifiers only if supported
Some compilers do not support the Q/R modifiers used to access
the low/high parts of a 64-bit register pair.  Check for this
and disable all uses of it when not supported.

Fixes bug #337.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 21:13:30 +01:00
Mans Rullgard
edd8226795 x86: fix build with nasm 2.08
It appears that something goes wrong in old nasm versions when the
%+ operator is used in the last argument of a macro invocation and
this argument is tested with %ifdef within the macro.  This patch
rearranges the macro arguments such that the %+ operator is never
used in the last argument.
2012-08-07 15:24:34 +01:00
Mans Rullgard
180d43bc67 x86: use nop cpu directives only if supported
nasm does not support 'CPU foonop' directives.  This adds a configure
test for the directive and uses it only if supported.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 15:22:20 +01:00
Mans Rullgard
7238265052 x86: fix rNmp macros with nasm
For some reason, nasm requires this.  No harm done to yasm.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 15:21:58 +01:00
Mans Rullgard
a3df4781f4 x86: add colons after labels
nasm prints a warning if the colon is missing.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 15:20:56 +01:00
Mans Rullgard
82494835c4 rational: add av_inv_q() returning the inverse of an AVRational
This allows simplifying a few expressions.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-05 17:46:41 +01:00
Diego Biurrun
239fdf1b4a x86: build: replace mmx2 by mmxext
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
2012-08-03 22:51:05 +02:00
Diego Biurrun
ca844b7be9 x86: Use consistent 3dnowext function and macro name suffixes
Currently there is a wild mix of 3dn2/3dnow2/3dnowext.  Switching to
"3dnowext", which is a more common name of the CPU flag, as reported
e.g. by the Linux kernel, unifies this.
2012-08-03 14:00:47 +02:00
Loren Merritt
f8d8fe255d x86inc: clip num_args to 7 on x86-32.
This allows us to unconditionally set the cglobal num_args
parameter to a bigger value, thus making writing yasm code
even easier than before.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-07-28 08:29:45 -07:00
Ronald S. Bultje
96c9cc1094 x86inc: sync to latest version from x264. 2012-07-28 08:29:44 -07:00
Justin Ruggles
79687079a9 x86: add support for fmaddps fma4 instruction with abstraction to avx/sse 2012-07-27 11:25:48 -04:00
Ronald S. Bultje
02ac28229a eval: fix printing of NaN in eval fate test.
This fixes "make fate-eval" on MSVC builds. Without this, the test outputs
"-1.#NaN" instead of "nan" on MSVS 2010.
2012-07-26 15:56:32 -07:00
Ronald S. Bultje
30b45d9c38 x86inc: automatically insert vzeroupper for YMM functions. 2012-07-26 13:43:16 -07:00
Jason Garrett-Glaser
85a3c19ed1 dsputil: x86: add SHUFFLE_MASK_W macro
Simplifies pshufb masks that operate on words.
2012-07-22 16:56:58 -04:00
Luca Barbato
f3e5e6f05b mem: introduce av_malloc_array and av_mallocz_array
Both function ease allocating large arrays implementing the overflow
check inside it.
2012-07-14 20:07:25 +02:00
Janne Grunau
2d497c141d eval: add gt(), gte(), lt() and lte() fate tests 2012-07-14 13:43:10 +02:00
Max Lazarov
caac3ab6ef eval: fix swapping of lt() and lte()
CC: libav-stable@libav.org
2012-07-14 13:33:25 +02:00
Ronald S. Bultje
183b1c2268 configure: Check for the math function rint
Add a fallback implementation if it doesn't exist.

Signed-off-by: Martin Storsjö <martin@martin.st>
2012-07-11 10:40:11 +03:00
Ronald S. Bultje
358d854df8 x86/cpu: implement get/set_eflags using intrinsics
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-07-10 14:33:32 +03:00
Ronald S. Bultje
c0ee695bd7 x86/cpu: implement support for cpuid through intrinsics
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-07-10 14:33:24 +03:00
Ronald S. Bultje
3f150ffba3 x86/cpu: implement support for xgetbv through intrinsics
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-07-10 14:33:17 +03:00
Ronald S. Bultje
f80ddd5bf7 lavu: use intrinsics for emms on systems lacking inline asm support
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-07-10 14:33:09 +03:00
Martin Storsjö
620b1e7e98 mem: Don't abort on av_malloc(0) in debug mode
This makes the behaviour consistent between debug and release mode.

Signed-off-by: Martin Storsjö <martin@martin.st>
2012-07-10 11:37:03 +03:00
Ronald S. Bultje
07b287020c x86/timer: implement an intrinsic-based version for rdtsc (AV_READ_TIME). 2012-07-07 13:35:07 -07:00
Loren Merritt
4d4752366f x86inc: add SPLATB_LOAD, SPLATB_REG, PSHUFLW macros
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-07-05 17:37:11 +02:00
Loren Merritt
2cd1f5cadc x86inc: modify ALIGN to not generate long nops on i586
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-07-05 17:37:11 +02:00
Samuel Pitoiset
983db9b2b4 xtea: Make the count parameter match the documentation
Previously it was interpreted as number of bytes, while the
documentation stated that it was the number of 8 byte blocks.
This makes it behave similarly to the existing AES code.

Signed-off-by: Martin Storsjö <martin@martin.st>
2012-07-05 12:45:18 +03:00
Samuel Pitoiset
e4a7fb3da3 blowfish: Make the count parameter match the documentation
Previously it was interpreted as number of bytes, while the
documentation stated that it was the number of 8 byte blocks.
This makes it behave similarly to the existing AES code.

Signed-off-by: Martin Storsjö <martin@martin.st>
2012-07-05 12:41:57 +03:00
Luca Barbato
f6687bf5f8 xtea: invert branch and loop precedence
Should slightly improve performance depending on the compiler used.
2012-07-05 10:42:00 +02:00
Luca Barbato
669bbedfa8 blowfish: invert branch and loop precedence
Should slightly improve performance depending on the compiler used.
2012-07-05 10:40:13 +02:00
Diego Biurrun
2047e40e6e Clarify Doxygen comment for FF_API_* #defines. 2012-07-04 15:10:10 +02:00
Diego Biurrun
86ab7b0f2f Create version.h headers for libraries that lack them 2012-07-04 15:10:06 +02:00