GCC 4.3 and later do the right thing with the plain C code. Earlier
versions in 32-bit mode generate one extra instruction, needlessly
zeroing what would be the high half of the shifted value. At least
two gcc configurations miscompile the inline asm in some situations.
In 64-bit mode, all gcc versions generate imul r64, r64 followed by
shr. On Intel i7 and later, this imul is faster 32-bit mul. On
older Intel and all AMD, it is slightly slower. On Atom it is much
slower.
Considering where the FASTDIV macro is used, any overall negative
performance impact of this change should be negligible. If anyone
cares, they should file a bug against gcc and get the instruction
selection fixed.
Signed-off-by: Mans Rullgard <mans@mansr.com>
There is no point in having the user disable any fastdiv macros.
Besides the condition implementation was broken and only disabled
the C implementation, but no platform specific assembly versions.
Fixed-point audio codecs often use saturating arithmetic, and
special instructions for these operations are common.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This adds a function to retrieve the number of entries in a
dictionary and updates the places directly accessing what should
be an opaque struct to use this new function instead.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The only compiler I have that does not define the standard
offsetof() macro is "Bruce's C Compiler", a simple compiler
for producing 8/16-bit 8086 code, usually for use in early
stages of PC booting.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This list is incomplete (we also use UINT16_MAX), so there does
not appear to be any system we care about that needs these.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This macro is only used in two places, both in libavcodec, so this
is a more sensible place for it.
Two small tweaks to the macro are made:
- removing the trailing semicolon
- dropping unnecessary 'volatile' from the x86 asm
Signed-off-by: Mans Rullgard <mans@mansr.com>
Some compilers do not support the Q/R modifiers used to access
the low/high parts of a 64-bit register pair. Check for this
and disable all uses of it when not supported.
Fixes bug #337.
Signed-off-by: Mans Rullgard <mans@mansr.com>
It appears that something goes wrong in old nasm versions when the
%+ operator is used in the last argument of a macro invocation and
this argument is tested with %ifdef within the macro. This patch
rearranges the macro arguments such that the %+ operator is never
used in the last argument.
nasm does not support 'CPU foonop' directives. This adds a configure
test for the directive and uses it only if supported.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to
"3dnowext", which is a more common name of the CPU flag, as reported
e.g. by the Linux kernel, unifies this.
This allows us to unconditionally set the cglobal num_args
parameter to a bigger value, thus making writing yasm code
even easier than before.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Previously it was interpreted as number of bytes, while the
documentation stated that it was the number of 8 byte blocks.
This makes it behave similarly to the existing AES code.
Signed-off-by: Martin Storsjö <martin@martin.st>
Previously it was interpreted as number of bytes, while the
documentation stated that it was the number of 8 byte blocks.
This makes it behave similarly to the existing AES code.
Signed-off-by: Martin Storsjö <martin@martin.st>