Commit Graph

348 Commits

Author SHA1 Message Date
Ben Avison 45e10e5c8d arm: Add assembly version of h264_find_start_code_candidate
Before          After
               Mean   StdDev   Mean   StdDev  Change
This function   508.8 23.4      185.4  9.0    +174.4%
Overall        3068.5 31.7     2752.1 29.4     +11.5%

In combination with the preceding patch:
                Before          After
                Mean   StdDev   Mean   StdDev  Change
Overall         2925.6 26.2     2752.1 29.4     +6.3%

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-08-08 12:08:34 +03:00
Martin Storsjö 54ba52077c arm: Comment out unused labels in simple_idct_arm
When building for iOS in thumb mode, gas-preprocessor.pl doesn't
mark unused labels as thumb functions (as it does for other
local labels, where it can figure out that they are functions
due to being referenced in branch instructions). This leads to
linker warnings for some of those local labels, such as:

ld: warning: ARM function not 4-byte aligned: __a_evaluation from
libavcodec/libavcodec.a(simple_idct_arm.o)

Therefore, comment them out since they don't have any function.
They do still have a value in documenting key points in the
assembly source though.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-24 22:43:21 +03:00
Martin Storsjö 69e6702c01 arm: Mangle external symbols properly in new vfp assembly files
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-22 14:48:30 +03:00
Ben Avison ff30d12159 arm: Add VFP-accelerated version of qmf_32_subbands
Before           After
               Mean    StdDev   Mean    StdDev  Change
This function   1323.0  98.0      746.2  60.6   +77.3%
Overall        15400.0 336.4    14147.5 288.4    +8.9%

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-22 10:15:44 +03:00
Martin Storsjö 8b9eba664e arm: Add VFP-accelerated version of fft16
Before           After
               Mean    StdDev   Mean    StdDev  Change
This function   1389.3  4.2       967.8  35.1   +43.6%
Overall        15577.5 83.2     15400.0 336.4    +1.2%

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-22 10:15:41 +03:00
Martin Storsjö ba6836c966 arm: Add VFP-accelerated version of dca_lfe_fir
Before           After
               Mean    StdDev   Mean    StdDev  Change
This function    868.2  33.5      436.0  27.0   +99.1%
Overall        15973.0 223.2    15577.5  83.2    +2.5%

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-22 10:15:39 +03:00
Martin Storsjö b63bb251ea arm: Add VFP-accelerated version of imdct_half
Before           After
               Mean    StdDev   Mean    StdDev  Change
This function   2653.0  28.5     1108.8  51.4   +139.3%
Overall        17049.5 408.2    15973.0 223.2     +6.7%

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-22 10:15:37 +03:00
Ben Avison d6e4f5fef0 arm: Add VFP-accelerated version of int32_to_float_fmul_array8
Before           After
               Mean    StdDev   Mean    StdDev  Change
This function    366.2  18.3      277.8  13.7   +31.9%
Overall        18420.5 489.1    17049.5 408.2    +8.0%

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-22 10:15:36 +03:00
Ben Avison ce9ed10ac2 arm: Add VFP-accelerated version of int32_to_float_fmul_scalar
Before           After
               Mean    StdDev   Mean    StdDev  Change
This function   1175.0   4.4      366.2  18.3   +220.8%
Overall        19285.5 292.0    18420.5 489.1     +4.7%

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-22 10:15:30 +03:00
Ben Avison 41ef1d360b arm: Add VFP-accelerated version of synth_filter_float
Before           After
               Mean    StdDev   Mean    StdDev  Change
This function   9295.0 114.9     4853.2 83.5    +91.5%
Overall        23699.8 397.6    19285.5 292.0   +22.9%

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-22 10:15:17 +03:00
Christophe Gisquet b6293e2798 fmtconvert: Explicitly use int32_t instead of int
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-17 11:02:47 +03:00
Martin Storsjö 86113667c0 arm: Include hpeldsp_neon.o if h264qpel is enabled
A few of the h264qpel neon functions are shared with other
hpeldsp functions in this file.

This fixes standalone compilation of the h264 decoder on arm.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-05-30 02:17:37 +03:00
Martin Storsjö efb7968cfe arm: Don't unconditionally build dsputil files
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-05-30 02:17:35 +03:00
Martin Storsjö 36a7df8cf1 arm: Only build the FFT init files if FFT is enabled
This fixes build errors in cases where FFT is disabled.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-05-30 02:17:33 +03:00
Diego Biurrun 186599ffe0 build: cosmetics: Place unconditional before conditional OBJS lines
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-05-30 02:17:31 +03:00
Diego Biurrun 9b9b2e9f30 build: arm: cosmetics: Place all OBJS declarations in alphabetical order
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-05-30 02:17:27 +03:00
Diego Biurrun 383fd4d478 arm: Drop unnecessary ff_ name prefixes from static functions 2013-04-30 16:02:03 +02:00
Ronald S. Bultje 7384b7a713 arm: hpeldsp: Move half-pel assembly from dsputil to hpeldsp
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-04-19 23:19:08 +03:00
Ronald S. Bultje 015821229f vp3: Use full transpose for all IDCTs
This way, the special IDCT permutations are no longer needed. This
is similar to how H264 does it, and removes the dsputil dependency
imposed by the scantable code.

Also remove the unused type == 0 cases from the plain C version
of the idct.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-04-15 12:32:05 +03:00
Ronald S. Bultje 62844c3fd6 h264: Integrate clear_blocks calls with IDCT
The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700
to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb
(in the decode_slice loop) goes from 1759 to 1733 cycles on the clip
tested (cathedral), i.e. almost 30 cycles per mb faster.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-04-10 11:03:06 +03:00
Luca Barbato a8b6015823 dsputil: convert remaining functions to use ptrdiff_t strides
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-03-12 18:26:42 +01:00
Diego Biurrun c242bbd8b6 Remove unnecessary dsputil.h #includes 2013-02-26 00:51:34 +01:00
Diego Biurrun 76b19a3984 Fix a number of incorrect intmath.h #includes. 2013-02-26 00:51:34 +01:00
Diego Biurrun 3e85b46ecf arm: vp8: Add missing #includes for header to compile standalone 2013-02-20 14:24:07 +01:00
Diego Biurrun 82bd04b170 rv34: Drop now unnecessary dsputil dependencies 2013-02-06 11:30:54 +01:00
Diego Biurrun 79dad2a932 dsputil: Separate h264chroma 2013-02-06 11:30:53 +01:00
Diego Biurrun c9f933b5b6 Add av_cold attributes to arch-specific init functions 2013-02-05 17:01:05 +01:00
Diego Biurrun 25841dfe80 Use ptrdiff_t instead of int for {avg, put}_pixels line_size parameter.
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.
2013-02-05 12:59:12 +01:00
Diego Biurrun 6c1a7d07eb Use proper "" quotes for local header #includes 2013-02-01 12:51:15 +01:00
Martin Storsjö 2026eb1408 arm: vp8: Fix the plain-armv6 version of vp8_luma_dc_wht
This makes the plain-armv6 version use the same registers as the
armv6t2 version above.

This fixes fate-vp8 on plain-armv6 devices.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-01-27 13:17:25 +02:00
Diego Biurrun 33552a5f7b arm: Add mathops.h to ARCH_HEADERS list
It is an arch-specific header not suitable for standalone compilation.
2013-01-24 20:59:22 +01:00
Janne Grunau 8e148b8742 arm: h264qpel: use neon h264 qpel functions only if supported 2013-01-24 17:06:52 +01:00
Mans Rullgard e9d817351b dsputil: Separate h264 qpel
The sh4 optimizations are removed, because the code is
100% identical to the C code, so it is unlikely to
provide any real practical benefit.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-01-24 10:44:43 +01:00
Ronald S. Bultje baf35bb4bc dsputil: remove one array dimension from avg_no_rnd_pixels_tab. 2013-01-22 18:41:36 -08:00
Ronald S. Bultje 32ff643228 dsputil: remove avg_no_rnd_pixels8.
This is never used.
2013-01-22 18:41:36 -08:00
Diego Biurrun 88bd7fdc82 Drop DCTELEM typedef
It does not help as an abstraction and adds dsputil dependencies.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2013-01-22 18:32:56 -08:00
Diego Biurrun 73b704ac60 arm: Add some missing header #includes 2013-01-22 21:24:10 +01:00
Ronald S. Bultje d56668bd80 floatdsp: move scalarproduct_float from dsputil to avfloatdsp.
This makes the aac decoder and all voice codecs independent of dsputil.
2013-01-22 11:55:42 -08:00
Ronald S. Bultje 5959bfaca3 floatdsp: move butterflies_float from dsputil to avfloatdsp.
This makes wmadec/enc, twinvq and mpegaudiodec (i.e. mp2/mp3)
independent of dsputil.
2013-01-22 11:55:42 -08:00
Ronald S. Bultje 42d3246948 floatdsp: move vector_fmul_reverse from dsputil to avfloatdsp.
Now, nellymoserenc and aacenc no longer depends on dsputil. Independent
of this patch, wmaprodec also does not depend on dsputil, so I removed
it from there also.
2013-01-22 11:55:42 -08:00
Ronald S. Bultje 55aa03b9f8 floatdsp: move vector_fmul_add from dsputil to avfloatdsp. 2013-01-22 11:55:42 -08:00
Ronald S. Bultje 1768e43ceb vorbisdsp: change block_size type from int to intptr_t.
This saves one instruction in the x86-64 assembly.
2013-01-20 22:26:42 -08:00
Janne Grunau 68f18f0351 videodsp_armv5te: remove #if HAVE_ARMV5TE_EXTERNAL
libavutil/arm/asm.S sets '.arch' depending on HAVE_ARMV5TE so that
assembling armv5te code will always succeed even if the default -march
flag does not support it. HAVE_ARMV5TE_EXTERNAL tests assembling code
with the default arch.
Fixes the missing symbol ff_prefetch_arm with --cpu= not including
armv5te.

CC: libav-stable@libav.org
2013-01-20 15:20:00 +01:00
Ronald S. Bultje fef906c77c Move vorbis_inverse_coupling from dsputil to vorbisdspcontext.
Conveniently (together with Justin's earlier patches), this makes
our vorbis decoder entirely independent of dsputil.
2013-01-19 22:21:10 -08:00
Ronald S. Bultje aeaf268e52 vp3: integrate clear_blocks with idct of previous block.
This is identical to what e.g. vp8 does, and prevents the function call
overhead (plus dependency on dsputil for this particular function).

Arm asm updated by Janne Grunau <janne-libav@jannau.net>.

Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2013-01-19 22:04:55 -08:00
Justin Ruggles e034cc6c60 lavc: Move vector_fmul_window to AVFloatDSPContext
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-01-16 10:45:45 +01:00
Luca Barbato 6906b19346 lavc: add missing files for arm
Across the many retouches those did not make the main commit.
2012-12-20 14:07:23 +01:00
Ronald S. Bultje 8c53d39e7f lavc: introduce VideoDSPContext
Move some functions from dsputil. The idea is that videodsp contains
functions that are useful for a large and varied set of video decoders.
Currently, it contains emulated_edge_mc() and prefetch().

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2012-12-20 13:40:45 +01:00
Diego Biurrun 523c7bd23c misc typo, style and wording fixes 2012-12-18 13:36:51 +01:00
Mans Rullgard b326755989 arm: rename ARMVFP config symbol to VFP
This is consistent with usual ARM nomenclature as well as with the
VFPV3 and NEON symbols which both lack the ARM prefix.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-12-07 16:54:04 +00:00