Commit Graph

28 Commits

Author SHA1 Message Date
Hendrik Leppkes e754c8e8ca Merge commit 'e2710e790c09e49e86baa58c6063af0097cc8cb0'
* commit 'e2710e790c09e49e86baa58c6063af0097cc8cb0':
  arm: add a cpu flag for the VFPv2 vector mode

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-01-02 11:01:29 +01:00
Janne Grunau e2710e790c arm: add a cpu flag for the VFPv2 vector mode
The vector mode was deprecated in ARMv7-A/VFPv3 and various cpu
implementations do not support it in hardware. Vector mode code will
depending the OS either be emulated in software or result in an illegal
instruction on cpus which does not support it. This was not really
problem in practice since NEON implementations of the same functions are
preferred. It will however become a problem for checkasm which tests
every cpu flag separately.

Since this is a cpu feature newer cpu do not support anymore the
behaviour of this flag differs from the other flags. It can be only
activated by runtime cpu feature selection.
2015-12-14 16:42:35 +01:00
Michael Niedermayer c27adb37ef Merge commit '87552d54d3337c3241e8a9e1a05df16eaa821496'
* commit '87552d54d3337c3241e8a9e1a05df16eaa821496':
  armv6: Accelerate ff_fft_calc for general case (nbits != 4)

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-18 03:12:02 +02:00
Ben Avison 87552d54d3 armv6: Accelerate ff_fft_calc for general case (nbits != 4)
The previous implementation targeted DTS Coherent Acoustics, which only
requires nbits == 4 (fft16()). This case was (and still is) linked directly
rather than being indirected through ff_fft_calc_vfp(), but now the full
range from radix-4 up to radix-65536 is available. This benefits other codecs
such as AAC and AC3.

The implementaion is based upon the C version, with each routine larger than
radix-16 calling a hierarchy of smaller FFT functions, then performing a
post-processing pass. This pass benefits a lot from loop unrolling to
counter the long pipelines in the VFP. A relaxed calling standard also
reduces the overhead of the call hierarchy, and avoiding the excessive
inlining performed by GCC probably helps with I-cache utilisation too.

I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in the FFT routines (fft4() to fft512() and pass()) for the
same sample AAC stream:

              Before          After
              Mean   StdDev   Mean   StdDev  Confidence  Change
Audio decode  2245.5 53.1     1599.6 43.8    100.0%      +40.4%
FFT routines  940.6  22.0     348.1  20.8    100.0%      +170.2%

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-07-18 01:34:23 +03:00
Michael Niedermayer 99e7c702db Merge commit 'bd549cbaacd33dfb7be81d0619c9b107b8a85be7'
* commit 'bd549cbaacd33dfb7be81d0619c9b107b8a85be7':
  arm: dcadsp: Move synth filter initialization to dcadsp file

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-08-29 16:00:45 +02:00
Diego Biurrun bd549cbaac arm: dcadsp: Move synth filter initialization to dcadsp file 2013-08-29 11:24:14 +02:00
Michael Niedermayer 2305a6775d Merge commit 'b63bb251ea6d6ba23295294e37a92625c0192206'
* commit 'b63bb251ea6d6ba23295294e37a92625c0192206':
  arm: Add VFP-accelerated version of imdct_half

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-07-22 11:57:05 +02:00
Michael Niedermayer 0129026c4e Merge commit '41ef1d360bac65032aa32f6b43ae137666507ae5'
* commit '41ef1d360bac65032aa32f6b43ae137666507ae5':
  arm: Add VFP-accelerated version of synth_filter_float

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-07-22 11:41:37 +02:00
Martin Storsjö b63bb251ea arm: Add VFP-accelerated version of imdct_half
Before           After
               Mean    StdDev   Mean    StdDev  Change
This function   2653.0  28.5     1108.8  51.4   +139.3%
Overall        17049.5 408.2    15973.0 223.2     +6.7%

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-22 10:15:37 +03:00
Ben Avison 41ef1d360b arm: Add VFP-accelerated version of synth_filter_float
Before           After
               Mean    StdDev   Mean    StdDev  Change
This function   9295.0 114.9     4853.2 83.5    +91.5%
Overall        23699.8 397.6    19285.5 292.0   +22.9%

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-07-22 10:15:17 +03:00
Carl Eugen Hoyos cf36180143 Only set accelerated arm fft functions if fft is enabled.
Fixes lavc compilation (linking) for configurations without fft.

Reported-by: tyler wear
Tested-by: Gavin Kinsey
2013-02-17 17:29:55 +01:00
Michael Niedermayer 92ef4be4ab Merge remote-tracking branch 'qatar/master'
* qatar/master:
  ARM: allow runtime masking of CPU features
  dsputil: remove unused functions
  mov: Treat keyframe indexes as 1-origin if starting at non-zero.
  mov: Take stps entries into consideration also about key_off.
  Remove lowres video decoding

Conflicts:
	ffmpeg.c
	ffplay.c
	libavcodec/arm/vp8dsp_init_arm.c
	libavcodec/libopenjpegdec.c
	libavcodec/mjpegdec.c
	libavcodec/mpegvideo.c
	libavcodec/utils.c
	libavformat/mov.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-22 22:26:42 +02:00
Mans Rullgard d526c5338d ARM: allow runtime masking of CPU features
This allows masking CPU features with the -cpuflags avconv option
which is useful for testing different optimisations without rebuilding.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-04-22 12:30:45 +01:00
Michael Niedermayer 8f0768cc22 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  Add a tool that uses avio to read and write, doing a plain copy of data
  ARM: fix build with FFT enabled and MDCT disabled
  lavf: force single-threaded decoding in avformat_find_stream_info
  avidec: migrate last of lavf from FF_ER_* to AV_EF_*
  avserver: fix build after the next bump.

Conflicts:
	libavformat/Makefile
	libavformat/avidec.c
	libavformat/utils.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-01-21 01:33:31 +01:00
Felipe Contreras c3d5e290ca ARM: fix build with FFT enabled and MDCT disabled
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-01-20 16:14:01 +00:00
Michael Niedermayer d4a50a2100 Merge remote-tracking branch 'newdev/master'
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2011-03-21 03:33:28 +01:00
Mans Rullgard 0aded9484d Move dct and rdft definitions to separate files
This leaves fft.h with only the core FFT and MDCT definitions
thus making it more managable.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-20 17:15:33 +00:00
Mans Rullgard 2912e87a6c Replace FFmpeg with Libav in licence headers
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-19 13:33:20 +00:00
Loren Merritt 11ab1e409f FFT: factor a shuffle out of the inner loop and merge it into fft_permute.
6% faster SSE FFT on Conroe, 2.5% on Penryn.

Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>
(cherry picked from commit e6b1ed693a)
2011-02-14 23:58:19 +01:00
Loren Merritt e6b1ed693a FFT: factor a shuffle out of the inner loop and merge it into fft_permute.
6% faster SSE FFT on Conroe, 2.5% on Penryn.

Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>
2011-02-13 15:36:39 +01:00
Justin Ruggles a8ae4e0e7b Remove unneeded add bias from 3 functions.
DSPContext.vector_fmul_window()
DCADSPContext.lfe_fir()
SynthFilterContext.synth_filter_float()

Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit 80ba1ddb58)
2011-02-02 03:40:48 +01:00
Justin Ruggles 80ba1ddb58 Remove unneeded add bias from 3 functions.
DSPContext.vector_fmul_window()
DCADSPContext.lfe_fir()
SynthFilterContext.synth_filter_float()

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-31 20:28:42 +00:00
Måns Rullgård e73d1a5efc ARM: NEON optimised synth_filter_float
2.7x faster DCA decoding on Cortex-A8

Originally committed as revision 22828 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-04-10 16:27:56 +00:00
Måns Rullgård a8bb9ea532 ARM: NEON optimised RDFT
Originally committed as revision 22641 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-03-23 03:35:02 +00:00
Måns Rullgård 1429224b04 Move FFT parts from dsputil.h to fft.h
Originally committed as revision 22235 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-03-06 14:34:46 +00:00
Måns Rullgård f7a3b6030c ARM: interleave cos/sin tables for improved NEON MDCT
Originally committed as revision 19940 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-09-21 02:56:09 +00:00
Måns Rullgård 01b2214758 Merge FFTContext and MDCTContext
Originally committed as revision 19931 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-09-20 17:30:20 +00:00
Måns Rullgård f486321395 Move per-arch fft init bits into the corresponding subdirs
Originally committed as revision 19864 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-09-15 21:14:14 +00:00