Commit Graph

656 Commits

Author SHA1 Message Date
Michael Niedermayer 32cf26cc6a Merge commit 'f23d26a6864128001b03876b0b92fffe131f2060'
* commit 'f23d26a6864128001b03876b0b92fffe131f2060':
  h264: avoid using uninitialized memory in NEON chroma mc

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-23 20:35:33 +02:00
Janne Grunau f23d26a686 h264: avoid using uninitialized memory in NEON chroma mc
Adapt commit 982b596ea6 for the arm and
aarch64 NEON asm. 5-10% faster on Cortex-A9.
2014-06-23 16:32:15 +02:00
Michael Niedermayer 99497b4683 Merge commit '9a9e2f1c8aa4539a261625145e5c1f46a8106ac2'
* commit '9a9e2f1c8aa4539a261625145e5c1f46a8106ac2':
  dsputil: Split audio operations off into a separate context

Conflicts:
	configure
	libavcodec/takdec.c
	libavcodec/x86/Makefile
	libavcodec/x86/dsputil.asm
	libavcodec/x86/dsputil_init.c
	libavcodec/x86/dsputil_mmx.c
	libavcodec/x86/dsputil_x86.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-22 17:58:28 +02:00
Diego Biurrun 9a9e2f1c8a dsputil: Split audio operations off into a separate context 2014-06-22 06:20:15 -07:00
Michael Niedermayer 08c5859f17 avcodec: add simpleauto idct
This will pick the "best" simple idct compatible idct

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-19 14:28:01 +02:00
Michael Niedermayer 2b05db4f81 Merge commit 'e74433a8e6fc00c8dbde293c97a3e45384c2c1d9'
* commit 'e74433a8e6fc00c8dbde293c97a3e45384c2c1d9':
  dsputil: Split clear_block*/fill_block* off into a separate context

Conflicts:
	configure
	libavcodec/asvdec.c
	libavcodec/dnxhddec.c
	libavcodec/dnxhdenc.c
	libavcodec/dsputil.h
	libavcodec/eamad.c
	libavcodec/intrax8.c
	libavcodec/mjpegdec.c
	libavcodec/ppc/dsputil_ppc.c
	libavcodec/vc1dec.c
	libavcodec/x86/dsputil_init.c
	libavcodec/x86/dsputil_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-19 04:54:38 +02:00
Diego Biurrun e74433a8e6 dsputil: Split clear_block*/fill_block* off into a separate context 2014-06-18 14:07:23 -07:00
Christophe Gisquet ccff45a0d3 apedsp: move to llauddsp
APE is not the sole codec using scalarproduct_and_madd_int16.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-05 20:31:59 +02:00
Michael Niedermayer 83e8650f77 Merge commit '896a5bff64264f4d01ed98eacc97a67260c1e17e'
* commit '896a5bff64264f4d01ed98eacc97a67260c1e17e':
  arm: check if AS supports .dn

Conflicts:
	configure
	libavcodec/arm/vc1dsp_init_neon.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-03 18:19:21 +02:00
Janne Grunau 896a5bff64 arm: check if AS supports .dn
Move the GNU as check before the arch specific asm checks since the .dn
check requires gas compatible assembler.

Disable the VC-1 motion compensation NEON asm which is the only part
using that directive. The integrated assembler in the upcoming clang 3.5
does not support .dn/.qn without plans to change that. Too much effort
to implement it while it is rarely used.

http://llvm.org/bugs/show_bug.cgi?id=18199.
2014-06-03 14:23:03 +02:00
Michael Niedermayer 40f3a87c10 Merge commit '054013a0fc6f2b52c60cee3e051be8cc7f82cef3'
* commit '054013a0fc6f2b52c60cee3e051be8cc7f82cef3':
  dsputil: Move APE-specific bits into apedsp

Conflicts:
	libavcodec/arm/int_neon.S
	libavcodec/x86/dsputil.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-30 00:59:15 +02:00
Diego Biurrun 054013a0fc dsputil: Move APE-specific bits into apedsp 2014-05-29 06:41:15 -07:00
Michael Niedermayer 8b30702c44 Merge commit '6a13505c069890cb0e2a07e29fd819a0cf2e73c1'
* commit '6a13505c069890cb0e2a07e29fd819a0cf2e73c1':
  mpegvideo: move the MpegEncContext fields used from arm asm to the beginning

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-30 00:23:01 +02:00
Anton Khirnov 6a13505c06 mpegvideo: move the MpegEncContext fields used from arm asm to the beginning
This should reduce the frequency with which the offsets need to be
updated.
2014-04-29 14:49:42 +02:00
Ben Avison 9d8ecdd8ca vc-1: Add platform-specific start code search routine to VC1DSPContext.
Initialise VC1DSPContext for parser as well as for decoder.
Note, the VC-1 code doesn't actually use the function pointer yet.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-25 02:36:11 +02:00
Ben Avison 270cede3f3 h264: Move search code search functions into separate source files.
This permits re-use with parsers for codecs which use similar start codes.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-25 02:35:56 +02:00
Michael Niedermayer 06e664366a Merge commit 'a88e1d1c598e641eecd5d43730211d91c82787c6'
* commit 'a88e1d1c598e641eecd5d43730211d91c82787c6':
  lavu: add CHK_OFFS as AV_CHECK_OFFSET to check struct member offsets

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-25 00:55:40 +02:00
Janne Grunau a88e1d1c59 lavu: add CHK_OFFS as AV_CHECK_OFFSET to check struct member offsets 2014-04-24 18:28:26 +02:00
Michael Niedermayer af89a685c4 avcodec/arm/vc1dsp_init_neon: fix code so it compiles and passes fate-vc1
The original patch  seems to be missing a 16x16 function though

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-20 20:32:21 +02:00
Christophe Gisquet 319235c67c vc1dsp: introduce cases for 8x8 and 16x16
This allows further unrolling the DSP implementation where possible.

x86 and ARM DSP modified by simply moving the multiple calls from vc1dec
to the DSP code. Decoding improvements should only occurs because of the
compiler actually able to unroll more.

Decoding time: ~8.80s -> 8.64s (ie around 2%)

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-20 18:25:36 +02:00
Michael Niedermayer 5440151fa4 Merge commit '3dc6272bed7890a49080e18eacf3c7a4a6594b0d'
* commit '3dc6272bed7890a49080e18eacf3c7a4a6594b0d':
  Remove a number of unnecessary dsputil.h #includes

Conflicts:
	libavcodec/h264pred.c
	libavcodec/vc1dsp.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-05 18:54:15 +02:00
Diego Biurrun 3dc6272bed Remove a number of unnecessary dsputil.h #includes 2014-04-04 19:08:05 +02:00
Michael Niedermayer fb61ed1e9f Merge commit 'ac4b32df71bd932838043a4838b86d11e169707f'
* commit 'ac4b32df71bd932838043a4838b86d11e169707f':
  On2 VP7 decoder

Conflicts:
	Changelog
	libavcodec/arm/h264pred_init_arm.c
	libavcodec/arm/vp8dsp.h
	libavcodec/arm/vp8dsp_init_arm.c
	libavcodec/arm/vp8dsp_init_armv6.c
	libavcodec/arm/vp8dsp_init_neon.c
	libavcodec/avcodec.h
	libavcodec/h264pred.c
	libavcodec/version.h
	libavcodec/vp8.c
	libavcodec/vp8.h
	libavcodec/vp8data.h
	libavcodec/vp8dsp.c
	libavcodec/vp8dsp.h
	libavcodec/x86/h264_intrapred_init.c
	libavcodec/x86/vp8dsp_init.c

See: 89f2f5dbd7 and others
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-04 14:46:10 +02:00
Janne Grunau f37815b1d5 arm: asm decode_block_coeffs_internal is vp8 specific
Unbreaks compilation on arm due to conflicting types for
'ff_decode_block_coeffs_armv6'.
2014-04-04 10:39:29 +02:00
Peter Ross ac4b32df71 On2 VP7 decoder
Further performance improvements and security fixes by
Vittorio Giovara, Luca Barbato and Diego Biurrun.

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-04-04 04:00:11 +02:00
Michael Niedermayer 68014c6ed9 Merge commit 'c3a0b3eb64be441ca897629e8ecd80d5b51fded7'
* commit 'c3a0b3eb64be441ca897629e8ecd80d5b51fded7':
  arm: build: Maintain decoder objects separate from infrastructure objects

Conflicts:
	libavcodec/arm/Makefile

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-27 20:10:51 +01:00
Diego Biurrun c3a0b3eb64 arm: build: Maintain decoder objects separate from infrastructure objects 2014-03-27 03:00:05 -07:00
Michael Niedermayer 50b68e323c Merge remote-tracking branch 'qatar/master'
* qatar/master:
  truehd: add hand-scheduled ARM asm version of ff_mlp_pack_output.

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-26 21:23:09 +01:00
Ben Avison 89135716fd truehd: add hand-scheduled ARM asm version of ff_mlp_rematrix_channel.
Profiling results for overall audio decode and the rematrix_channels function
in particular are as follows:

              Before          After
              Mean   StdDev   Mean   StdDev  Confidence  Change
6:2 total     370.8  17.0     348.8  20.1    99.9%       +6.3%
6:2 function  46.4   8.4      45.8   6.6     18.0%       +1.2%  (insignificant)
8:2 total     343.2  19.0     339.1  15.4    54.7%       +1.2%  (insignificant)
8:2 function  38.9   3.9      40.2   6.9     52.4%       -3.2%  (insignificant)
6:6 total     658.4  15.7     604.6  20.8    100.0%      +8.9%
6:6 function  109.0  8.7      59.5   5.4     100.0%      +83.3%
8:8 total     896.2  24.5     766.4  17.6    100.0%      +16.9%
8:8 function  223.4  12.8     93.8   5.0     100.0%      +138.3%

The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-26 20:50:05 +01:00
Michael Niedermayer f38af0143c Merge commit '15a29c39d9ef15b0783c04b3228e1c55f6701ee3'
* commit '15a29c39d9ef15b0783c04b3228e1c55f6701ee3':
  truehd: add hand-scheduled ARM asm version of mlp_filter_channel.

Conflicts:
	libavcodec/arm/Makefile
	libavcodec/arm/mlpdsp_init_arm.c

See: 87b128d5ef
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-26 20:39:10 +01:00
Ben Avison 87b128d5ef truehd: add hand-scheduled ARM asm version of mlp_filter_channel.
Profiling results for overall audio decode and the mlp_filter_channel(_arm)
function in particular are as follows:

              Before          After
              Mean   StdDev   Mean   StdDev  Confidence  Change
6:2 total     380.4  22.0     370.8  17.0    87.4%       +2.6%  (insignificant)
6:2 function  60.7   7.2      36.6   8.1     100.0%      +65.8%
8:2 total     357.0  17.5     343.2  19.0    97.8%       +4.0%  (insignificant)
8:2 function  60.3   8.8      37.3   3.8     100.0%      +61.8%
6:6 total     717.2  23.2     658.4  15.7    100.0%      +8.9%
6:6 function  140.4  12.9     81.5   9.2     100.0%      +72.4%
8:8 total     981.9  16.2     896.2  24.5    100.0%      +9.6%
8:8 function  193.4  15.0     103.3  11.5    100.0%      +87.2%

Experiments with adding preload instructions to this function yielded no
useful benefit, so these have not been included.

The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-26 20:22:18 +01:00
Ben Avison 3b5946bcce truehd: add hand-scheduled ARM asm version of ff_mlp_pack_output.
Profiling results for overall decode and the output_data function in
particular are as follows:

              Before          After
              Mean   StdDev   Mean   StdDev  Confidence  Change
6:2 total     339.6  15.1     329.3  16.0    95.8%       +3.1%  (insignificant)
6:2 function  24.6   6.0      9.9    3.1     100.0%      +148.5%
8:2 total     324.5  15.5     323.6  14.3    15.2%       +0.3%  (insignificant)
8:2 function  20.4   3.9      9.9    3.4     100.0%      +104.7%
6:6 total     572.8  20.6     539.9  24.2    100.0%      +6.1%
6:6 function  54.5   5.6      16.0   3.8     100.0%      +240.9%
8:8 total     741.5  21.2     702.5  18.5    100.0%      +5.6%
8:8 function  63.9   7.6      18.4   4.8     100.0%      +247.3%

The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-03-26 19:54:32 +02:00
Ben Avison 483321fe78 truehd: add hand-scheduled ARM asm version of ff_mlp_rematrix_channel.
Profiling results for overall audio decode and the rematrix_channels function
in particular are as follows:

              Before          After
              Mean   StdDev   Mean   StdDev  Confidence  Change
6:2 total     370.8  17.0     348.8  20.1    99.9%       +6.3%
6:2 function  46.4   8.4      45.8   6.6     18.0%       +1.2%  (insignificant)
8:2 total     343.2  19.0     339.1  15.4    54.7%       +1.2%  (insignificant)
8:2 function  38.9   3.9      40.2   6.9     52.4%       -3.2%  (insignificant)
6:6 total     658.4  15.7     604.6  20.8    100.0%      +8.9%
6:6 function  109.0  8.7      59.5   5.4     100.0%      +83.3%
8:8 total     896.2  24.5     766.4  17.6    100.0%      +16.9%
8:8 function  223.4  12.8     93.8   5.0     100.0%      +138.3%

The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-03-26 19:54:10 +02:00
Ben Avison 15a29c39d9 truehd: add hand-scheduled ARM asm version of mlp_filter_channel.
Profiling results for overall audio decode and the mlp_filter_channel(_arm)
function in particular are as follows:

              Before          After
              Mean   StdDev   Mean   StdDev  Confidence  Change
6:2 total     380.4  22.0     370.8  17.0    87.4%       +2.6%  (insignificant)
6:2 function  60.7   7.2      36.6   8.1     100.0%      +65.8%
8:2 total     357.0  17.5     343.2  19.0    97.8%       +4.0%  (insignificant)
8:2 function  60.3   8.8      37.3   3.8     100.0%      +61.8%
6:6 total     717.2  23.2     658.4  15.7    100.0%      +8.9%
6:6 function  140.4  12.9     81.5   9.2     100.0%      +72.4%
8:8 total     981.9  16.2     896.2  24.5    100.0%      +9.6%
8:8 function  193.4  15.0     103.3  11.5    100.0%      +87.2%

Experiments with adding preload instructions to this function yielded no
useful benefit, so these have not been included.

The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-03-26 19:53:52 +02:00
Peter Ross a490970af2 libavcodec/*/vp8dsp_init: indent
Signed-off-by: Peter Ross <pross@xvid.org>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-25 13:29:29 +01:00
Peter Ross 89f2f5dbd7 On2 VP7 decoder
Signed-off-by: Peter Ross <pross@xvid.org>
Reviewed-by: BBB
previous patch reviewed by jason
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-25 13:29:05 +01:00
Michael Niedermayer 77bc342975 Merge commit '322a1dda973e802db7b57f2007fad3efcd5bab81'
* commit '322a1dda973e802db7b57f2007fad3efcd5bab81':
  dsputil: Refactor duplicated CALL_2X_PIXELS / PIXELS16 macros

Conflicts:
	libavcodec/arm/hpeldsp_init_arm.c
	libavcodec/x86/dsputil_x86.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-22 22:53:33 +01:00
Diego Biurrun 322a1dda97 dsputil: Refactor duplicated CALL_2X_PIXELS / PIXELS16 macros 2014-03-22 06:17:29 -07:00
Michael Niedermayer e98bac82e5 Merge commit '82bb3048013201c0095d2853d4623633d912252f'
* commit '82bb3048013201c0095d2853d4623633d912252f':
  dsputil: Use correct type in me_cmp_func function pointer

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-20 22:36:40 +01:00
Michael Niedermayer 011d83de48 Merge commit '0e083d7e43805db1a978cb57bfa25fda62e8ff18'
* commit '0e083d7e43805db1a978cb57bfa25fda62e8ff18':
  build: Group general components separate from de/encoders in arch Makefiles

Conflicts:
	libavcodec/arm/Makefile
	libavcodec/x86/Makefile

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-20 22:26:31 +01:00
Michael Niedermayer ba85bfabf3 Merge commit '5169e688956be3378adb3b16a93962fe0048f1c9'
* commit '5169e688956be3378adb3b16a93962fe0048f1c9':
  dsputil: Propagate bit depth information to all (sub)init functions

Conflicts:
	libavcodec/arm/dsputil_init_arm.c
	libavcodec/arm/dsputil_init_armv5te.c
	libavcodec/arm/dsputil_init_armv6.c
	libavcodec/arm/dsputil_init_neon.c
	libavcodec/dsputil.c
	libavcodec/dsputil.h
	libavcodec/ppc/dsputil_ppc.c
	libavcodec/x86/dsputil_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-20 22:06:01 +01:00
Michael Niedermayer a87188ebdb Merge commit 'cf7a2167570e6ccb9dfbd62e9d8ba8f4f065b17e'
* commit 'cf7a2167570e6ccb9dfbd62e9d8ba8f4f065b17e':
  arm: dsputil: K&R formatting cosmetics

Conflicts:
	libavcodec/arm/dsputil_init_arm.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-20 21:58:16 +01:00
Diego Biurrun 82bb304801 dsputil: Use correct type in me_cmp_func function pointer 2014-03-20 05:03:23 -07:00
Diego Biurrun 0e083d7e43 build: Group general components separate from de/encoders in arch Makefiles
This is in line with how the top-level libavcodec Makefile is structured.
2014-03-20 05:03:23 -07:00
Diego Biurrun 5169e68895 dsputil: Propagate bit depth information to all (sub)init functions
This avoids recalculating the value over and over again.
2014-03-20 05:03:23 -07:00
Diego Biurrun cf7a216757 arm: dsputil: K&R formatting cosmetics 2014-03-20 05:03:23 -07:00
Michael Niedermayer 41d08ca575 avcodec/arm/cabac: fix inline cabac reader with the UNCHECKED bitstream reader
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-15 01:08:45 +01:00
Michael Niedermayer 5e5d8ace8a Merge commit '36b822b8be7f9ecd6f9d87acaa786b128a873cd9'
* commit '36b822b8be7f9ecd6f9d87acaa786b128a873cd9':
  arm: dsputil: Drop restrict keyword from add_pixels_clamped_armv6 prototype

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-14 18:25:04 +01:00
Michael Niedermayer 2c9e5cb1a6 avcodec/arm/fft_fixed_neon: reduce diff by 2 spaces to libav
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-14 14:28:28 +01:00
Diego Biurrun 36b822b8be arm: dsputil: Drop restrict keyword from add_pixels_clamped_armv6 prototype
The function is assigned to a function pointer that does not have the
restrict keyword for that parameter.

This fixes compilation for MSVC builds that don't recognize "restrict",
broken since ed9625eb62.
2014-03-14 13:45:40 +01:00