Commit Graph

644 Commits

Author SHA1 Message Date
Michael Niedermayer 8b30702c44 Merge commit '6a13505c069890cb0e2a07e29fd819a0cf2e73c1'
* commit '6a13505c069890cb0e2a07e29fd819a0cf2e73c1':
  mpegvideo: move the MpegEncContext fields used from arm asm to the beginning

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-30 00:23:01 +02:00
Anton Khirnov 6a13505c06 mpegvideo: move the MpegEncContext fields used from arm asm to the beginning
This should reduce the frequency with which the offsets need to be
updated.
2014-04-29 14:49:42 +02:00
Ben Avison 9d8ecdd8ca vc-1: Add platform-specific start code search routine to VC1DSPContext.
Initialise VC1DSPContext for parser as well as for decoder.
Note, the VC-1 code doesn't actually use the function pointer yet.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-25 02:36:11 +02:00
Ben Avison 270cede3f3 h264: Move search code search functions into separate source files.
This permits re-use with parsers for codecs which use similar start codes.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-25 02:35:56 +02:00
Michael Niedermayer 06e664366a Merge commit 'a88e1d1c598e641eecd5d43730211d91c82787c6'
* commit 'a88e1d1c598e641eecd5d43730211d91c82787c6':
  lavu: add CHK_OFFS as AV_CHECK_OFFSET to check struct member offsets

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-25 00:55:40 +02:00
Janne Grunau a88e1d1c59 lavu: add CHK_OFFS as AV_CHECK_OFFSET to check struct member offsets 2014-04-24 18:28:26 +02:00
Michael Niedermayer af89a685c4 avcodec/arm/vc1dsp_init_neon: fix code so it compiles and passes fate-vc1
The original patch  seems to be missing a 16x16 function though

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-20 20:32:21 +02:00
Christophe Gisquet 319235c67c vc1dsp: introduce cases for 8x8 and 16x16
This allows further unrolling the DSP implementation where possible.

x86 and ARM DSP modified by simply moving the multiple calls from vc1dec
to the DSP code. Decoding improvements should only occurs because of the
compiler actually able to unroll more.

Decoding time: ~8.80s -> 8.64s (ie around 2%)

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-20 18:25:36 +02:00
Michael Niedermayer 5440151fa4 Merge commit '3dc6272bed7890a49080e18eacf3c7a4a6594b0d'
* commit '3dc6272bed7890a49080e18eacf3c7a4a6594b0d':
  Remove a number of unnecessary dsputil.h #includes

Conflicts:
	libavcodec/h264pred.c
	libavcodec/vc1dsp.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-05 18:54:15 +02:00
Diego Biurrun 3dc6272bed Remove a number of unnecessary dsputil.h #includes 2014-04-04 19:08:05 +02:00
Michael Niedermayer fb61ed1e9f Merge commit 'ac4b32df71bd932838043a4838b86d11e169707f'
* commit 'ac4b32df71bd932838043a4838b86d11e169707f':
  On2 VP7 decoder

Conflicts:
	Changelog
	libavcodec/arm/h264pred_init_arm.c
	libavcodec/arm/vp8dsp.h
	libavcodec/arm/vp8dsp_init_arm.c
	libavcodec/arm/vp8dsp_init_armv6.c
	libavcodec/arm/vp8dsp_init_neon.c
	libavcodec/avcodec.h
	libavcodec/h264pred.c
	libavcodec/version.h
	libavcodec/vp8.c
	libavcodec/vp8.h
	libavcodec/vp8data.h
	libavcodec/vp8dsp.c
	libavcodec/vp8dsp.h
	libavcodec/x86/h264_intrapred_init.c
	libavcodec/x86/vp8dsp_init.c

See: 89f2f5dbd7 and others
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-04 14:46:10 +02:00
Janne Grunau f37815b1d5 arm: asm decode_block_coeffs_internal is vp8 specific
Unbreaks compilation on arm due to conflicting types for
'ff_decode_block_coeffs_armv6'.
2014-04-04 10:39:29 +02:00
Peter Ross ac4b32df71 On2 VP7 decoder
Further performance improvements and security fixes by
Vittorio Giovara, Luca Barbato and Diego Biurrun.

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-04-04 04:00:11 +02:00
Michael Niedermayer 68014c6ed9 Merge commit 'c3a0b3eb64be441ca897629e8ecd80d5b51fded7'
* commit 'c3a0b3eb64be441ca897629e8ecd80d5b51fded7':
  arm: build: Maintain decoder objects separate from infrastructure objects

Conflicts:
	libavcodec/arm/Makefile

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-27 20:10:51 +01:00
Diego Biurrun c3a0b3eb64 arm: build: Maintain decoder objects separate from infrastructure objects 2014-03-27 03:00:05 -07:00
Michael Niedermayer 50b68e323c Merge remote-tracking branch 'qatar/master'
* qatar/master:
  truehd: add hand-scheduled ARM asm version of ff_mlp_pack_output.

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-26 21:23:09 +01:00
Ben Avison 89135716fd truehd: add hand-scheduled ARM asm version of ff_mlp_rematrix_channel.
Profiling results for overall audio decode and the rematrix_channels function
in particular are as follows:

              Before          After
              Mean   StdDev   Mean   StdDev  Confidence  Change
6:2 total     370.8  17.0     348.8  20.1    99.9%       +6.3%
6:2 function  46.4   8.4      45.8   6.6     18.0%       +1.2%  (insignificant)
8:2 total     343.2  19.0     339.1  15.4    54.7%       +1.2%  (insignificant)
8:2 function  38.9   3.9      40.2   6.9     52.4%       -3.2%  (insignificant)
6:6 total     658.4  15.7     604.6  20.8    100.0%      +8.9%
6:6 function  109.0  8.7      59.5   5.4     100.0%      +83.3%
8:8 total     896.2  24.5     766.4  17.6    100.0%      +16.9%
8:8 function  223.4  12.8     93.8   5.0     100.0%      +138.3%

The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-26 20:50:05 +01:00
Michael Niedermayer f38af0143c Merge commit '15a29c39d9ef15b0783c04b3228e1c55f6701ee3'
* commit '15a29c39d9ef15b0783c04b3228e1c55f6701ee3':
  truehd: add hand-scheduled ARM asm version of mlp_filter_channel.

Conflicts:
	libavcodec/arm/Makefile
	libavcodec/arm/mlpdsp_init_arm.c

See: 87b128d5ef
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-26 20:39:10 +01:00
Ben Avison 87b128d5ef truehd: add hand-scheduled ARM asm version of mlp_filter_channel.
Profiling results for overall audio decode and the mlp_filter_channel(_arm)
function in particular are as follows:

              Before          After
              Mean   StdDev   Mean   StdDev  Confidence  Change
6:2 total     380.4  22.0     370.8  17.0    87.4%       +2.6%  (insignificant)
6:2 function  60.7   7.2      36.6   8.1     100.0%      +65.8%
8:2 total     357.0  17.5     343.2  19.0    97.8%       +4.0%  (insignificant)
8:2 function  60.3   8.8      37.3   3.8     100.0%      +61.8%
6:6 total     717.2  23.2     658.4  15.7    100.0%      +8.9%
6:6 function  140.4  12.9     81.5   9.2     100.0%      +72.4%
8:8 total     981.9  16.2     896.2  24.5    100.0%      +9.6%
8:8 function  193.4  15.0     103.3  11.5    100.0%      +87.2%

Experiments with adding preload instructions to this function yielded no
useful benefit, so these have not been included.

The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-26 20:22:18 +01:00
Ben Avison 3b5946bcce truehd: add hand-scheduled ARM asm version of ff_mlp_pack_output.
Profiling results for overall decode and the output_data function in
particular are as follows:

              Before          After
              Mean   StdDev   Mean   StdDev  Confidence  Change
6:2 total     339.6  15.1     329.3  16.0    95.8%       +3.1%  (insignificant)
6:2 function  24.6   6.0      9.9    3.1     100.0%      +148.5%
8:2 total     324.5  15.5     323.6  14.3    15.2%       +0.3%  (insignificant)
8:2 function  20.4   3.9      9.9    3.4     100.0%      +104.7%
6:6 total     572.8  20.6     539.9  24.2    100.0%      +6.1%
6:6 function  54.5   5.6      16.0   3.8     100.0%      +240.9%
8:8 total     741.5  21.2     702.5  18.5    100.0%      +5.6%
8:8 function  63.9   7.6      18.4   4.8     100.0%      +247.3%

The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-03-26 19:54:32 +02:00
Ben Avison 483321fe78 truehd: add hand-scheduled ARM asm version of ff_mlp_rematrix_channel.
Profiling results for overall audio decode and the rematrix_channels function
in particular are as follows:

              Before          After
              Mean   StdDev   Mean   StdDev  Confidence  Change
6:2 total     370.8  17.0     348.8  20.1    99.9%       +6.3%
6:2 function  46.4   8.4      45.8   6.6     18.0%       +1.2%  (insignificant)
8:2 total     343.2  19.0     339.1  15.4    54.7%       +1.2%  (insignificant)
8:2 function  38.9   3.9      40.2   6.9     52.4%       -3.2%  (insignificant)
6:6 total     658.4  15.7     604.6  20.8    100.0%      +8.9%
6:6 function  109.0  8.7      59.5   5.4     100.0%      +83.3%
8:8 total     896.2  24.5     766.4  17.6    100.0%      +16.9%
8:8 function  223.4  12.8     93.8   5.0     100.0%      +138.3%

The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-03-26 19:54:10 +02:00
Ben Avison 15a29c39d9 truehd: add hand-scheduled ARM asm version of mlp_filter_channel.
Profiling results for overall audio decode and the mlp_filter_channel(_arm)
function in particular are as follows:

              Before          After
              Mean   StdDev   Mean   StdDev  Confidence  Change
6:2 total     380.4  22.0     370.8  17.0    87.4%       +2.6%  (insignificant)
6:2 function  60.7   7.2      36.6   8.1     100.0%      +65.8%
8:2 total     357.0  17.5     343.2  19.0    97.8%       +4.0%  (insignificant)
8:2 function  60.3   8.8      37.3   3.8     100.0%      +61.8%
6:6 total     717.2  23.2     658.4  15.7    100.0%      +8.9%
6:6 function  140.4  12.9     81.5   9.2     100.0%      +72.4%
8:8 total     981.9  16.2     896.2  24.5    100.0%      +9.6%
8:8 function  193.4  15.0     103.3  11.5    100.0%      +87.2%

Experiments with adding preload instructions to this function yielded no
useful benefit, so these have not been included.

The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-03-26 19:53:52 +02:00
Peter Ross a490970af2 libavcodec/*/vp8dsp_init: indent
Signed-off-by: Peter Ross <pross@xvid.org>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-25 13:29:29 +01:00
Peter Ross 89f2f5dbd7 On2 VP7 decoder
Signed-off-by: Peter Ross <pross@xvid.org>
Reviewed-by: BBB
previous patch reviewed by jason
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-25 13:29:05 +01:00
Michael Niedermayer 77bc342975 Merge commit '322a1dda973e802db7b57f2007fad3efcd5bab81'
* commit '322a1dda973e802db7b57f2007fad3efcd5bab81':
  dsputil: Refactor duplicated CALL_2X_PIXELS / PIXELS16 macros

Conflicts:
	libavcodec/arm/hpeldsp_init_arm.c
	libavcodec/x86/dsputil_x86.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-22 22:53:33 +01:00
Diego Biurrun 322a1dda97 dsputil: Refactor duplicated CALL_2X_PIXELS / PIXELS16 macros 2014-03-22 06:17:29 -07:00
Michael Niedermayer e98bac82e5 Merge commit '82bb3048013201c0095d2853d4623633d912252f'
* commit '82bb3048013201c0095d2853d4623633d912252f':
  dsputil: Use correct type in me_cmp_func function pointer

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-20 22:36:40 +01:00
Michael Niedermayer 011d83de48 Merge commit '0e083d7e43805db1a978cb57bfa25fda62e8ff18'
* commit '0e083d7e43805db1a978cb57bfa25fda62e8ff18':
  build: Group general components separate from de/encoders in arch Makefiles

Conflicts:
	libavcodec/arm/Makefile
	libavcodec/x86/Makefile

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-20 22:26:31 +01:00
Michael Niedermayer ba85bfabf3 Merge commit '5169e688956be3378adb3b16a93962fe0048f1c9'
* commit '5169e688956be3378adb3b16a93962fe0048f1c9':
  dsputil: Propagate bit depth information to all (sub)init functions

Conflicts:
	libavcodec/arm/dsputil_init_arm.c
	libavcodec/arm/dsputil_init_armv5te.c
	libavcodec/arm/dsputil_init_armv6.c
	libavcodec/arm/dsputil_init_neon.c
	libavcodec/dsputil.c
	libavcodec/dsputil.h
	libavcodec/ppc/dsputil_ppc.c
	libavcodec/x86/dsputil_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-20 22:06:01 +01:00
Michael Niedermayer a87188ebdb Merge commit 'cf7a2167570e6ccb9dfbd62e9d8ba8f4f065b17e'
* commit 'cf7a2167570e6ccb9dfbd62e9d8ba8f4f065b17e':
  arm: dsputil: K&R formatting cosmetics

Conflicts:
	libavcodec/arm/dsputil_init_arm.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-20 21:58:16 +01:00
Diego Biurrun 82bb304801 dsputil: Use correct type in me_cmp_func function pointer 2014-03-20 05:03:23 -07:00
Diego Biurrun 0e083d7e43 build: Group general components separate from de/encoders in arch Makefiles
This is in line with how the top-level libavcodec Makefile is structured.
2014-03-20 05:03:23 -07:00
Diego Biurrun 5169e68895 dsputil: Propagate bit depth information to all (sub)init functions
This avoids recalculating the value over and over again.
2014-03-20 05:03:23 -07:00
Diego Biurrun cf7a216757 arm: dsputil: K&R formatting cosmetics 2014-03-20 05:03:23 -07:00
Michael Niedermayer 41d08ca575 avcodec/arm/cabac: fix inline cabac reader with the UNCHECKED bitstream reader
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-15 01:08:45 +01:00
Michael Niedermayer 5e5d8ace8a Merge commit '36b822b8be7f9ecd6f9d87acaa786b128a873cd9'
* commit '36b822b8be7f9ecd6f9d87acaa786b128a873cd9':
  arm: dsputil: Drop restrict keyword from add_pixels_clamped_armv6 prototype

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-14 18:25:04 +01:00
Michael Niedermayer 2c9e5cb1a6 avcodec/arm/fft_fixed_neon: reduce diff by 2 spaces to libav
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-14 14:28:28 +01:00
Diego Biurrun 36b822b8be arm: dsputil: Drop restrict keyword from add_pixels_clamped_armv6 prototype
The function is assigned to a function pointer that does not have the
restrict keyword for that parameter.

This fixes compilation for MSVC builds that don't recognize "restrict",
broken since ed9625eb62.
2014-03-14 13:45:40 +01:00
Michael Niedermayer 1c788eaca9 Merge commit '831a1180785a786272cdcefb71566a770bfb879e'
* commit '831a1180785a786272cdcefb71566a770bfb879e':
  Update dsputil- and SIMD-related comments to match reality more closely

Conflicts:
	libavcodec/x86/hpeldsp.asm
	libavutil/arm/float_dsp_init_arm.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-13 23:59:56 +01:00
Michael Niedermayer be879af217 Merge commit 'd1184b8110b4847013bf25222e6809eb3462913c'
* commit 'd1184b8110b4847013bf25222e6809eb3462913c':
  arm: dsputil: Add a bunch of missing #includes

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-13 23:41:40 +01:00
Michael Niedermayer 1306359ea9 Merge commit '49676eb7301e775d08bdbba5380159b106ee258f'
* commit '49676eb7301e775d08bdbba5380159b106ee258f':
  dsputil: Remove prototypes for nonexisting optimization functions

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-13 22:56:08 +01:00
Diego Biurrun 831a118078 Update dsputil- and SIMD-related comments to match reality more closely 2014-03-13 05:50:29 -07:00
Diego Biurrun d1184b8110 arm: dsputil: Add a bunch of missing #includes 2014-03-13 05:50:28 -07:00
Diego Biurrun 49676eb730 dsputil: Remove prototypes for nonexisting optimization functions 2014-03-13 05:50:28 -07:00
Michael Niedermayer e5920425b0 Merge commit '5a7f382a5d33d9a26890affe6c8c5070a48dfc22'
* commit '5a7f382a5d33d9a26890affe6c8c5070a48dfc22':
  armv6: vp8: use explicit labels in motion compensation asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-12 22:03:00 +01:00
Janne Grunau 5a7f382a5d armv6: vp8: use explicit labels in motion compensation asm
The integrated arm assembler in clang-503.0.38 (Xcode-5.1) fails
to assemble a branch to 'label + offset' in thumb mode.
2014-03-12 15:06:05 +01:00
Michael Niedermayer fa4f573997 Merge commit '634d9d8b398982647b3d7160641198744901d8d8'
* commit '634d9d8b398982647b3d7160641198744901d8d8':
  arm: get_cabac inline asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-09 13:37:29 +01:00
Michael Niedermayer fc1d7811ef Merge commit '4506a854a4d846692ba71daeeff661dc214c8fa2'
* commit '4506a854a4d846692ba71daeeff661dc214c8fa2':
  arm: vp3: remove incorrect const in ff_vp3_idct_dc_add_neon declaration

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-09 13:06:31 +01:00
Michael Niedermayer b39e895024 Merge commit '61985ad72c47bbb668f2d3923bf5c9df83e79323'
* commit '61985ad72c47bbb668f2d3923bf5c9df83e79323':
  arm: hpeldsp: fix put_pixels8_y2_{,no_rnd_}armv6

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-09 01:16:21 +01:00
Janne Grunau 634d9d8b39 arm: get_cabac inline asm
Based on the aarch64 asm. CPU cycle counts on cortex-a9 compared to
gcc 4.8.2:
before: 475 decicycles in get_cabac_noinline, 67106035 runs, 2829 skips
after:  393 decicycles in get_cabac_noinline, 67106474 runs, 2390 skips

Overall speedup is above 2%. Code generated by clang 3.4 is slower on
the same hardware and the relative change is a little larger.
2014-03-09 00:45:34 +01:00