ffmpeg/libavcodec/arm
Martin Storsjö 9c8bc74c2b arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32
This work is sponsored by, and copyright, Google.

Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:

                                     Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub16_add_neon:   3188.1   2435.4   2499.0   1969.0
vp9_inv_dct_dct_32x32_sub32_add_neon:  18531.7  16582.3  14207.6  12000.3

By skipping individual 4x16 or 4x32 pixel slices in the first pass,
we reduce the runtime of these functions like this:

vp9_inv_dct_dct_16x16_sub1_add_neon:     274.6    189.5    211.7    235.8
vp9_inv_dct_dct_16x16_sub2_add_neon:    2064.0   1534.8   1719.4   1248.7
vp9_inv_dct_dct_16x16_sub4_add_neon:    2135.0   1477.2   1736.3   1249.5
vp9_inv_dct_dct_16x16_sub8_add_neon:    2446.7   1828.7   1993.6   1494.7
vp9_inv_dct_dct_16x16_sub12_add_neon:   2832.4   2118.3   2266.5   1735.1
vp9_inv_dct_dct_16x16_sub16_add_neon:   3211.7   2475.3   2523.5   1983.1
vp9_inv_dct_dct_32x32_sub1_add_neon:     756.2    456.7    862.0    553.9
vp9_inv_dct_dct_32x32_sub2_add_neon:   10682.2   8190.4   8539.2   6762.5
vp9_inv_dct_dct_32x32_sub4_add_neon:   10813.5   8014.9   8518.3   6762.8
vp9_inv_dct_dct_32x32_sub8_add_neon:   11859.6   9313.0   9347.4   7514.5
vp9_inv_dct_dct_32x32_sub12_add_neon:  12946.6  10752.4  10192.2   8280.2
vp9_inv_dct_dct_32x32_sub16_add_neon:  14074.6  11946.5  11001.4   9008.6
vp9_inv_dct_dct_32x32_sub20_add_neon:  15269.9  13662.7  11816.1   9762.6
vp9_inv_dct_dct_32x32_sub24_add_neon:  16327.9  14940.1  12626.7  10516.0
vp9_inv_dct_dct_32x32_sub28_add_neon:  17462.7  15776.1  13446.2  11264.7
vp9_inv_dct_dct_32x32_sub32_add_neon:  18575.5  17157.0  14249.3  12015.1

I.e. in general a very minor overhead for the full subpartition case due
to the additional loads and cmps, but a significant speedup for the cases
when we only need to process a small part of the actual input data.

In common VP9 content in a few inspected clips, 70-90% of the non-dc-only
16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left
8x8 or 16x16 subpartitions respectively.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-30 23:54:07 +02:00
..
Makefile arm: vp9: Add NEON loop filters 2016-11-11 14:16:42 +02:00
aac.h
aacpsdsp_init_arm.c
aacpsdsp_neon.S
ac3dsp_arm.S
ac3dsp_armv6.S
ac3dsp_init_arm.c
ac3dsp_neon.S
apedsp_init_arm.c
apedsp_neon.S
asm-offsets.h
audiodsp_arm.h
audiodsp_init_arm.c
audiodsp_init_neon.c audiodsp: reorder arguments for vector_clipf 2016-09-22 09:47:52 +02:00
audiodsp_neon.S audiodsp: reorder arguments for vector_clipf 2016-09-22 09:47:52 +02:00
blockdsp_arm.h blockdsp: drop the high_bit_depth parameter 2016-09-22 09:47:52 +02:00
blockdsp_init_arm.c blockdsp: drop the high_bit_depth parameter 2016-09-22 09:47:52 +02:00
blockdsp_init_neon.c blockdsp: drop the high_bit_depth parameter 2016-09-22 09:47:52 +02:00
blockdsp_neon.S
cabac.h
dca.h
dcadsp_init_arm.c dca: remove unused decode_hf function and quant_d tables 2015-12-24 13:58:18 +01:00
dcadsp_neon.S dca: remove unused decode_hf function and quant_d tables 2015-12-24 13:58:18 +01:00
dcadsp_vfp.S
fft_fixed_init_arm.c fft: Split MDCT bits off from FFT 2016-03-01 10:18:28 +01:00
fft_fixed_neon.S arm: Use .data.rel.ro for const data with relocations 2014-12-09 11:43:25 +02:00
fft_init_arm.c fft: Split MDCT bits off from FFT 2016-03-01 10:18:28 +01:00
fft_neon.S arm: Use .data.rel.ro for const data with relocations 2014-12-09 11:43:25 +02:00
fft_vfp.S arm: Use .data.rel.ro for const data with relocations 2014-12-09 11:43:25 +02:00
flacdsp_arm.S
flacdsp_init_arm.c
fmtconvert_init_arm.c arm: add ff_int32_to_float_fmul_array8_neon 2015-12-14 16:45:02 +01:00
fmtconvert_neon.S arm: add ff_int32_to_float_fmul_array8_neon 2015-12-14 16:45:02 +01:00
fmtconvert_vfp.S
g722dsp_init_arm.c g722: Add ARM NEON implementation for g722_apply_qmf() 2015-02-15 22:47:21 +02:00
g722dsp_neon.S g722: Add ARM NEON implementation for g722_apply_qmf() 2015-02-15 22:47:21 +02:00
h264chroma_init_arm.c h264chroma: Change type of stride parameters to ptrdiff_t 2016-09-29 14:48:04 +02:00
h264cmc_neon.S h264chroma: Change type of stride parameters to ptrdiff_t 2016-09-29 14:48:04 +02:00
h264dsp_init_arm.c h264: Move start code search functions into separate source files. 2014-08-04 22:22:54 +02:00
h264dsp_neon.S
h264idct_neon.S
h264pred_init_arm.c h264: arm: use intra pred8x8 functions only for chroma_format_idc <= 1 2015-07-18 00:28:49 +02:00
h264pred_neon.S
h264qpel_init_arm.c qpeldsp: Mark source pointer in qpel_mc_func function pointer const 2014-07-25 02:52:54 -07:00
h264qpel_neon.S
hpeldsp_arm.S hpeldsp: arm: Update comments left behind in 25841dfe80 2016-09-29 14:48:03 +02:00
hpeldsp_arm.h
hpeldsp_armv6.S
hpeldsp_init_arm.c
hpeldsp_init_armv6.c
hpeldsp_init_neon.c
hpeldsp_neon.S
idct.h idct: Change type of array stride parameters to ptrdiff_t 2016-09-29 14:48:03 +02:00
idctdsp_arm.S idct: Change type of array stride parameters to ptrdiff_t 2016-09-29 14:48:03 +02:00
idctdsp_arm.h
idctdsp_armv6.S
idctdsp_init_arm.c idct: Change type of array stride parameters to ptrdiff_t 2016-09-29 14:48:03 +02:00
idctdsp_init_armv5te.c idct: Move arm-specific declarations to a header in the arm directory 2014-07-20 13:02:17 -07:00
idctdsp_init_armv6.c idct: Change type of array stride parameters to ptrdiff_t 2016-09-29 14:48:03 +02:00
idctdsp_init_neon.c idct: Move arm-specific declarations to a header in the arm directory 2014-07-20 13:02:17 -07:00
idctdsp_neon.S
int_neon.S
jrevdct_arm.S
mathops.h
mdct_fixed_init_arm.c fft: Split MDCT bits off from FFT 2016-03-01 10:18:28 +01:00
mdct_fixed_neon.S
mdct_init_arm.c fft: Split MDCT bits off from FFT 2016-03-01 10:18:28 +01:00
mdct_neon.S
mdct_vfp.S armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6) 2014-07-18 01:34:08 +03:00
me_cmp_armv6.S dsputil: Split motion estimation compare bits off into their own context 2014-07-17 09:07:10 -07:00
me_cmp_init_arm.c motion_est: convert stride to ptrdiff_t 2014-11-24 01:30:10 +00:00
mlpdsp_armv5te.S arm: mlpdsp: handle pic offset calculation in a macro 2014-12-09 22:00:08 +01:00
mlpdsp_armv6.S cosmetics: Fix spelling mistakes 2016-05-04 18:16:21 +02:00
mlpdsp_init_arm.c
mpegaudiodsp_fixed_armv6.S
mpegaudiodsp_init_arm.c
mpegvideo_arm.c mpegvideo: cosmetics: Lowercase ugly uppercase MPV_ function name prefixes 2014-08-15 01:26:33 -07:00
mpegvideo_arm.h mpegvideo: cosmetics: Lowercase ugly uppercase MPV_ function name prefixes 2014-08-15 01:26:33 -07:00
mpegvideo_armv5te.c cosmetics: Fix spelling mistakes 2016-05-04 18:16:21 +02:00
mpegvideo_armv5te_s.S
mpegvideo_neon.S
mpegvideoencdsp_armv6.S
mpegvideoencdsp_init_arm.c
neon.S
neontest.c lavc: add clobber tests for the new encoding/decoding API 2016-09-28 10:01:52 +02:00
pixblockdsp_armv6.S
pixblockdsp_init_arm.c pixblockdsp: Change type of stride parameters to ptrdiff_t 2016-09-14 14:12:36 +02:00
rdft_init_arm.c rdft: arm: Split RDFT initialization into a separate file 2016-02-26 14:34:58 +01:00
rdft_neon.S
rv34dsp_init_arm.c
rv34dsp_neon.S
rv40dsp_init_arm.c qpeldsp: Mark source pointer in qpel_mc_func function pointer const 2014-07-25 02:52:54 -07:00
rv40dsp_neon.S
sbrdsp_init_arm.c
sbrdsp_neon.S
simple_idct_arm.S cosmetics: Fix spelling mistakes 2016-05-04 18:16:21 +02:00
simple_idct_armv5te.S simple_idct: arm: Drop disabled code variant 2016-08-17 12:21:54 +02:00
simple_idct_armv6.S idct: Change type of array stride parameters to ptrdiff_t 2016-09-29 14:48:03 +02:00
simple_idct_neon.S idct: Change type of array stride parameters to ptrdiff_t 2016-09-29 14:48:03 +02:00
startcode.h h264: Move start code search functions into separate source files. 2014-08-04 22:22:54 +02:00
startcode_armv6.S h264: Move start code search functions into separate source files. 2014-08-04 22:22:54 +02:00
synth_filter_neon.S
synth_filter_vfp.S arm: cosmetics: Consistently use lowercase for shift operators 2014-07-18 11:17:40 +03:00
vc1dsp.h
vc1dsp_init_arm.c vc-1: Add platform-specific start code search routine to VC1DSPContext. 2014-08-04 22:22:54 +02:00
vc1dsp_init_neon.c h264chroma: Change type of stride parameters to ptrdiff_t 2016-09-29 14:48:04 +02:00
vc1dsp_neon.S idct: Change type of array stride parameters to ptrdiff_t 2016-09-29 14:48:03 +02:00
videodsp_arm.h
videodsp_armv5te.S arm: use a local label instead of the function symbol in ff_prefetch_arm 2015-07-20 23:10:29 +02:00
videodsp_init_arm.c
videodsp_init_armv5te.c
vorbisdsp_init_arm.c
vorbisdsp_neon.S
vp3dsp_init_arm.c vp3: Change type of stride parameters to ptrdiff_t 2016-08-26 11:36:26 +02:00
vp3dsp_neon.S
vp6dsp_init_arm.c vp56: Separate VP5 and VP6 dsp initialization 2016-08-26 11:50:22 +02:00
vp6dsp_neon.S
vp8.h
vp8_armv6.S
vp8dsp.h
vp8dsp_armv6.S vp8: Update some assembly comments left unchanged in bd66f073fe 2016-08-26 11:36:53 +02:00
vp8dsp_init_arm.c
vp8dsp_init_armv6.c
vp8dsp_init_neon.c
vp8dsp_neon.S arm: Fix a typo in a comment 2016-07-06 22:58:51 +03:00
vp9dsp_init_arm.c arm: vp9: Add NEON loop filters 2016-11-11 14:16:42 +02:00
vp9itxfm_neon.S arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 2016-11-30 23:54:07 +02:00
vp9lpf_neon.S arm: vp9: Add NEON loop filters 2016-11-11 14:16:42 +02:00
vp9mc_neon.S arm: vp9mc: Use a different helper register for PIC loads 2016-11-10 14:01:04 +02:00
vp56_arith.h