mirror of https://git.ffmpeg.org/ffmpeg.git
cad42fadcd
This work is sponsored by, and copyright, Google. Previously all subpartitions except the eob=1 (DC) case ran with the same runtime: vp9_inv_dct_dct_16x16_sub16_add_neon: 1373.2 vp9_inv_dct_dct_32x32_sub32_add_neon: 8089.0 By skipping individual 8x16 or 8x32 pixel slices in the first pass, we reduce the runtime of these functions like this: vp9_inv_dct_dct_16x16_sub1_add_neon: 235.3 vp9_inv_dct_dct_16x16_sub2_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub4_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub8_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 1372.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 1372.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 555.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 5190.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 5180.0 vp9_inv_dct_dct_32x32_sub8_add_neon: 5183.1 vp9_inv_dct_dct_32x32_sub12_add_neon: 6161.5 vp9_inv_dct_dct_32x32_sub16_add_neon: 6155.5 vp9_inv_dct_dct_32x32_sub20_add_neon: 7136.3 vp9_inv_dct_dct_32x32_sub24_add_neon: 7128.4 vp9_inv_dct_dct_32x32_sub28_add_neon: 8098.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 8098.8 I.e. in general a very minor overhead for the full subpartition case due to the additional cmps, but a significant speedup for the cases when we only need to process a small part of the actual input data. Signed-off-by: Martin Storsjö <martin@martin.st> |
||
---|---|---|
.. | ||
Makefile | ||
asm-offsets.h | ||
cabac.h | ||
dcadsp_init.c | ||
dcadsp_neon.S | ||
fft_init_aarch64.c | ||
fft_neon.S | ||
fmtconvert_init.c | ||
fmtconvert_neon.S | ||
h264chroma_init_aarch64.c | ||
h264cmc_neon.S | ||
h264dsp_init_aarch64.c | ||
h264dsp_neon.S | ||
h264idct_neon.S | ||
h264pred_init.c | ||
h264pred_neon.S | ||
h264qpel_init_aarch64.c | ||
h264qpel_neon.S | ||
hpeldsp_init_aarch64.c | ||
hpeldsp_neon.S | ||
imdct15_init.c | ||
imdct15_neon.S | ||
mdct_init.c | ||
mdct_neon.S | ||
mpegaudiodsp_init.c | ||
mpegaudiodsp_neon.S | ||
neon.S | ||
neontest.c | ||
rv40dsp_init_aarch64.c | ||
synth_filter_neon.S | ||
vc1dsp_init_aarch64.c | ||
videodsp.S | ||
videodsp_init.c | ||
vorbisdsp_init.c | ||
vorbisdsp_neon.S | ||
vp9dsp_init_aarch64.c | ||
vp9itxfm_neon.S | ||
vp9lpf_neon.S | ||
vp9mc_neon.S |