ffmpeg

History

Martin Storsjö 388f6e6715 arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 This work is sponsored by, and copyright, Google. Previously all subpartitions except the eob=1 (DC) case ran with the same runtime: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub16_add_neon: 3188.1 2435.4 2499.0 1969.0 vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.7 16582.3 14207.6 12000.3 By skipping individual 4x16 or 4x32 pixel slices in the first pass, we reduce the runtime of these functions like this: vp9_inv_dct_dct_16x16_sub1_add_neon: 274.6 189.5 211.7 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 2064.0 1534.8 1719.4 1248.7 vp9_inv_dct_dct_16x16_sub4_add_neon: 2135.0 1477.2 1736.3 1249.5 vp9_inv_dct_dct_16x16_sub8_add_neon: 2446.7 1828.7 1993.6 1494.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 2832.4 2118.3 2266.5 1735.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.7 2475.3 2523.5 1983.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 756.2 456.7 862.0 553.9 vp9_inv_dct_dct_32x32_sub2_add_neon: 10682.2 8190.4 8539.2 6762.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 10813.5 8014.9 8518.3 6762.8 vp9_inv_dct_dct_32x32_sub8_add_neon: 11859.6 9313.0 9347.4 7514.5 vp9_inv_dct_dct_32x32_sub12_add_neon: 12946.6 10752.4 10192.2 8280.2 vp9_inv_dct_dct_32x32_sub16_add_neon: 14074.6 11946.5 11001.4 9008.6 vp9_inv_dct_dct_32x32_sub20_add_neon: 15269.9 13662.7 11816.1 9762.6 vp9_inv_dct_dct_32x32_sub24_add_neon: 16327.9 14940.1 12626.7 10516.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17462.7 15776.1 13446.2 11264.7 vp9_inv_dct_dct_32x32_sub32_add_neon: 18575.5 17157.0 14249.3 12015.1 I.e. in general a very minor overhead for the full subpartition case due to the additional loads and cmps, but a significant speedup for the cases when we only need to process a small part of the actual input data. In common VP9 content in a few inspected clips, 70-90% of the non-dc-only 16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left 8x8 or 16x16 subpartitions respectively. This is cherrypicked from libav commit `9c8bc74c2b`. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>		2017-01-14 21:13:30 +01:00
..
api	tests/api/api-seek-test: check all compute_crc_of_packets() calls	2016-12-06 15:42:07 +01:00
checkasm	arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32	2017-01-14 21:13:30 +01:00
fate	fate/psd : add test for bitmap and duotone	2017-01-14 04:52:43 +01:00
filtergraphs	avfilter/af_firequalizer: add fft2 option	2016-11-04 09:45:01 +07:00
ref	lavf/matroskaenc: Do not write two CodecID elements for rawvideo.	2017-01-14 06:06:05 +01:00
.gitignore	…
Makefile	tests: Fix running ffserver under qemu	2016-11-28 23:50:01 +01:00
audiogen.c	…
audiomatch.c	…
base64.c	…
copycooker.sh	…
extended.ffconcat	…
fate-run.sh	avformat/flvenc: add add_keyframe_index option	2016-11-10 10:30:48 +08:00
fate-valgrind.supp	…
fate.sh	…
ffserver-regression.sh	tests: drop -d option from ffserver invocation	2016-11-30 22:38:10 +01:00
ffserver.conf	tests/ffserver.conf: Force bitexactness in the ffmpeg command	2016-11-27 23:28:23 +01:00
ffserver.regression.ref	tests/ffserver.regression.ref: Update ffserver checksums	2016-12-01 23:43:31 +01:00
lavf-regression.sh	avformat/apngenc: use the stream parameters extradata if available	2016-11-18 12:26:44 -03:00
md5.sh	…
reference.pnm	…
regression-funcs.sh	tests: add -nostdin flag when calling ffmpeg	2016-10-06 18:31:07 -05:00
rotozoom.c	…
simple1.ffconcat	…
simple2.ffconcat	…
test.ffmeta	…
tiny_psnr.c	…
tiny_ssim.c	…
utils.c	…
videogen.c	…