mirror of https://git.ffmpeg.org/ffmpeg.git synced 2025-03-24 11:59:38 +00:00

FFmpeg git repo

Go to file

Martin Storsjö 9d2afd1eb8 aarch64: vp9: Implement NEON loop filters This work is sponsored by, and copyright, Google. These are ported from the ARM version; thanks to the larger amount of registers available, we can do the loop filters with 16 pixels at a time. The implementation is fully templated, with a single macro which can generate versions for both 8 and 16 pixels wide, for both 4, 8 and 16 pixels loop filters (and the 4/8 mixed versions as well). For the 8 pixel wide versions, it is pretty close in speed (the v_4_8 and v_8_8 filters are the best examples of this; the h_4_8 and h_8_8 filters seem to get some gain in the load/transpose/store part). For the 16 pixels wide ones, we get a speedup of around 1.2-1.4x compared to the 32 bit version. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_loop_filter_h_4_8_neon: 144.0 127.2 vp9_loop_filter_h_8_8_neon: 207.0 182.5 vp9_loop_filter_h_16_8_neon: 415.0 328.7 vp9_loop_filter_h_16_16_neon: 672.0 558.6 vp9_loop_filter_mix2_h_44_16_neon: 302.0 203.5 vp9_loop_filter_mix2_h_48_16_neon: 365.0 305.2 vp9_loop_filter_mix2_h_84_16_neon: 365.0 305.2 vp9_loop_filter_mix2_h_88_16_neon: 376.0 305.2 vp9_loop_filter_mix2_v_44_16_neon: 193.2 128.2 vp9_loop_filter_mix2_v_48_16_neon: 246.7 218.4 vp9_loop_filter_mix2_v_84_16_neon: 248.0 218.5 vp9_loop_filter_mix2_v_88_16_neon: 302.0 218.2 vp9_loop_filter_v_4_8_neon: 89.0 88.7 vp9_loop_filter_v_8_8_neon: 141.0 137.7 vp9_loop_filter_v_16_8_neon: 295.0 272.7 vp9_loop_filter_v_16_16_neon: 546.0 453.7 The speedup vs C code in checkasm tests is around 2-7x, which is pretty much the same as for the 32 bit version. Even if these functions are faster than their 32 bit equivalent, the C version that we compare to also became around 1.3-1.7x faster than the C version in 32 bit. Based on START_TIMER/STOP_TIMER wrapping around a few individual functions, the speedup vs C code is around 4-5x. Examples of runtimes vs C on a Cortex A57 (for a slightly older version of the patch): A57 gcc-5.3 neon loop_filter_h_4_8_neon: 256.6 93.4 loop_filter_h_8_8_neon: 307.3 139.1 loop_filter_h_16_8_neon: 340.1 254.1 loop_filter_h_16_16_neon: 827.0 407.9 loop_filter_mix2_h_44_16_neon: 524.5 155.4 loop_filter_mix2_h_48_16_neon: 644.5 173.3 loop_filter_mix2_h_84_16_neon: 630.5 222.0 loop_filter_mix2_h_88_16_neon: 697.3 222.0 loop_filter_mix2_v_44_16_neon: 598.5 100.6 loop_filter_mix2_v_48_16_neon: 651.5 127.0 loop_filter_mix2_v_84_16_neon: 591.5 167.1 loop_filter_mix2_v_88_16_neon: 855.1 166.7 loop_filter_v_4_8_neon: 271.7 65.3 loop_filter_v_8_8_neon: 312.5 106.9 loop_filter_v_16_8_neon: 473.3 206.5 loop_filter_v_16_16_neon: 976.1 327.8 The speed-up compared to the C functions is 2.5 to 6 and the cortex-a57 is again 30-50% faster than the cortex-a53. Signed-off-by: Martin Storsjö <martin@martin.st>		2016-11-14 00:10:13 +02:00
compat	Add a compat dummy stdatomic.h used when threading is disabled	2016-10-02 18:57:56 +02:00
doc	examples/decode_audio: Add missing header for av_free()	2016-11-10 10:33:19 +01:00
libavcodec	aarch64: vp9: Implement NEON loop filters	2016-11-14 00:10:13 +02:00
libavdevice	Use avpriv_report_missing_feature() where appropriate	2016-11-08 17:54:34 +01:00
libavfilter	vf_drawtext: Drop wrong void* cast	2016-11-12 16:47:07 +01:00
libavformat	Drop pointless void* casts	2016-11-13 18:44:01 +01:00
libavresample	build: Change structure of the linker version script templates	2016-05-29 16:43:11 +02:00
libavutil	arm: Clear the gp register alias at the end of functions	2016-11-10 14:01:04 +02:00
libswscale	swscale: Add GRAY12	2016-11-07 22:42:00 +01:00
presets
tests	checkasm: add vp9dsp.itxfm_add tests.	2016-11-11 11:09:05 +02:00
tools	aviocat: Support avio options	2016-10-25 15:43:56 +02:00
.gitattributes
.gitignore	build: Ignore generated mapfile and remove it on distclean	2016-05-27 11:27:24 +02:00
.travis.yml	travis: Enable OSX integration	2015-11-17 16:51:00 +01:00
arch.mak
avconv_dxva2.c	avconv_dxva2: add a profile check for hevc	2016-07-20 16:33:09 +02:00
avconv_filter.c	avconv: make sure the filtergraph is freed on init failure	2016-10-02 11:41:45 +02:00
avconv_opt.c	avconv_opt: Consistently iterate through hwaccels array in all cases	2016-11-13 19:06:38 +01:00
avconv_qsv.c	avconv_qsv: use the actual pixel format provided by lavc	2016-07-22 19:08:12 +02:00
avconv_vaapi.c	avconv_vaapi: Convert to use hw_frames_ctx only	2016-08-30 22:16:01 +01:00
avconv_vda.c	avconv: vda: Unlock the pixel buffer once it is accessed	2015-07-09 00:10:13 +02:00
avconv_vdpau.c	avconv_vdpau: use the hwcontext device creation API	2016-05-26 15:40:34 +02:00
avconv.c	avconv: Drop stray leftover debug output	2016-11-09 20:51:55 +01:00
avconv.h	avconv: support parsing bitstream filter options	2016-11-02 10:08:28 +01:00
avplay.c	avplay: Correct function pointer assignments in options array	2016-11-08 17:20:30 +01:00
avprobe.c	avprobe: Add -show_stream_entry to get a single stream property	2016-11-01 11:27:52 -04:00
Changelog	Changelog: mark the release 12 branch	2016-08-31 08:08:32 +02:00
cmdutils_common_opts.h
cmdutils.c	avconv: switch to the new BSF API	2016-03-20 08:15:01 +01:00
cmdutils.h	avconv: use read_file() for reading the 2pass stats	2015-07-19 09:37:11 +02:00
common.mak	build: Simplify postprocessing of linker version script files	2016-05-29 16:49:16 +02:00
configure	libxvid: Require availability of mkstemp()	2016-11-11 10:17:07 +01:00
COPYING.GPLv2
COPYING.GPLv3
COPYING.LGPLv2.1
COPYING.LGPLv3
CREDITS
INSTALL
library.mak	build: Drop duplicate asm recipe	2016-10-17 16:25:35 +02:00
LICENSE	Remove the legacy X11 screen grabber	2016-07-29 19:03:10 +02:00
Makefile	build: Hardcode avversion.h dependency	2016-10-27 11:54:06 +02:00
README
README.md	doc: Add travis badge	2015-09-14 00:19:08 +02:00
RELEASE	Make the RELEASE file match with the most recent tag	2016-10-14 13:52:51 -04:00
version.sh	build: remove hardcoded name of version header	2016-09-15 21:59:15 +02:00

README.md

Libav

Libav is a collection of libraries and tools to process multimedia content such as audio, video, subtitles and related metadata.

Libraries

libavcodec provides implementation of a wider range of codecs.
libavformat implements streaming protocols, container formats and basic I/O access.
libavutil includes hashers, decompressors and miscellaneous utility functions.
libavfilter provides a mean to alter decoded Audio and Video through chain of filters.
libavdevice provides an abstraction to access capture and playback devices.
libavresample implements audio mixing and resampling routines.
libswscale implements color conversion and scaling routines.

Tools

avconv is a command line toolbox to manipulate, convert and stream multimedia content.
avplay is a minimalistic multimedia player.
avprobe is a simple analisys tool to inspect multimedia content.
Additional small tools such as aviocat, ismindex and qt-faststart.

Documentation

The offline documentation is available in the doc/ directory.

The online documentation is available in the main website and in the wiki.

Examples

Conding examples are available in the doc/example directory.

License

Libav codebase is mainly LGPL-licensed with optional components licensed under GPL. Please refer to the LICENSE file for detailed information.