ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2024-12-24 00:02:52 +00:00

History

Sebastian Pop bd83191271 swscale/aarch64: use multiply accumulate and increase vector factor to 4 This patch implements ff_hscale_8_to_15_neon with NEON fused multiply accumulate and bumps the vectorization factor from 2 to 4. The speedup is of 25% on Graviton1 A1 instances based on A-72 cpus: $ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null - before: t:0.040303 avg:0.040287 max:0.040371 min:0.039214 after: t:0.032168 avg:0.032215 max:0.033081 min:0.032146 The speedup is of 39% on Graviton2 m6g instances based on Neoverse-N1 cpus: $ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null - before: t:0.019446 avg:0.019423 max:0.019493 min:0.019181 after: t:0.014015 avg:0.014096 max:0.015018 min:0.013971 Tested with `make check` on aarch64-linux. Signed-off-by: Sebastian Pop <spop@amazon.com> Reviewed-by: Jean-Baptiste Kempf <jb@videolan.org> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>		2019-12-17 23:41:47 +01:00
..
aarch64	swscale/aarch64: use multiply accumulate and increase vector factor to 4	2019-12-17 23:41:47 +01:00
arm
ppc	swscale: Fix AltiVec/VSX build with recent GCC	2019-10-04 08:58:17 +03:00
tests
x86	swscale/x86/swscale: Fix undefined left shifts of negative numbers	2019-09-28 17:24:32 +02:00
alphablend.c
bayer_template.c
gamma.c
hscale_fast_bilinear.c
hscale.c
input.c
libswscale.v
log2_tab.c
Makefile
options.c
output.c	swscale/output: Avoid 64bit in Alpha in yuv2ya16_X_c_template()	2019-10-16 19:17:57 +02:00
rgb2rgb_template.c
rgb2rgb.c
rgb2rgb.h
slice.c
swscale_internal.h
swscale_unscaled.c	swscale/swscale_unscaled: add AV_PIX_FMT_GBRAP10 for LE and BE conversion wrapper	2019-12-10 16:09:14 +01:00
swscale.c	swscale/swscale: cosmetics	2019-09-27 10:58:30 +02:00
swscale.h
swscaleres.rc
utils.c	swscale/utils: Fix invalid left shifts of negative numbers	2019-09-28 17:24:32 +02:00
version.h
vscale.c
yuv2rgb.c