ffmpeg/libswscale/aarch64
Swinney, Jonathan 3e708722a2 swscale/aarch64: vscale optimization
Use scalar times vector multiply accumlate instructions instead of
vector times vector to remove the need for replicating load instructions
which are slightly slower.

On AWS c7g (Graviton 3, Neoverse V1) instances:
yuv2yuvX_8_0_512_accurate_neon:  1144.8  987.4
yuv2yuvX_16_0_512_accurate_neon: 2080.5 1869.4

Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-16 13:40:42 +03:00
..
hscale.S libswscale/aarch64: add another hscale specialization 2022-08-16 12:08:38 +03:00
Makefile swscale: aarch64: Add a NEON implementation of interleaveBytes 2020-05-15 23:38:17 +03:00
output.S swscale/aarch64: vscale optimization 2022-08-16 13:40:42 +03:00
rgb2rgb_neon.S swscale: aarch64: Add a NEON implementation of interleaveBytes 2020-05-15 23:38:17 +03:00
rgb2rgb.c swscale: aarch64: Add a NEON implementation of interleaveBytes 2020-05-15 23:38:17 +03:00
swscale_unscaled.c sws: rename SwsContext.swscale to convert_unscaled 2021-07-03 15:57:53 +02:00
swscale.c libswscale/aarch64: add another hscale specialization 2022-08-16 12:08:38 +03:00
yuv2rgb_neon.S aarch64/yuv2rgb_neon: fix return value 2020-07-09 10:33:14 +01:00