ffmpeg/libswscale
Swinney, Jonathan 0ea61725b1 swscale/aarch64: add hscale specializations
This patch adds code to support specializations of the hscale function
and adds a specialization for filterSize == 4.

ff_hscale8to15_4_neon is a complete rewrite. Since the main bottleneck
here is loading the data from src, this data is loaded a whole block
ahead and stored back to the stack to be loaded again with ld4. This
arranges the data for most efficient use of the vector instructions and
removes the need for completion adds at the end. The number of
iterations of the C per iteration of the assembly is increased from 4 to
8, but because of the prefetching, there must be a special section
without prefetching when dstW < 16.

This improves speed on Graviton 2 (Neoverse N1) dramatically in the case
where previously fs=8 would have been required.

before: hscale_8_to_15__fs_8_dstW_512_neon: 1962.8
after : hscale_8_to_15__fs_4_dstW_512_neon: 1220.9

Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-05-28 01:09:05 +03:00
..
aarch64 swscale/aarch64: add hscale specializations 2022-05-28 01:09:05 +03:00
arm
ppc
tests
x86
Makefile lib*/version: Move library version functions into files of their own 2022-05-10 06:49:32 +02:00
alphablend.c
bayer_template.c
gamma.c
hscale.c
hscale_fast_bilinear.c
input.c
libswscale.v
log2_tab.c
options.c
output.c
rgb2rgb.c
rgb2rgb.h
rgb2rgb_template.c
slice.c
swscale.c
swscale.h Keep including the full version.h when headers are included externally 2022-03-19 00:01:57 +02:00
swscale_internal.h
swscale_unscaled.c
swscaleres.rc
utils.c swscale/aarch64: add hscale specializations 2022-05-28 01:09:05 +03:00
version.c lib*/version: Move library version functions into files of their own 2022-05-10 06:49:32 +02:00
version.h doc: Add an entry to APIchanges about changes to version.h and version_major.h 2022-03-16 14:12:46 +02:00
version_major.h libswscale: Split version.h 2022-03-16 14:05:26 +02:00
vscale.c
yuv2rgb.c