ffmpeg/libswscale
Martin Storsjö 70db14376c swscale: aarch64: Optimize the final summation in the hscale routine
Before:                     Cortex A53      A72      A73  Graviton 2  Graviton 3
hscale_8_to_15_width8_neon:     8273.0   4602.5   4289.5      2429.7      1629.1
hscale_8_to_15_width16_neon:   12405.7   6803.0   6359.0      3549.0      2378.4
hscale_8_to_15_width32_neon:   21258.7  11491.7  11469.2      5797.2      3919.6
hscale_8_to_15_width40_neon:   25652.0  14173.7  12488.2      6893.5      4810.4

After:
hscale_8_to_15_width8_neon:     7633.0   3981.5   3350.2      1980.7      1261.1
hscale_8_to_15_width16_neon:   11666.7   5951.0   5512.0      3080.7      2131.4
hscale_8_to_15_width32_neon:   20900.7  10733.2   9481.7      5275.2      3862.1
hscale_8_to_15_width40_neon:   24826.0  13536.2  11502.0      6397.2      4731.9

Thus, this gives overall a 8-29% speedup for the smaller filter
sizes, around 1-8% for the larger filter sizes.

Inspired by a patch by Jonathan Swinney <jswinney@amazon.com>.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-04-22 10:49:46 +03:00
..
aarch64 swscale: aarch64: Optimize the final summation in the hscale routine 2022-04-22 10:49:46 +03:00
arm
ppc
tests
x86
Makefile libswscale: Split version.h 2022-03-16 14:05:26 +02:00
alphablend.c
bayer_template.c
gamma.c
hscale.c
hscale_fast_bilinear.c
input.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
libswscale.v
log2_tab.c
options.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
output.c
rgb2rgb.c
rgb2rgb.h Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
rgb2rgb_template.c
slice.c
swscale.c Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
swscale.h Keep including the full version.h when headers are included externally 2022-03-19 00:01:57 +02:00
swscale_internal.h Remove unnecessary libavutil/(avutil|common|internal).h inclusions 2022-02-24 12:56:49 +01:00
swscale_unscaled.c
swscaleres.rc
utils.c libswscale: Split version.h 2022-03-16 14:05:26 +02:00
version.h doc: Add an entry to APIchanges about changes to version.h and version_major.h 2022-03-16 14:12:46 +02:00
version_major.h libswscale: Split version.h 2022-03-16 14:05:26 +02:00
vscale.c
yuv2rgb.c