ffmpeg/libswscale
Andreas Rheinhardt 888a02a126 swscale/output: Don't call av_pix_fmt_desc_get() in a loop
Up until now, libswscale/output.c used a macro to write
an output pixel which involved a call to av_pix_fmt_desc_get()
to find out whether the input pixel format is BE or LE
despite this being known at compile-time (there are templates
per pixfmt). Even worse, these calls are made in a loop,
so that e.g. there are eight calls to av_pix_fmt_desc_get()
for every pixel processed in yuv2rgba64_X_c_template()
for 64bit RGB formats.

This commit modifies these macros to ensure that isBE()
is evaluated at compile-time. This saved 41184B of .text
for me (GCC 11.2, -O3). Of course, it also improved performance.
E.g. ffmpeg_g -f lavfi -i testsrc2,format=yuva420p -pix_fmt rgba64le \
-threads 1  -t 1:00  -f null - (which uses yuv2rgba64le_X_c,
which is an invocation of yuv2rgba64_X_c_template() mentioned above),
performance improved from 95589 to 41387 decicycles for one call
to yuv2packedX; for the be variant the numbers went down from
76087 to 43024 decicycles.

Reviewed-by: Anton Khirnov <anton@khirnov.net>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-19 23:40:41 +02:00
..
aarch64 swscale/aarch64: add vscale specializations 2022-08-16 13:40:42 +03:00
arm
loongarch swscale/la: Add output_lasx.c file. 2022-09-10 22:56:39 +02:00
ppc
tests
x86 swscale/x86/rgb_2_rgb: Empty MMX state in ff_shuffle_bytes_2103_mmxext 2022-08-23 12:21:00 +02:00
alphablend.c
bayer_template.c
gamma.c
half2float.c swscale/input: add rgbaf16 input support 2022-08-19 22:09:36 +02:00
hscale_fast_bilinear.c
hscale.c swscale: add opaque parameter to input functions 2022-08-19 22:09:36 +02:00
input.c swscale/input: Avoid calls to av_pix_fmt_desc_get() 2022-09-19 23:40:41 +02:00
libswscale.v
log2_tab.c
Makefile swscale/input: add rgbaf16 input support 2022-08-19 22:09:36 +02:00
options.c
output.c swscale/output: Don't call av_pix_fmt_desc_get() in a loop 2022-09-19 23:40:41 +02:00
rgb2rgb_template.c
rgb2rgb.c swscale/la: Add yuv2rgb_lasx.c and rgb2rgb_lasx.c files 2022-09-10 22:56:38 +02:00
rgb2rgb.h swscale/la: Add yuv2rgb_lasx.c and rgb2rgb_lasx.c files 2022-09-10 22:56:38 +02:00
slice.c swscale/input: add rgbaf16 input support 2022-08-19 22:09:36 +02:00
swscale_internal.h swscale/la: Optimize hscale functions with lasx. 2022-09-10 22:56:38 +02:00
swscale_unscaled.c
swscale.c swscale/la: Optimize hscale functions with lasx. 2022-09-10 22:56:38 +02:00
swscale.h
swscaleres.rc
utils.c swscale/la: Optimize hscale functions with lasx. 2022-09-10 22:56:38 +02:00
version_major.h
version.c
version.h swscale/output: add support for Y210LE and Y212LE 2022-09-10 12:29:12 -07:00
vscale.c
yuv2rgb.c swscale/la: Add yuv2rgb_lasx.c and rgb2rgb_lasx.c files 2022-09-10 22:56:38 +02:00