Commit Graph

115664 Commits

Author SHA1 Message Date
Rémi Denis-Courmont
417957ec5e sws/range_convert: R-V V to/from JPEG
C908   X60
chrRangeFromJpeg_8_c:          2.7    2.5
chrRangeFromJpeg_8_rvv_i32:    1.7    1.5
chrRangeFromJpeg_24_c:         7.5    6.7
chrRangeFromJpeg_24_rvv_i32:   1.7    1.5
chrRangeFromJpeg_128_c:       55.2   34.7
chrRangeFromJpeg_128_rvv_i32:  6.5    3.0
chrRangeFromJpeg_144_c:       44.0   39.2
chrRangeFromJpeg_144_rvv_i32:  7.7    4.5
chrRangeFromJpeg_256_c:       78.2   69.5
chrRangeFromJpeg_256_rvv_i32: 12.2    6.0
chrRangeFromJpeg_512_c:      172.2  138.5
chrRangeFromJpeg_512_rvv_i32: 24.5   11.7
chrRangeToJpeg_8_c:            4.7    4.2
chrRangeToJpeg_8_rvv_i32:      2.0    1.7
chrRangeToJpeg_24_c:          13.7   12.2
chrRangeToJpeg_24_rvv_i32:     2.0    1.5
chrRangeToJpeg_128_c:         72.0   63.7
chrRangeToJpeg_128_rvv_i32:    6.7    3.2
chrRangeToJpeg_144_c:         80.7   71.7
chrRangeToJpeg_144_rvv_i32:    8.5    4.7
chrRangeToJpeg_256_c:        143.2  127.2
chrRangeToJpeg_256_rvv_i32:   13.5    6.5
chrRangeToJpeg_512_c:        285.7  253.7
chrRangeToJpeg_512_rvv_i32:   27.0   13.0
lumRangeFromJpeg_8_c:          1.7    1.5
lumRangeFromJpeg_8_rvv_i32:    1.2    1.0
lumRangeFromJpeg_24_c:         4.2    3.7
lumRangeFromJpeg_24_rvv_i32:   1.2    1.0
lumRangeFromJpeg_128_c:       21.7   19.2
lumRangeFromJpeg_128_rvv_i32:  3.7    1.7
lumRangeFromJpeg_144_c:       24.7   22.0
lumRangeFromJpeg_144_rvv_i32:  4.7    2.7
lumRangeFromJpeg_256_c:       43.7   39.0
lumRangeFromJpeg_256_rvv_i32:  7.5    3.2
lumRangeFromJpeg_512_c:       87.0   77.2
lumRangeFromJpeg_512_rvv_i32: 14.5    6.7
lumRangeToJpeg_8_c:            2.7    2.2
lumRangeToJpeg_8_rvv_i32:      1.0    1.0
lumRangeToJpeg_24_c:           7.2    6.5
lumRangeToJpeg_24_rvv_i32:     1.2    1.0
lumRangeToJpeg_128_c:         37.7   33.7
lumRangeToJpeg_128_rvv_i32:    3.7    2.0
lumRangeToJpeg_144_c:         42.5   37.7
lumRangeToJpeg_144_rvv_i32:    4.7    2.7
lumRangeToJpeg_256_c:         75.0   66.7
lumRangeToJpeg_256_rvv_i32:    7.5    3.5
lumRangeToJpeg_512_c:        149.5  133.0
lumRangeToJpeg_512_rvv_i32:   14.7    7.0
2024-06-10 22:48:52 +03:00
Zhao Zhili
9dac8495b0 swscale/aarch64: Add rgb24 to yuv implementation
Test on Apple M1:

rgb24_to_uv_8_c: 0.0
rgb24_to_uv_8_neon: 0.2
rgb24_to_uv_128_c: 1.0
rgb24_to_uv_128_neon: 0.5
rgb24_to_uv_1080_c: 7.0
rgb24_to_uv_1080_neon: 5.7
rgb24_to_uv_1920_c: 12.5
rgb24_to_uv_1920_neon: 9.5
rgb24_to_uv_half_8_c: 0.2
rgb24_to_uv_half_8_neon: 0.2
rgb24_to_uv_half_128_c: 1.0
rgb24_to_uv_half_128_neon: 0.5
rgb24_to_uv_half_1080_c: 6.2
rgb24_to_uv_half_1080_neon: 3.0
rgb24_to_uv_half_1920_c: 11.2
rgb24_to_uv_half_1920_neon: 5.2
rgb24_to_y_8_c: 0.2
rgb24_to_y_8_neon: 0.0
rgb24_to_y_128_c: 0.5
rgb24_to_y_128_neon: 0.5
rgb24_to_y_1080_c: 4.7
rgb24_to_y_1080_neon: 3.2
rgb24_to_y_1920_c: 8.0
rgb24_to_y_1920_neon: 5.7

On Pixel 6:

rgb24_to_uv_8_c: 30.7
rgb24_to_uv_8_neon: 56.9
rgb24_to_uv_128_c: 213.9
rgb24_to_uv_128_neon: 173.2
rgb24_to_uv_1080_c: 1649.9
rgb24_to_uv_1080_neon: 1424.4
rgb24_to_uv_1920_c: 2907.9
rgb24_to_uv_1920_neon: 2480.7
rgb24_to_uv_half_8_c: 36.2
rgb24_to_uv_half_8_neon: 33.4
rgb24_to_uv_half_128_c: 167.9
rgb24_to_uv_half_128_neon: 99.4
rgb24_to_uv_half_1080_c: 1293.9
rgb24_to_uv_half_1080_neon: 778.7
rgb24_to_uv_half_1920_c: 2292.7
rgb24_to_uv_half_1920_neon: 1328.7
rgb24_to_y_8_c: 19.7
rgb24_to_y_8_neon: 27.7
rgb24_to_y_128_c: 129.9
rgb24_to_y_128_neon: 96.7
rgb24_to_y_1080_c: 995.4
rgb24_to_y_1080_neon: 767.7
rgb24_to_y_1920_c: 1747.4
rgb24_to_y_1920_neon: 1337.2

Note both tests use clang as compiler, which has vectorization
enabled by default with -O3.

Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-06-11 01:12:09 +08:00
Zhao Zhili
b1240c983f tests/checkasm: Fix build error when enable linux perf on Android
B0 is defined by system header, see f0f596dbc6 for ref.

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-06-11 01:11:46 +08:00
Zhao Zhili
33e4cc963d avutil/timer: Add clock_gettime as a fallback of AV_READ_TIME
Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-06-11 01:11:36 +08:00
Zhao Zhili
6a18c0bc87 avutil/aarch64: Skip define AV_READ_TIME for apple
It will fallback to mach_absolute_time inside libavutil/timer.h

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-06-11 01:10:42 +08:00
James Almer
94f2274a8b x86/aacencdsp: fix ff_aac_quantize_bands_avx on unix64 ABI
Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 17:16:02 -03:00
James Almer
17c3cc5bb6 swscale/x86/rgb_2_rgb: add missing wrap to ff_uyvytoyuv422_avx2
Fixes old yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 16:04:36 -03:00
James Almer
03546f49a3 swscale/x86/rgb2rgb: add missing wrap for ff_uyvytoyuv422_avx2
Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 15:56:52 -03:00
James Almer
287d139b77 checkasm/sw_rgb: fix alignment of buffers for rgb_to_yuv tests
src is apparently not guaranteed to be >8 byte aligned, but align to 16
nonetheless as the x86 asm will do unaligned loads anyway.
dst is guaranteed to be 32 byte aligned for the Y plane, but 16 byte for UV.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 14:12:51 -03:00
James Almer
e8cef5e152 swscale/x86/rgb2rgb: remove mmxext version of shuffle_bytes_2103
Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 13:43:11 -03:00
James Almer
c578bb9864 swscale/x86/input: add AVX2 optimized uyvytoyuv422
uyvytoyuv422_c: 23991.8
uyvytoyuv422_sse2: 2817.8
uyvytoyuv422_avx: 2819.3
uyvytoyuv422_avx2: 1972.3

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 13:43:11 -03:00
James Almer
e9cfd53257 swscale/x86/input: add AVX2 optimized RGB32 to YUV functions
abgr_to_uv_8_c: 43.3
abgr_to_uv_8_sse2: 14.3
abgr_to_uv_8_avx: 15.3
abgr_to_uv_8_avx2: 18.8
abgr_to_uv_128_c: 650.3
abgr_to_uv_128_sse2: 110.8
abgr_to_uv_128_avx: 112.3
abgr_to_uv_128_avx2: 64.8
abgr_to_uv_1080_c: 5456.3
abgr_to_uv_1080_sse2: 888.8
abgr_to_uv_1080_avx: 900.8
abgr_to_uv_1080_avx2: 518.3
abgr_to_uv_1920_c: 9692.3
abgr_to_uv_1920_sse2: 1593.8
abgr_to_uv_1920_avx: 1613.3
abgr_to_uv_1920_avx2: 864.8
abgr_to_y_8_c: 23.3
abgr_to_y_8_sse2: 12.8
abgr_to_y_8_avx: 13.3
abgr_to_y_8_avx2: 17.3
abgr_to_y_128_c: 308.3
abgr_to_y_128_sse2: 67.3
abgr_to_y_128_avx: 66.8
abgr_to_y_128_avx2: 44.8
abgr_to_y_1080_c: 2371.3
abgr_to_y_1080_sse2: 512.8
abgr_to_y_1080_avx: 505.8
abgr_to_y_1080_avx2: 314.3
abgr_to_y_1920_c: 4177.3
abgr_to_y_1920_sse2: 915.8
abgr_to_y_1920_avx: 926.8
abgr_to_y_1920_avx2: 519.3
bgra_to_uv_8_c: 37.3
bgra_to_uv_8_sse2: 13.3
bgra_to_uv_8_avx: 14.8
bgra_to_uv_8_avx2: 19.8
bgra_to_uv_128_c: 563.8
bgra_to_uv_128_sse2: 111.3
bgra_to_uv_128_avx: 112.3
bgra_to_uv_128_avx2: 64.8
bgra_to_uv_1080_c: 4691.8
bgra_to_uv_1080_sse2: 893.8
bgra_to_uv_1080_avx: 899.8
bgra_to_uv_1080_avx2: 517.8
bgra_to_uv_1920_c: 8332.8
bgra_to_uv_1920_sse2: 1590.8
bgra_to_uv_1920_avx: 1605.8
bgra_to_uv_1920_avx2: 867.3
bgra_to_y_8_c: 22.3
bgra_to_y_8_sse2: 12.8
bgra_to_y_8_avx: 12.8
bgra_to_y_8_avx2: 17.3
bgra_to_y_128_c: 291.3
bgra_to_y_128_sse2: 67.8
bgra_to_y_128_avx: 69.3
bgra_to_y_128_avx2: 45.3
bgra_to_y_1080_c: 2357.3
bgra_to_y_1080_sse2: 508.3
bgra_to_y_1080_avx: 518.3
bgra_to_y_1080_avx2: 399.8
bgra_to_y_1920_c: 4202.8
bgra_to_y_1920_sse2: 906.8
bgra_to_y_1920_avx: 907.3
bgra_to_y_1920_avx2: 526.3

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 13:43:11 -03:00
James Almer
d5fe99dc5f swscale/x86/input: add AVX2 optimized RGB24 to YUV functions
rgb24_to_uv_8_c: 39.3
rgb24_to_uv_8_sse2: 14.3
rgb24_to_uv_8_ssse3: 13.3
rgb24_to_uv_8_avx: 12.8
rgb24_to_uv_8_avx2: 14.3
rgb24_to_uv_128_c: 582.8
rgb24_to_uv_128_sse2: 127.3
rgb24_to_uv_128_ssse3: 107.3
rgb24_to_uv_128_avx: 111.3
rgb24_to_uv_128_avx2: 62.3
rgb24_to_uv_1080_c: 4981.3
rgb24_to_uv_1080_sse2: 1048.3
rgb24_to_uv_1080_ssse3: 876.8
rgb24_to_uv_1080_avx: 887.8
rgb24_to_uv_1080_avx2: 492.3
rgb24_to_uv_1280_c: 5906.8
rgb24_to_uv_1280_sse2: 1263.3
rgb24_to_uv_1280_ssse3: 1048.3
rgb24_to_uv_1280_avx: 1045.8
rgb24_to_uv_1280_avx2: 579.8
rgb24_to_uv_1920_c: 8665.3
rgb24_to_uv_1920_sse2: 1888.8
rgb24_to_uv_1920_ssse3: 1571.8
rgb24_to_uv_1920_avx: 1558.8
rgb24_to_uv_1920_avx2: 869.3
rgb24_to_y_8_c: 20.3
rgb24_to_y_8_sse2: 11.8
rgb24_to_y_8_ssse3: 10.3
rgb24_to_y_8_avx: 10.3
rgb24_to_y_8_avx2: 10.8
rgb24_to_y_128_c: 284.8
rgb24_to_y_128_sse2: 83.3
rgb24_to_y_128_ssse3: 66.8
rgb24_to_y_128_avx: 64.8
rgb24_to_y_128_avx2: 39.3
rgb24_to_y_1080_c: 2451.3
rgb24_to_y_1080_sse2: 696.3
rgb24_to_y_1080_ssse3: 516.8
rgb24_to_y_1080_avx: 518.8
rgb24_to_y_1080_avx2: 301.8
rgb24_to_y_1280_c: 2892.8
rgb24_to_y_1280_sse2: 816.8
rgb24_to_y_1280_ssse3: 623.3
rgb24_to_y_1280_avx: 616.3
rgb24_to_y_1280_avx2: 350.8
rgb24_to_y_1920_c: 4338.8
rgb24_to_y_1920_sse2: 1210.8
rgb24_to_y_1920_ssse3: 928.3
rgb24_to_y_1920_avx: 920.3
rgb24_to_y_1920_avx2: 534.8

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 13:42:09 -03:00
James Almer
6743c2fc6a checkasm/sw_rgb: test rgb32/rgb32_1 to yuv
Test all four pixel formats, but only bench the two native endian ones for a
given target.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 12:29:49 -03:00
James Almer
91b9af0058 x86/aacencdsp: add AVX version of quantize_bands
quant_bands_signed_c: 1928.0
quant_bands_signed_sse2: 406.0
quant_bands_signed_avx: 207.0
quant_bands_unsigned_c: 1702.0
quant_bands_unsigned_sse2: 404.0
quant_bands_unsigned_avx: 209.0

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 12:29:49 -03:00
Rémi Denis-Courmont
7a3369398f sws/input: R-V V 32-bit RGB to halved UV
T-Head C908:
abgr_to_uv_half_8_c:            2.2
abgr_to_uv_half_8_rvv_i32:      3.5
abgr_to_uv_half_128_c:         44.0
abgr_to_uv_half_128_rvv_i32:   13.0
abgr_to_uv_half_1080_c:       245.0
abgr_to_uv_half_1080_rvv_i32: 107.2
abgr_to_uv_half_1920_c:       406.2
abgr_to_uv_half_1920_rvv_i32: 188.7
bgra_to_uv_half_8_c:            2.2
bgra_to_uv_half_8_rvv_i32:      3.5
bgra_to_uv_half_128_c:         26.5
bgra_to_uv_half_128_rvv_i32:   13.0
bgra_to_uv_half_1080_c:       219.7
bgra_to_uv_half_1080_rvv_i32: 107.0
bgra_to_uv_half_1920_c:       406.7
bgra_to_uv_half_1920_rvv_i32: 188.7

SpacemiT X60:
abgr_to_uv_half_8_c:           2.2
abgr_to_uv_half_8_rvv_i32:     3.0
abgr_to_uv_half_128_c:        28.2
abgr_to_uv_half_128_rvv_i32:   5.7
abgr_to_uv_half_1080_c:      235.5
abgr_to_uv_half_1080_rvv_i32: 47.7
abgr_to_uv_half_1920_c:      418.2
abgr_to_uv_half_1920_rvv_i32: 84.0
bgra_to_uv_half_8_c:           2.0
bgra_to_uv_half_8_rvv_i32:     3.0
bgra_to_uv_half_128_c:        23.7
bgra_to_uv_half_128_rvv_i32:   5.7
bgra_to_uv_half_1080_c:      195.5
bgra_to_uv_half_1080_rvv_i32: 47.7
bgra_to_uv_half_1920_c:      346.5
bgra_to_uv_half_1920_rvv_i32: 84.0
2024-06-09 14:33:04 +03:00
Rémi Denis-Courmont
e2f069905e sws/input: R-V V 32-bit RGB to UV 2024-06-09 14:33:04 +03:00
Rémi Denis-Courmont
f5555cb106 sws/input: R-V V 32-bit RGB to Y
T-Head C908:
abgr_to_y_8_c:            2.5
abgr_to_y_8_rvv_i32:      2.2
abgr_to_y_128_c:         37.0
abgr_to_y_128_rvv_i32:    8.5
abgr_to_y_1080_c:       327.0
abgr_to_y_1080_rvv_i32:  69.5
abgr_to_y_1920_c:       552.0
abgr_to_y_1920_rvv_i32: 122.2
bgra_to_y_8_c:            2.5
bgra_to_y_8_rvv_i32:      2.2
bgra_to_y_128_c:         37.2
bgra_to_y_128_rvv_i32:    8.5
bgra_to_y_1080_c:       310.2
bgra_to_y_1080_rvv_i32:  69.5
bgra_to_y_1920_c:       568.2
bgra_to_y_1920_rvv_i32: 122.5

SpacemiT X60:
abgr_to_y_8_c:            2.5
abgr_to_y_8_rvv_i32:      2.0
abgr_to_y_128_c:         33.0
abgr_to_y_128_rvv_i32:    3.7
abgr_to_y_1080_c:       276.0
abgr_to_y_1080_rvv_i32:  31.5
abgr_to_y_1920_c:       493.7
abgr_to_y_1920_rvv_i32:  55.5
bgra_to_y_8_c:            2.2
bgra_to_y_8_rvv_i32:      2.0
bgra_to_y_128_c:         33.0
bgra_to_y_128_rvv_i32:    3.7
bgra_to_y_1080_c:       276.0
bgra_to_y_1080_rvv_i32:  31.5
bgra_to_y_1920_c:       490.7
bgra_to_y_1920_rvv_i32:  55.5
2024-06-09 14:33:04 +03:00
Andreas Rheinhardt
8b62fb231a swscale/x86/rgb2rgb: Detemplatize
Every function in rgb2rgb_template.c is only compiled exactly
once; there is no overlap at all between the MMXEXT and the
SSE2 functions, so detemplatize it.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 12:03:47 +02:00
Andreas Rheinhardt
5421dee0e7 swscale/x86/rgb2rgb_template: Remove unused uyvytoyv12
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 12:03:47 +02:00
Andreas Rheinhardt
c1c35380a7 swscale/x86/rgb2rgb: Don't unnecessarily check for inline ASM
The SSE2 and AVX versions of deinterleaveBytes are external ASM.
Move them out of the inline ASM template.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 12:03:47 +02:00
Andreas Rheinhardt
f7305eb3b3 swscale/x86/rgb2rgb_template: Remove unnecessary SFENCE
The ff_nv12ToUV_* functions don't use non-temporal stores
at all.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 12:03:47 +02:00
Andreas Rheinhardt
fca796ac3b tests/checkasm/sw_rgb: Be more strict about clobbering MMX state
The MMXEXT versions of the rgb2rgb functions tested here
always emit emms on their own. Therefore one can use
a stricter test to ensure that it stays that way.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 12:03:47 +02:00
Andreas Rheinhardt
3af6136669 avcodec/dnxhdenc: Simplify padding
It is unnecessary to first pad to 32bits; the memset later
will pad everything will with zeroes anyway.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 10:59:33 +02:00
Andreas Rheinhardt
b0e0b3c58a avcodec/dnxhdenc: Move PutBitContext from ctx to stack
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 10:59:33 +02:00
Andreas Rheinhardt
542abee213 avcodec/cbs_h266_syntax_template: Use correct format specifier
H266RawSliceHeader.num_entry_points is an uint32_t.
Fixes -Wformat warnings:
https://fate.ffmpeg.org/log.cgi?slot=aarch64-osx-clang-1200.0.32.29&time=20240604151047&log=compile

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 10:59:33 +02:00
Andreas Rheinhardt
8f199cfb5b avformat/evc: Fix format specifiers
Fixes -Wformat warnings; see e.g.
https://fate.ffmpeg.org/log.cgi?slot=aarch64-osx-clang-1200.0.32.29&time=20240604151047&log=compile

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 10:59:33 +02:00
Andreas Rheinhardt
5f31a4fd16 avformat/vvc: Don't use uint8_t iterators, fix shadowing
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 10:59:33 +02:00
Andreas Rheinhardt
1c4362cce9 avformat/vvc: Fix comment
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 10:59:33 +02:00
Andreas Rheinhardt
fa77dc8c44 avformat/vvc: Reindent after the previous commit
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 10:59:33 +02:00
Andreas Rheinhardt
8b6c7e7cda avformat/vvc: Fix crash on allocation failure, avoid allocations
This is the VVC version of 8b5d155301.

(Hint: This ensures that the order of NALU arrays is OPI-VPS-SPS-PPS-
Prefix-SEI-Suffix-SEI, regardless of the order in the original
extradata. I hope this is right.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 10:59:33 +02:00
Andreas Rheinhardt
4482b3353d avformat/vvc: Don't use ff_copy_bits()
There is no benefit in using it: The fast path of copying
is not taken because of misalignment; furthermore we are
only dealing with a few byte here anyway, so simply copy
the bytes manually, avoiding the dependency on bitstream.c
in lavf (which also contains a function that is completely
unused in lavf).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 10:59:33 +02:00
Andreas Rheinhardt
52fb49a8a3 avformat/vvc: Use put_bytes_output()
The PutBitContext has just been flushed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 10:59:33 +02:00
Andreas Rheinhardt
dd8fb0aaae avcodec/hevc/Makefile: Move rules for lavc/* files to lavc/Makefile
If any of these files (say A) would be changed in such a way
that A acquires a new dependency on another file B, building B
would need to be added to all the rules that lead to A being built.
Yet currently the rules for several files are spread over
the lavc Makefile and the Makefile of the lavc/hevc subdir, making
it more likely to be forgotten. So move the rules for these files
to the lavc/Makefile.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 10:59:33 +02:00
Rémi Denis-Courmont
daac101e61 lavc/aacencdsp: fix rounding in R-V V quantize_bands
We need to round toward zero here.
2024-06-08 18:30:43 +03:00
Rémi Denis-Courmont
658439934b lavc/vp8dsp: R-V V vp8_idct_add
T-Head C908 (cycles):
vp8_idct_add_c:       312.2
vp8_idct_add_rvv_i32: 117.0
2024-06-08 18:30:43 +03:00
Rémi Denis-Courmont
e0f4d185f1 sws/input: R-V V rgb24ToUV_half and bgr24ToUV_half
T-Head C908:
rgb24_to_uv_half_4_c:           2.0
rgb24_to_uv_half_4_rvv_i32:     3.5
rgb24_to_uv_half_64_c:         27.0
rgb24_to_uv_half_64_rvv_i32:   12.5
rgb24_to_uv_half_540_c:       223.7
rgb24_to_uv_half_540_rvv_i32: 105.2
rgb24_to_uv_half_640_c:       265.5
rgb24_to_uv_half_640_rvv_i32: 123.7
rgb24_to_uv_half_960_c:       414.5
rgb24_to_uv_half_960_rvv_i32: 249.5

SpacemiT X60:
rgb24_to_uv_half_4_c:           1.7
rgb24_to_uv_half_4_rvv_i32:     4.2
rgb24_to_uv_half_64_c:         24.0
rgb24_to_uv_half_64_rvv_i32:    8.7
rgb24_to_uv_half_540_c:       199.2
rgb24_to_uv_half_540_rvv_i32:  72.5
rgb24_to_uv_half_640_c:       235.7
rgb24_to_uv_half_640_rvv_i32:  85.2
rgb24_to_uv_half_960_c:       353.5
rgb24_to_uv_half_960_rvv_i32: 127.5
2024-06-08 18:30:43 +03:00
Rémi Denis-Courmont
3ef5867e4b sws/input: R-V V rgb24ToUV and bgr24ToUV
T-Head C908:
rgb24_to_uv_8_c:            2.7
rgb24_to_uv_8_rvv_i32:      3.2
rgb24_to_uv_128_c:         41.0
rgb24_to_uv_128_rvv_i32:   12.7
rgb24_to_uv_1080_c:       342.5
rgb24_to_uv_1080_rvv_i32: 105.7
rgb24_to_uv_1280_c:       406.0
rgb24_to_uv_1280_rvv_i32: 124.2
rgb24_to_uv_1920_c:       626.0
rgb24_to_uv_1920_rvv_i32: 186.0

SpacemiT X60:
rgb24_to_uv_8_c:            2.5
rgb24_to_uv_8_rvv_i32:      3.0
rgb24_to_uv_128_c:         36.5
rgb24_to_uv_128_rvv_i32:    5.7
rgb24_to_uv_1080_c:       304.2
rgb24_to_uv_1080_rvv_i32:  49.0
rgb24_to_uv_1280_c:       360.5
rgb24_to_uv_1280_rvv_i32:  57.5
rgb24_to_uv_1920_c:       540.7
rgb24_to_uv_1920_rvv_i32:  86.2
2024-06-08 18:30:43 +03:00
Rémi Denis-Courmont
79dfdac4db sws/input: R-V V rgb24ToY & bgr24ToY
T-Head C908:
rgb24_to_y_8_c:            2.0
rgb24_to_y_8_rvv_i32:      2.7
rgb24_to_y_128_c:         26.2
rgb24_to_y_128_rvv_i32:    9.2
rgb24_to_y_1080_c:       219.5
rgb24_to_y_1080_rvv_i32:  76.2
rgb24_to_y_1280_c:       276.2
rgb24_to_y_1280_rvv_i32:  89.7
rgb24_to_y_1920_c:       389.7
rgb24_to_y_1920_rvv_i32: 134.2

SpacemiT X60:
rgb24_to_y_8_c:            1.7
rgb24_to_y_8_rvv_i32:      2.2
rgb24_to_y_128_c:         23.2
rgb24_to_y_128_rvv_i32:    4.2
rgb24_to_y_1080_c:       195.0
rgb24_to_y_1080_rvv_i32:  33.7
rgb24_to_y_1280_c:       231.0
rgb24_to_y_1280_rvv_i32:  40.0
rgb24_to_y_1920_c:       346.2
rgb24_to_y_1920_rvv_i32:  59.7
2024-06-08 18:30:43 +03:00
Wenbin Chen
7560db937d libavfi/dnn: enable LibTorch xpu device option support
Add xpu device support to libtorch backend.
To enable xpu support you need to add
 "-Wl,--no-as-needed -lintel-ext-pt-gpu -Wl,--as-needed" to
"--extra-libs" when configure ffmpeg.

Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
2024-06-08 19:45:21 +08:00
Nuo Mi
f68f40736f avcodec/vvcdec: support mv wraparound
A 360 video specific tool
see https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9503377

passed files:
    DMVR_A_Huawei_3.bit
    WRAP_D_InterDigital_4.bit
    WRAP_A_InterDigital_4.bit
    WRAP_B_InterDigital_4.bit
    WRAP_C_InterDigital_4.bit
    ERP_A_MediaTek_3.bit
2024-06-08 17:45:55 +08:00
Nuo Mi
685174069f avcodec/vvcdec: misc, reindent inter.c 2024-06-08 17:45:55 +08:00
Nuo Mi
a4013e748a avcodec/vvcdec: refact out emulated_edge_no_wrap
prepare for refrence wraparound
2024-06-08 17:45:55 +08:00
Nuo Mi
8abdf0a28e avcodec/vvcdec: misc, move src offset inside emulated_edge 2024-06-08 17:45:55 +08:00
Nuo Mi
2d98786fee avcodec/vvcdec: refact, remove emulated_edge_dmvr and emulated_edge_bilinear to simplify code 2024-06-08 17:45:55 +08:00
Lynne
714596bcbf
aacdec_usac: zero out alpha values for the current frame 2024-06-08 00:22:41 +02:00
Lynne
c2d459cb51
aacdec_usac: fix stereo alpha values for transients
Typo.
Also added comments and fixed the branch underneath.
2024-06-08 00:22:40 +02:00
Lynne
7223523335
aacdec_usac: use correct TNS values
The standard slightly modified the maximum TNS bands allowed.
2024-06-08 00:22:40 +02:00
Lynne
9b41cc0430
aacdec_usac: do not round noise amplitude values
Use floating point division instead of integer division.
2024-06-08 00:22:40 +02:00
Lynne
a18d0659f4
aacdec_usac: skip coeff decoding if the number to be decoded is 0
Yet another thing not mentioned in the spec.
2024-06-08 00:22:39 +02:00