Commit Graph

7 Commits

Author SHA1 Message Date
Ramiro Polla
52887683e9 swscale/aarch64: add nv24/nv42 to yuv420p unscaled converter
A55               A76
nv24_yuv420p_128_c:       4956.1            1267.0
nv24_yuv420p_128_neon:    3109.1 ( 1.59x)    640.0 ( 1.98x)
nv24_yuv420p_1920_c:     35728.4           11736.2
nv24_yuv420p_1920_neon:   8011.1 ( 4.46x)   2436.0 ( 4.82x)
nv42_yuv420p_128_c:       4956.4            1270.5
nv42_yuv420p_128_neon:    3074.6 ( 1.61x)    639.5 ( 1.99x)
nv42_yuv420p_1920_c:     35685.9           11732.5
nv42_yuv420p_1920_neon:   7995.1 ( 4.46x)   2437.2 ( 4.81x)
2024-08-26 11:04:46 +02:00
Ramiro Polla
75f1a8e071 swscale/aarch64: add neon {lum,chr}ConvertRange
chrRangeFromJpeg_8_c: 29.2
chrRangeFromJpeg_8_neon: 19.5
chrRangeFromJpeg_24_c: 80.5
chrRangeFromJpeg_24_neon: 34.0
chrRangeFromJpeg_128_c: 413.7
chrRangeFromJpeg_128_neon: 156.0
chrRangeFromJpeg_144_c: 471.0
chrRangeFromJpeg_144_neon: 174.2
chrRangeFromJpeg_256_c: 842.0
chrRangeFromJpeg_256_neon: 305.5
chrRangeFromJpeg_512_c: 1699.0
chrRangeFromJpeg_512_neon: 608.0
chrRangeToJpeg_8_c: 51.7
chrRangeToJpeg_8_neon: 22.7
chrRangeToJpeg_24_c: 149.7
chrRangeToJpeg_24_neon: 38.0
chrRangeToJpeg_128_c: 761.7
chrRangeToJpeg_128_neon: 176.7
chrRangeToJpeg_144_c: 866.2
chrRangeToJpeg_144_neon: 198.7
chrRangeToJpeg_256_c: 1516.5
chrRangeToJpeg_256_neon: 348.7
chrRangeToJpeg_512_c: 3067.2
chrRangeToJpeg_512_neon: 692.7
lumRangeFromJpeg_8_c: 24.0
lumRangeFromJpeg_8_neon: 17.0
lumRangeFromJpeg_24_c: 56.7
lumRangeFromJpeg_24_neon: 21.0
lumRangeFromJpeg_128_c: 294.5
lumRangeFromJpeg_128_neon: 76.7
lumRangeFromJpeg_144_c: 332.5
lumRangeFromJpeg_144_neon: 86.7
lumRangeFromJpeg_256_c: 586.0
lumRangeFromJpeg_256_neon: 152.2
lumRangeFromJpeg_512_c: 1190.0
lumRangeFromJpeg_512_neon: 298.0
lumRangeToJpeg_8_c: 31.7
lumRangeToJpeg_8_neon: 19.5
lumRangeToJpeg_24_c: 83.5
lumRangeToJpeg_24_neon: 24.2
lumRangeToJpeg_128_c: 440.5
lumRangeToJpeg_128_neon: 91.0
lumRangeToJpeg_144_c: 504.2
lumRangeToJpeg_144_neon: 101.0
lumRangeToJpeg_256_c: 879.7
lumRangeToJpeg_256_neon: 177.2
lumRangeToJpeg_512_c: 1794.2
lumRangeToJpeg_512_neon: 354.0
2024-06-18 23:12:41 +02:00
Zhao Zhili
9dac8495b0 swscale/aarch64: Add rgb24 to yuv implementation
Test on Apple M1:

rgb24_to_uv_8_c: 0.0
rgb24_to_uv_8_neon: 0.2
rgb24_to_uv_128_c: 1.0
rgb24_to_uv_128_neon: 0.5
rgb24_to_uv_1080_c: 7.0
rgb24_to_uv_1080_neon: 5.7
rgb24_to_uv_1920_c: 12.5
rgb24_to_uv_1920_neon: 9.5
rgb24_to_uv_half_8_c: 0.2
rgb24_to_uv_half_8_neon: 0.2
rgb24_to_uv_half_128_c: 1.0
rgb24_to_uv_half_128_neon: 0.5
rgb24_to_uv_half_1080_c: 6.2
rgb24_to_uv_half_1080_neon: 3.0
rgb24_to_uv_half_1920_c: 11.2
rgb24_to_uv_half_1920_neon: 5.2
rgb24_to_y_8_c: 0.2
rgb24_to_y_8_neon: 0.0
rgb24_to_y_128_c: 0.5
rgb24_to_y_128_neon: 0.5
rgb24_to_y_1080_c: 4.7
rgb24_to_y_1080_neon: 3.2
rgb24_to_y_1920_c: 8.0
rgb24_to_y_1920_neon: 5.7

On Pixel 6:

rgb24_to_uv_8_c: 30.7
rgb24_to_uv_8_neon: 56.9
rgb24_to_uv_128_c: 213.9
rgb24_to_uv_128_neon: 173.2
rgb24_to_uv_1080_c: 1649.9
rgb24_to_uv_1080_neon: 1424.4
rgb24_to_uv_1920_c: 2907.9
rgb24_to_uv_1920_neon: 2480.7
rgb24_to_uv_half_8_c: 36.2
rgb24_to_uv_half_8_neon: 33.4
rgb24_to_uv_half_128_c: 167.9
rgb24_to_uv_half_128_neon: 99.4
rgb24_to_uv_half_1080_c: 1293.9
rgb24_to_uv_half_1080_neon: 778.7
rgb24_to_uv_half_1920_c: 2292.7
rgb24_to_uv_half_1920_neon: 1328.7
rgb24_to_y_8_c: 19.7
rgb24_to_y_8_neon: 27.7
rgb24_to_y_128_c: 129.9
rgb24_to_y_128_neon: 96.7
rgb24_to_y_1080_c: 995.4
rgb24_to_y_1080_neon: 767.7
rgb24_to_y_1920_c: 1747.4
rgb24_to_y_1920_neon: 1337.2

Note both tests use clang as compiler, which has vectorization
enabled by default with -O3.

Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-06-11 01:12:09 +08:00
Martin Storsjö
e0604d508e swscale: aarch64: Add a NEON implementation of interleaveBytes
This allows speeding up format conversions from yuv420 to nv12.

                             Cortex A53      A72      A73
interleave_bytes_c:             86077.5  51433.0  66972.0
interleave_bytes_neon:          19701.7  23019.2  15859.2
interleave_bytes_aligned_c:     86603.0  52017.2  67484.2
interleave_bytes_aligned_neon:   9061.0   7623.0   6309.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2020-05-15 23:38:17 +03:00
Clément Bœsch
c921f4f687 sws/aarch64: add ff_yuv2planeX_8_neon 2016-04-11 16:27:19 +02:00
Clément Bœsch
263eb76bdf sws/aarch64: add ff_hscale_8_to_15_neon
./ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -

    before: t:0.489726 avg:0.489883 max:0.491852 min:0.489482
    after:  t:0.256515 avg:0.256458 max:0.256999 min:0.253755
2016-03-31 10:12:55 +02:00
Clément Bœsch
f1148390d7 sws/aarch64: add {nv12,nv21,yuv420p,yuv422p}_to_{argb,rgba,abgr,rgba}_neon 2016-03-01 17:53:33 +01:00