Commit Graph

18 Commits

Author SHA1 Message Date
Rémi Denis-Courmont 6d60cc7baf sws/rgb2rgb: fix unaligned accesses in R-V V YUYV to I422p
In my personal opinion, we should not need to support unaligned YUY2
pixel maps. They should always be aligned to at least 32 bits, and the
current code assumes just 16 bits. However checkasm does test for
unaligned input bitmaps. QEMU accepts it, but real hardware dose not.

In this particular case, we can at the same time improve performance and
handle unaligned inputs, so do just that.

uyvytoyuv422_c:      104379.0
uyvytoyuv422_c:      104060.0
uyvytoyuv422_rvv_i32: 25284.0 (before)
uyvytoyuv422_rvv_i32: 19303.2 (after)
2023-11-13 18:34:29 +02:00
Rémi Denis-Courmont 5b8b5ec9c5 sws/rgb2rgb: rework R-V V YUY2 to 4:2:2 planar
This saves three scratch registers and three instructions per line. The
performance gains are mostly negligible. The main point is to free up
registers for further rework.
2023-11-13 18:34:29 +02:00
Rémi Denis-Courmont 19baf4e009 swscale/rgb2rgb: R-V V deinterleaveBytes 2023-10-03 22:53:20 +03:00
Rémi Denis-Courmont ede3215115 swscale/rgb2rgb: fix extra iteration in R-V V interleave
There was an additional iteration doing nothing for each line,
due to checking the selected vector length instead of the available
vector length.
2023-10-03 22:53:20 +03:00
Rémi Denis-Courmont d14130aea3 swscale/rgb2rgb: unroll R-V V interleave_bytes 2023-10-03 20:48:47 +03:00
Rémi Denis-Courmont 6269c4a440 swscale/rgb2rgb: unroll RISC-V V uyvytoyuv422 2023-10-03 20:48:39 +03:00
Rémi Denis-Courmont e50f8e861b swscale/rgb2rgb: avoid S-regs in RISC-V V uyvytoyuv422
We can make do with callee-clobbered registers only now.
As an added bonus, this makes the code XLEN-independent.
2023-10-03 20:48:39 +03:00
Rémi Denis-Courmont be37a2e364 swscale/rgb2rgb: rework RISC-V V uyvytoyuv422
This avoids using relatively slow register strides.
2023-10-03 20:48:39 +03:00
Rémi Denis-Courmont 1a4bd76ea5 swscale/rgb2rgb: remove R-V V shuffle_bytes_3012
This is slower than the Zbb version on real hardware due to register
strides. Proper support for vector byte-swap requires the Zvbb
extension, but it's much too early for me to worry about it.
2023-10-02 22:28:38 +03:00
Rémi Denis-Courmont c4a144c29d swscale/rgb2rgb: add R-V Zbb shuffle_bytes_3210 2023-10-02 22:28:25 +03:00
Rémi Denis-Courmont c2b38619c0 swscale/rgb2rgb2: rework RISC-V V shuffle_bytes_{1230,3012}
This avoids strided loads.

Before:
shuffle_bytes_1230_rvv_i32: 308.7
shuffle_bytes_3012_rvv_i32: 308.7

After:
shuffle_bytes_1230_rvv_i32: 46.7
shuffle_bytes_3012_rvv_i32: 46.7
2023-07-21 22:18:02 +03:00
Rémi Denis-Courmont 15982554e6 swscale/rgb2rgb2: rework RISC-V V shuffle_bytes_{0321,2103}
This avoids strided loads.

Before:
shuffle_bytes_0321_rvv_i32: 307.7
shuffle_bytes_2103_rvv_i32: 308.7

After:
shuffle_bytes_0321_rvv_i32: 59.7
shuffle_bytes_2103_rvv_i32: 61.5
2023-07-21 22:18:02 +03:00
Rémi Denis-Courmont d3948e4db5 swscale: inline ff_shuffle_bytes_3210_rvv
No functional changes.
2023-07-21 22:18:02 +03:00
Rémi Denis-Courmont b6585eb04c lavu: add/use flag for RISC-V Zba extension
The code was blindly assuming that Zbb or V implied Zba. While the
earlier is practically always true, the later broke some QEMU setups,
as V was introduced earlier than Zba.
2023-07-19 19:29:35 +03:00
Khem Raj a7b3c0203f libswscale/riscv: fix syntax of vsetvli
Add missing operand which clang complains about but GCC assumes it to be
'm1' if not specified.

Works around build failure with Clang:
| src/libswscale/riscv/rgb2rgb_rvv.S:88:25: error: operand must be e[8|16|32|64|128|256|512|1024],m[1|2|4|8|f2|f4|f8],[ta|tu],[ma|mu]
|         vsetvli t4, t3, e8, ta, ma
|                         ^

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2023-07-13 22:01:24 +03:00
Rémi Denis-Courmont a1bfb5290e sws/rgb2rgb: RISC-V 64-bit V packed YUYV/UYVY to planar 4:2:2
This is currently 64-bit only because the stack spilling code would not
assemble on RV32I (and it would corrupt s0 and s1 on RV128I, in theory).

This could be added later in the unlikely that someone wants it.
2022-09-30 07:25:44 +02:00
Rémi Denis-Courmont 9181835a24 sws/rgb2rgb: RISC-V V interleaveBytes 2022-09-30 07:24:09 +02:00
Rémi Denis-Courmont 66a03f4053 sws/rgb2rgb: RISC-V V shuffle_bytes_xxxx functions 2022-09-30 07:24:09 +02:00