Commit Graph

62 Commits

Author SHA1 Message Date
Rémi Denis-Courmont ee1526c05f lavu/riscv: add assembler macros for adjusting vector LMUL
vtype_vli computes the VTYPE value with the optimal LMUL for a given
element width, tail and mask policies and a run-time vector length.

vtype_ivli does the same, but with the compile-time constant vector
length.

vwtypei and vntypei can be used to widen or narrow a VTYPE value for
use in mixed-width vector-optimised functions.
2024-05-19 18:37:33 +03:00
Rémi Denis-Courmont 83e5fdd3f4 lavu/riscv: fix parsing the unaligned access capability
Pointed-out-by: Stefan O'Rear <sorear@fastmail.com>
2024-05-15 20:04:08 +03:00
Rémi Denis-Courmont 20fbc07af1 lavu/riscv: remove bogus B extension
The B Bit manipulation extension was not defined to this day, and
probably never will. Instead it was broken down into Zba, Zbb, Zbc and
Zbs with no particular blessed set to make up B.

This removes the bogus field test. Linux never set this bit, nor
(AFAICT) did FreeBSD or any other OS. We can always add it back in the
unlikely event that it gets taken into use.
2024-05-14 19:50:00 +03:00
Rémi Denis-Courmont b410439263 lavu/riscv: CPU flag for fast misaligned accesses 2024-05-14 19:50:00 +03:00
Rémi Denis-Courmont 61ec7450ff lavu/riscv: fallback to raw hwprobe() system call
Not all C run-times support this, and even then, it will be a while
before distributions provide recent enough versions thereof.

Since this is a trivial system call wrapper, we might just as well call
the corresponding kernel system call directly where the C run-time lacks
support but the kernel headers are new enough (as is the case on Debian
Unstable at the time of writing). In doing so, we need to add a few more
guards as the first suitable kernel (headers) release did not expose the
V, Zba and Zbb extensions.
2024-05-14 19:50:00 +03:00
Rémi Denis-Courmont 247c5b2b97 lavu/riscv: add ff_rv_vlen_least()
This inline function checks that the vector length is at least a given
value. With this, most run-time VLEN checks can be optimised away.
2024-05-13 18:36:07 +03:00
Rémi Denis-Courmont 5d8f62feb5 lavu/riscv: add Zvbb CPU capability detection
This requires Linux kernel version 6.8 or later.
2024-05-11 11:38:49 +03:00
Rémi Denis-Courmont 5afe734b6d lavu/riscv: remove bespoke assembler for MIN
This is no longer necessary as Zbb is now always explicitly required.
2024-05-10 18:59:06 +03:00
Rémi Denis-Courmont 89029baebd lavu/riscv: allow requesting a second extension 2024-05-10 18:59:06 +03:00
Rémi Denis-Courmont 1f150a68ac lavu/riscv: fix build without <sys/hwprobe.h> 2024-05-08 18:26:32 +03:00
Rémi Denis-Courmont 95d1052fba lavu/riscv: add hwprobe() for CPU detection
This adds the Linux-specific function call to detect CPU features. Unlike
the more portable auxillary vector, this supports extensions other than
single lettered ones. At this point, FFmpeg already needs this to detect
Zba and Zbb at run-time, and probably will need it for Zvbb in the near
future.

Support will be available in glibc 2.40 onward.
2024-05-06 22:09:41 +03:00
Rémi Denis-Courmont d7333ba6f2 lavu/riscv: indent code
This reindents code to prepare for the next changeset.
No functional changes.
2024-05-06 22:09:41 +03:00
Rémi Denis-Courmont e33ce0d9dd lavu/fixed_dsp: R-V V fmul_window_scaled
vector_fmul_window_scaled_fixed_c:       4393.7
vector_fmul_window_scaled_fixed_rvv_i64: 1642.7
2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont e49f41fb27 lavu/float_dsp: optimise R-V V fmul_reverse & fmul_window
Roll the loop to avoid slow gathers.

Before:
vector_fmul_reverse_c:       1561.7
vector_fmul_reverse_rvv_f32: 2410.2
vector_fmul_window_c:        2068.2
vector_fmul_window_rvv_f32:  1879.5

After:
vector_fmul_reverse_c:       1561.7
vector_fmul_reverse_rvv_f32:  916.2
vector_fmul_window_c:        2068.2
vector_fmul_window_rvv_f32:  1202.5
2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont 3a134e8299 lavu/fixed_dsp: optimise R-V V fmul_reverse
Gathers are (unsurprisingly) a notable exception to the rule that R-V V
gets faster with larger group multipliers. So roll the function to speed
it up.

Before:
vector_fmul_reverse_fixed_c:       2840.7
vector_fmul_reverse_fixed_rvv_i32: 2430.2

After:
vector_fmul_reverse_fixed_c:       2841.0
vector_fmul_reverse_fixed_rvv_i32:  962.2

It might be possible to further optimise the function by moving the
reverse-subtract out of the loop and adding ad-hoc tail handling.
2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont cd6089dc9c riscv: fix builds without Zbb support 2023-11-18 22:01:59 +02:00
Rémi Denis-Courmont 04b49fb3c5 lavu/riscv: fix typo 2023-10-29 22:15:15 +02:00
Rémi Denis-Courmont f39a8790e1 lavu/fixed_dsp: R-V V vector_fmul_window 2023-10-09 19:52:28 +03:00
Rémi Denis-Courmont 10eb3b9c9f lavu/fixed_dsp: R-V V vector_fmul
vector_fmul_fixed_c: 4.0
vector_fmul_fixed_rvv_i64: 0.5
2023-10-09 19:52:28 +03:00
Rémi Denis-Courmont da7a77fb0a lavu/fixed_dsp: R-V V vector_fmul_reverse 2023-10-09 19:52:28 +03:00
Rémi Denis-Courmont bf911cc1bf lavu/fixed_dsp: R-V V vector_fmul_add
vector_fmul_add_fixed_c: 2.2
vector_fmul_add_fixed_rvv_i64: 0.5
2023-10-09 19:52:28 +03:00
Rémi Denis-Courmont 9091ffb006 lavu/float_dsp: adjust multipler in R-V V fmul_window
The gather index vector is only used as double-length (due to register
pressure), so no need to initialise it for quad-length. Basically this
matches the multiplier in the prologue to the the multipler in the loop.
2023-10-09 19:52:28 +03:00
Rémi Denis-Courmont eb73d178ea lavu/fixed_dsp: R-V V scalarproduct 2023-10-07 17:45:39 +03:00
Rémi Denis-Courmont 9240035c0e lavu/float_dsp: avoid reg-stride in R-V V fmul_window 2023-10-03 22:48:10 +03:00
Rémi Denis-Courmont 446b0090cb lavu/float_dsp: avoid reg-stride in R-V V reverse_fmul
This revectors the inner loop to reverse vectors element in vectors,
thus eliminating the negative register stride. Note that RVV does not
have a vector reverse instruction, so this uses a gather.
2023-10-03 20:48:47 +03:00
Rémi Denis-Courmont cec48e3b32 riscv: factor out the bswap32 assembler 2023-10-02 22:28:21 +03:00
Rémi Denis-Courmont 7a24d794f6 Revert "lavu/timer: remove gratuitous volatile"
It does not make much sense to me, but GCC somehow optimises the
inline assembler even though the output is very obviously used and
having observable side effects.

This reverts commit 09731fbfc3.
2023-09-28 17:48:18 +03:00
Rémi Denis-Courmont 6f8ac298da lavu/timer: specify RISC-V time unit 2023-08-24 20:58:57 +03:00
Rémi Denis-Courmont 09731fbfc3 lavu/timer: remove gratuitous volatile
AV_READ_TIME has no side effects. It does not need to be volatile.
2023-08-24 20:58:57 +03:00
Rémi Denis-Courmont 05115a77e0 lavu/timer: use time for AV_READ_TIME on RISC-V
So far, AV_READ_TIME would return the cycle counter. This posed two
problems:
1) On recent systems, it would just raise an illegal instruction
   exception. Indeed RDCYCLE is blocked in user space to ward off some
   side channel attacks. In particular, this would cause the random
   number generator to crash.
2) It does not match the x86 behaviour and the apparent original intent
   of AV_READ_TIME in the functional code base (outside test cases).

So this replaces the cycle counter with the time counter. The unit is
a platform-dependent constant fraction of time, and the value should be
stable across harts (RISC-V lingo for physical CPU thread).
2023-08-24 20:58:57 +03:00
Rémi Denis-Courmont 29b9d616c2 lavu/float_dsp: rework RISC-V V scalar product
1) Take the reductive sum out of the loop,
   leaving a regular vector addition in the loop.
2) Merge the addition and the multiplication.
3) Unroll.

Before:
scalarproduct_float_rvv_f32: 832.5

After:
scalarproduct_float_rvv_f32: 275.2
2023-07-20 22:54:34 +03:00
Rémi Denis-Courmont b710f881ce lavu/float_dsp: unroll RISC-V V loops
butterflies_float_c: 1057.0
butterflies_float_rvv_f32: 351.0 (before)
butterflies_float_rvv_f32: 329.5 (after)

vector_dmac_scalar_c: 819.0
vector_dmac_scalar_rvv_f64: 670.5 (before)
vector_dmac_scalar_rvv_f64: 431.0 (after)

vector_dmul_c: 800.2
vector_dmul_rvv_f64: 541.5 (before)
vector_dmul_rvv_f64: 426.0 (after)

vector_dmul_scalar_c: 545.7
vector_dmul_scalar_rvv_f64: 670.7 (before)
vector_dmul_scalar_rvv_f64: 324.7 (after)

vector_fmac_scalar_c: 804.5
vector_fmac_scalar_rvv_f32: 412.7 (before)
vector_fmac_scalar_rvv_f32: 214.5 (after)

vector_fmul_c: 811.2
vector_fmul_rvv_f32: 285.7 (before)
vector_fmul_rvv_f32: 214.2 (after)

vector_fmul_add_c: 1313.0
vector_fmul_add_rvv_f32: 349.0 (before)
vector_fmul_add_rvv_f32: 290.2 (after)

vector_fmul_reverse_c: 815.7
vector_fmul_reverse_rvv_f32: 529.2 (before)
vector_fmul_reverse_rvv_f32: 515.7 (after)

vector_fmul_scalar_c: 546.0
vector_fmul_scalar_rvv_f32: 350.2 (before)
vector_fmul_scalar_rvv_f32: 169.5 (after)
2023-07-20 22:54:34 +03:00
Rémi Denis-Courmont b6585eb04c lavu: add/use flag for RISC-V Zba extension
The code was blindly assuming that Zbb or V implied Zba. While the
earlier is practically always true, the later broke some QEMU setups,
as V was introduced earlier than Zba.
2023-07-19 19:29:35 +03:00
Rémi Denis-Courmont 3d79afbe70 lavu/fixed_dsp: unroll RISC-V V loop
Before:
butterflies_fixed_c: 804.7
butterflies_fixed_rvv_i32: 348.2

After:
butterflies_fixed_rvv_i32: 308.7
2023-07-17 18:48:42 +03:00
Rémi Denis-Courmont 0e580806d8 riscv/intmath: use builtins for counting ones
As with the earlier bswap change, all versions of GCC and Clang that
support RISC-V support the popcount built-ins, so we can just use them
instead of inline assembler.
2023-05-02 22:08:25 +02:00
Rémi Denis-Courmont 7dcb5e1ab0 riscv/bswap: use compiler builtins
av_bswapXX() are used in context that expect exact size types, notably
variable arguments to av_log(). On Linux RV64, uint_fast32_t is an
unsigned long, so the current inline assembler does not work properly.

Since GCC and Clang gained their byte-swap built-ins before they
supported RISC-V, we can simply defer to them. As an added bonus, the
compiler can do instruction scheduling, which it couldn't with the Zbb
inline assembler.
2023-05-02 22:08:21 +02:00
Rémi Denis-Courmont 96a83ceea4 riscv: fix scalar product initialisation
VSETVLI xd, x0, ...' has rather nonobvious semantics:
- If xd is x0, then it preserves the current vector length.
- If xd is not x0, it sets the vector length to the supported maximum.

Also somewhat confusingly, while VMV.X.S always does its thing
regardless of the selected vector length, VMV.S.X does _nothing_ if the
selected vector length is zero.

So the current code breaks fails to initialise the accumulator if we
are unlucky to have a selected vector length of zero on entry. Fix it
by forcing the vector length to one.
2022-10-13 10:17:38 +02:00
Rémi Denis-Courmont f59a767ccd lavu/riscv: helper macro for VTYPE encoding
On most cases, the vector type (VTYPE) for the RISC-V Vector extension
is supplied as an immediate value, with either of the VSETVLI or
VSETIVLI instructions. There is however a third instruction VSETVL
which takes the vector type from a general purpose register. That is so
the type can be selected at run-time.

This introduces a macro to load a (valid) vector type into a register.
The syntax follows that of VSETVLI and VSETIVLI, with element size,
group multiplier, then tail and mask policies.
2022-10-10 02:22:12 +02:00
Rémi Denis-Courmont 37d5ddc317 lavu/riscv: CPU flag for the Zbb extension
Unfortunately, it is common, and will remain so, that the Bit
manipulations are not enabled at compilation time. This is an official
policy for Debian ports in general (though they do not support RISC-V
officially as of yet) to stick to the minimal target baseline, which
does not include the B extension or even its Zbb subset.

For inline helpers (CPOP, REV8), compiler builtins (CTZ, CLZ) or
even plain C code (MIN, MAX, MINU, MAXU), run-time detection seems
impractical. But at least it can work for the byte-swap DSP functions.
2022-10-05 08:26:19 +02:00
Rémi Denis-Courmont 3ba5579e55 riscv: remove unnecessary #include's
Pointed out by Andreas Rheinhardt.
2022-10-05 06:54:56 +02:00
Rémi Denis-Courmont c47ebfa141 lavu/riscv: helper to read the vector length 2022-09-28 11:43:17 +02:00
Rémi Denis-Courmont c1bb19e263 lavu/fixeddsp: RISC-V V butterflies_fixed 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont cd77662953 lavu/floatdsp: RISC-V V scalarproduct_float 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont b493370662 lavu/floatdsp: RISC-V V vector_fmul_window 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont 9aeb6aca3a lavu/floatdsp: RISC-V V vector_fmul_reverse 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont 47ce9735cc lavu/floatdsp: RISC-V V butterflies_float 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont f4ea45040f lavu/floatdsp: RISC-V V vector_fmul_add 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont d120ab5b91 lavu/floatdsp: RISC-V V vector_dmac_scalar 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont c3db27ba95 lavu/floatdsp: RISC-V V vector_fmac_scalar 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont da169a210d lavu/floatdsp: RISC-V V vector_dmul 2022-09-27 13:19:52 +02:00