Commit Graph

19 Commits

Author SHA1 Message Date
Rémi Denis-Courmont e49f41fb27 lavu/float_dsp: optimise R-V V fmul_reverse & fmul_window
Roll the loop to avoid slow gathers.

Before:
vector_fmul_reverse_c:       1561.7
vector_fmul_reverse_rvv_f32: 2410.2
vector_fmul_window_c:        2068.2
vector_fmul_window_rvv_f32:  1879.5

After:
vector_fmul_reverse_c:       1561.7
vector_fmul_reverse_rvv_f32:  916.2
vector_fmul_window_c:        2068.2
vector_fmul_window_rvv_f32:  1202.5
2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont 9091ffb006 lavu/float_dsp: adjust multipler in R-V V fmul_window
The gather index vector is only used as double-length (due to register
pressure), so no need to initialise it for quad-length. Basically this
matches the multiplier in the prologue to the the multipler in the loop.
2023-10-09 19:52:28 +03:00
Rémi Denis-Courmont 9240035c0e lavu/float_dsp: avoid reg-stride in R-V V fmul_window 2023-10-03 22:48:10 +03:00
Rémi Denis-Courmont 446b0090cb lavu/float_dsp: avoid reg-stride in R-V V reverse_fmul
This revectors the inner loop to reverse vectors element in vectors,
thus eliminating the negative register stride. Note that RVV does not
have a vector reverse instruction, so this uses a gather.
2023-10-03 20:48:47 +03:00
Rémi Denis-Courmont 29b9d616c2 lavu/float_dsp: rework RISC-V V scalar product
1) Take the reductive sum out of the loop,
   leaving a regular vector addition in the loop.
2) Merge the addition and the multiplication.
3) Unroll.

Before:
scalarproduct_float_rvv_f32: 832.5

After:
scalarproduct_float_rvv_f32: 275.2
2023-07-20 22:54:34 +03:00
Rémi Denis-Courmont b710f881ce lavu/float_dsp: unroll RISC-V V loops
butterflies_float_c: 1057.0
butterflies_float_rvv_f32: 351.0 (before)
butterflies_float_rvv_f32: 329.5 (after)

vector_dmac_scalar_c: 819.0
vector_dmac_scalar_rvv_f64: 670.5 (before)
vector_dmac_scalar_rvv_f64: 431.0 (after)

vector_dmul_c: 800.2
vector_dmul_rvv_f64: 541.5 (before)
vector_dmul_rvv_f64: 426.0 (after)

vector_dmul_scalar_c: 545.7
vector_dmul_scalar_rvv_f64: 670.7 (before)
vector_dmul_scalar_rvv_f64: 324.7 (after)

vector_fmac_scalar_c: 804.5
vector_fmac_scalar_rvv_f32: 412.7 (before)
vector_fmac_scalar_rvv_f32: 214.5 (after)

vector_fmul_c: 811.2
vector_fmul_rvv_f32: 285.7 (before)
vector_fmul_rvv_f32: 214.2 (after)

vector_fmul_add_c: 1313.0
vector_fmul_add_rvv_f32: 349.0 (before)
vector_fmul_add_rvv_f32: 290.2 (after)

vector_fmul_reverse_c: 815.7
vector_fmul_reverse_rvv_f32: 529.2 (before)
vector_fmul_reverse_rvv_f32: 515.7 (after)

vector_fmul_scalar_c: 546.0
vector_fmul_scalar_rvv_f32: 350.2 (before)
vector_fmul_scalar_rvv_f32: 169.5 (after)
2023-07-20 22:54:34 +03:00
Rémi Denis-Courmont 96a83ceea4 riscv: fix scalar product initialisation
VSETVLI xd, x0, ...' has rather nonobvious semantics:
- If xd is x0, then it preserves the current vector length.
- If xd is not x0, it sets the vector length to the supported maximum.

Also somewhat confusingly, while VMV.X.S always does its thing
regardless of the selected vector length, VMV.S.X does _nothing_ if the
selected vector length is zero.

So the current code breaks fails to initialise the accumulator if we
are unlucky to have a selected vector length of zero on entry. Fix it
by forcing the vector length to one.
2022-10-13 10:17:38 +02:00
Rémi Denis-Courmont 3ba5579e55 riscv: remove unnecessary #include's
Pointed out by Andreas Rheinhardt.
2022-10-05 06:54:56 +02:00
Rémi Denis-Courmont cd77662953 lavu/floatdsp: RISC-V V scalarproduct_float 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont b493370662 lavu/floatdsp: RISC-V V vector_fmul_window 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont 9aeb6aca3a lavu/floatdsp: RISC-V V vector_fmul_reverse 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont 47ce9735cc lavu/floatdsp: RISC-V V butterflies_float 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont f4ea45040f lavu/floatdsp: RISC-V V vector_fmul_add 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont d120ab5b91 lavu/floatdsp: RISC-V V vector_dmac_scalar 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont c3db27ba95 lavu/floatdsp: RISC-V V vector_fmac_scalar 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont da169a210d lavu/floatdsp: RISC-V V vector_dmul 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont 7058af9969 lavu/floatdsp: RISC-V V vector_fmul 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont 89b7ec65a8 lavu/floatdsp: RISC-V V vector_dmul_scalar 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont a6c10d05fe lavu/floatdsp: RISC-V V vector_fmul_scalar
This is based on existing code from the VLC git tree with two minor
changes to account for the different function prototypes.
2022-09-27 13:19:52 +02:00