ffmpeg

Commit Graph

Author	SHA1	Message	Date
Rémi Denis-Courmont	e33ce0d9dd	lavu/fixed_dsp: R-V V fmul_window_scaled vector_fmul_window_scaled_fixed_c: 4393.7 vector_fmul_window_scaled_fixed_rvv_i64: 1642.7	2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont	e49f41fb27	lavu/float_dsp: optimise R-V V fmul_reverse & fmul_window Roll the loop to avoid slow gathers. Before: vector_fmul_reverse_c: 1561.7 vector_fmul_reverse_rvv_f32: 2410.2 vector_fmul_window_c: 2068.2 vector_fmul_window_rvv_f32: 1879.5 After: vector_fmul_reverse_c: 1561.7 vector_fmul_reverse_rvv_f32: 916.2 vector_fmul_window_c: 2068.2 vector_fmul_window_rvv_f32: 1202.5	2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont	3a134e8299	lavu/fixed_dsp: optimise R-V V fmul_reverse Gathers are (unsurprisingly) a notable exception to the rule that R-V V gets faster with larger group multipliers. So roll the function to speed it up. Before: vector_fmul_reverse_fixed_c: 2840.7 vector_fmul_reverse_fixed_rvv_i32: 2430.2 After: vector_fmul_reverse_fixed_c: 2841.0 vector_fmul_reverse_fixed_rvv_i32: 962.2 It might be possible to further optimise the function by moving the reverse-subtract out of the loop and adding ad-hoc tail handling.	2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont	cd6089dc9c	riscv: fix builds without Zbb support	2023-11-18 22:01:59 +02:00
Rémi Denis-Courmont	04b49fb3c5	lavu/riscv: fix typo	2023-10-29 22:15:15 +02:00
Rémi Denis-Courmont	f39a8790e1	lavu/fixed_dsp: R-V V vector_fmul_window	2023-10-09 19:52:28 +03:00
Rémi Denis-Courmont	10eb3b9c9f	lavu/fixed_dsp: R-V V vector_fmul vector_fmul_fixed_c: 4.0 vector_fmul_fixed_rvv_i64: 0.5	2023-10-09 19:52:28 +03:00
Rémi Denis-Courmont	da7a77fb0a	lavu/fixed_dsp: R-V V vector_fmul_reverse	2023-10-09 19:52:28 +03:00
Rémi Denis-Courmont	bf911cc1bf	lavu/fixed_dsp: R-V V vector_fmul_add vector_fmul_add_fixed_c: 2.2 vector_fmul_add_fixed_rvv_i64: 0.5	2023-10-09 19:52:28 +03:00
Rémi Denis-Courmont	9091ffb006	lavu/float_dsp: adjust multipler in R-V V fmul_window The gather index vector is only used as double-length (due to register pressure), so no need to initialise it for quad-length. Basically this matches the multiplier in the prologue to the the multipler in the loop.	2023-10-09 19:52:28 +03:00
Rémi Denis-Courmont	eb73d178ea	lavu/fixed_dsp: R-V V scalarproduct	2023-10-07 17:45:39 +03:00
Rémi Denis-Courmont	9240035c0e	lavu/float_dsp: avoid reg-stride in R-V V fmul_window	2023-10-03 22:48:10 +03:00
Rémi Denis-Courmont	446b0090cb	lavu/float_dsp: avoid reg-stride in R-V V reverse_fmul This revectors the inner loop to reverse vectors element in vectors, thus eliminating the negative register stride. Note that RVV does not have a vector reverse instruction, so this uses a gather.	2023-10-03 20:48:47 +03:00
Rémi Denis-Courmont	cec48e3b32	riscv: factor out the bswap32 assembler	2023-10-02 22:28:21 +03:00
Rémi Denis-Courmont	7a24d794f6	Revert "lavu/timer: remove gratuitous volatile" It does not make much sense to me, but GCC somehow optimises the inline assembler even though the output is very obviously used and having observable side effects. This reverts commit `09731fbfc3`.	2023-09-28 17:48:18 +03:00
Rémi Denis-Courmont	6f8ac298da	lavu/timer: specify RISC-V time unit	2023-08-24 20:58:57 +03:00
Rémi Denis-Courmont	09731fbfc3	lavu/timer: remove gratuitous volatile AV_READ_TIME has no side effects. It does not need to be volatile.	2023-08-24 20:58:57 +03:00
Rémi Denis-Courmont	05115a77e0	lavu/timer: use time for AV_READ_TIME on RISC-V So far, AV_READ_TIME would return the cycle counter. This posed two problems: 1) On recent systems, it would just raise an illegal instruction exception. Indeed RDCYCLE is blocked in user space to ward off some side channel attacks. In particular, this would cause the random number generator to crash. 2) It does not match the x86 behaviour and the apparent original intent of AV_READ_TIME in the functional code base (outside test cases). So this replaces the cycle counter with the time counter. The unit is a platform-dependent constant fraction of time, and the value should be stable across harts (RISC-V lingo for physical CPU thread).	2023-08-24 20:58:57 +03:00
Rémi Denis-Courmont	29b9d616c2	lavu/float_dsp: rework RISC-V V scalar product 1) Take the reductive sum out of the loop, leaving a regular vector addition in the loop. 2) Merge the addition and the multiplication. 3) Unroll. Before: scalarproduct_float_rvv_f32: 832.5 After: scalarproduct_float_rvv_f32: 275.2	2023-07-20 22:54:34 +03:00
Rémi Denis-Courmont	b710f881ce	lavu/float_dsp: unroll RISC-V V loops butterflies_float_c: 1057.0 butterflies_float_rvv_f32: 351.0 (before) butterflies_float_rvv_f32: 329.5 (after) vector_dmac_scalar_c: 819.0 vector_dmac_scalar_rvv_f64: 670.5 (before) vector_dmac_scalar_rvv_f64: 431.0 (after) vector_dmul_c: 800.2 vector_dmul_rvv_f64: 541.5 (before) vector_dmul_rvv_f64: 426.0 (after) vector_dmul_scalar_c: 545.7 vector_dmul_scalar_rvv_f64: 670.7 (before) vector_dmul_scalar_rvv_f64: 324.7 (after) vector_fmac_scalar_c: 804.5 vector_fmac_scalar_rvv_f32: 412.7 (before) vector_fmac_scalar_rvv_f32: 214.5 (after) vector_fmul_c: 811.2 vector_fmul_rvv_f32: 285.7 (before) vector_fmul_rvv_f32: 214.2 (after) vector_fmul_add_c: 1313.0 vector_fmul_add_rvv_f32: 349.0 (before) vector_fmul_add_rvv_f32: 290.2 (after) vector_fmul_reverse_c: 815.7 vector_fmul_reverse_rvv_f32: 529.2 (before) vector_fmul_reverse_rvv_f32: 515.7 (after) vector_fmul_scalar_c: 546.0 vector_fmul_scalar_rvv_f32: 350.2 (before) vector_fmul_scalar_rvv_f32: 169.5 (after)	2023-07-20 22:54:34 +03:00
Rémi Denis-Courmont	b6585eb04c	lavu: add/use flag for RISC-V Zba extension The code was blindly assuming that Zbb or V implied Zba. While the earlier is practically always true, the later broke some QEMU setups, as V was introduced earlier than Zba.	2023-07-19 19:29:35 +03:00
Rémi Denis-Courmont	3d79afbe70	lavu/fixed_dsp: unroll RISC-V V loop Before: butterflies_fixed_c: 804.7 butterflies_fixed_rvv_i32: 348.2 After: butterflies_fixed_rvv_i32: 308.7	2023-07-17 18:48:42 +03:00
Rémi Denis-Courmont	0e580806d8	riscv/intmath: use builtins for counting ones As with the earlier bswap change, all versions of GCC and Clang that support RISC-V support the popcount built-ins, so we can just use them instead of inline assembler.	2023-05-02 22:08:25 +02:00
Rémi Denis-Courmont	7dcb5e1ab0	riscv/bswap: use compiler builtins av_bswapXX() are used in context that expect exact size types, notably variable arguments to av_log(). On Linux RV64, uint_fast32_t is an unsigned long, so the current inline assembler does not work properly. Since GCC and Clang gained their byte-swap built-ins before they supported RISC-V, we can simply defer to them. As an added bonus, the compiler can do instruction scheduling, which it couldn't with the Zbb inline assembler.	2023-05-02 22:08:21 +02:00
Rémi Denis-Courmont	96a83ceea4	riscv: fix scalar product initialisation VSETVLI xd, x0, ...' has rather nonobvious semantics: - If xd is x0, then it preserves the current vector length. - If xd is not x0, it sets the vector length to the supported maximum. Also somewhat confusingly, while VMV.X.S always does its thing regardless of the selected vector length, VMV.S.X does _nothing_ if the selected vector length is zero. So the current code breaks fails to initialise the accumulator if we are unlucky to have a selected vector length of zero on entry. Fix it by forcing the vector length to one.	2022-10-13 10:17:38 +02:00
Rémi Denis-Courmont	f59a767ccd	lavu/riscv: helper macro for VTYPE encoding On most cases, the vector type (VTYPE) for the RISC-V Vector extension is supplied as an immediate value, with either of the VSETVLI or VSETIVLI instructions. There is however a third instruction VSETVL which takes the vector type from a general purpose register. That is so the type can be selected at run-time. This introduces a macro to load a (valid) vector type into a register. The syntax follows that of VSETVLI and VSETIVLI, with element size, group multiplier, then tail and mask policies.	2022-10-10 02:22:12 +02:00
Rémi Denis-Courmont	37d5ddc317	lavu/riscv: CPU flag for the Zbb extension Unfortunately, it is common, and will remain so, that the Bit manipulations are not enabled at compilation time. This is an official policy for Debian ports in general (though they do not support RISC-V officially as of yet) to stick to the minimal target baseline, which does not include the B extension or even its Zbb subset. For inline helpers (CPOP, REV8), compiler builtins (CTZ, CLZ) or even plain C code (MIN, MAX, MINU, MAXU), run-time detection seems impractical. But at least it can work for the byte-swap DSP functions.	2022-10-05 08:26:19 +02:00
Rémi Denis-Courmont	3ba5579e55	riscv: remove unnecessary #include's Pointed out by Andreas Rheinhardt.	2022-10-05 06:54:56 +02:00
Rémi Denis-Courmont	c47ebfa141	lavu/riscv: helper to read the vector length	2022-09-28 11:43:17 +02:00
Rémi Denis-Courmont	c1bb19e263	lavu/fixeddsp: RISC-V V butterflies_fixed	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	cd77662953	lavu/floatdsp: RISC-V V scalarproduct_float	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	b493370662	lavu/floatdsp: RISC-V V vector_fmul_window	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	9aeb6aca3a	lavu/floatdsp: RISC-V V vector_fmul_reverse	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	47ce9735cc	lavu/floatdsp: RISC-V V butterflies_float	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	f4ea45040f	lavu/floatdsp: RISC-V V vector_fmul_add	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	d120ab5b91	lavu/floatdsp: RISC-V V vector_dmac_scalar	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	c3db27ba95	lavu/floatdsp: RISC-V V vector_fmac_scalar	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	da169a210d	lavu/floatdsp: RISC-V V vector_dmul	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	7058af9969	lavu/floatdsp: RISC-V V vector_fmul	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	89b7ec65a8	lavu/floatdsp: RISC-V V vector_dmul_scalar	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	a6c10d05fe	lavu/floatdsp: RISC-V V vector_fmul_scalar This is based on existing code from the VLC git tree with two minor changes to account for the different function prototypes.	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	39357cad37	lavu/riscv: fallback macros for SH{1, 2, 3}ADD Those mnemonics require the very latest binutils release at the time of writing. These macros provide seamless backward compatibility.	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	0c0a3deb18	lavu/cpu: CPU flags for the RISC-V Vector extension RVV defines a total of 12 different extensions, including: - 5 different instruction subsets: - Zve32x: 8-, 16- and 32-bit integers, - Zve32f: Zve32x plus single precision floats, - Zve64x: Zve32x plus 64-bit integers, - Zve64f: Zve32f plus Zve64x, - Zve64d: Zve64f plus double precision floats. - 6 different vector lengths: - Zvl32b (embedded only), - Zvl64b (embedded only), - Zvl128b, - Zvl256b, - Zvl512b, - Zvl1024b, - and the V extension proper: equivalent to Zve64f and Zvl128b. In total, there are 6 different possible sets of supported instructions (including the empty set), but for convenience we allocate one bit for each type sets: up-to-32-bit ints (RVV_I32), floats (RVV_F32), 64-bit ints (RVV_I64) and doubles (RVV_F64). Whence the vector size is needed, it can be retrieved by reading the unprivileged read-only vlenb CSR. This should probably be a separate helper macro if needed at a later point.	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	746f1ff36a	lavu/riscv: initial common header for assembler macros	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	b95e2fbd85	lavu/cpu: detect RISC-V base extensions This introduces compile-time and run-time CPU detection on RISC-V. In practice, I doubt that FFmpeg will ever see a RISC-V CPU without all of I, F and D extensions, and if it does, it probably won't have run-time detection. So the flags are essentially always set. But as things stand, checkasm wants them that way. Compare the ARMV8 flag on AArch64. We are nowhere near running short on CPU flag bits.	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	6df3ad9687	lavu/riscv: fix off-by-one in bit-magnitude clip	2022-09-15 18:11:12 -03:00
Rémi Denis-Courmont	a5ce44f301	lavu/riscv: fix av_clip_int16 Some serious copy-paste / squash / rebase mismanipulation here. Signed-off-by: James Almer <jamrial@gmail.com>	2022-09-14 14:37:21 -03:00
Rémi Denis-Courmont	c177108ae1	lavu/riscv: add <intmath.h> optimisations This provides some micro-optimisations for signed integer clipping, and support for bit weight with the Zbb extension.	2022-09-13 16:50:43 -03:00
Rémi Denis-Courmont	df2057041b	lavu/riscv: byte-swap operations If the target supports the Basic bit-manipulation (Zbb) extension, then the REV8 instruction is available to reverse byte order. Note that this instruction only exists at the "XLEN" register size, so we need to right shift the result down to the data width. If Zbb is not supported, then this patchset does nothing. Support for run-time detection is left for the future. Currently, there are no bits in auxv/ELF HWCAP for Z-extensions, so there are no clean ways to do this.	2022-09-13 16:50:43 -03:00
Rémi Denis-Courmont	d808070547	lavu/riscv: AV_READ_TIME cycle counter This uses the architected RISC-V 64-bit cycle counter from the RISC-V unprivileged instruction set. In 64-bit and 128-bit, this is a straightforward CSR read. In 32-bit mode, the 64-bit value is exposed as two CSRs, which cannot be read atomically, so a loop is necessary to detect and fix up the race condition where the bottom half wraps exactly between the two reads.	2022-09-13 16:50:43 -03:00

50 Commits