Commit Graph

410 Commits

Author SHA1 Message Date
Andreas Rheinhardt
a84fe06112 avcodec/idctdsp: Avoid inclusion of avcodec.h
Not every user of idctdsp.h wants to initialize an IDCTDSPContext;
e.g. the proresdsp only uses ff_init_scantable_permutation()
and the IDCT permutation enum; similarly for cavsdsp and wmv2dsp.
Using a forward declaration here avoids an avcodec.h dependency
in the relevant files.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-09-11 00:26:34 +02:00
Andreas Rheinhardt
f8503b4c33 avutil/internal: Don't auto-include emms.h
Instead include emms.h wherever it is needed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-09-04 11:04:45 +02:00
Andreas Rheinhardt
a39d6e81fa tests/checkasm/sw_scale: Avoid declare_func_emms where possible
This makes the test stricter because it is checked that the
MMX registers are not accidentally clobbered.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-09-04 11:04:45 +02:00
Andreas Rheinhardt
e7866e00c8 tests/checkasm/llvidencdsp: Don't use declare_func_emms
Only sub_media_pred has an MMXEXT version, so one can use
the version with the stricter check (that checks that
the MMX registers have not been clobbered) for sub_left_predict.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-09-04 11:04:45 +02:00
Andreas Rheinhardt
3f82b38516 tests/checkasm/hevc_*: Avoid using declare_func_emms where possible
Only the idct_dc and add_residual functions have MMX versions,
so one can use the version with the stricter check (that checks
that the MMX registers have not been clobbered) for all the other
checks.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-09-04 11:04:45 +02:00
Matthias Dressel
e41bd6e65e checkasm: hevc_sao: Fix a regression in hevc_sao_edge
check_func() might return NULL, in which case the function is not to be
benched. Introduced in cc679054c7.

Signed-off-by: Matthias Dressel <code@deadcode.eu>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-08-24 22:09:37 +03:00
Rémi Denis-Courmont
f25ad0fe02 checkasm: improve Linux perf error message
Report the failing system call name, as is convention, rather than just
a rather unhelpful "syscall".
2023-07-22 21:35:15 +03:00
Rémi Denis-Courmont
b6585eb04c lavu: add/use flag for RISC-V Zba extension
The code was blindly assuming that Zbb or V implied Zba. While the
earlier is practically always true, the later broke some QEMU setups,
as V was introduced earlier than Zba.
2023-07-19 19:29:35 +03:00
Rémi Denis-Courmont
98e4dd39c5 checkasm: test Zbb before V
Without this, Zbb functions get shadowed by V functions on devices
supporting both extensions, and never tested.
2023-07-19 19:29:35 +03:00
Rémi Denis-Courmont
d8ea5f50e2 checkasm: print usage on invalid arguments
This checks that arguments are handled. If not, then this prints a
short usage notice and returns an error.
2023-07-17 18:48:42 +03:00
John Cox
697533e76d avfilter/vf_bwdif: Add a filter_line3 method for optimisation
Add an optional filter_line3 to the available optimisations.

filter_line3 is equivalent to filter_line, memcpy, filter_line

filter_line shares quite a number of loads and some calculations in
common with its next iteration and testing shows that using aarch64
neon filter_line3s performance is 30% better than two filter_lines
and a memcpy.

Adds a test for vf_bwdif filter_line3 to checkasm

Rounds job start lines down to a multiple of 4. This means that if
filter_line3 exists then filter_line will not sometimes be called
once at the end of a slice depending on thread count. The final slice
may do up to 3 extra lines but filter_edge is faster than filter_line
so it is unlikely to create any noticable thread load variation.

Signed-off-by: John Cox <jc@kynesim.co.uk>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-07-06 00:21:05 +03:00
John Cox
7ed7c00f55 tests/checkasm: Add test for vf_bwdif filter_edge
Signed-off-by: John Cox <jc@kynesim.co.uk>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-07-06 00:21:05 +03:00
John Cox
7caa8d6b91 tests/checkasm: Add test for vf_bwdif filter_intra
Signed-off-by: John Cox <jc@kynesim.co.uk>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-07-06 00:21:05 +03:00
Martin Storsjö
397cb623c8 aarch64: Add cpu flags for the dotprod and i8mm extensions
Set these available if they are available unconditionally for
the compiler.

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-06-06 12:40:42 +03:00
Lynne
783270bfd1
checkasm: add h264chroma tests
Checks all variants of put_h264_chroma and avg_h264_chroma.
2023-05-20 20:07:21 +02:00
xufuji456
1e91a39502 checkasm: pass context as pointer
Signed-off-by: xufuji456 <839789740@qq.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-04-13 15:17:04 +03:00
xufuji456
30def6365d checkasm/hevc: add transform_luma test
Signed-off-by: xufuji456 <839789740@qq.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-04-13 15:17:04 +03:00
J. Dekker
68c151cb1b checkasm: add hevc_deblock chroma test
Signed-off-by: J. Dekker <jdek@itanimul.li>
2023-04-06 06:16:57 +02:00
James Darnley
087faf8cac checkasm: add test for bwdif 2023-03-25 02:38:17 +01:00
Lynne
bbe95f7353
x86: replace explicit REP_RETs with RETs
From x86inc:
> On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
> a branch or a branch target. So switch to a 2-byte form of ret in that case.
> We can automatically detect "follows a branch", but not a branch target.
> (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.)

x86inc can automatically determine whether to use REP_RET rather than
REP in most of these cases, so impact is minimal. Additionally, a few
REP_RETs were used unnecessary, despite the return being nowhere near a
branch.

The only CPUs affected were AMD K10s, made between 2007 and 2011, 16
years ago and 12 years ago, respectively.

In the future, everyone involved with x86inc should consider dropping
REP_RETs altogether.
2023-02-01 04:23:55 +01:00
James Darnley
eef763c705 checkasm/v210dec: add extra space to the destination arrays 2022-12-21 00:36:49 +01:00
James Darnley
6af453ca38 avcodec/x86: add avx512icl function for v210dec
Ice Lake (Xeon Silver 4316): 2.01x faster (1147±36.8 vs. 571±38.2 decicycles) compared with avx2
2022-12-20 15:02:45 +01:00
James Darnley
cfd1c3c0a1 checkasm/v210enc: test the entire width of 10-bit planar input arrays 2022-12-01 18:19:03 +01:00
bwang30
3ab11dc5bb libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI
This commit enabled assembly code with intel AVX512 VNNI and added unit test for sobel filter

sobel_c: 4537
sobel_avx512icl 2136

Signed-off-by: bwang30 <bin.wang@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-11-14 10:04:16 +08:00
Lynne
e0661fc805
dca_core: convert to lavu/tx
Thanks to Martin Storsjö <martin@martin.st> for fixing and testing the
arm32 and aarch64 changes.
2022-11-06 14:39:36 +01:00
James Darnley
1936c06f02 checkasm: add a verbose check function for uint32_t data 2022-11-04 19:37:46 +01:00
Andreas Rheinhardt
37ee36f689 checkasm/idctdsp: Use declare_func_emms only when needed
There is no MMX code for (add|put|put_signed)_pixels_clamped
since commit bfb28b5ce8, so use
declare_func instead of declare_func_emms() to also test that
we are not in MMX mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Andreas Rheinhardt
5102b98b7a checkasm/llviddspenc: Use declare_func_emms only when needed
There is no MMX code for diff_bytes since commit
230ea38de1, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Andreas Rheinhardt
e814569c8d checkasm/huffyuvdsp: Use declare_func_emms only when needed
There is no MMX code for add_int16 since commit
4b6ffc2880, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Andreas Rheinhardt
cd8a33bcce checkasm/llviddsp: Be strict about MMX
There is no MMX code for llviddsp after commit
fed07efcde, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Andreas Rheinhardt
b4e2d67636 checkasm/pixblockdsp: Be strict about MMX
There is no MMX code for pixblockdsp after commit
92b58002776edd3a3df03c90e8a3ab24b8f987de, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Andreas Rheinhardt
42921190cb checkasm/audiodsp: Be strict about MMX
There is no MMX code for audiodsp after commit
3d716d38ab, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Andreas Rheinhardt
18afaa20f1 checkasm/blockdsp: Be strict about MMX
There is no MMX code for blockdsp after commit
ee551a21dd, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Andreas Rheinhardt
f224c195e0 checkasm/vc1dsp: Use declare_func_emms only when needed
There is no MMX code for vc1_inv_trans_8x8 or
vc1_unescape_buffer, so use declare_func instead of
declare_func_emms() to also test that we are not in MMX
mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Rémi Denis-Courmont
c962c78901 checkasm: RISC-V 64-bit assembler test harness 2022-10-10 02:23:18 +02:00
Andreas Rheinhardt
bcfa427c8f checkasm/vp8dsp: Use declare_func_emms only when needed
There is no MMX code for loop filters since commit
6a551f1405, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-08 09:33:36 +02:00
Rémi Denis-Courmont
37d5ddc317 lavu/riscv: CPU flag for the Zbb extension
Unfortunately, it is common, and will remain so, that the Bit
manipulations are not enabled at compilation time. This is an official
policy for Debian ports in general (though they do not support RISC-V
officially as of yet) to stick to the minimal target baseline, which
does not include the B extension or even its Zbb subset.

For inline helpers (CPOP, REV8), compiler builtins (CTZ, CLZ) or
even plain C code (MIN, MAX, MINU, MAXU), run-time detection seems
impractical. But at least it can work for the byte-swap DSP functions.
2022-10-05 08:26:19 +02:00
Rémi Denis-Courmont
0c0a3deb18 lavu/cpu: CPU flags for the RISC-V Vector extension
RVV defines a total of 12 different extensions, including:

- 5 different instruction subsets:
  - Zve32x: 8-, 16- and 32-bit integers,
  - Zve32f: Zve32x plus single precision floats,
  - Zve64x: Zve32x plus 64-bit integers,
  - Zve64f: Zve32f plus Zve64x,
  - Zve64d: Zve64f plus double precision floats.

- 6 different vector lengths:
  - Zvl32b (embedded only),
  - Zvl64b (embedded only),
  - Zvl128b,
  - Zvl256b,
  - Zvl512b,
  - Zvl1024b,

- and the V extension proper: equivalent to Zve64f and Zvl128b.

In total, there are 6 different possible sets of supported instructions
(including the empty set), but for convenience we allocate one bit for
each type sets: up-to-32-bit ints (RVV_I32), floats (RVV_F32),
64-bit ints (RVV_I64) and doubles (RVV_F64).

Whence the vector size is needed, it can be retrieved by reading the
unprivileged read-only vlenb CSR. This should probably be a separate
helper macro if needed at a later point.
2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
b95e2fbd85 lavu/cpu: detect RISC-V base extensions
This introduces compile-time and run-time CPU detection on RISC-V. In
practice, I doubt that FFmpeg will ever see a RISC-V CPU without all of
I, F and D extensions, and if it does, it probably won't have run-time
detection. So the flags are essentially always set.

But as things stand, checkasm wants them that way. Compare the ARMV8
flag on AArch64. We are nowhere near running short on CPU flag bits.
2022-09-27 13:19:52 +02:00
Lynne
ace42cf581
x86/tx_float: add 15xN PFA FFT AVX SIMD
~4x faster than the C version.
The shuffles in the 15pt dim1 are seriously expensive. Not happy with it,
but I'm contempt.

Can be easily converted to pure AVX by removing all vpermpd/vpermps
instructions.
2022-09-23 12:35:27 +02:00
Lynne
668f43af20
tests/checkasm/lpc: correct arithmetic when randomizing buffers
Results weren't signed.
2022-09-23 01:50:59 +02:00
Lynne
6ad39f01df
tests/checkasm/lpc: reduce range and use signed values
This is more similar to its regular use, and prevents inaccuracies
of huge float*float multiplications from failing the tests.
2022-09-23 01:42:34 +02:00
James Almer
9cbfffa0d4 tests/checkasm/lpc: print mismatching values
Will help debugging.

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-22 18:18:52 -03:00
James Almer
a1c6f4b653 tests/checkasm/lpc: randomize buffer length
Simplifies the test, while trying more values and preventing pointlessly
running benchmarks in a loop.

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-22 18:17:26 -03:00
James Almer
c8c4a162fc avcodec/lpc: use ptrdiff_t for length parameters
Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-22 18:17:26 -03:00
Lynne
b67776e12f
x86/lpc: fix even scalar loop overreads/writes
Passes checkasm with valgrind, tested to sizes of more than 4000 samples.
2022-09-22 04:27:19 +02:00
Andreas Rheinhardt
9beba05311 avcodec/fmtconvert: Remove unused AVCodecContext parameter
Unused since d74a8cb7e4.

Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-21 20:26:40 +02:00
Andreas Rheinhardt
fd72d8aea3 avcodec/blockdsp: Remove unused AVCodecContext parameter
Possible since be95df12bb.

Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-21 20:24:40 +02:00
Lynne
3ade6a8644
x86/lpc: implement a new Welch windowing function
Old one was written with the assumption only even inputs would be given.
This very messy replacement supports even and odd inputs, and supports
AVX2 for extra speed. The buffers given are usually quite big (4k samples),
so the speedup is worth it.
The new SSE version is still faster than the old inline asm version by 33%.

Also checkasm is provided to make sure this monstrosity works.

This fixes some FATE tests.
2022-09-21 07:12:39 +02:00
James Almer
8f119b501e tests/checkasm: add a test for VorbisDSPContext
Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-19 21:28:23 -03:00