Commit Graph

113 Commits

Author SHA1 Message Date
Rémi Denis-Courmont b3825bbe45 riscv: test for assembler support
This should fix the build on LLVM 16 and earlier, at the cost of turning
all non-RVV optimisations off.
2023-12-08 17:21:09 +02:00
sunyuechi d0ec826077 checkasm/ac3dsp: add float_to_fixed24 test
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2023-12-01 20:26:48 +02:00
Rémi Denis-Courmont 7212466e73 checkasm/riscv: report an error upon SIGILL
Terminating the whole checkasm process is not very helpful. This will
report if an illegal instruction occurs while executing a tested
function. This is a common occurrence whilst developping RISC-V
assembler, due to the compatibility between vector configuration and
instruction done at run-time.
2023-11-23 19:04:07 +02:00
Rémi Denis-Courmont 286d674221 checkasm: add helper to report a fatal signal 2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont 6720a509a7 checkasm: add lossless audio DSP 2023-11-16 16:53:44 +02:00
Andreas Rheinhardt f8503b4c33 avutil/internal: Don't auto-include emms.h
Instead include emms.h wherever it is needed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-09-04 11:04:45 +02:00
Lynne 783270bfd1
checkasm: add h264chroma tests
Checks all variants of put_h264_chroma and avg_h264_chroma.
2023-05-20 20:07:21 +02:00
J. Dekker 68c151cb1b checkasm: add hevc_deblock chroma test
Signed-off-by: J. Dekker <jdek@itanimul.li>
2023-04-06 06:16:57 +02:00
James Darnley 087faf8cac checkasm: add test for bwdif 2023-03-25 02:38:17 +01:00
bwang30 3ab11dc5bb libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI
This commit enabled assembly code with intel AVX512 VNNI and added unit test for sobel filter

sobel_c: 4537
sobel_avx512icl 2136

Signed-off-by: bwang30 <bin.wang@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-11-14 10:04:16 +08:00
James Darnley 1936c06f02 checkasm: add a verbose check function for uint32_t data 2022-11-04 19:37:46 +01:00
Rémi Denis-Courmont c962c78901 checkasm: RISC-V 64-bit assembler test harness 2022-10-10 02:23:18 +02:00
Lynne 3ade6a8644
x86/lpc: implement a new Welch windowing function
Old one was written with the assumption only even inputs would be given.
This very messy replacement supports even and odd inputs, and supports
AVX2 for extra speed. The buffers given are usually quite big (4k samples),
so the speedup is worth it.
The new SSE version is still faster than the old inline asm version by 33%.

Also checkasm is provided to make sure this monstrosity works.

This fixes some FATE tests.
2022-09-21 07:12:39 +02:00
James Almer 8f119b501e tests/checkasm: add a test for VorbisDSPContext
Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-19 21:28:23 -03:00
Martin Storsjö 5cdf4c0bed checkasm: Silence warnings about unused return value from read()
This codepath is enabled by default on arm, if the linux perf API
is available, unless disabled with --disable-linux-perf.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-08 23:39:13 +03:00
Swinney, Jonathan c471cc7474 lavc/aarch64: motion estimation functions in neon
- ff_pix_abs16_neon
 - ff_pix_abs16_xy2_neon

In direct micro benchmarks of these ff functions verses their C implementations,
these functions performed as follows on AWS Graviton 3.

ff_pix_abs16_neon:
pix_abs_0_0_c: 141.1
pix_abs_0_0_neon: 19.6

ff_pix_abs16_xy2_neon:
pix_abs_0_3_c: 269.1
pix_abs_0_3_neon: 39.3

Tested with:
./tests/checkasm/checkasm --test=motion --bench --disable-linux-perf

Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-06-28 00:51:39 +03:00
Ben Avison bd3615a81a checkasm: Add idctdsp add/put-pixels-clamped tests
Signed-off-by: Ben Avison <bavison@riscosopen.org>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-04-01 10:03:33 +03:00
Ben Avison 20cb43ea8b checkasm: Add vc1dsp in-loop deblocking filter tests
Note that the benchmarking results for these functions are highly dependent
upon the input data. Therefore, each function is benchmarked twice,
corresponding to the best and worst case complexity of the reference C
implementation. The performance of a real stream decode will fall somewhere
between these two extremes.

Signed-off-by: Ben Avison <bavison@riscosopen.org>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-04-01 10:03:33 +03:00
Mark Reid 9e445a5be2 swscale/x86/output.asm: add x86-optimized planer gbr yuv2anyX functions
changes since v2:
 * fixed label
changes since v1:
 * remove vex intruction on sse4 path
 * some load/pack marcos use less intructions
 * fixed some typos

yuv2gbrp_full_X_4_512_c: 12757.6
yuv2gbrp_full_X_4_512_sse2: 8946.6
yuv2gbrp_full_X_4_512_sse4: 5138.6
yuv2gbrp_full_X_4_512_avx2: 3889.6
yuv2gbrap_full_X_4_512_c: 15368.6
yuv2gbrap_full_X_4_512_sse2: 11916.1
yuv2gbrap_full_X_4_512_sse4: 6294.6
yuv2gbrap_full_X_4_512_avx2: 3477.1
yuv2gbrp9be_full_X_4_512_c: 14381.6
yuv2gbrp9be_full_X_4_512_sse2: 9139.1
yuv2gbrp9be_full_X_4_512_sse4: 5150.1
yuv2gbrp9be_full_X_4_512_avx2: 2834.6
yuv2gbrp9le_full_X_4_512_c: 12990.1
yuv2gbrp9le_full_X_4_512_sse2: 9118.1
yuv2gbrp9le_full_X_4_512_sse4: 5132.1
yuv2gbrp9le_full_X_4_512_avx2: 2833.1
yuv2gbrp10be_full_X_4_512_c: 14401.6
yuv2gbrp10be_full_X_4_512_sse2: 9133.1
yuv2gbrp10be_full_X_4_512_sse4: 5126.1
yuv2gbrp10be_full_X_4_512_avx2: 2837.6
yuv2gbrp10le_full_X_4_512_c: 12718.1
yuv2gbrp10le_full_X_4_512_sse2: 9106.1
yuv2gbrp10le_full_X_4_512_sse4: 5120.1
yuv2gbrp10le_full_X_4_512_avx2: 2826.1
yuv2gbrap10be_full_X_4_512_c: 18535.6
yuv2gbrap10be_full_X_4_512_sse2: 33617.6
yuv2gbrap10be_full_X_4_512_sse4: 6264.1
yuv2gbrap10be_full_X_4_512_avx2: 3422.1
yuv2gbrap10le_full_X_4_512_c: 16724.1
yuv2gbrap10le_full_X_4_512_sse2: 11787.1
yuv2gbrap10le_full_X_4_512_sse4: 6282.1
yuv2gbrap10le_full_X_4_512_avx2: 3441.6
yuv2gbrp12be_full_X_4_512_c: 13723.6
yuv2gbrp12be_full_X_4_512_sse2: 9128.1
yuv2gbrp12be_full_X_4_512_sse4: 7997.6
yuv2gbrp12be_full_X_4_512_avx2: 2844.1
yuv2gbrp12le_full_X_4_512_c: 12257.1
yuv2gbrp12le_full_X_4_512_sse2: 9107.6
yuv2gbrp12le_full_X_4_512_sse4: 5142.6
yuv2gbrp12le_full_X_4_512_avx2: 2837.6
yuv2gbrap12be_full_X_4_512_c: 18511.1
yuv2gbrap12be_full_X_4_512_sse2: 12156.6
yuv2gbrap12be_full_X_4_512_sse4: 6251.1
yuv2gbrap12be_full_X_4_512_avx2: 3444.6
yuv2gbrap12le_full_X_4_512_c: 16687.1
yuv2gbrap12le_full_X_4_512_sse2: 11785.1
yuv2gbrap12le_full_X_4_512_sse4: 6243.6
yuv2gbrap12le_full_X_4_512_avx2: 3446.1
yuv2gbrp14be_full_X_4_512_c: 13690.6
yuv2gbrp14be_full_X_4_512_sse2: 9120.6
yuv2gbrp14be_full_X_4_512_sse4: 5138.1
yuv2gbrp14be_full_X_4_512_avx2: 2843.1
yuv2gbrp14le_full_X_4_512_c: 14995.6
yuv2gbrp14le_full_X_4_512_sse2: 9119.1
yuv2gbrp14le_full_X_4_512_sse4: 5126.1
yuv2gbrp14le_full_X_4_512_avx2: 2843.1
yuv2gbrp16be_full_X_4_512_c: 12367.1
yuv2gbrp16be_full_X_4_512_sse2: 8233.6
yuv2gbrp16be_full_X_4_512_sse4: 4820.1
yuv2gbrp16be_full_X_4_512_avx2: 2666.6
yuv2gbrp16le_full_X_4_512_c: 10904.1
yuv2gbrp16le_full_X_4_512_sse2: 8214.1
yuv2gbrp16le_full_X_4_512_sse4: 4824.1
yuv2gbrp16le_full_X_4_512_avx2: 2629.1
yuv2gbrap16be_full_X_4_512_c: 26569.6
yuv2gbrap16be_full_X_4_512_sse2: 10884.1
yuv2gbrap16be_full_X_4_512_sse4: 5488.1
yuv2gbrap16be_full_X_4_512_avx2: 3272.1
yuv2gbrap16le_full_X_4_512_c: 14010.1
yuv2gbrap16le_full_X_4_512_sse2: 10562.1
yuv2gbrap16le_full_X_4_512_sse4: 5463.6
yuv2gbrap16le_full_X_4_512_avx2: 3255.1
yuv2gbrpf32be_full_X_4_512_c: 14524.1
yuv2gbrpf32be_full_X_4_512_sse2: 8552.6
yuv2gbrpf32be_full_X_4_512_sse4: 4636.1
yuv2gbrpf32be_full_X_4_512_avx2: 2474.6
yuv2gbrpf32le_full_X_4_512_c: 13060.6
yuv2gbrpf32le_full_X_4_512_sse2: 9682.6
yuv2gbrpf32le_full_X_4_512_sse4: 4298.1
yuv2gbrpf32le_full_X_4_512_avx2: 2453.1
yuv2gbrapf32be_full_X_4_512_c: 18629.6
yuv2gbrapf32be_full_X_4_512_sse2: 11363.1
yuv2gbrapf32be_full_X_4_512_sse4: 15201.6
yuv2gbrapf32be_full_X_4_512_avx2: 3727.1
yuv2gbrapf32le_full_X_4_512_c: 16677.6
yuv2gbrapf32le_full_X_4_512_sse2: 10221.6
yuv2gbrapf32le_full_X_4_512_sse4: 5693.6
yuv2gbrapf32le_full_X_4_512_avx2: 3656.6

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2022-01-11 16:33:17 -03:00
J. Dekker b492cacffd checkasm: collapse hevc pel tests
Also add to `make fate-checkasm' target.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2021-08-24 22:12:06 +02:00
J. Dekker 9a727235fd lavu/checkasm: add (private) kperf timing for macOS
Signed-off-by: J. Dekker <jdek@itanimul.li>
2021-07-20 19:40:03 +02:00
Lynne 1978b143eb
checkasm: add av_tx FFT SIMD testing code
This sadly required making changes to the code itself,
due to the same context needing to be reused for both versions.
The lookup table had to be duplicated for both versions.
2021-04-24 17:19:17 +02:00
Josh Dekker 9c513edb79 checkasm: add hevc_pel tests
Co-authored-by: Niklas Haas <git@haasn.xyz>
Signed-off-by: Josh Dekker <josh@itanimul.li>
2021-01-25 09:24:11 +01:00
Martin Storsjö ed7d73355e checkasm: aarch64: Check for stack overflows
Also fill x8-x17 with garbage before calling the function.

Figure out the number of stack parameters and make sure that the
value on the stack after those is untouched.

Signed-off-by: Martin Storsjö <martin@martin.st>
2020-05-15 21:22:36 +03:00
Martin Storsjö 6cb2d4d94b checkasm: arm: Check for stack overflows
Figure out the number of stack parameters and make sure that the
value on the stack after those is untouched.

Signed-off-by: Martin Storsjö <martin@martin.st>
2020-05-15 21:22:34 +03:00
Josh de Kock 5913cd4e6c checkasm: add hscale test
This tests the hscale 8bpp to 14/18bpp functions with different filter
sizes.

Signed-off-by: Josh de Kock <josh@itanimul.li>
2020-05-15 10:29:30 +01:00
Martin Storsjö 3ce1b2bf8d checkasm: add function to check and diff memory
This was ported from dav1d (c950e7101bdf5f7117bfca816984a21e550509f0).

Signed-off-by: Josh de Kock <josh@itanimul.li>
2020-05-15 10:29:30 +01:00
Ting Fu 9691e2a426 checkasm/vf_eq: add test for vf_eq
Signed-off-by: Ting Fu <ting.fu@intel.com>
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
2019-09-26 08:10:31 +08:00
Lynne 4ce1e13b54 checkasm: add opusdsp tests 2019-09-11 03:28:22 +01:00
Ruiling Song 8f4963ad25 checkasm/vf_gblur: add test for horiz_slice simd
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
2019-06-12 08:54:05 +08:00
James Darnley 76c370af64 checkasm: add test for v210dec 2019-05-02 19:21:37 +02:00
James Almer ba89dc27b5 checkasm: add an af_afir test
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-01-03 10:12:18 -03:00
Clément Bœsch f679711c1b checkasm: add vf_nlmeans test for ssd_integral_image 2018-05-08 10:28:06 +02:00
Martin Vignali a9a7ed4f27 checkasm/swscale : add test for rgb shuffle_bytes func 2018-03-24 20:22:12 +01:00
Yingming Fan 80798e3857 checkasm/hevc_sao : add hevc_sao for checkasm
Signed-off-by: James Almer <jamrial@gmail.com>
2018-03-07 23:53:32 -03:00
Martin Vignali 78b982d3b9 checkasm : add test for losslessvideoencdsp for diff bytes and sub_left_pred 2018-01-28 20:23:16 +01:00
James Almer da03242778 Revert "checkasm/vf_interlace : add test for lowpass_line 8 and 16"
This reverts commit adff97be5e.

It currently fails on Windows targets.

Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-19 19:07:24 -03:00
Martin Vignali adff97be5e checkasm/vf_interlace : add test for lowpass_line 8 and 16 2017-12-19 20:59:51 +01:00
Martin Vignali cefb7e0060 checkasm/vf_hflip : add test for vf_hflip byte and short simd 2017-12-13 11:34:29 +01:00
Martin Vignali cfce442750 checkasm/vf_threshold : add checkasm test for threshold8 2017-12-03 19:17:15 +01:00
Martin Vignali 4a6aa6d1b2 checkasm : add test for huffyuvdsp add_int16 2017-11-21 09:41:42 +01:00
Martin Vignali 6a7eb65e1b checkasm : add utvideodsp test 2017-11-21 09:00:27 +01:00
James Almer 7323c896b2 checkasm: add an exrdsp test
Signed-off-by: James Almer <jamrial@gmail.com>
2017-09-17 19:01:40 -03:00
Clément Bœsch e0d56f097f checkasm: use perf API on Linux ARM*
On ARM platforms, accessing the PMU registers requires special user
access permissions. Since there is no other way to get accurate timers,
the current implementation of timers in FFmpeg rely on these registers.
Unfortunately, enabling user access to these registers on Linux is not
trivial, and generally involve compiling a random and unreliable github
kernel module, or patching somehow your kernel.

Such module is very unlikely to reach the upstream anytime soon. Quoting
Robin Murphin from ARM:

> Say you do give userspace direct access to the PMU; now run two or more
> programs at once that believe they can use the counters for their own
> "minimal-overhead" profiling. Have fun interpreting those results...
>
> And that's not even getting into the implications of scheduling across
> different CPUs, CPUidle, etc. where the PMU state is completely beyond
> userspace's control. In general, the plan to provide userspace with
> something which might happen to just about work in a few corner cases,
> but is meaningless, misleading or downright broken in all others, is to
> never do so.

As a result, the alternative is to use the Performance Monitoring Linux
API which makes use of these registers internally (assuming the PMU of
your ARM board is supported in the kernel, which is definitely not a
given...).

While the Linux API is obviously cross platform, it does have a
significant overhead which needs to be taken into account. As a result,
that mode is only weakly enabled on ARM platforms exclusively.

Note on the non flexibility of the implementation: the timers (native
FFmpeg vs Linux API) are selected at compilation time to prevent the
need of function calls, which would result in a negative impact on the
cycle counters.
2017-09-08 18:51:05 +02:00
James Almer 823cc7e25f checkasm: add a g722dsp test
Signed-off-by: James Almer <jamrial@gmail.com>
2017-07-13 17:00:19 -03:00
Matthieu Bouron 7864e07f4a checkasm: add sbrdsp tests 2017-07-03 14:28:17 +02:00
Clément Bœsch edd041e64c checkasm: add AAC PS tests
This includes various fixes and improvements from James Almer.

Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-28 12:22:39 +02:00
Diego Biurrun fd502f4f5f build: Generalize yasm/nasm-related variable names
None of them are specific to the YASM assembler.

(Cherry-picked from libav commit 39e208f4d4)

Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-21 17:00:29 -03:00
James Almer 5b10f484e2 checkasm: add float_dsp tests
Ported from libavutil/tests/float_dsp.c

Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-14 19:20:10 -03:00
James Almer 37388b119c checkasm: add a checkasm_checked_call function that doesn't issue emms
Meant for DSP functions returning a float or double, as they'd fail if emms
is called after every run on x86_32.

Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-14 19:18:56 -03:00