Commit Graph

87 Commits

Author SHA1 Message Date
Martin Storsjö 516c479172 checkasm: Test more h264 idct variants
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-09-27 13:58:39 +03:00
Martin Storsjö e12f1cd616 Revert "checkasm: Test more h264 idct variants"
This reverts commit 547db1eaec.

This commit wasn't supposed to be pushed (yet) since it hasn't
been reviewed.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-09-02 22:23:30 +03:00
Martin Storsjö 547db1eaec checkasm: Test more h264 idct variants 2017-08-31 14:55:34 +03:00
Martin Storsjö d05c9cde0e checkasm: aarch64: Specify alignment for the register_init const array
Loads from this strictly doesn't require alignment, but specify it
just for consistency with the arm version.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-05-15 10:19:46 +03:00
Martin Storsjö e00db9f78b checkasm: hevc: Add a hevc_ prefix to the add_residual functions
This makes it easier to group them with the rest when running e.g.
--bench=hevc.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-04-21 13:32:44 +03:00
Diego Biurrun dcc39ee10e lavc: Remove deprecated XvMC support hacks
Deprecated in 11/2013.
2017-03-23 10:09:14 +01:00
Diego Biurrun 39e208f4d4 build: Generalize yasm/nasm-related variable names
None of them are specific to the YASM assembler.
2017-03-01 10:18:15 +01:00
Diego Biurrun 7cb1d9e2db build: Fine-grained link-time dependency settings
Previously, all link-time dependencies were added for all libraries,
resulting in bogus link-time dependencies since not all dependencies
are shared across libraries. Also, in some cases like libavutil, not
all dependencies were taken into account, resulting in some cases of
underlinking.

To address all this mess a machinery is added for tracking which
dependency belongs to which library component and then leveraged
to determine correct dependencies for all individual libraries.
2017-03-01 09:00:40 +01:00
Diego Biurrun 3794062ab1 Remove Plan 9 support
Supporting the system was a nice joke for the 9 release, but it has
run its course. Nowadays Plan 9 receives no testing and has no
practical usefulness.
2016-12-03 09:15:01 +01:00
Martin Storsjö 9c8bc74c2b arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32
This work is sponsored by, and copyright, Google.

Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:

                                     Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub16_add_neon:   3188.1   2435.4   2499.0   1969.0
vp9_inv_dct_dct_32x32_sub32_add_neon:  18531.7  16582.3  14207.6  12000.3

By skipping individual 4x16 or 4x32 pixel slices in the first pass,
we reduce the runtime of these functions like this:

vp9_inv_dct_dct_16x16_sub1_add_neon:     274.6    189.5    211.7    235.8
vp9_inv_dct_dct_16x16_sub2_add_neon:    2064.0   1534.8   1719.4   1248.7
vp9_inv_dct_dct_16x16_sub4_add_neon:    2135.0   1477.2   1736.3   1249.5
vp9_inv_dct_dct_16x16_sub8_add_neon:    2446.7   1828.7   1993.6   1494.7
vp9_inv_dct_dct_16x16_sub12_add_neon:   2832.4   2118.3   2266.5   1735.1
vp9_inv_dct_dct_16x16_sub16_add_neon:   3211.7   2475.3   2523.5   1983.1
vp9_inv_dct_dct_32x32_sub1_add_neon:     756.2    456.7    862.0    553.9
vp9_inv_dct_dct_32x32_sub2_add_neon:   10682.2   8190.4   8539.2   6762.5
vp9_inv_dct_dct_32x32_sub4_add_neon:   10813.5   8014.9   8518.3   6762.8
vp9_inv_dct_dct_32x32_sub8_add_neon:   11859.6   9313.0   9347.4   7514.5
vp9_inv_dct_dct_32x32_sub12_add_neon:  12946.6  10752.4  10192.2   8280.2
vp9_inv_dct_dct_32x32_sub16_add_neon:  14074.6  11946.5  11001.4   9008.6
vp9_inv_dct_dct_32x32_sub20_add_neon:  15269.9  13662.7  11816.1   9762.6
vp9_inv_dct_dct_32x32_sub24_add_neon:  16327.9  14940.1  12626.7  10516.0
vp9_inv_dct_dct_32x32_sub28_add_neon:  17462.7  15776.1  13446.2  11264.7
vp9_inv_dct_dct_32x32_sub32_add_neon:  18575.5  17157.0  14249.3  12015.1

I.e. in general a very minor overhead for the full subpartition case due
to the additional loads and cmps, but a significant speedup for the cases
when we only need to process a small part of the actual input data.

In common VP9 content in a few inspected clips, 70-90% of the non-dc-only
16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left
8x8 or 16x16 subpartitions respectively.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-30 23:54:07 +02:00
Ronald S. Bultje 06fec74cac checkasm: vp9dsp: benchmark all sub-IDCTs (but not WHT or ADST).
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-23 23:55:38 +02:00
Martin Storsjö effc1430b2 Revert "checkasm: vp9dsp: Benchmark the dc-only version of idct_idct separately"
This reverts commit 81d7f0bbca.

Instead of just benchmarking dc separately, test all relevant subparts
(in the next commit).

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-23 23:55:26 +02:00
Martin Storsjö 81d7f0bbca checkasm: vp9dsp: Benchmark the dc-only version of idct_idct separately
The dc-only mode is already checked to work correctly above, but this
allows benchmarking this mode for performance tuning, and allows making
sure that it actually is correctly hooked up.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-16 10:06:32 +02:00
Ronald S. Bultje 0b37cd09a6 checkasm: add vp9dsp.itxfm_add tests.
This includes fixes by Henrik Gramner.

The forward transforms are derived from the reference encoder.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-11 11:09:05 +02:00
Diego Biurrun 9498237049 checkasm: Add --test parameter to check only specific components
Inspired by a patch from Martin Storsjö <martin@martin.st>.
2016-11-08 17:32:25 +01:00
Martin Storsjö 2e55e26b40 vp9: Flip the order of arguments in MC functions
This makes it match the pattern already used for VP8 MC functions.

This also makes the signature match ffmpeg's version of these
functions, easing porting of code in both directions.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-03 09:12:02 +02:00
Alexandra Hájková ed48a9d814 checkasm: Add a test for HEVC add_residual 2016-10-22 17:33:35 +02:00
Martin Storsjö dd5d4a0e1e checkasm: aarch64: Don't clobber x29 in checkasm_stack_clobber
x29 (FP) is a callee saved register and should be restored on
return. Instead of backing up x29 and restoring it here, back up
sp in a register that we are allowed to overwrite.

This fixes crashes in checkasm on aarch64 since f1b3e13138.
For some reason, gcc builds didn't crash, but clang builds do.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-10-18 16:17:12 +03:00
Diego Biurrun 2816f8a8bb build: Drop arch-specific checkasm Makefiles
They only contain one line and will never contain more.
2016-10-17 16:25:38 +02:00
Diego Biurrun 93d5b022a9 build: Drop duplicate asm recipe
And move the asm recipe to the top-level Makefile next to the other
local pattern rules for .o files.
2016-10-17 16:25:35 +02:00
Martin Storsjö c91d6a33f8 checkasm: aarch64: Add filler args to make sure all parameters are passed on the stack
This, combined with clobbering the stack space prior to the call,
increases the chances of finding cases where 32 bit parameters
are erroneously treated as 64 bit.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-10-16 23:26:33 +03:00
Martin Storsjö f1b3e13138 checkasm: aarch64: Clobber the stack before calling functions
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-10-16 23:26:22 +03:00
Martin Storsjö a05cc56124 checkasm: arm/aarch64: Fix the amount of space reserved for stack parameters
Even if MAX_ARGS - 2 (for arm) or MAX_ARGS - 7 (for aarch64) parameters
are passed on the stack to checkasm_checked_call, we actually only
need to store MAX_ARGS - 4 (for arm) or MAX_ARGS - 8 (for aarch64)
parameters on the stack when calling the tested function.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-10-16 23:26:15 +03:00
Alexandra Hájková e3f941cb03 checkasm: add a test for HEVC IDCT
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-11 18:15:40 +02:00
Ronald S. Bultje c935b54bd6 checkasm: add VP9 loopfilter tests.
The randomize_buffer() implementation assures that "most of the time",
we'll do a good mix of wide16/wide8/hev/regular/no filters for complete
code coverage. However, this is not mathematically assured because that
would make the code either much more complex, or much less random.

Some fixes and improvements by Rodger Combs <rodger.combs@gmail.com>

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:07 +02:00
Alexandra Hájková 22c3ab1864 checkasm: Add test for huffyuvdsp add_bytes
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-10-02 17:13:26 +02:00
Diego Biurrun ba479f3daa hevc: Change type of array stride parameters to ptrdiff_t
ptrdiff_t is the correct type for array strides and similar.
2016-09-29 17:54:23 +02:00
Anton Khirnov 683da86aab audiodsp: reorder arguments for vector_clipf
This will make the x86 asm simpler.

ARM conversion by Martin Storsjö <martin@martin.st> and Janne Grunau
<janne-libav@jannau.net>
2016-09-22 09:47:52 +02:00
Anton Khirnov e9ef617139 checkasm: add tests for audiodsp 2016-09-22 09:47:52 +02:00
Anton Khirnov 2eb97af66a checkasm: add a test for blockdsp 2016-09-22 09:47:52 +02:00
Luca Barbato e89cef4050 checkasm: Read the unsigned value as it should
Reading a value larger than int using atoi() may give the wrong result.
2016-09-11 14:12:18 +02:00
Diego Biurrun 87c6c78604 vp8: Change type of stride parameters to ptrdiff_t
ptrdiff_t is the correct type for array strides and similar.
2016-08-26 11:36:53 +02:00
Ronald S. Bultje e99ecda550 checkasm: add vp9 MC tests.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-08-03 11:07:01 +02:00
Luca Barbato 40ad05bab2 checkasm: Cast unsigned to signed
Avoid a warning for passing an unsigned value to abs(), some compilers
might optimize away abs().
2016-07-23 08:27:32 +02:00
Alexandra Hájková 9064777dbb checkasm: add HEVC test for testing IDCT DC
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-07-22 19:08:12 +02:00
Martin Storsjö 6f9e34baea arm: Check for support for the .fpu directive
When targeting COFF (windows), clang doesn't support this
directive (while binutils supports it for all targets).

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-07-21 12:52:10 +03:00
Martin Storsjö 37961044c6 checkasm: arm: Ignore changes to bits 0-4 and 7 of FPSCR
These bits are set by exceptions in NEON instructions.

Also print the differing bits when FPSCR is clobbered,
and use bic instead of lsl, for clearing the topmost bits.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-07-17 21:48:17 +03:00
Janne Grunau 59aeed93e4 cheackasm/arm: remove NEON instructions from checkasm_checked_call_vfp
Fixes AS error on non NEON builds introduced in 71a0472114. Also
set the fpu directly to vfp in checkasm.S to cause build errors on NEON
builds.
2016-07-17 11:28:21 +02:00
Martin Storsjö 446353ea18 checkasm: arm: Don't start new const blocks for each string
Each const block needs to be terminated by one endconst
invocation so either call endconst after each, or just
declare plain labels to the later strings.

This fixes errors such as this, on some binutils versions:

checkasm.S:38: Error: Macro `endconst' was already defined

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-07-17 12:21:19 +03:00
Janne Grunau 71a0472114 checkasm: arm: report the first clobbered register in checkasm_checked_call 2016-07-16 12:57:18 +02:00
Janne Grunau 7b1ae0e73a checkasm/arm: preserve the stack alignment checkasm_checked_call
The stack used by checkasm_checked_call_vfp was a multiple of 4 when the
checked function is called. AAPCS requires a double word (8 byte)
aligned stack public interfaces. Since both calls are public interfaces
the stack is misaligned when the checked is called.

Might fix the SIGBUS error in the armv7-linux-clang-3.7 fate config.
2016-07-13 22:18:53 +02:00
Janne Grunau 80fbb7beca checkasm: vp8.mc: initialize the full src buffer after ec32574209
Fixes "Use of uninitialised value" valgrind warnings in checkasm.
2016-07-13 22:18:52 +02:00
Janne Grunau 8c816c0c9b checkasm/arm: align the clobber check data properly for ldrd
Should fix the SIGBUS in the armv7-linux-clang-3.7 fate target.
2016-07-10 13:35:41 +02:00
Janne Grunau ec32574209 checkasm: vp8: mc: test unequal width/height for partitions 2016-07-10 13:35:41 +02:00
Martin Storsjö f8d17d5395 checkasm: Add tests for vp8dsp
The tests are inspired by similar tests for vp9 by
Ronald Bultje.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-07-08 14:10:46 +03:00
Martin Storsjö 67cb2c0f73 checkasm: hevc: Iterate over features first, then over bitdepths
This avoids listing the same feature multiple times in the
test output. Previously the output contained something like this:

SSE2:
 - hevc_mc.qpel              [OK]
 - hevc_mc.epel              [OK]
 - hevc_mc.unweighted_pred   [OK]
 - hevc_mc.qpel              [OK]
 - hevc_mc.epel              [OK]
 - hevc_mc.unweighted_pred   [OK]

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-06-29 21:12:05 +03:00
Martin Storsjö e48746deec checkasm: h264dsp: Move the x and y variables into the randomize_buffer macro
This avoids the risk of accidentally clobbering such variables outside
of the macro if the same variables are used there.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-06-28 14:24:04 +03:00
Martin Storsjö e57de6faa1 checkasm: h264dsp: Initialize the padding area
This fixes valgrind warnings about conditional jumps based on
uninitialized data (even though the uninitialized data only ever
was compared with a direct copy of the same uninitialized data).

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-06-28 14:24:01 +03:00
Martin Storsjö dc7501e524 checkasm: Issue emms after benchmarking functions
The functions may not clean up properly after using MMX
registers. For the normal testing calls, the checkasm_checked_call
functions will do the cleanup (and check that functions that
should clean up do it as well), but when benchmarking functions
that don't clean up, we don't currently properly clean up at all.

This causes issues if a benchmarked function is followed by testing
of a function that is supposed to not clobber the MMX/FPU state but
doesn't touch it at all.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-06-21 22:09:29 +03:00
Martin Storsjö 105998fb5c checkasm: Add tests for h264 idct
The tests are inspired by similar tests for vp9 by
Ronald Bultje.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-06-17 21:37:56 +03:00