Commit Graph

27 Commits

Author SHA1 Message Date
James Almer 6747fc436e Merge commit 'effc1430b2fe5997d9d55bf28dc507c27125eb27'
* commit 'effc1430b2fe5997d9d55bf28dc507c27125eb27':
  Revert "checkasm: vp9dsp: Benchmark the dc-only version of idct_idct separately"

Merged-by: James Almer <jamrial@gmail.com>
2017-04-04 15:26:18 -03:00
Clément Bœsch edfa7ac8ec Merge commit '81d7f0bbca837afda1f7e60d3ae52ab1360ab44b'
* commit '81d7f0bbca837afda1f7e60d3ae52ab1360ab44b':
  checkasm: vp9dsp: Benchmark the dc-only version of idct_idct separately

Merged-by: Clément Bœsch <u@pkh.me>
2017-04-01 11:54:29 +02:00
Clément Bœsch 1c9f4b5078 lavc/vp9: split into vp9{block,data,mvs}
This is following Libav layout to ease merges.
2017-03-27 21:38:21 +02:00
Martin Storsjö 388f6e6715 arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32
This work is sponsored by, and copyright, Google.

Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:

                                     Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub16_add_neon:   3188.1   2435.4   2499.0   1969.0
vp9_inv_dct_dct_32x32_sub32_add_neon:  18531.7  16582.3  14207.6  12000.3

By skipping individual 4x16 or 4x32 pixel slices in the first pass,
we reduce the runtime of these functions like this:

vp9_inv_dct_dct_16x16_sub1_add_neon:     274.6    189.5    211.7    235.8
vp9_inv_dct_dct_16x16_sub2_add_neon:    2064.0   1534.8   1719.4   1248.7
vp9_inv_dct_dct_16x16_sub4_add_neon:    2135.0   1477.2   1736.3   1249.5
vp9_inv_dct_dct_16x16_sub8_add_neon:    2446.7   1828.7   1993.6   1494.7
vp9_inv_dct_dct_16x16_sub12_add_neon:   2832.4   2118.3   2266.5   1735.1
vp9_inv_dct_dct_16x16_sub16_add_neon:   3211.7   2475.3   2523.5   1983.1
vp9_inv_dct_dct_32x32_sub1_add_neon:     756.2    456.7    862.0    553.9
vp9_inv_dct_dct_32x32_sub2_add_neon:   10682.2   8190.4   8539.2   6762.5
vp9_inv_dct_dct_32x32_sub4_add_neon:   10813.5   8014.9   8518.3   6762.8
vp9_inv_dct_dct_32x32_sub8_add_neon:   11859.6   9313.0   9347.4   7514.5
vp9_inv_dct_dct_32x32_sub12_add_neon:  12946.6  10752.4  10192.2   8280.2
vp9_inv_dct_dct_32x32_sub16_add_neon:  14074.6  11946.5  11001.4   9008.6
vp9_inv_dct_dct_32x32_sub20_add_neon:  15269.9  13662.7  11816.1   9762.6
vp9_inv_dct_dct_32x32_sub24_add_neon:  16327.9  14940.1  12626.7  10516.0
vp9_inv_dct_dct_32x32_sub28_add_neon:  17462.7  15776.1  13446.2  11264.7
vp9_inv_dct_dct_32x32_sub32_add_neon:  18575.5  17157.0  14249.3  12015.1

I.e. in general a very minor overhead for the full subpartition case due
to the additional loads and cmps, but a significant speedup for the cases
when we only need to process a small part of the actual input data.

In common VP9 content in a few inspected clips, 70-90% of the non-dc-only
16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left
8x8 or 16x16 subpartitions respectively.

This is cherrypicked from libav commit
9c8bc74c2b.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-01-14 21:13:30 +01:00
Ronald S. Bultje 1c8fbd7b90 checkasm/vp9: benchmark all sub-IDCTs (but not WHT or ADST). 2016-12-27 10:02:33 -05:00
Martin Storsjö effc1430b2 Revert "checkasm: vp9dsp: Benchmark the dc-only version of idct_idct separately"
This reverts commit 81d7f0bbca.

Instead of just benchmarking dc separately, test all relevant subparts
(in the next commit).

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-23 23:55:26 +02:00
Martin Storsjö 81d7f0bbca checkasm: vp9dsp: Benchmark the dc-only version of idct_idct separately
The dc-only mode is already checked to work correctly above, but this
allows benchmarking this mode for performance tuning, and allows making
sure that it actually is correctly hooked up.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-16 10:06:32 +02:00
Ronald S. Bultje 0b37cd09a6 checkasm: add vp9dsp.itxfm_add tests.
This includes fixes by Henrik Gramner.

The forward transforms are derived from the reference encoder.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-11 11:09:05 +02:00
Martin Storsjö 2e55e26b40 vp9: Flip the order of arguments in MC functions
This makes it match the pattern already used for VP8 MC functions.

This also makes the signature match ffmpeg's version of these
functions, easing porting of code in both directions.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-03 09:12:02 +02:00
Ronald S. Bultje c935b54bd6 checkasm: add VP9 loopfilter tests.
The randomize_buffer() implementation assures that "most of the time",
we'll do a good mix of wide16/wide8/hev/regular/no filters for complete
code coverage. However, this is not mathematically assured because that
would make the code either much more complex, or much less random.

Some fixes and improvements by Rodger Combs <rodger.combs@gmail.com>

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-10-04 10:54:07 +02:00
Ronald S. Bultje e99ecda550 checkasm: add vp9 MC tests.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-08-03 11:07:01 +02:00
James Almer 54a0a52be1 checkasm/vp9dsp: use declare_func_emms in check_loopfilter
Fixes checkasm failures on mmxext functions

Signed-off-by: James Almer <jamrial@gmail.com>
2016-07-26 22:16:21 -03:00
Hendrik Leppkes 69ead86027 Merge commit '711781d7a1714ea4eb0217eb1ba04811978c43d1'
* commit '711781d7a1714ea4eb0217eb1ba04811978c43d1':
  x86: checkasm: check for or handle missing cleanup after MMX instructions

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-01-02 11:55:44 +01:00
Ronald S. Bultje eb4b5ff738 vp9: add itxfm_add eob shortcuts to 10/12bpp functions.
These aren't quite as helpful as the ones in 8bpp, since over there,
we can use pmulhrsw, but here the coefficients have too many bits to
be able to take advantage of pmulhrsw. However, we can still skip
cols for which all coefs are 0, and instead just zero the input data
for the row itx. This helps a few % on overall decoding speed.
2015-10-13 11:06:01 -04:00
Henrik Gramner 69e456d7fb checkasm/vp9dsp: Fix iszero() to read the correct data 2015-09-28 18:50:13 +02:00
Ronald S. Bultje 0b227c6d47 checkasm: add vp9dsp.itxfm_add tests. 2015-09-28 10:51:53 -04:00
James Almer 4e03f0ab08 checkasm/vp9dsp: add const to suppress "discards const qualifier" warnings
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-26 16:35:39 -03:00
Ronald S. Bultje 7a4b97e946 checkasm: clip vp9 loopfilter test pixels inside allowed bitdepth range. 2015-09-26 06:42:33 -04:00
Rodger Combs f559812a84 tests/checkasm: make randomize_buffers a function for easier debugging
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-26 02:47:53 +02:00
Michael Niedermayer 5ba40c3c71 tests/checkasm/vp9dsp: Revert first hunk of bddcf758d3
The change was wrong, also add a comment explaining it

Found-by: BBB
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-24 18:34:43 +02:00
Ronald S. Bultje 350e9c6765 vp9: fix loopfilter test code to address Hendrik's comments.
(I forgot to actually merge them into the patch I just pushed.)
2015-09-21 20:44:14 -04:00
Rodger Combs df2a2643fe tests/checkasm: fix stack smash in check_loopfilter
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-20 20:26:09 +02:00
Michael Niedermayer bddcf758d3 tests/checkasm/vp9dsp: Add () to protect macro arguments
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-20 11:37:57 +02:00
Ronald S. Bultje b074367405 checkasm: add VP9 loopfilter tests.
The randomize_buffer() implementation assures that "most of the time",
we'll do a good mix of wide16/wide8/hev/regular/no filters for complete
code coverage. However, this is not mathematically assured because that
would make the code either much more complex, or much less random.
2015-09-20 10:33:04 +02:00
Michael Niedermayer a860adb49c tests/checkasm/vp9dsp: Use snprintf() for safetey
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-16 14:19:37 +02:00
Ronald S. Bultje bbd44e124a checkasm: add vp9 intra pred tests. 2015-09-15 16:43:29 -04:00
Ronald S. Bultje 084451e1e4 checkasm: add vp9 MC tests. 2015-09-15 16:43:28 -04:00