Commit Graph

45225 Commits

Author SHA1 Message Date
Janne Grunau 186bd30aa3 h264/arm64: implement missing 4:2:2 chroma loop filter neon functions 2019-02-27 21:57:05 +01:00
Martin Storsjö 7e42d5f0ab aarch64: vp8: Optimize vp8_idct_add_neon for aarch64
The previous version was a pretty exact translation of the arm
version. This version does do some unnecessary arithemetic (it does
more operations on vectors that are only half filled; it does 4
uaddw and 4 sqxtun instead of 2 of each), but it reduces the overhead
of packing data together (which could be done for free in the arm
version).

This gives a decent speedup on Cortex A53, a minor speedup on
A72 and a very minor slowdown on Cortex A73.

Before:        Cortex A53    A72    A73
vp8_idct_add_neon:   79.7   67.5   65.0
After:
vp8_idct_add_neon:   67.7   64.8   66.7

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:46:28 +02:00
Martin Storsjö 49f9c4272c aarch64: vp8: Skip saturating in shrn in ff_vp8_idct_add_neon
The original arm version didn't do saturation here. This probably
doesn't make any difference for performance, but reduces the
differences.

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:46:24 +02:00
Martin Storsjö 37394ef01b aarch64: vp8: Optimize put_epel16_h6v6 with vp8_epel8_v6_y2
This makes it similar to put_epel16_v6, and gives a large speedup
on Cortex A53, a minor speedup on A72 and a very minor slowdown on
A73.

Before:                 Cortex A53     A72     A73
vp8_put_epel16_h6v6_neon:   2211.4  1586.5  1431.7
After:
vp8_put_epel16_h6v6_neon:   1736.9  1522.0  1448.1

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:46:21 +02:00
Martin Storsjö cef914e083 arm: vp8: Optimize put_epel16_h6v6 with vp8_epel8_v6_y2
This makes it similar to put_epel16_v6, and gives a 10-25%
speedup of this function.

Before:                   Cortex A7       A8       A9      A53     A72
vp8_put_epel16_h6v6_neon:    3058.0   2218.5   2459.8   2183.0  1572.2
After:
vp8_put_epel16_h6v6_neon:    2670.8   1934.2   2244.4   1729.4  1503.9

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:46:18 +02:00
Martin Storsjö e39a9212ab aarch64: vp8: Port bilin functions from arm version
Cortex A53     A72     A73
vp8_put_bilin4_h_c:        303.8   102.2   161.8
vp8_put_bilin4_h_neon:     100.0    40.9    41.2
vp8_put_bilin4_hv_c:       322.8   201.0   305.9
vp8_put_bilin4_hv_neon:    156.8    72.6    77.0
vp8_put_bilin4_v_c:        304.7   101.7   166.5
vp8_put_bilin4_v_neon:      82.7    41.2    33.0
vp8_put_bilin8_h_c:       1192.7   352.5   623.8
vp8_put_bilin8_h_neon:     213.5    70.2    87.8
vp8_put_bilin8_hv_c:      1098.6   769.2  1041.9
vp8_put_bilin8_hv_neon:    324.0   123.5   146.0
vp8_put_bilin8_v_c:       1193.9   350.4   617.7
vp8_put_bilin8_v_neon:     183.9    60.7    64.7
vp8_put_bilin16_h_c:      2353.1   671.2  1223.3
vp8_put_bilin16_h_neon:    261.9   140.7   145.0
vp8_put_bilin16_hv_c:     2453.2  1470.9  2355.2
vp8_put_bilin16_hv_neon:   383.9   196.0   217.0
vp8_put_bilin16_v_c:      2349.3   669.8  1251.2
vp8_put_bilin16_v_neon:    202.9   110.7    96.2

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:46:14 +02:00
Martin Storsjö 58d1549227 aarch64: vp8: Port epel4 functions from arm version
Cortex A53    A72    A73
vp8_put_epel4_h4_c:        631.4  291.7  367.8
vp8_put_epel4_h4_neon:     241.0  131.0  155.7
vp8_put_epel4_h4v4_c:      967.5  529.3  667.7
vp8_put_epel4_h4v4_neon:   429.3  241.8  279.7
vp8_put_epel4_h4v6_c:     1374.7  657.5  864.5
vp8_put_epel4_h4v6_neon:   515.5  295.5  334.7
vp8_put_epel4_h6_c:        851.0  421.0  486.0
vp8_put_epel4_h6_neon:     321.5  195.0  217.7
vp8_put_epel4_h6v4_c:     1111.3  621.1  781.2
vp8_put_epel4_h6v4_neon:   539.2  328.0  365.3
vp8_put_epel4_h6v6_c:     1561.3  763.3  999.7
vp8_put_epel4_h6v6_neon:   645.5  401.0  434.7
vp8_put_epel4_v4_c:        663.8  298.3  357.0
vp8_put_epel4_v4_neon:     116.0   81.5   72.5
vp8_put_epel4_v6_c:        870.5  437.0  507.4
vp8_put_epel4_v6_neon:     147.7  108.8   92.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:46:11 +02:00
Martin Storsjö cc7ba00c35 aarch64: vp8: Port missing epel8 functions from arm version
Cortex A53     A72     A73
vp8_put_epel8_h4_c:       2594.8  1159.6  1374.8
vp8_put_epel8_h4_neon:     506.4   244.2   314.0
vp8_put_epel8_h6_c:       3445.8  1677.1  1811.3
vp8_put_epel8_h6_neon:     634.4   371.7   433.0
vp8_put_epel8_v4_c:       2614.0  1174.8  1378.0
vp8_put_epel8_v4_neon:     321.0   221.7   235.8
vp8_put_epel8_v6_c:       3635.5  1703.0  2079.2
vp8_put_epel8_v6_neon:     416.9   317.0   295.5

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:46:08 +02:00
Martin Storsjö 52c9b0a6c0 aarch64: vp8: Port vp8_luma_dc_wht and vp8_idct_dc_add4uv from arm version
Cortex A53    A72    A73
vp8_luma_dc_wht_c:        115.7   75.7   90.7
vp8_luma_dc_wht_neon:      60.7   41.2   45.7
vp8_idct_dc_add4uv_c:     376.1  262.9  282.5
vp8_idct_dc_add4uv_neon:   52.0   29.0   37.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:46:04 +02:00
Martin Storsjö c513fcd7d2 aarch64: vp8: Fix a typo in a comment
Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:46:00 +02:00
Martin Storsjö f1011ea28a aarch64: vp8: Reorder the function pointer inits to match the arm original
Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:45:56 +02:00
Martin Storsjö b4b27dce95 aarch64: vp8: Move the vp8dsp makefile entries to the right places
Even if NEON would be disabled, the init functions should be built
as they are called as long as ARCH_AARCH64 is set.

These functions are part of a generic DSP subsytem, not tied directly
to one decoder. (They should be built if the vp7 decoder is enabled,
even if the vp8 decoder is disabled.)

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:45:53 +02:00
Martin Storsjö ad32f7b126 aarch64: vp8: Remove superfluous includes
This fixes building with MSVC, which lacks unistd.h.

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:45:50 +02:00
Martin Storsjö 85bfaa4949 aarch64: vp8: Use the proper aarch64 form for conditional branches
The previous form also does seem to assemble on current tools,
but I think it might fail on some older aarch64 tools.

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:45:47 +02:00
Martin Storsjö 2eeac79936 aarch64: vp8: Fix assembling with armasm64
Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:45:44 +02:00
Martin Storsjö 26d7af4c38 aarch64: vp8: Fix assembling with clang
This also partially fixes assembling with MS armasm64 (via
gas-preprocessor).

The movrel macro invocations need to pass the offset via a separate
parameter. Mach-o and COFF relocations don't allow a negative
offset to a symbol, which is handled properly if the offset is passed
via the parameter. If no offset parameter is given, the macro
evaluates to something like "adrp x17, subpel_filters-16+(0)", which
older clang versions also fail to parse (the older clang versions
only support one single offset term, although it can be a parenthesis.

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:45:41 +02:00
Magnus Röös 0801853e64 libavcodec: vp8 neon optimizations for aarch64
Partial port of the ARM Neon for aarch64.

Benchmarks from fate:

benchmarking with Linux Perf Monitoring API
nop: 58.6
checkasm: using random seed 1760970128
NEON:
 - vp8dsp.idct       [OK]
 - vp8dsp.mc         [OK]
 - vp8dsp.loopfilter [OK]
checkasm: all 21 tests passed
vp8_idct_add_c: 201.6
vp8_idct_add_neon: 83.1
vp8_idct_dc_add_c: 107.6
vp8_idct_dc_add_neon: 33.8
vp8_idct_dc_add4y_c: 426.4
vp8_idct_dc_add4y_neon: 59.4
vp8_loop_filter8uv_h_c: 688.1
vp8_loop_filter8uv_h_neon: 216.3
vp8_loop_filter8uv_inner_h_c: 649.3
vp8_loop_filter8uv_inner_h_neon: 195.3
vp8_loop_filter8uv_inner_v_c: 544.8
vp8_loop_filter8uv_inner_v_neon: 131.3
vp8_loop_filter8uv_v_c: 706.1
vp8_loop_filter8uv_v_neon: 141.1
vp8_loop_filter16y_h_c: 668.8
vp8_loop_filter16y_h_neon: 242.8
vp8_loop_filter16y_inner_h_c: 647.3
vp8_loop_filter16y_inner_h_neon: 224.6
vp8_loop_filter16y_inner_v_c: 647.8
vp8_loop_filter16y_inner_v_neon: 128.8
vp8_loop_filter16y_v_c: 721.8
vp8_loop_filter16y_v_neon: 154.3
vp8_loop_filter_simple_h_c: 387.8
vp8_loop_filter_simple_h_neon: 187.6
vp8_loop_filter_simple_v_c: 384.1
vp8_loop_filter_simple_v_neon: 78.6
vp8_put_epel8_h4v4_c: 3971.1
vp8_put_epel8_h4v4_neon: 855.1
vp8_put_epel8_h4v6_c: 5060.1
vp8_put_epel8_h4v6_neon: 989.6
vp8_put_epel8_h6v4_c: 4320.8
vp8_put_epel8_h6v4_neon: 1007.3
vp8_put_epel8_h6v6_c: 5449.3
vp8_put_epel8_h6v6_neon: 1158.1
vp8_put_epel16_h6_c: 6683.8
vp8_put_epel16_h6_neon: 831.8
vp8_put_epel16_h6v6_c: 11110.8
vp8_put_epel16_h6v6_neon: 2214.8
vp8_put_epel16_v6_c: 7024.8
vp8_put_epel16_v6_neon: 799.6
vp8_put_pixels8_c: 112.8
vp8_put_pixels8_neon: 78.1
vp8_put_pixels16_c: 131.3
vp8_put_pixels16_neon: 129.8

This contains a fix to include guards by Carl Eugen Hoyos.

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:45:33 +02:00
Luca Barbato 899ee03088 Unbreak travis on macos 2019-02-19 10:05:10 +01:00
Diego Biurrun f8df5e2f31 tests: Add a convenience function for video-only lavf tests
Rename a test in the process for consistency and simplicity and
remove the remnants of the now-unused lavf regression test scripts.
2019-02-16 18:15:55 +01:00
Diego Biurrun 618d02c1fa tests: Convert lavf container tests to non-legacy test scripts
Rename some tests in the process for consistency and simplicity.
2019-02-16 18:15:46 +01:00
Diego Biurrun 896fe15dbb tests: Convert lavf pixfmt conversion tests to non-legacy test scripts
Also split monolithic lavf-pixfmt test into individual tests.
2019-02-16 18:15:38 +01:00
Diego Biurrun a957e9379d tests: Convert lavf image tests to non-legacy test scripts
Rename some tests in the process for consistency and simplicity.
2019-02-16 18:15:30 +01:00
Diego Biurrun eb8a811599 tests: Convert audio-only lavf tests to non-legacy test scripts
Rename some tests in the process for consistency and simplicity.
2019-02-16 18:15:22 +01:00
Diego Biurrun a70eac7a9b tests: Convert image2pipe tests to non-legacy test scripts 2019-02-16 18:15:11 +01:00
Diego Biurrun 5846b496f0 tests: Use a predefined function for lavf-rm test 2019-02-16 13:09:35 +01:00
Diego Biurrun dad5fd59f3 tests: Enable CRC test for yuv4mpeg 2019-02-16 13:09:35 +01:00
Diego Biurrun 8629149816 tests: Drop duplicate variable declaration 2019-02-16 13:09:35 +01:00
Diego Biurrun e22ffb3805 tests: Unify output directory creation 2019-02-16 13:09:35 +01:00
Diego Biurrun 7e5bde93a1 build: Rename OBJDIRS variable to OUTDIRS
These directories are not just for object files.
2019-02-16 13:09:35 +01:00
Sven Dueking 90b15f60bf srt: Set srto_sender flag to sender srt socket
SRT API Documentation:
This flag is superfluous if both parties are at least version 1.3.0
(this shall be enforced by setting this value to SRTO_MINVERSION if
you expect that it be true) and therefore support HSv5 handshake,
where the SRT extended handshake is done with the overall handshake
process.

This flag is however obligatory if at least one party may be using
SRT below version 1.3.0 and does not support HSv5.
2019-02-12 11:59:29 +01:00
Janne Grunau 156ea66c91 h264/x86: sign extend int stride in deblock functions
Fixes checkasm errors after adding the h264 deblock tests.
2019-01-27 11:16:31 +01:00
Martin Storsjö eec93e5709 libopenh264dec: Use a newer decoding entry point function
The "new" entry point actually has existed since OpenH264 1.4 in
2015 and is the the recommended decoding entry point.

The name of this function, DecodeFrameNoDelay, is rather backwards
considering that it doesn't return the latest decoded frame immediately,
but actually does proper delaying and reordering of frames.

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-01-26 21:13:03 +02:00
Janne Grunau 28a8b5413b h264/aarch64: add intra loop filter neon asm
Add my neon asm from x264 relicensed under the LGPL 2.1 or later. Ported
(x264 uses nv12 chroma) and optimized.

Cycle count for checkasm --bench on a Snapdragon 820e:
h264_h_loop_filter_luma_intra_8bpp_c: 60.0
h264_h_loop_filter_luma_intra_8bpp_neon: 54.2
h264_v_loop_filter_luma_intra_8bpp_c: 148.3
h264_v_loop_filter_luma_intra_8bpp_neon: 73.8
h264_h_loop_filter_chroma_intra_8bpp_c: 27.8
h264_h_loop_filter_chroma_intra_8bpp_neon: 21.4
h264_h_loop_filter_chroma_mbaff_intra_8bpp_c: 15.8
h264_h_loop_filter_chroma_mbaff_intra_8bpp_neon: 15.7
h264_v_loop_filter_chroma_intra_8bpp_c: 45.8
h264_v_loop_filter_chroma_intra_8bpp_neon: 17.3
2019-01-26 12:05:10 +01:00
Janne Grunau 846c3d6aca h264/aarch64: optimize neon loop filter
Exit as soon as possible if no filtering will be done.

Improves the checkasm --bench cycle count on a Snapdragon 820e:
h264_h_loop_filter_luma_8bpp_c:      72.4 ->  72.5
h264_h_loop_filter_luma_8bpp_neon:   97.1 ->  56.3
h264_v_loop_filter_luma_8bpp_c:     174.0 -> 173.5
h264_v_loop_filter_luma_8bpp_neon:   62.9 ->  60.9
h264_h_loop_filter_chroma_8bpp_c:    30.2 ->  30.3
h264_h_loop_filter_chroma_8bpp_neon: 51.6 ->  25.7
h264_v_loop_filter_chroma_8bpp_c:    57.3 ->  57.3
h264_v_loop_filter_chroma_8bpp_neon: 28.0 ->  24.0
2019-01-26 12:05:10 +01:00
Janne Grunau d7f4f5c4a1 checkasm/h264: add loop filter tests 2019-01-26 12:05:10 +01:00
Janne Grunau bb515e3a73 h264/aarch64: sign extend int stride in loop filter asm 2019-01-26 12:05:10 +01:00
Martin Storsjö 41cf3e3b1c arm: Create proper .rdata sections for COFF
As .rodata isn't one of the default created sections for COFF, it was
created as a read-write data section. By using the default .rdata
section name for COFF, it automatically becomes a read-only data section.
The existing ".section .rodata" works as intended for ELF though.

This is based on an original patch and diagnose by Tom Tan
<Tom.Tan@microsoft.com>.

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-01-25 23:53:37 +02:00
James Almer ca44fa5d7f avcodec/libdav1d: properly free all output picture references
Dav1dPictures contain more than one buffer reference, so we're forced to use the
API properly to free them all.

Signed-off-by: James Almer <jamrial@gmail.com>
2019-01-23 17:39:20 -03:00
Luca Barbato 90adbf4abf cook: Use the correct table for 6-bit stereo coupling
Thanks to Kostya for digging it out and telling me.
2019-01-17 14:58:03 +01:00
James Almer 70ab2778be libdav1d: update API usage to the first stable release
The color fields were moved to another struct, and a way to propagate
timestamps and other input metadata was introduced, so the packet
fifo can be removed.

Add support for 12bit streams, an option to disable film grain, and
read the profile from the sequence header referenced by the ouput
picture instead of guessing based on output pix_fmt.

Signed-off-by: James Almer <jamrial@gmail.com>
2018-12-12 19:56:16 -03:00
James Almer 56f50183f3 libdav1d: fix build after a recent API break
Signed-off-by: James Almer <jamrial@gmail.com>
2018-11-14 22:04:35 -03:00
Linjie Fu e716323fa8 qsvenc: Add VDENC support for H264 and HEVC
Add VDENC(lowpower mode) support for QSV h264 and HEVC

It's an experimental function(like lowpower in vaapi) with
some limitations:
- CBR/VBR require HuC which should be explicitly loaded via i915
module parameter(i915.enable_guc=2 for linux kerner version >= 4.16)
- HEVC VDENC was supported >= ICE LAKE

use option "-low_power 1" to enable VDENC.

Signed-off-by: Linjie Fu <linjie.fu@intel.com>
2018-11-13 16:36:04 +00:00
James Almer 9bf9358b61 avcodec: libdav1d AV1 decoder wrapper.
Originally written by Ronald S. Bultje, with fixes, optimizations and
improvements by James Almer.

Signed-off-by: James Almer <jamrial@gmail.com>
2018-11-06 12:40:27 -03:00
Carl Eugen Hoyos f149a4a5fc swscale: Add GRAY10
Based on ab839054 by Luca Barbato.

Signed-off-by: James Almer <jamrial@gmail.com>
2018-11-06 12:39:15 -03:00
Carl Eugen Hoyos ee3f62a90c pixfmt: Add GRAY10
Based on 7471352f by Luca Barbato.

Signed-off-by: James Almer <jamrial@gmail.com>
2018-11-06 12:39:15 -03:00
Martin Storsjö 80f85a95da libx264: Pass the reordered_opaque field through the encoder
libx264 does have a field for opaque data to pass along with frames
through the encoder, but it is a pointer, while the libavcodec
reordered_opaque field is an int64_t. Therefore, allocate an array
within the libx264 wrapper, where reordered_opaque values in flight
are stored, and pass a pointer to this array to libx264.

Update the public libavcodec documentation for the AVCodecContext
field to explain this usage, and add a codec capability that allows
detecting whether an encoder handles this field.

Signed-off-by: Martin Storsjö <martin@martin.st>
2018-11-05 15:41:14 +02:00
Martin Storsjö a3a501df24 libavutil: Undeprecate the AVFrame reordered_opaque field
This was marked as deprecated (but only in the doxygen, not with an
actual deprecation attribute) in 81c623fae0 in 2011, but was
undeprecated in ad1ee5fa7.

Signed-off-by: Martin Storsjö <martin@martin.st>
2018-11-05 15:41:08 +02:00
James Almer 8d80046a0f libaom: remove references to yuva444p pixfmt
Support for it was apparently never in the codebase, and the enum
value was recently removed from the public headers [1]

[1] https://aomedia.googlesource.com/aom/+/f1570f0c2f70832dd170285f8de60bd2379c8efa

Signed-off-by: James Almer <jamrial@gmail.com>
2018-10-27 00:02:17 -03:00
James Almer cacb62f9cb Revert "decode: copy the output parameters from the last bsf in the chain back to the AVCodecContext"
This reverts commit 662558f985.

The avcodec_parameters_to_context() call was freeing and reallocating
AVCodecContext->extradata, essentially taking ownership of it, which according
to the doxy is user owned. This is an API break and has produces crashes in
some library users like Firefox.
Revert until a better solution is found to internally propagate the filtered
extradata back into the decoder context.

Signed-off-by: James Almer <jamrial@gmail.com>
2018-10-27 00:02:13 -03:00
Zhong Li 1ff6cb2ca6 lavc/qsvenc_jpeg: set a default quality
Keep alignment with vaapi mjpeg encoder.

Signed-off-by: Zhong Li <zhong.li@intel.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2018-10-13 15:57:06 +02:00