Commit Graph

116786 Commits

Author SHA1 Message Date
J. Dekker
e758b24396 checkasm: add wildcompares for test & functions
Added:

  --test=<pattern>    Filter tests by glob style pattern.
  --bench[=<pattern>] Run benchmark and optionally filter functions
                      by glob style pattern.

Example:

$ ./tests/checkasm/checkasm --bench=yuva*
[...]
yuva420p_bgr24_8_c:                                     34.5 ( 1.00x)
yuva420p_bgr24_8_ssse3:                                 31.1 ( 1.11x)
yuva420p_bgr24_128_c:                                  310.6 ( 1.00x)
yuva420p_bgr24_128_ssse3:                              178.1 ( 1.74x)
yuva420p_bgr24_1080_c:                                2509.6 ( 1.00x)
yuva420p_bgr24_1080_ssse3:                            1471.5 ( 1.71x)
yuva420p_bgr24_1920_c:                                4462.6 ( 1.00x)
yuva420p_bgr24_1920_ssse3:                            2331.1 ( 1.91x)
[...]

Ported from dav1d.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-08-28 11:45:46 +02:00
J. Dekker
d0986709a8 checkasm: improve print format
Port dav1d's checkasm output format to FFmpeg's checkasm, includes
relative speedups and aligns results.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-08-28 11:45:46 +02:00
J. Dekker
03f26549cd checkasm: print only results to stdout
Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-08-28 11:45:46 +02:00
J. Dekker
42528ff835 checkasm: add csv/tsv bench output
When collecting performance information from checkasm it is common
to parse the output for use in graphs to compare vs different
architectures.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-08-28 11:45:46 +02:00
Anton Khirnov
d89930f866 lavu/opt: add API for retrieving array-type option values
Previously one could only convert the entire array to a string, not
access individual elements.
2024-08-27 16:53:16 +02:00
Anton Khirnov
4a5bb84515 lavu/opt: forward av_opt_get_video_rate() to av_opt_get_q()
The two functions are exactly the same.
2024-08-27 16:53:16 +02:00
Anton Khirnov
efe38286d1 lavu/opt: document underlying C types for enum AVOptionType 2024-08-27 16:53:16 +02:00
Ramiro Polla
7e4784e40c avcodec/mpegvideoencdsp: speed up draw_edges_8_c by inlining it for all used edge widths
This commit also restricts w to 4, 8, or 16.

Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz:
                                    before    after
draw_edges_8_1724_4_c:             46796.5   7141.7  ( 6.55x)
draw_edges_8_1724_8_c:             43584.5   7216.5  ( 6.04x)
draw_edges_8_1724_16_c:            47007.2  10080.5  ( 4.66x)
draw_edges_128_407_4_c:            11199.0   4185.0  ( 2.68x)
draw_edges_128_407_8_c:            10660.2   4418.0  ( 2.41x)
draw_edges_128_407_16_c:           11800.2   4634.5  ( 2.55x)
draw_edges_1080_31_4_c:             1356.5    634.7  ( 2.14x)
draw_edges_1080_31_8_c:             1972.0   1430.2  ( 1.38x)
draw_edges_1080_31_16_c:            4621.0   4009.7  ( 1.15x)
draw_edges_1920_4_4_c:               834.5    795.2  ( 1.05x)
draw_edges_1920_4_4_negstride_c:     821.7    802.0  ( 1.02x)
draw_edges_1920_4_8_c:              2782.2   2650.7  ( 1.05x)
draw_edges_1920_4_8_negstride_c:    2724.7   2670.0  ( 1.02x)
draw_edges_1920_4_16_c:             6437.5   6327.7  ( 1.02x)
draw_edges_1920_4_16_negstride_c:   6395.2   6349.5  ( 1.01x)

A55:
                                    before    after
draw_edges_8_1724_4_c:             52540.4  19739.2  ( 2.66x)
draw_edges_8_1724_8_c:             45386.9  19847.4  ( 2.29x)
draw_edges_8_1724_16_c:            51995.4  23284.7  ( 2.23x)
draw_edges_128_407_4_c:            13401.1   6988.2  ( 1.92x)
draw_edges_128_407_8_c:            12218.4   7527.9  ( 1.62x)
draw_edges_128_407_16_c:           13695.9   8207.2  ( 1.67x)
draw_edges_1080_31_4_c:             3702.9   3110.4  ( 1.19x)
draw_edges_1080_31_8_c:             6015.6   5643.2  ( 1.07x)
draw_edges_1080_31_16_c:           12281.9  11901.4  ( 1.03x)
draw_edges_1920_4_4_c:              3957.9   3970.2  ( 1.00x)
draw_edges_1920_4_4_negstride_c:    3964.1   3825.2  ( 1.04x)
draw_edges_1920_4_8_c:              7757.9   7676.4  ( 1.01x)
draw_edges_1920_4_8_negstride_c:    7923.6   7812.4  ( 1.01x)
draw_edges_1920_4_16_c:            14791.6  15143.9  ( 0.98x)
draw_edges_1920_4_16_negstride_c:  14788.6  15163.4  ( 0.98x)

A76:
                                    before   after
draw_edges_8_1724_4_c:             39786.0  4968.5  ( 8.01x)
draw_edges_8_1724_8_c:             32971.5  5069.5  ( 6.50x)
draw_edges_8_1724_16_c:            40056.0  6017.2  ( 6.66x)
draw_edges_128_407_4_c:             9517.2  1210.5  ( 7.86x)
draw_edges_128_407_8_c:             8035.7  1346.2  ( 5.97x)
draw_edges_128_407_16_c:            9946.5  1648.2  ( 6.03x)
draw_edges_1080_31_4_c:             1308.0   660.7  ( 1.98x)
draw_edges_1080_31_8_c:             1785.5  1270.7  ( 1.41x)
draw_edges_1080_31_16_c:            3266.7  2591.5  ( 1.26x)
draw_edges_1920_4_4_c:              1151.0  1090.7  ( 1.06x)
draw_edges_1920_4_4_negstride_c:    1153.7  1096.5  ( 1.05x)
draw_edges_1920_4_8_c:              2220.7  2186.5  ( 1.02x)
draw_edges_1920_4_8_negstride_c:    2218.5  2193.5  ( 1.01x)
draw_edges_1920_4_16_c:             4324.2  4230.0  ( 1.02x)
draw_edges_1920_4_16_negstride_c:   4310.7  4233.0  ( 1.02x)
2024-08-26 12:50:26 +02:00
Ramiro Polla
3bfce2a104 avcodec/x86/mpegvideoencdsp: speed up draw_edges_mmx by using memcpy()
The mmx memory copy code is not nearly as efficient as memcpy(), which
would make draw_edges_mmx much slower than draw_edges_8_c.

Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz:
                                      before    after
draw_edges_8_1724_4_mmx:              8700.5   8751.8  ( 0.99x)
draw_edges_8_1724_8_mmx:             10441.7  10558.0  ( 0.99x)
draw_edges_8_1724_16_mmx:            10660.7  10799.5  ( 0.99x)
draw_edges_128_407_4_mmx:             4202.2   4099.3  ( 1.03x)
draw_edges_128_407_8_mmx:             4579.0   4511.3  ( 1.02x)
draw_edges_128_407_16_mmx:            5479.7   4729.5  ( 1.16x)
draw_edges_1080_31_4_mmx:             1546.7    658.0  ( 2.35x)
draw_edges_1080_31_8_mmx:             2745.5   1442.5  ( 1.90x)
draw_edges_1080_31_16_mmx:           12511.5   4901.0  ( 2.55x)
draw_edges_1920_4_4_mmx:              2659.0    705.0  ( 3.77x)
draw_edges_1920_4_4_negstride_mmx:    2643.0    729.0  ( 3.63x)
draw_edges_1920_4_8_mmx:              7845.0   2819.0  ( 2.78x)
draw_edges_1920_4_8_negstride_mmx:    7777.0   2747.3  ( 2.83x)
draw_edges_1920_4_16_mmx:            24583.7   6358.3  ( 3.87x)
draw_edges_1920_4_16_negstride_mmx:  24589.0   6367.0  ( 3.86x)
2024-08-26 12:50:21 +02:00
Ramiro Polla
9cdcbb639a avcodec/x86/mpegvideoencdsp: fix comment for draw_edges_mmx
Not only w == 8 and w == 16 are supported, but also w == 4.
2024-08-26 12:49:24 +02:00
Ramiro Polla
8c203ea7c7 avcodec/aarch64/mpegvideoencdsp: add dotprod implementation for pix_norm1
A55             A76
pix_norm1_c:        484.3           235.2
pix_norm1_neon:     193.8 ( 2.50x)   44.7 ( 5.26x)
pix_norm1_dotprod:   91.8 ( 5.28x)   21.2 (11.09x)
2024-08-26 12:49:04 +02:00
Ramiro Polla
9f68a3712e avcodec/aarch64/mpegvideoencdsp: add neon implementations for pix_sum and pix_norm1
A55             A76
pix_norm1_c:     478.2           234.2
pix_norm1_neon:  188.2 ( 2.54x)   41.2 ( 5.68x)
pix_sum_c:       304.2           244.0
pix_sum_neon:     77.2 ( 3.94x)   21.5 (11.35x)
2024-08-26 12:48:31 +02:00
Ramiro Polla
834964ce1a checkasm/mpegvideoencdsp: add pix_sum, pix_norm1, and draw_edges 2024-08-26 12:48:09 +02:00
Ramiro Polla
f9074427db avcodec/x86/mpegvideoencdsp: support negative strides in draw_edges_mmx() 2024-08-26 12:44:02 +02:00
Ramiro Polla
98610fe95f fate/checkasm: run the sw_yuv2yuv test 2024-08-26 12:16:40 +02:00
Zhao Zhili
12cdb30e37 avcodec/videotoolboxenc: Fix leaking of supported_props
There are two VTCompressionSessionRef been created, one for generating
extradata, and another for normal encoding. supported_props was been
overwritten without release.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-26 17:09:46 +08:00
Ramiro Polla
420d443600 swscale/aarch64: cosmetics fix (spaces inside curly braces) 2024-08-26 11:07:49 +02:00
Ramiro Polla
52887683e9 swscale/aarch64: add nv24/nv42 to yuv420p unscaled converter
A55               A76
nv24_yuv420p_128_c:       4956.1            1267.0
nv24_yuv420p_128_neon:    3109.1 ( 1.59x)    640.0 ( 1.98x)
nv24_yuv420p_1920_c:     35728.4           11736.2
nv24_yuv420p_1920_neon:   8011.1 ( 4.46x)   2436.0 ( 4.82x)
nv42_yuv420p_128_c:       4956.4            1270.5
nv42_yuv420p_128_neon:    3074.6 ( 1.61x)    639.5 ( 1.99x)
nv42_yuv420p_1920_c:     35685.9           11732.5
nv42_yuv420p_1920_neon:   7995.1 ( 4.46x)   2437.2 ( 4.81x)
2024-08-26 11:04:46 +02:00
Ramiro Polla
88a563ad18 swscale: export ff_copyPlane so it may be used by simd code 2024-08-26 11:04:46 +02:00
Ramiro Polla
a2e01cade8 checkasm/yuv2yuv: add tests for semiplanar unscaled converters 2024-08-26 11:04:46 +02:00
Ramiro Polla
4eb5594295 swscale: add nv24/nv42 to yuv420p unscaled converter 2024-08-26 11:04:46 +02:00
Zhao Zhili
aa14f9fe63 avcodec/mediacodecdec: Skip dequeue buffer in draining state
There is no more packet to queue in draining state.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-26 16:59:07 +08:00
Zhao Zhili
2e370805da avfilter/unsharp: Merge header into .c
It was shared with opencl implementation.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-26 16:58:25 +08:00
Stefan Oltmanns
d42cd5b75b avformat/vapoursynth: load library at runtime
Signed-off-by: Stefan Oltmanns <stefan-oltmanns@gmx.net>
2024-08-26 10:30:52 +02:00
Stefan Oltmanns
eac611f1a4 avformat/vapoursynth: Update to API version 4
Signed-off-by: Stefan Oltmanns <stefan-oltmanns@gmx.net>
2024-08-26 10:30:50 +02:00
Ramiro Polla
abb4e13a0a avutil/aarch64: add AV_COPY128 and AV_ZERO128 macros 2024-08-26 10:26:59 +02:00
Zhao Zhili
40dda881d6 avcodec/filter_units: Fix extradata and packets can have different bitstream format
Filter init can change extradata from avcc/hvcc to annexb format.
With different passthrough logic, packets can still in avcc/hvcc
format. Use same passthrough logic for init and filter.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-24 00:27:15 +08:00
Zhao Zhili
523189c744 fftools/ffplay: handle flip in display matrix
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-24 00:26:59 +08:00
Gnattu OC
30f090b4f8 avfilter: inherit input color range for videotoolbox filters
The color range should be set to match the input when creating
the VideoToolbox context. Otherwise, the new context will default
to limited range, creates inconsistencies with full range inputs.

Signed-off-by: Gnattu OC <gnattuoc@me.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-24 00:24:06 +08:00
Martin Storsjö
cfe0a36352 libswscale: aarch64: Fix the indentation of some macro invocations
Signed-off-by: Martin Storsjö <martin@martin.st>
2024-08-22 14:40:30 +03:00
James Almer
9d15fe77e3 avcodec/container_fifo: add missing stddef.h include
Fixes make checkheaders

Signed-off-by: James Almer <jamrial@gmail.com>
2024-08-21 15:12:46 -03:00
James Almer
a754ee0844 avcodec/h2645_parse: replace three bool arguments in ff_h2645_packet_split with a single flags one
Signed-off-by: James Almer <jamrial@gmail.com>
2024-08-19 20:23:20 -03:00
James Almer
8060644237 avcodec/shorten: Fix discard of ‘const’ qualifier
Signed-off-by: James Almer <jamrial@gmail.com>
2024-08-19 17:40:00 -03:00
Martin Storsjö
507c2a5774 libswscale: arm: Don't assume aligned output in yuv2rgb functions
This fixes failures in recently added checkasm tests.

While the buffers in most cases are aligned, libswscale in general
can't assume the output to be aligned.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-08-19 23:04:52 +03:00
Anton Khirnov
52471b56ba lavfi: make FFFilterContext private to generic code
Nothing in it needs to be visible to filters.
2024-08-19 21:48:11 +02:00
Anton Khirnov
f19c988911 lavfi/filters: move functions only used by generic code to avfilter_internal.h 2024-08-19 21:48:11 +02:00
Anton Khirnov
6d75d44d90 lavfi: drop internal.h
All that remains in it are things that belong in avfilter_internal.h.

Move them there and remove internal.h
2024-08-19 21:48:04 +02:00
Anton Khirnov
90e4af65e1 lavfi/f_streamselect: remove a no-op ff_filter_config_links() call
It does not do anything when the links are already configured.
2024-08-19 21:45:25 +02:00
Anton Khirnov
a2314308f2 lavfi/inernal: move ff_fmt_is_regular_yuv() declaration to video.h 2024-08-19 21:45:25 +02:00
Anton Khirnov
a83a30e899 lavfi: move ff_parse_{sample_rate,channel_layout}() to audio.[ch]
That is a more appropriate place for those functions.
2024-08-19 21:45:25 +02:00
Anton Khirnov
f4bfdf7893 lavfi: move ff_parse_pixel_format() to vf_format, its only caller
The only thing this function does beyond calling av_get_pix_fmt() is
falling back onto parsing the argument as a number. No other filters
should need to do this.
2024-08-19 21:45:25 +02:00
Anton Khirnov
1afe42852b lavfi/internal: move functions used by filters to filters.h
internal.h currently mixes interfaces intended to be used by filters
with those that should be limited to generic filter- or graph-level
code.
2024-08-19 21:45:25 +02:00
Rémi Denis-Courmont
d8fb44c0aa lavc/mpegvideoencdsp: R-V V add_8x8basis
T-Head C908:
add_8x8basis_c:      440.6
add_8x8basis_rvv_i32: 70.3

SpacemiT X60:
add_8x8basis_c:      436.3
add_8x8basis_rvv_i32: 40.5
2024-08-19 22:41:13 +03:00
Rémi Denis-Courmont
1907dd7f23 lavc/mpegvideoencdsp: R-V V try_8x8basis
T-Head C908:
try_8x8basis_c:       922.5
try_8x8basis_rvv_i32: 135.3

SpacemiT X60:
try_8x8basis_c:       926.1
try_8x8basis_rvv_i32: 103.1
2024-08-19 22:41:13 +03:00
Rémi Denis-Courmont
0fd37c00d7 lavc/mpegvideoencdsp: R-V V pix_norm1
T-Head C908:
pix_norm1_c:       480.2
pix_norm1_rvv_i64: 146.9

SpacemiT X60:
pix_norm1_c:       478.2
pix_norm1_rvv_i64:  92.7
2024-08-19 22:41:13 +03:00
Rémi Denis-Courmont
63d016aea5 lavc/mpegvideoencdsp: R-V V pix_sum
T-Head C908:
pix_sum_c:      332.2
pix_sum_rvv_i64: 91.2

SpacemiT X60:
pix_sum_c:      321.2
pix_sum_rvv_i64: 60.9
2024-08-19 22:41:13 +03:00
Anton Khirnov
631a725670 lavc/hevcdec: call ff_thread_finish_setup() even if hwaccel is in use
Serializing frame threading for non-threadsafe hwaccels is handled at the
generic level, the decoder does not need to care about it.
2024-08-19 21:37:22 +02:00
Anton Khirnov
4b9adb35b6 lavc/hevcdec: simplify output logic
Current code is written around the "simple" decode API's limitation that
a single input packet (AU/coded frame) triggers the output of at most
one output frame. However the spec contains two cases where a coded
frame may cause multiple frames to be output (cf. C.5.2.2.2):
* start of a new sequence
* overflowing sps_max_dec_pic_buffering

The decoder currently contains rather convoluted logic to handle these
cases:
* decode/output/per-frame sequence counters,
* HEVC_FRAME_FLAG_BUMPING
* ff_hevc_bump_frame()
* special clauses in ff_hevc_output_frame()

However, with the receive_frame() API none of that is necessary, as we
can just output multiple frames at once. Previously added ContainerFifo
allows that to be done in a straightforward and efficient manner.
2024-08-19 21:37:22 +02:00
Anton Khirnov
79afc45c03 lavc/hevcdec: use a ContainerFifo to hold frames scheduled for output
Instead of a single AVFrame.

Will be useful in future commits, where we will want to produce multiple
output frames for a single coded frame.
2024-08-19 21:37:22 +02:00
Anton Khirnov
4bda7f288c lavc/videotoolbox: drop HEVC cropping from start_frame rather than end_frame
HEVCContext.output_frame will be removed in following commits.

Reported-By: Max Bykov
2024-08-19 21:37:22 +02:00