J. Dekker
42528ff835
checkasm: add csv/tsv bench output
...
When collecting performance information from checkasm it is common
to parse the output for use in graphs to compare vs different
architectures.
Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-08-28 11:45:46 +02:00
Anton Khirnov
d89930f866
lavu/opt: add API for retrieving array-type option values
...
Previously one could only convert the entire array to a string, not
access individual elements.
2024-08-27 16:53:16 +02:00
Anton Khirnov
4a5bb84515
lavu/opt: forward av_opt_get_video_rate() to av_opt_get_q()
...
The two functions are exactly the same.
2024-08-27 16:53:16 +02:00
Anton Khirnov
efe38286d1
lavu/opt: document underlying C types for enum AVOptionType
2024-08-27 16:53:16 +02:00
Ramiro Polla
7e4784e40c
avcodec/mpegvideoencdsp: speed up draw_edges_8_c by inlining it for all used edge widths
...
This commit also restricts w to 4, 8, or 16.
Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz:
before after
draw_edges_8_1724_4_c: 46796.5 7141.7 ( 6.55x)
draw_edges_8_1724_8_c: 43584.5 7216.5 ( 6.04x)
draw_edges_8_1724_16_c: 47007.2 10080.5 ( 4.66x)
draw_edges_128_407_4_c: 11199.0 4185.0 ( 2.68x)
draw_edges_128_407_8_c: 10660.2 4418.0 ( 2.41x)
draw_edges_128_407_16_c: 11800.2 4634.5 ( 2.55x)
draw_edges_1080_31_4_c: 1356.5 634.7 ( 2.14x)
draw_edges_1080_31_8_c: 1972.0 1430.2 ( 1.38x)
draw_edges_1080_31_16_c: 4621.0 4009.7 ( 1.15x)
draw_edges_1920_4_4_c: 834.5 795.2 ( 1.05x)
draw_edges_1920_4_4_negstride_c: 821.7 802.0 ( 1.02x)
draw_edges_1920_4_8_c: 2782.2 2650.7 ( 1.05x)
draw_edges_1920_4_8_negstride_c: 2724.7 2670.0 ( 1.02x)
draw_edges_1920_4_16_c: 6437.5 6327.7 ( 1.02x)
draw_edges_1920_4_16_negstride_c: 6395.2 6349.5 ( 1.01x)
A55:
before after
draw_edges_8_1724_4_c: 52540.4 19739.2 ( 2.66x)
draw_edges_8_1724_8_c: 45386.9 19847.4 ( 2.29x)
draw_edges_8_1724_16_c: 51995.4 23284.7 ( 2.23x)
draw_edges_128_407_4_c: 13401.1 6988.2 ( 1.92x)
draw_edges_128_407_8_c: 12218.4 7527.9 ( 1.62x)
draw_edges_128_407_16_c: 13695.9 8207.2 ( 1.67x)
draw_edges_1080_31_4_c: 3702.9 3110.4 ( 1.19x)
draw_edges_1080_31_8_c: 6015.6 5643.2 ( 1.07x)
draw_edges_1080_31_16_c: 12281.9 11901.4 ( 1.03x)
draw_edges_1920_4_4_c: 3957.9 3970.2 ( 1.00x)
draw_edges_1920_4_4_negstride_c: 3964.1 3825.2 ( 1.04x)
draw_edges_1920_4_8_c: 7757.9 7676.4 ( 1.01x)
draw_edges_1920_4_8_negstride_c: 7923.6 7812.4 ( 1.01x)
draw_edges_1920_4_16_c: 14791.6 15143.9 ( 0.98x)
draw_edges_1920_4_16_negstride_c: 14788.6 15163.4 ( 0.98x)
A76:
before after
draw_edges_8_1724_4_c: 39786.0 4968.5 ( 8.01x)
draw_edges_8_1724_8_c: 32971.5 5069.5 ( 6.50x)
draw_edges_8_1724_16_c: 40056.0 6017.2 ( 6.66x)
draw_edges_128_407_4_c: 9517.2 1210.5 ( 7.86x)
draw_edges_128_407_8_c: 8035.7 1346.2 ( 5.97x)
draw_edges_128_407_16_c: 9946.5 1648.2 ( 6.03x)
draw_edges_1080_31_4_c: 1308.0 660.7 ( 1.98x)
draw_edges_1080_31_8_c: 1785.5 1270.7 ( 1.41x)
draw_edges_1080_31_16_c: 3266.7 2591.5 ( 1.26x)
draw_edges_1920_4_4_c: 1151.0 1090.7 ( 1.06x)
draw_edges_1920_4_4_negstride_c: 1153.7 1096.5 ( 1.05x)
draw_edges_1920_4_8_c: 2220.7 2186.5 ( 1.02x)
draw_edges_1920_4_8_negstride_c: 2218.5 2193.5 ( 1.01x)
draw_edges_1920_4_16_c: 4324.2 4230.0 ( 1.02x)
draw_edges_1920_4_16_negstride_c: 4310.7 4233.0 ( 1.02x)
2024-08-26 12:50:26 +02:00
Ramiro Polla
3bfce2a104
avcodec/x86/mpegvideoencdsp: speed up draw_edges_mmx by using memcpy()
...
The mmx memory copy code is not nearly as efficient as memcpy(), which
would make draw_edges_mmx much slower than draw_edges_8_c.
Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz:
before after
draw_edges_8_1724_4_mmx: 8700.5 8751.8 ( 0.99x)
draw_edges_8_1724_8_mmx: 10441.7 10558.0 ( 0.99x)
draw_edges_8_1724_16_mmx: 10660.7 10799.5 ( 0.99x)
draw_edges_128_407_4_mmx: 4202.2 4099.3 ( 1.03x)
draw_edges_128_407_8_mmx: 4579.0 4511.3 ( 1.02x)
draw_edges_128_407_16_mmx: 5479.7 4729.5 ( 1.16x)
draw_edges_1080_31_4_mmx: 1546.7 658.0 ( 2.35x)
draw_edges_1080_31_8_mmx: 2745.5 1442.5 ( 1.90x)
draw_edges_1080_31_16_mmx: 12511.5 4901.0 ( 2.55x)
draw_edges_1920_4_4_mmx: 2659.0 705.0 ( 3.77x)
draw_edges_1920_4_4_negstride_mmx: 2643.0 729.0 ( 3.63x)
draw_edges_1920_4_8_mmx: 7845.0 2819.0 ( 2.78x)
draw_edges_1920_4_8_negstride_mmx: 7777.0 2747.3 ( 2.83x)
draw_edges_1920_4_16_mmx: 24583.7 6358.3 ( 3.87x)
draw_edges_1920_4_16_negstride_mmx: 24589.0 6367.0 ( 3.86x)
2024-08-26 12:50:21 +02:00
Ramiro Polla
9cdcbb639a
avcodec/x86/mpegvideoencdsp: fix comment for draw_edges_mmx
...
Not only w == 8 and w == 16 are supported, but also w == 4.
2024-08-26 12:49:24 +02:00
Ramiro Polla
8c203ea7c7
avcodec/aarch64/mpegvideoencdsp: add dotprod implementation for pix_norm1
...
A55 A76
pix_norm1_c: 484.3 235.2
pix_norm1_neon: 193.8 ( 2.50x) 44.7 ( 5.26x)
pix_norm1_dotprod: 91.8 ( 5.28x) 21.2 (11.09x)
2024-08-26 12:49:04 +02:00
Ramiro Polla
9f68a3712e
avcodec/aarch64/mpegvideoencdsp: add neon implementations for pix_sum and pix_norm1
...
A55 A76
pix_norm1_c: 478.2 234.2
pix_norm1_neon: 188.2 ( 2.54x) 41.2 ( 5.68x)
pix_sum_c: 304.2 244.0
pix_sum_neon: 77.2 ( 3.94x) 21.5 (11.35x)
2024-08-26 12:48:31 +02:00
Ramiro Polla
834964ce1a
checkasm/mpegvideoencdsp: add pix_sum, pix_norm1, and draw_edges
2024-08-26 12:48:09 +02:00
Ramiro Polla
f9074427db
avcodec/x86/mpegvideoencdsp: support negative strides in draw_edges_mmx()
2024-08-26 12:44:02 +02:00
Ramiro Polla
98610fe95f
fate/checkasm: run the sw_yuv2yuv test
2024-08-26 12:16:40 +02:00
Zhao Zhili
12cdb30e37
avcodec/videotoolboxenc: Fix leaking of supported_props
...
There are two VTCompressionSessionRef been created, one for generating
extradata, and another for normal encoding. supported_props was been
overwritten without release.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-26 17:09:46 +08:00
Ramiro Polla
420d443600
swscale/aarch64: cosmetics fix (spaces inside curly braces)
2024-08-26 11:07:49 +02:00
Ramiro Polla
52887683e9
swscale/aarch64: add nv24/nv42 to yuv420p unscaled converter
...
A55 A76
nv24_yuv420p_128_c: 4956.1 1267.0
nv24_yuv420p_128_neon: 3109.1 ( 1.59x) 640.0 ( 1.98x)
nv24_yuv420p_1920_c: 35728.4 11736.2
nv24_yuv420p_1920_neon: 8011.1 ( 4.46x) 2436.0 ( 4.82x)
nv42_yuv420p_128_c: 4956.4 1270.5
nv42_yuv420p_128_neon: 3074.6 ( 1.61x) 639.5 ( 1.99x)
nv42_yuv420p_1920_c: 35685.9 11732.5
nv42_yuv420p_1920_neon: 7995.1 ( 4.46x) 2437.2 ( 4.81x)
2024-08-26 11:04:46 +02:00
Ramiro Polla
88a563ad18
swscale: export ff_copyPlane so it may be used by simd code
2024-08-26 11:04:46 +02:00
Ramiro Polla
a2e01cade8
checkasm/yuv2yuv: add tests for semiplanar unscaled converters
2024-08-26 11:04:46 +02:00
Ramiro Polla
4eb5594295
swscale: add nv24/nv42 to yuv420p unscaled converter
2024-08-26 11:04:46 +02:00
Zhao Zhili
aa14f9fe63
avcodec/mediacodecdec: Skip dequeue buffer in draining state
...
There is no more packet to queue in draining state.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-26 16:59:07 +08:00
Zhao Zhili
2e370805da
avfilter/unsharp: Merge header into .c
...
It was shared with opencl implementation.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-26 16:58:25 +08:00
Stefan Oltmanns
d42cd5b75b
avformat/vapoursynth: load library at runtime
...
Signed-off-by: Stefan Oltmanns <stefan-oltmanns@gmx.net>
2024-08-26 10:30:52 +02:00
Stefan Oltmanns
eac611f1a4
avformat/vapoursynth: Update to API version 4
...
Signed-off-by: Stefan Oltmanns <stefan-oltmanns@gmx.net>
2024-08-26 10:30:50 +02:00
Ramiro Polla
abb4e13a0a
avutil/aarch64: add AV_COPY128 and AV_ZERO128 macros
2024-08-26 10:26:59 +02:00
Zhao Zhili
40dda881d6
avcodec/filter_units: Fix extradata and packets can have different bitstream format
...
Filter init can change extradata from avcc/hvcc to annexb format.
With different passthrough logic, packets can still in avcc/hvcc
format. Use same passthrough logic for init and filter.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-24 00:27:15 +08:00
Zhao Zhili
523189c744
fftools/ffplay: handle flip in display matrix
...
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-24 00:26:59 +08:00
Gnattu OC
30f090b4f8
avfilter: inherit input color range for videotoolbox filters
...
The color range should be set to match the input when creating
the VideoToolbox context. Otherwise, the new context will default
to limited range, creates inconsistencies with full range inputs.
Signed-off-by: Gnattu OC <gnattuoc@me.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-24 00:24:06 +08:00
Martin Storsjö
cfe0a36352
libswscale: aarch64: Fix the indentation of some macro invocations
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2024-08-22 14:40:30 +03:00
James Almer
9d15fe77e3
avcodec/container_fifo: add missing stddef.h include
...
Fixes make checkheaders
Signed-off-by: James Almer <jamrial@gmail.com>
2024-08-21 15:12:46 -03:00
James Almer
a754ee0844
avcodec/h2645_parse: replace three bool arguments in ff_h2645_packet_split with a single flags one
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-08-19 20:23:20 -03:00
James Almer
8060644237
avcodec/shorten: Fix discard of ‘const’ qualifier
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-08-19 17:40:00 -03:00
Martin Storsjö
507c2a5774
libswscale: arm: Don't assume aligned output in yuv2rgb functions
...
This fixes failures in recently added checkasm tests.
While the buffers in most cases are aligned, libswscale in general
can't assume the output to be aligned.
Signed-off-by: Martin Storsjö <martin@martin.st>
2024-08-19 23:04:52 +03:00
Anton Khirnov
52471b56ba
lavfi: make FFFilterContext private to generic code
...
Nothing in it needs to be visible to filters.
2024-08-19 21:48:11 +02:00
Anton Khirnov
f19c988911
lavfi/filters: move functions only used by generic code to avfilter_internal.h
2024-08-19 21:48:11 +02:00
Anton Khirnov
6d75d44d90
lavfi: drop internal.h
...
All that remains in it are things that belong in avfilter_internal.h.
Move them there and remove internal.h
2024-08-19 21:48:04 +02:00
Anton Khirnov
90e4af65e1
lavfi/f_streamselect: remove a no-op ff_filter_config_links() call
...
It does not do anything when the links are already configured.
2024-08-19 21:45:25 +02:00
Anton Khirnov
a2314308f2
lavfi/inernal: move ff_fmt_is_regular_yuv() declaration to video.h
2024-08-19 21:45:25 +02:00
Anton Khirnov
a83a30e899
lavfi: move ff_parse_{sample_rate,channel_layout}() to audio.[ch]
...
That is a more appropriate place for those functions.
2024-08-19 21:45:25 +02:00
Anton Khirnov
f4bfdf7893
lavfi: move ff_parse_pixel_format() to vf_format, its only caller
...
The only thing this function does beyond calling av_get_pix_fmt() is
falling back onto parsing the argument as a number. No other filters
should need to do this.
2024-08-19 21:45:25 +02:00
Anton Khirnov
1afe42852b
lavfi/internal: move functions used by filters to filters.h
...
internal.h currently mixes interfaces intended to be used by filters
with those that should be limited to generic filter- or graph-level
code.
2024-08-19 21:45:25 +02:00
Rémi Denis-Courmont
d8fb44c0aa
lavc/mpegvideoencdsp: R-V V add_8x8basis
...
T-Head C908:
add_8x8basis_c: 440.6
add_8x8basis_rvv_i32: 70.3
SpacemiT X60:
add_8x8basis_c: 436.3
add_8x8basis_rvv_i32: 40.5
2024-08-19 22:41:13 +03:00
Rémi Denis-Courmont
1907dd7f23
lavc/mpegvideoencdsp: R-V V try_8x8basis
...
T-Head C908:
try_8x8basis_c: 922.5
try_8x8basis_rvv_i32: 135.3
SpacemiT X60:
try_8x8basis_c: 926.1
try_8x8basis_rvv_i32: 103.1
2024-08-19 22:41:13 +03:00
Rémi Denis-Courmont
0fd37c00d7
lavc/mpegvideoencdsp: R-V V pix_norm1
...
T-Head C908:
pix_norm1_c: 480.2
pix_norm1_rvv_i64: 146.9
SpacemiT X60:
pix_norm1_c: 478.2
pix_norm1_rvv_i64: 92.7
2024-08-19 22:41:13 +03:00
Rémi Denis-Courmont
63d016aea5
lavc/mpegvideoencdsp: R-V V pix_sum
...
T-Head C908:
pix_sum_c: 332.2
pix_sum_rvv_i64: 91.2
SpacemiT X60:
pix_sum_c: 321.2
pix_sum_rvv_i64: 60.9
2024-08-19 22:41:13 +03:00
Anton Khirnov
631a725670
lavc/hevcdec: call ff_thread_finish_setup() even if hwaccel is in use
...
Serializing frame threading for non-threadsafe hwaccels is handled at the
generic level, the decoder does not need to care about it.
2024-08-19 21:37:22 +02:00
Anton Khirnov
4b9adb35b6
lavc/hevcdec: simplify output logic
...
Current code is written around the "simple" decode API's limitation that
a single input packet (AU/coded frame) triggers the output of at most
one output frame. However the spec contains two cases where a coded
frame may cause multiple frames to be output (cf. C.5.2.2.2):
* start of a new sequence
* overflowing sps_max_dec_pic_buffering
The decoder currently contains rather convoluted logic to handle these
cases:
* decode/output/per-frame sequence counters,
* HEVC_FRAME_FLAG_BUMPING
* ff_hevc_bump_frame()
* special clauses in ff_hevc_output_frame()
However, with the receive_frame() API none of that is necessary, as we
can just output multiple frames at once. Previously added ContainerFifo
allows that to be done in a straightforward and efficient manner.
2024-08-19 21:37:22 +02:00
Anton Khirnov
79afc45c03
lavc/hevcdec: use a ContainerFifo to hold frames scheduled for output
...
Instead of a single AVFrame.
Will be useful in future commits, where we will want to produce multiple
output frames for a single coded frame.
2024-08-19 21:37:22 +02:00
Anton Khirnov
4bda7f288c
lavc/videotoolbox: drop HEVC cropping from start_frame rather than end_frame
...
HEVCContext.output_frame will be removed in following commits.
Reported-By: Max Bykov
2024-08-19 21:37:22 +02:00
Anton Khirnov
6174818252
lavc: add private container FIFO API
...
It provides a FIFO for "container" objects like AVFrame/AVPacket and
features an integrated FFRefStructPool-based pool to avoid allocating an
freeing them repeatedly.
2024-08-19 21:37:22 +02:00
Anton Khirnov
2fdecbb239
lavc/hevcdec: switch to receive_frame()
...
Required by following commits, where we will want to output multiple
frames per packet.
2024-08-19 21:37:22 +02:00
sunyuechi
4e7b5ac48f
lavc/vp9dsp: R-V V mc bilin hv
...
C908 X60
vp9_avg_bilin_4hv_8bpp_c : 10.7 9.5
vp9_avg_bilin_4hv_8bpp_rvv_i32 : 4.0 3.5
vp9_avg_bilin_8hv_8bpp_c : 38.5 34.2
vp9_avg_bilin_8hv_8bpp_rvv_i32 : 7.2 6.5
vp9_avg_bilin_16hv_8bpp_c : 147.2 130.5
vp9_avg_bilin_16hv_8bpp_rvv_i32 : 14.5 12.7
vp9_avg_bilin_32hv_8bpp_c : 574.2 509.7
vp9_avg_bilin_32hv_8bpp_rvv_i32 : 42.5 38.0
vp9_avg_bilin_64hv_8bpp_c : 2321.2 2017.7
vp9_avg_bilin_64hv_8bpp_rvv_i32 : 163.5 131.0
vp9_put_bilin_4hv_8bpp_c : 10.0 8.7
vp9_put_bilin_4hv_8bpp_rvv_i32 : 3.5 3.0
vp9_put_bilin_8hv_8bpp_c : 35.2 31.2
vp9_put_bilin_8hv_8bpp_rvv_i32 : 6.5 5.7
vp9_put_bilin_16hv_8bpp_c : 134.0 119.0
vp9_put_bilin_16hv_8bpp_rvv_i32 : 12.7 11.5
vp9_put_bilin_32hv_8bpp_c : 538.5 464.2
vp9_put_bilin_32hv_8bpp_rvv_i32 : 39.7 35.2
vp9_put_bilin_64hv_8bpp_c : 2111.7 1833.2
vp9_put_bilin_64hv_8bpp_rvv_i32 : 138.5 122.5
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-08-19 22:29:20 +03:00