Commit Graph

112581 Commits

Author SHA1 Message Date
Andreas Rheinhardt a99285aedf avcodec/asvdec: Avoid superfluous VLC structures
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:46:59 +01:00
Andreas Rheinhardt ab8a8246c8 avcodec/h264_cavlc: Remove code duplication
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:46:59 +01:00
Andreas Rheinhardt bd4c778e19 avcodec/h264_cavlc: Avoid indirection for coefficient table VLCs
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:46:59 +01:00
Andreas Rheinhardt fe748ddf62 avcodec/h264_cavlc: Avoid superfluous VLC structures
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:46:59 +01:00
Andreas Rheinhardt c630d76b27 avcodec/vp3: Increase some VLC tables
These are quite small and therefore force reloads
that can be avoided by modest increases in the number of bits used.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:46:59 +01:00
Andreas Rheinhardt 1fee3a3dce avcodec/vp3: Make VLC tables static where possible
This is especially important for frame-threaded decoders like
this one, because up until now each thread had an identical
copy of all these VLC tables.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:46:59 +01:00
Andreas Rheinhardt edc50658d9 avcodec/vlc: Add functions to init static VLCElem[] without VLC
For lots of static VLCs, the number of bits is not read from
VLC.bits, but rather a compile-constant that is hardcoded
at the callsite of get_vlc2(). Only VLC.table is ever used
and not using it directly is just an unnecessary indirection.

This commit adds helper functions and macros to avoid the VLC
structure when initializing VLC tables; there are 2x2 functions:
Two choices for init_sparse or from_lengths and two choices
for "overlong" initialization (as used when multiple VLCs are
initialized that share the same underlying table).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:46:59 +01:00
Rémi Denis-Courmont 424c8ceb08 lavc/huffyuvdsp: R-V V add_int16
add_int16_128_c:      2390.5
add_int16_128_rvv_i32: 832.0
add_int16_rnd_width_c:      2390.2
add_int16_rnd_width_rvv_i32: 832.5
2023-10-31 21:33:25 +02:00
Rémi Denis-Courmont 7e1cdc69fb lavc/utvideodsp: R-V V restore_rgb_planes10
restore_rgb_planes10_c:      185852.2
restore_rgb_planes10_rvv_i32: 90130.5
2023-10-31 21:33:25 +02:00
Rémi Denis-Courmont 4aea0da230 lavc/utvideodsp: R-V V restore_rgb_planes
restore_rgb_planes_c:      133065.7
restore_rgb_planes_rvv_i32: 33317.2
2023-10-31 21:33:25 +02:00
Niklas Haas 6aff17a451 avformat/vf_vapoursynth: simplify xyz format check 2023-10-31 15:46:38 +01:00
Niklas Haas 93f07d98d9 avutil/pixdesc: simplify xyz pixfmt check 2023-10-31 15:46:38 +01:00
Niklas Haas d312a33ed2 avfilter/drawutils: remove redundant xyz format check
The code above this does a whitelist on desc->flags, which now includes
the (disallowed) AV_PIX_FMT_FLAG_XYZ for XYZ formats. So there is no
more need for a separate check, here.
2023-10-31 15:46:38 +01:00
Niklas Haas 57c16323f2 avutil/pixdesc: add AV_PIX_FMT_FLAG_XYZ
There are already several places in the codebase that match desc->name
against "xyz", and many downstream clients replicate this behavior.
I have no idea why this is not just a flag.

Motivated by my desire to add yet another check for XYZ to the codebase,
and I'd rather not keep copy/pasting a string comparison hack.
2023-10-31 15:46:07 +01:00
Niklas Haas 96dfc4481b avfilter/drawutils: ban XYZ formats
These are not supported by the drawing functions at all, and were
incorrectly advertised as supported in the past.

Note: This check is added only to separate the logic change from the API
change in the following commit, and will be removed again after it
becomes redundant.
2023-10-31 15:43:30 +01:00
Logan Lyu 55f28eb627 lavc/aarch64: new optimization for 8-bit hevc_qpel_hv
checkasm bench:
put_hevc_qpel_hv4_8_c: 422.1
put_hevc_qpel_hv4_8_i8mm: 101.6
put_hevc_qpel_hv6_8_c: 756.4
put_hevc_qpel_hv6_8_i8mm: 225.9
put_hevc_qpel_hv8_8_c: 1189.9
put_hevc_qpel_hv8_8_i8mm: 296.6
put_hevc_qpel_hv12_8_c: 2407.4
put_hevc_qpel_hv12_8_i8mm: 552.4
put_hevc_qpel_hv16_8_c: 4021.4
put_hevc_qpel_hv16_8_i8mm: 886.6
put_hevc_qpel_hv24_8_c: 8992.1
put_hevc_qpel_hv24_8_i8mm: 1968.9
put_hevc_qpel_hv32_8_c: 15197.9
put_hevc_qpel_hv32_8_i8mm: 3209.4
put_hevc_qpel_hv48_8_c: 32811.1
put_hevc_qpel_hv48_8_i8mm: 7442.1
put_hevc_qpel_hv64_8_c: 58106.1
put_hevc_qpel_hv64_8_i8mm: 12423.9

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-31 14:14:21 +02:00
Logan Lyu 97a9d12657 lavc/aarch64: new optimization for 8-bit hevc_qpel_v
checkasm bench:

put_hevc_qpel_v4_8_c: 138.1
put_hevc_qpel_v4_8_neon: 41.1
put_hevc_qpel_v6_8_c: 276.6
put_hevc_qpel_v6_8_neon: 60.9
put_hevc_qpel_v8_8_c: 478.9
put_hevc_qpel_v8_8_neon: 72.9
put_hevc_qpel_v12_8_c: 1072.6
put_hevc_qpel_v12_8_neon: 203.9
put_hevc_qpel_v16_8_c: 1852.1
put_hevc_qpel_v16_8_neon: 264.1
put_hevc_qpel_v24_8_c: 4137.6
put_hevc_qpel_v24_8_neon: 586.9
put_hevc_qpel_v32_8_c: 7579.1
put_hevc_qpel_v32_8_neon: 1036.6
put_hevc_qpel_v48_8_c: 16355.6
put_hevc_qpel_v48_8_neon: 2326.4
put_hevc_qpel_v64_8_c: 33545.1
put_hevc_qpel_v64_8_neon: 4126.4

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-31 14:14:21 +02:00
Logan Lyu 265450b89e lavc/aarch64: new optimization for 8-bit hevc_epel_hv
checkasm bench:
put_hevc_epel_hv4_8_c: 213.7
put_hevc_epel_hv4_8_i8mm: 59.4
put_hevc_epel_hv6_8_c: 350.9
put_hevc_epel_hv6_8_i8mm: 130.2
put_hevc_epel_hv8_8_c: 548.7
put_hevc_epel_hv8_8_i8mm: 136.9
put_hevc_epel_hv12_8_c: 1126.7
put_hevc_epel_hv12_8_i8mm: 302.2
put_hevc_epel_hv16_8_c: 1925.2
put_hevc_epel_hv16_8_i8mm: 459.9
put_hevc_epel_hv24_8_c: 4301.9
put_hevc_epel_hv24_8_i8mm: 1024.9
put_hevc_epel_hv32_8_c: 7509.2
put_hevc_epel_hv32_8_i8mm: 1680.4
put_hevc_epel_hv48_8_c: 16566.9
put_hevc_epel_hv48_8_i8mm: 3945.4
put_hevc_epel_hv64_8_c: 29134.2
put_hevc_epel_hv64_8_i8mm: 6567.7

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-31 14:02:53 +02:00
Logan Lyu 22c7291506 lavc/aarch64: new optimization for 8-bit hevc_epel_v
checkasm bench:
put_hevc_epel_v4_8_c: 79.9
put_hevc_epel_v4_8_neon: 25.7
put_hevc_epel_v6_8_c: 151.4
put_hevc_epel_v6_8_neon: 46.4
put_hevc_epel_v8_8_c: 250.9
put_hevc_epel_v8_8_neon: 41.7
put_hevc_epel_v12_8_c: 542.7
put_hevc_epel_v12_8_neon: 108.7
put_hevc_epel_v16_8_c: 939.4
put_hevc_epel_v16_8_neon: 169.2
put_hevc_epel_v24_8_c: 2104.9
put_hevc_epel_v24_8_neon: 307.9
put_hevc_epel_v32_8_c: 3713.9
put_hevc_epel_v32_8_neon: 524.2
put_hevc_epel_v48_8_c: 8175.2
put_hevc_epel_v48_8_neon: 1197.2
put_hevc_epel_v64_8_c: 16049.4
put_hevc_epel_v64_8_neon: 2094.9

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-31 14:02:53 +02:00
Logan Lyu 772865717b lavc/aarch64: new optimization for 8-bit hevc_epel_pixels and and hevc_qpel_pixels
checkasm bench:
put_hevc_pel_pixels4_8_c: 33.7
put_hevc_pel_pixels4_8_neon: 20.2
put_hevc_pel_pixels6_8_c: 61.4
put_hevc_pel_pixels6_8_neon: 25.4
put_hevc_pel_pixels8_8_c: 121.4
put_hevc_pel_pixels8_8_neon: 16.9
put_hevc_pel_pixels12_8_c: 199.9
put_hevc_pel_pixels12_8_neon: 40.2
put_hevc_pel_pixels16_8_c: 355.9
put_hevc_pel_pixels16_8_neon: 43.4
put_hevc_pel_pixels24_8_c: 774.7
put_hevc_pel_pixels24_8_neon: 78.9
put_hevc_pel_pixels32_8_c: 1345.2
put_hevc_pel_pixels32_8_neon: 152.2
put_hevc_pel_pixels48_8_c: 2963.7
put_hevc_pel_pixels48_8_neon: 309.4
put_hevc_pel_pixels64_8_c: 5236.2
put_hevc_pel_pixels64_8_neon: 514.2

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-31 14:02:53 +02:00
Martin Storsjö 2c3d2a0245 configure: Improve aarch64 feature detection on older, broken Clang versions
Clang versions before 17 (Xcode versions up to and including 15.0)
had a very annoying bug in its behaviour of the ".arch" directive
in assembly. If the directive only contained a level, such as
".arch armv8.2-a", it did validate the name of the level, but it
didn't apply the level to what instructions are allowed. The level
was applied if the directive contained an extra feature enabled,
such as ".arch armv8.2-a+crc" though. It was also applied on the
next ".arch_extension" directive.

This bug, combined with the fact that the same versions of Clang
didn't support the dotprod/i8mm extension names in either
".arch <level>+<feature>" or in ".arch_extension", could lead to
unexepcted build failures.

As the dotprod/i8mm extensions couldn't be enabled dynamically
via the ".arch_extension" directive, someone building ffmpeg could
try to enable them by configuring their build with
--extra-cflags="-march=armv8.6-a".

During configure, we test for support for the i8mm instructions
like this:

    # Built with -march=armv8.6-a
    .arch armv8.2-a             # Has no visible effect here
    #.arch_extension i8mm       # Omitted as the extension name isn't known
    usdot v0.4s, v0.16b, v0.16b
    # Successfully assembled as armv8.6-a is the effective level,
    # and i8mm is enabled implicitly in armv8.6-a.

Thus, we would enable assembling those instructions. However if
we later check for another extension, such as sve (which those
versions of Clang actually do support), we can later run into the
following situation when building actual code:

    # Built with -march=armv8.6-a
    .arch armv8.2-a             # Has no visible effect here
    #.arch_extension i8mm       # Omitted as the extension name isn't known
    .arch_extension sve         # Included as "sve" is as supported extension name
    # .arch_extension effectively activates the previous .arch directive,
    # so the effective level is armv8.2-a+sve now.
    usdot v0.4s, v0.16b, v0.16b
    # Fails to build the instructions that require i8mm. Despite the
    # configure check, the unrelated ".arch_extension sve" directive
    # breaks the functionality of the i8mm feature.

This patch avoids this situation:
- By adding a dummy feature such as "+crc" on the .arch directive
  (if supported), we make sure that it does get applied immediately,
  avoiding it taking effect spuriously at a later unrelated
  ".arch_extension" directive.
- By checking for higher arch levels such as armv8.4-a and armv8.6-a,
  we can assemble the dotprod and i8mm extensions without the user
  needing to pass -march=armv8.6-a. This allows using the dotprod/i8mm
  codepaths via runtime detection while keeping the binary runnable
  on older versions. I.e. this enables the i8mm codepaths on Apple M2
  machines while built with Xcode's Clang.

TL;DR: Enable the I8MM extensions for Apple M2 without the user needing
to do a custom configuration; avoid potential build breakage if a user
does such a custom configuration.

Once Xcode versions that have these issues fixed are prevalent, we
can consider reverting this change.

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-31 12:23:31 +02:00
Martin Storsjö f05948ada4 aarch64: Simplify the linux runtime cpu detection code
Skip doing the whole getauxval(AT_HWCAP) if HWCAP_CPUID isn't
defined.

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-31 12:23:27 +02:00
Rémi Denis-Courmont ae72412aa8 lavc/idctdsp: improve R-V V put_pixels_clamped 2023-10-30 18:14:16 +02:00
Rémi Denis-Courmont d48810f3a5 lavc/idctdsp: improve R-V V add_pixels_clamped 2023-10-30 18:14:16 +02:00
Rémi Denis-Courmont 600c6f1b55 lavc/idctdsp: improve R-V V put_signed_pixels_clamped
This follows the same idea as with pixblockdsp, but applied at the
other end, whilst writing data at the end of the function.
2023-10-30 18:14:16 +02:00
Rémi Denis-Courmont 3ea2310e89 lavc/idctdsp: require Zve64x for R-V V functions
This will be required for the following changesets.
2023-10-30 18:14:16 +02:00
Rémi Denis-Courmont 300ee8b02d lavc/pixblockdsp: aligned R-V V 8-bit functions
If the scan lines are aligned, we can load each row as a 64-bit value,
thus avoiding segmentation. And then we can factor the conversion or
subtraction.

In principle, the same optimisation should be possible for high depth,
but would require 128-bit elements, for which no FFmpeg CPU flag
exists.
2023-10-30 18:14:16 +02:00
Rémi Denis-Courmont 722765687b lavc/pixblockdsp: rename unaligned R-V V functions 2023-10-30 18:14:16 +02:00
Paul B Mahol 6323ca5902 avfilter/vf_feedback: add timeline support 2023-10-30 16:06:46 +01:00
Paul B Mahol 2f268505b9 doc/filters: add one more example for feedback filter 2023-10-30 15:12:12 +01:00
James Almer 4cba3e0f07 avutil/video_enc_params: fix doxy for av_video_enc_params_block()
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: James Almer <jamrial@gmail.com>
2023-10-30 10:30:05 -03:00
Kieran Kunhya 2532e832d2 libavcodec/mpeg12: Reindent 2023-10-29 22:12:05 +00:00
Kieran Kunhya 7d497a1119 libavcodec/mpeg12: Remove "fast" mode 2023-10-29 22:12:02 +00:00
Rémi Denis-Courmont 04b49fb3c5 lavu/riscv: fix typo 2023-10-29 22:15:15 +02:00
TADANO Tokumei a824c6f2f6 lavc/libaribcaption: rename `replace_fullwidth_ascii` to `replace_msz_ascii`
This should hopefully clarify that the option only affects MSZ
full-width characters, and not all full-width ASCII. Additionally,
this matches the prefix with the upstream option.

Signed-off-by: TADANO Tokumei <aimingoff@pc.nifty.jp>
2023-10-29 18:21:05 +02:00
TADANO Tokumei 21bfadd9b4 lavc/libaribcaption: add MSZ character related options
This patch adds two MSZ (Middle Size; half width) character
related options, mapping against newly added upstream
functionality:

* `replace_msz_japanese`, which was introduced in version 1.0.1
  of libaribcaption.
* `replace_msz_glyph`, which was introduced in version 1.1.0
  of libaribcaption.

The latter option improves bitmap type rendering if specified
fonts contain half-width glyphs (e.g., BIZ UDGothic), even
if both ASCII and Japanese MSZ replacement options are set
to false.

As these options require newer versions of libaribcaption, the
configure requirement has been bumped accordingly.

Signed-off-by: TADANO Tokumei <aimingoff@pc.nifty.jp>
2023-10-29 18:20:43 +02:00
TADANO Tokumei 82faba8a6c lavc/libaribcaption: switch all `bool` context variables to `int`
On some environments, a `bool` variable is of smaller size than `int`.
As AV_OPT_TYPE_BOOL is internally handled as sizeof(int), if a `bool`
option was set on such an environment, the memory of following
variables would be filled. Additionally, set values may be destroyed
by av_opt_copy().

Signed-off-by: TADANO Tokumei <aimingoff@pc.nifty.jp>
2023-10-29 18:19:58 +02:00
James Almer ff3429991e Changelog: mark 6.1
Signed-off-by: James Almer <jamrial@gmail.com>
2023-10-29 12:55:37 -03:00
Michael Niedermayer 47e784f881
Bump versions after 6.1
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-10-29 16:19:14 +01:00
Michael Niedermayer 0e15b2b828
doc/APIchanges: Add 6.1 cut point
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-10-29 16:19:14 +01:00
Michael Niedermayer 9d3a7d30c4
Bump versions prior to 6.1
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-10-29 15:34:05 +01:00
Zhao Zhili 105657540b avutil/hwcontext_vaapi: return ENOSYS for unsupported operation
av_hwframe_transfer_data try with src_ctx first. If the operation
failed with AVERROR(ENOSYS), it will try again with dst_ctx. Return
AVERROR(EINVAL) makes the second step being skipped.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-10-29 13:58:52 +08:00
Zhao Zhili 63078b4599 avutil/hwcontext_vulkan: cuda doesn't belong to valid_sw_formats
Move it to transfer_get_formats.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-10-29 13:58:30 +08:00
Zhao Zhili 891f70c6d5 avutil/hwcontext_vulkan: fix memleak when device_create is skipped
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-10-29 13:57:43 +08:00
Lynne 1a8e766984
vulkan: return VK_NOT_READY when no queries are available
Fixes a validation issue.
The issue is that the function gets called before we've sumitted a frame
for decoding to that context. However, we cannot run queries before
they've been reset, which happens at submission time.
As we'd need to otherwise run a command queue at init-time, just check
if submissions have happened.
2023-10-28 21:16:15 +02:00
James Almer 1ad7bd0fe5 avutil/channel_layout: simplify 22.2 layout bitmask define
Signed-off-by: James Almer <jamrial@gmail.com>
2023-10-28 12:49:48 -03:00
Kyle Swanson e5f774268a avfilter/libvmaf: fix broken cuda build
Signed-off-by: Kyle Swanson <kswanson@netflix.com>
2023-10-27 15:00:58 -07:00
Michael Niedermayer e4d5ac8d7d
avformat/rtsp: Use rtsp_st->stream_index
Fixes: out of array access
Fixes: rtpdec_h264.c149/poc

Found-by: Hardik Shah of Vehere
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-10-27 18:10:47 +02:00
Michael Niedermayer 907743239d
avutil/tx_template: fix integer ovberflwo in fft3()
Fixes: signed integer overflow: -1028966111 + -1314089526 cannot be represented in type 'int'
Fixes: 63174/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AAC_FIXED_fuzzer-5853273711837184

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-10-27 18:10:47 +02:00
Michael Niedermayer 88453250db
avcodec/jpeg2000dec: Check image offset
Fixes: left shift of negative value -538967841
Fixes: 62447/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEG2000_fuzzer-6427134337613824

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Tomas Härdin <git@haerdin.se>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-10-27 18:10:47 +02:00