Commit Graph

112706 Commits

Author SHA1 Message Date
Andreas Rheinhardt
fd4cb6ebee avcodec/speedhqdec: Avoid unnecessary VLC structure
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
0a610e22c1 avcodec/lagarith: Avoid unnecessary VLC structure
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
e3ad5b9784 avcodec/imm4: Avoid unnecessary VLC structure
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
f6c5d04b6d avcodec/mimic: Avoid unnecessary VLC structure
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
827d0325a9 avcodec/mobiclip: Avoid unnecessary VLC structure
Everything besides VLC.table is basically write-only
and only VLC.table needs to be retained.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
36e7f9b339 avcodec/vqcdec: Avoid unnecessary VLC structure
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
99ed510d4b avcodec/mv30: Avoid unnecessary VLC structure
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
b60a3f70be avcodec/wnv1: Avoid unnecessary VLC structure
Everything besides VLC.table is basically write-only
and even VLC.table can be removed by accessing the
underlying table directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
1ae750a16e avcodec/rv34: Constify pointer to static object
Said object is only allowed to be modified during its
initialization and is immutable afterwards.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
716ddc8c62 avcodec/rv34: Avoid superfluous VLC structures
For most VLCs here, the number of bits of the VLC is
write-only, because it is hardcoded at the call site.
Therefore one can replace these VLC structures with
the only thing that is actually used: The pointer
to the VLCElem table.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
73fa6d486d avcodec/vp3: Reindent after the previous commits
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
75c6a253a4 avcodec/vp3: Avoid complete VLC struct, only use VLCElem*
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
6c7a344b65 avcodec/vp3: Share coefficient VLCs between threads
These VLCs are very big: The VP3 one have 164382 elements
but due to the overallocation enough memory for 313344 elements
are allocated (1.195 MiB with sizeof(VLCElem) == 4);
for VP4 the numbers are very similar, namely 311296 and 164392
elements. Since 1f4cf92cfb, each
frame thread has its own copy of these VLCs.

This commit fixes this by sharing these VLCs across threads.
The approach used here will also make it easier to support
stream reconfigurations in case of frame-multithreading
in the future.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
7fee90efac avcodec/imc: Avoid superfluous VLC structures
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
6fb96ef755 avcodec/atrac9dec: Avoid superfluous VLC structures
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
4c7e8b969e avcodec/clearvideo: Avoid superfluous VLC structures
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
c95e123e8c avcodec/intrax8: Avoid superfluous VLC structures
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
886fbec82f avcodec/mpc7: Avoid superfluous VLC structures
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
e5e05fd3c8 avcodec/rv40: Avoid superfluous VLC structures
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
460c6ae597 avcodec/svq1dec: Increase size of VLC
It allows to reduce the number of maximum reloads by one.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
7d542e26a9 avcodec/svq1dec: Avoid superfluous VLC structures
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:47:00 +01:00
Andreas Rheinhardt
7902c0df4c avcodec/msmpeg4_vc1_data: Avoid superfluous VLC structures
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.
Also combine the ff_msmp4_dc_(luma|chroma)_vlcs as well
as the tables used to generate them to simplify the code.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:46:59 +01:00
Andreas Rheinhardt
ff886fc282 avcodec/ituh263dec: Avoid superfluous VLC structures
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:46:59 +01:00
Andreas Rheinhardt
5a9e185dfc avcodec/h261dec: Avoid superfluous VLC structures
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:46:59 +01:00
Andreas Rheinhardt
363837de0e avcodec/faxcompr: Avoid superfluous VLC structures
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:46:59 +01:00
Andreas Rheinhardt
a99285aedf avcodec/asvdec: Avoid superfluous VLC structures
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:46:59 +01:00
Andreas Rheinhardt
ab8a8246c8 avcodec/h264_cavlc: Remove code duplication
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:46:59 +01:00
Andreas Rheinhardt
bd4c778e19 avcodec/h264_cavlc: Avoid indirection for coefficient table VLCs
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:46:59 +01:00
Andreas Rheinhardt
fe748ddf62 avcodec/h264_cavlc: Avoid superfluous VLC structures
Of all these VLCs here, only VLC.table was really used
after init, so use the ff_vlc_init_tables API
to get rid of them.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:46:59 +01:00
Andreas Rheinhardt
c630d76b27 avcodec/vp3: Increase some VLC tables
These are quite small and therefore force reloads
that can be avoided by modest increases in the number of bits used.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:46:59 +01:00
Andreas Rheinhardt
1fee3a3dce avcodec/vp3: Make VLC tables static where possible
This is especially important for frame-threaded decoders like
this one, because up until now each thread had an identical
copy of all these VLC tables.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:46:59 +01:00
Andreas Rheinhardt
edc50658d9 avcodec/vlc: Add functions to init static VLCElem[] without VLC
For lots of static VLCs, the number of bits is not read from
VLC.bits, but rather a compile-constant that is hardcoded
at the callsite of get_vlc2(). Only VLC.table is ever used
and not using it directly is just an unnecessary indirection.

This commit adds helper functions and macros to avoid the VLC
structure when initializing VLC tables; there are 2x2 functions:
Two choices for init_sparse or from_lengths and two choices
for "overlong" initialization (as used when multiple VLCs are
initialized that share the same underlying table).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-10-31 20:46:59 +01:00
Rémi Denis-Courmont
424c8ceb08 lavc/huffyuvdsp: R-V V add_int16
add_int16_128_c:      2390.5
add_int16_128_rvv_i32: 832.0
add_int16_rnd_width_c:      2390.2
add_int16_rnd_width_rvv_i32: 832.5
2023-10-31 21:33:25 +02:00
Rémi Denis-Courmont
7e1cdc69fb lavc/utvideodsp: R-V V restore_rgb_planes10
restore_rgb_planes10_c:      185852.2
restore_rgb_planes10_rvv_i32: 90130.5
2023-10-31 21:33:25 +02:00
Rémi Denis-Courmont
4aea0da230 lavc/utvideodsp: R-V V restore_rgb_planes
restore_rgb_planes_c:      133065.7
restore_rgb_planes_rvv_i32: 33317.2
2023-10-31 21:33:25 +02:00
Niklas Haas
6aff17a451 avformat/vf_vapoursynth: simplify xyz format check 2023-10-31 15:46:38 +01:00
Niklas Haas
93f07d98d9 avutil/pixdesc: simplify xyz pixfmt check 2023-10-31 15:46:38 +01:00
Niklas Haas
d312a33ed2 avfilter/drawutils: remove redundant xyz format check
The code above this does a whitelist on desc->flags, which now includes
the (disallowed) AV_PIX_FMT_FLAG_XYZ for XYZ formats. So there is no
more need for a separate check, here.
2023-10-31 15:46:38 +01:00
Niklas Haas
57c16323f2 avutil/pixdesc: add AV_PIX_FMT_FLAG_XYZ
There are already several places in the codebase that match desc->name
against "xyz", and many downstream clients replicate this behavior.
I have no idea why this is not just a flag.

Motivated by my desire to add yet another check for XYZ to the codebase,
and I'd rather not keep copy/pasting a string comparison hack.
2023-10-31 15:46:07 +01:00
Niklas Haas
96dfc4481b avfilter/drawutils: ban XYZ formats
These are not supported by the drawing functions at all, and were
incorrectly advertised as supported in the past.

Note: This check is added only to separate the logic change from the API
change in the following commit, and will be removed again after it
becomes redundant.
2023-10-31 15:43:30 +01:00
Logan Lyu
55f28eb627 lavc/aarch64: new optimization for 8-bit hevc_qpel_hv
checkasm bench:
put_hevc_qpel_hv4_8_c: 422.1
put_hevc_qpel_hv4_8_i8mm: 101.6
put_hevc_qpel_hv6_8_c: 756.4
put_hevc_qpel_hv6_8_i8mm: 225.9
put_hevc_qpel_hv8_8_c: 1189.9
put_hevc_qpel_hv8_8_i8mm: 296.6
put_hevc_qpel_hv12_8_c: 2407.4
put_hevc_qpel_hv12_8_i8mm: 552.4
put_hevc_qpel_hv16_8_c: 4021.4
put_hevc_qpel_hv16_8_i8mm: 886.6
put_hevc_qpel_hv24_8_c: 8992.1
put_hevc_qpel_hv24_8_i8mm: 1968.9
put_hevc_qpel_hv32_8_c: 15197.9
put_hevc_qpel_hv32_8_i8mm: 3209.4
put_hevc_qpel_hv48_8_c: 32811.1
put_hevc_qpel_hv48_8_i8mm: 7442.1
put_hevc_qpel_hv64_8_c: 58106.1
put_hevc_qpel_hv64_8_i8mm: 12423.9

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-31 14:14:21 +02:00
Logan Lyu
97a9d12657 lavc/aarch64: new optimization for 8-bit hevc_qpel_v
checkasm bench:

put_hevc_qpel_v4_8_c: 138.1
put_hevc_qpel_v4_8_neon: 41.1
put_hevc_qpel_v6_8_c: 276.6
put_hevc_qpel_v6_8_neon: 60.9
put_hevc_qpel_v8_8_c: 478.9
put_hevc_qpel_v8_8_neon: 72.9
put_hevc_qpel_v12_8_c: 1072.6
put_hevc_qpel_v12_8_neon: 203.9
put_hevc_qpel_v16_8_c: 1852.1
put_hevc_qpel_v16_8_neon: 264.1
put_hevc_qpel_v24_8_c: 4137.6
put_hevc_qpel_v24_8_neon: 586.9
put_hevc_qpel_v32_8_c: 7579.1
put_hevc_qpel_v32_8_neon: 1036.6
put_hevc_qpel_v48_8_c: 16355.6
put_hevc_qpel_v48_8_neon: 2326.4
put_hevc_qpel_v64_8_c: 33545.1
put_hevc_qpel_v64_8_neon: 4126.4

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-31 14:14:21 +02:00
Logan Lyu
265450b89e lavc/aarch64: new optimization for 8-bit hevc_epel_hv
checkasm bench:
put_hevc_epel_hv4_8_c: 213.7
put_hevc_epel_hv4_8_i8mm: 59.4
put_hevc_epel_hv6_8_c: 350.9
put_hevc_epel_hv6_8_i8mm: 130.2
put_hevc_epel_hv8_8_c: 548.7
put_hevc_epel_hv8_8_i8mm: 136.9
put_hevc_epel_hv12_8_c: 1126.7
put_hevc_epel_hv12_8_i8mm: 302.2
put_hevc_epel_hv16_8_c: 1925.2
put_hevc_epel_hv16_8_i8mm: 459.9
put_hevc_epel_hv24_8_c: 4301.9
put_hevc_epel_hv24_8_i8mm: 1024.9
put_hevc_epel_hv32_8_c: 7509.2
put_hevc_epel_hv32_8_i8mm: 1680.4
put_hevc_epel_hv48_8_c: 16566.9
put_hevc_epel_hv48_8_i8mm: 3945.4
put_hevc_epel_hv64_8_c: 29134.2
put_hevc_epel_hv64_8_i8mm: 6567.7

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-31 14:02:53 +02:00
Logan Lyu
22c7291506 lavc/aarch64: new optimization for 8-bit hevc_epel_v
checkasm bench:
put_hevc_epel_v4_8_c: 79.9
put_hevc_epel_v4_8_neon: 25.7
put_hevc_epel_v6_8_c: 151.4
put_hevc_epel_v6_8_neon: 46.4
put_hevc_epel_v8_8_c: 250.9
put_hevc_epel_v8_8_neon: 41.7
put_hevc_epel_v12_8_c: 542.7
put_hevc_epel_v12_8_neon: 108.7
put_hevc_epel_v16_8_c: 939.4
put_hevc_epel_v16_8_neon: 169.2
put_hevc_epel_v24_8_c: 2104.9
put_hevc_epel_v24_8_neon: 307.9
put_hevc_epel_v32_8_c: 3713.9
put_hevc_epel_v32_8_neon: 524.2
put_hevc_epel_v48_8_c: 8175.2
put_hevc_epel_v48_8_neon: 1197.2
put_hevc_epel_v64_8_c: 16049.4
put_hevc_epel_v64_8_neon: 2094.9

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-31 14:02:53 +02:00
Logan Lyu
772865717b lavc/aarch64: new optimization for 8-bit hevc_epel_pixels and and hevc_qpel_pixels
checkasm bench:
put_hevc_pel_pixels4_8_c: 33.7
put_hevc_pel_pixels4_8_neon: 20.2
put_hevc_pel_pixels6_8_c: 61.4
put_hevc_pel_pixels6_8_neon: 25.4
put_hevc_pel_pixels8_8_c: 121.4
put_hevc_pel_pixels8_8_neon: 16.9
put_hevc_pel_pixels12_8_c: 199.9
put_hevc_pel_pixels12_8_neon: 40.2
put_hevc_pel_pixels16_8_c: 355.9
put_hevc_pel_pixels16_8_neon: 43.4
put_hevc_pel_pixels24_8_c: 774.7
put_hevc_pel_pixels24_8_neon: 78.9
put_hevc_pel_pixels32_8_c: 1345.2
put_hevc_pel_pixels32_8_neon: 152.2
put_hevc_pel_pixels48_8_c: 2963.7
put_hevc_pel_pixels48_8_neon: 309.4
put_hevc_pel_pixels64_8_c: 5236.2
put_hevc_pel_pixels64_8_neon: 514.2

Co-Authored-By: J. Dekker <jdek@itanimul.li>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-31 14:02:53 +02:00
Martin Storsjö
2c3d2a0245 configure: Improve aarch64 feature detection on older, broken Clang versions
Clang versions before 17 (Xcode versions up to and including 15.0)
had a very annoying bug in its behaviour of the ".arch" directive
in assembly. If the directive only contained a level, such as
".arch armv8.2-a", it did validate the name of the level, but it
didn't apply the level to what instructions are allowed. The level
was applied if the directive contained an extra feature enabled,
such as ".arch armv8.2-a+crc" though. It was also applied on the
next ".arch_extension" directive.

This bug, combined with the fact that the same versions of Clang
didn't support the dotprod/i8mm extension names in either
".arch <level>+<feature>" or in ".arch_extension", could lead to
unexepcted build failures.

As the dotprod/i8mm extensions couldn't be enabled dynamically
via the ".arch_extension" directive, someone building ffmpeg could
try to enable them by configuring their build with
--extra-cflags="-march=armv8.6-a".

During configure, we test for support for the i8mm instructions
like this:

    # Built with -march=armv8.6-a
    .arch armv8.2-a             # Has no visible effect here
    #.arch_extension i8mm       # Omitted as the extension name isn't known
    usdot v0.4s, v0.16b, v0.16b
    # Successfully assembled as armv8.6-a is the effective level,
    # and i8mm is enabled implicitly in armv8.6-a.

Thus, we would enable assembling those instructions. However if
we later check for another extension, such as sve (which those
versions of Clang actually do support), we can later run into the
following situation when building actual code:

    # Built with -march=armv8.6-a
    .arch armv8.2-a             # Has no visible effect here
    #.arch_extension i8mm       # Omitted as the extension name isn't known
    .arch_extension sve         # Included as "sve" is as supported extension name
    # .arch_extension effectively activates the previous .arch directive,
    # so the effective level is armv8.2-a+sve now.
    usdot v0.4s, v0.16b, v0.16b
    # Fails to build the instructions that require i8mm. Despite the
    # configure check, the unrelated ".arch_extension sve" directive
    # breaks the functionality of the i8mm feature.

This patch avoids this situation:
- By adding a dummy feature such as "+crc" on the .arch directive
  (if supported), we make sure that it does get applied immediately,
  avoiding it taking effect spuriously at a later unrelated
  ".arch_extension" directive.
- By checking for higher arch levels such as armv8.4-a and armv8.6-a,
  we can assemble the dotprod and i8mm extensions without the user
  needing to pass -march=armv8.6-a. This allows using the dotprod/i8mm
  codepaths via runtime detection while keeping the binary runnable
  on older versions. I.e. this enables the i8mm codepaths on Apple M2
  machines while built with Xcode's Clang.

TL;DR: Enable the I8MM extensions for Apple M2 without the user needing
to do a custom configuration; avoid potential build breakage if a user
does such a custom configuration.

Once Xcode versions that have these issues fixed are prevalent, we
can consider reverting this change.

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-31 12:23:31 +02:00
Martin Storsjö
f05948ada4 aarch64: Simplify the linux runtime cpu detection code
Skip doing the whole getauxval(AT_HWCAP) if HWCAP_CPUID isn't
defined.

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-31 12:23:27 +02:00
Rémi Denis-Courmont
ae72412aa8 lavc/idctdsp: improve R-V V put_pixels_clamped 2023-10-30 18:14:16 +02:00
Rémi Denis-Courmont
d48810f3a5 lavc/idctdsp: improve R-V V add_pixels_clamped 2023-10-30 18:14:16 +02:00
Rémi Denis-Courmont
600c6f1b55 lavc/idctdsp: improve R-V V put_signed_pixels_clamped
This follows the same idea as with pixblockdsp, but applied at the
other end, whilst writing data at the end of the function.
2023-10-30 18:14:16 +02:00