Commit Graph

115819 Commits

Author SHA1 Message Date
Rémi Denis-Courmont
f6d0a41c8c lavu/riscv: use Zbb CLZ/CTZ/CLZW/CTZW at run-time
Zbb static    Zbb dynamic   I baseline
clz       0.668032642   1.336072283   19.552376803
clzl      0.668092643   1.336181786   26.110855571
ctz       1.336208533   3.340209702   26.054869008
ctzl      1.336247784   3.340362457   26.055266290
(seconds for 1 billion iterations on a SiFive-U74 core)
2024-06-11 20:12:37 +03:00
Rémi Denis-Courmont
98db140910 lavu/riscv: use Zbb CPOP/CPOPW at run-time
Zbb static    Zbb dynamic   I baseline
popcount  1.336129286   3.469067758   20.146362909
popcountl 1.336322291   3.340292968   20.224829821
(seconds for 1 billion iterations on a SiFive-U74 core)
2024-06-11 20:12:37 +03:00
Rémi Denis-Courmont
324899b748 lavu/riscv: use Zbb REV8 at run-time
This adds runtime support to use Zbb REV8 for 32- and 64-bit byte-wise
swaps. The result is about five times slower than if targetting Zbb
statically, but still a lot faster than the default bespoke C code or a
call to GCC run-time functions.

For 16-bit swap, this is however unsurprisingly a lot worse, and so this
sticks to the baseline. In fact, even using REV8 statically does not
seem to be beneficial in that case.

         Zbb static    Zbb dynamic   I baseline
bswap16:  0.668184765   3.340764069   0.668029012
bswap32:  0.668174014   3.340763319   9.353855435
bswap64:  0.668221765   3.340496313  14.698672283
(seconds for 1 billion iterations on a SiFive-U74 core)
2024-06-11 20:12:37 +03:00
Rémi Denis-Courmont
378d1b06c3 riscv: probe for Zbb extension at load time
Due to hysterical raisins, most RISC-V Linux distributions target a
RV64GC baseline excluding the Bit-manipulation ISA extensions, most
notably:
- Zba: address generation extension and
- Zbb: basic bit manipulation extension.
Most CPUs that would make sense to run FFmpeg on support Zba and Zbb
(including the current FATE runner), so it makes sense to optimise for
them. In fact a large chunk of existing assembler optimisations relies
on Zba and/or Zbb.

Since we cannot patch shared library code, the next best thing is to
carry a flag initialised at load-time and check it on need basis.
This results in 3 instructions overhead on isolated use, e.g.:
1:  AUIPC rd, %pcrel_hi(ff_rv_zbb_supported)
    LBU   rd, %pcrel_lo(1b)(rd)
    BEQZ  rd, non_Zbb_fallback_code
    // Zbb code here

The C compiler will typically load the flag ahead of time to reducing
latency, and can also keep it around if Zbb is used multiple times in a
single optimisation scope. For this to work, the flag symbol must be
hidden; otherwise the optimisation degrades with a GOT look-up to
support interposition:
1:  AUIPC rd, GOT_OFFSET_HI
    LD    rd, GOT_OFFSET_LO(rd)
    LBU   rd, (rd)
    BEQZ  rd, non_Zbb_fallback_code
    // Zbb code here

This patch adds code to provision the flag in libraries using bit
manipulation functions from libavutil: byte-swap, bit-weight and
counting leading or trailing zeroes.
2024-06-11 20:12:37 +03:00
Rémi Denis-Courmont
18adaf9fe5 checkasm/lls: adjust buffer sizes and alignments
var must be padded.
param has `order + 1`, not `order` elements and is *not* over-aligned.
2024-06-11 20:07:55 +03:00
Anton Khirnov
08ea7d6b8e lavc/hevcdec: constify source frame in hevc_ref_frame() 2024-06-11 17:39:35 +02:00
Anton Khirnov
ccd391d6a3 lavc/hevcdec: do not unref current frame on frame_end() failure
It's a race with frame threading.
2024-06-11 17:39:35 +02:00
Anton Khirnov
d725c737fe lavc/hevcdec: move some frame-end code to hevc_frame_end()
Specifically, calling hwaccel end_frame, verifying frame checksum,
and printing the frame-was-decoded message.
2024-06-11 17:39:35 +02:00
Anton Khirnov
edb6a471c4 lavc/hevcdec: factor decoding a slice NALU out of decode_nal_unit() 2024-06-11 17:39:35 +02:00
Anton Khirnov
90e75c4ec9 lavc/hevcdec: drop a redundant multiple-frame-per-packet check 2024-06-11 17:39:35 +02:00
Anton Khirnov
3cd6492fb5 lavc/hevcdec: move the check for multiple frames in a packet
Do not do it in hls_slice_header(), which is the wrong place for it.
Avoids special magic return value of 1 in that function. The comment
mentioning potential corrupted state is no longer relevant, as
hls_slice_header() modifies no state beyond SliceHeader, which will only
get used for a valid frame.
2024-06-11 17:39:35 +02:00
Anton Khirnov
a8f9d52c22 lavc/hevcdec: move setting slice_initialized out of hls_slice_header()
hls_slice_header() no longer modifies anything in HEVCContext besides
SliceHeader.
2024-06-11 17:39:35 +02:00
Anton Khirnov
82ded1ad3a lavc/hevcdec: move sequence increment/IDR handling to hevc_frame_start()
From hls_slice_header(). It is only done once per frame, so that is a
more appropriate place for this code.
2024-06-11 17:39:35 +02:00
Anton Khirnov
a2e77caf37 lavc/hevcdec: set active PPS/SPS in hevc_frame_start()
Not in hls_slice_header(), as it should only be done once per frame.
2024-06-11 17:39:35 +02:00
Anton Khirnov
47d34ba7fb lavc/hevcdec: move constructing slice RPL to decode_slice_data() 2024-06-11 17:39:35 +02:00
Anton Khirnov
fe171a3b51 lavc/hevcdec: move calling hwaccel decode_slice to decode_slice_data()
From decode_nal_unit(), as that is a more appropriate place for it.
2024-06-11 17:39:35 +02:00
Anton Khirnov
6ee550d83d lavc/hevcdec: move calling hwaccel start_frame to hevc_frame_start()
From decode_nal_unit(), as that is a more appropriate place for it.
2024-06-11 17:39:35 +02:00
Anton Khirnov
3bbb5d78c7 lavc/hevcdec: move per-slice local_ctx setup out of hls_slice_header()
Into decode_slice_data(). This is a step towards constifying
HEVCContext in hls_slice_header().
2024-06-11 17:39:35 +02:00
Anton Khirnov
efc827bf6f lavc/hevcdec: move slice decoding dispatch to its own function
Also move there a sanity check from hls_decode_entry() that should also
be performed when WPP is active (note that the check is not moved to
hls_slice_header() because it requires the HEVCContext.tab_slice_address
to be set up).
2024-06-11 17:39:35 +02:00
Anton Khirnov
7cce612a26 lavc/hevcdec: move a slice segment sanity check to hls_slice_header()
Combine it with an existing similar check.
2024-06-11 17:39:35 +02:00
Anton Khirnov
d43527a1a0 lavc/hevcdec: store slice header POC in SliceHeader
Rather than decoding directly into HEVCContext.poc.

This is a step towards constifying HEVCContext in hls_slice_header().
2024-06-11 17:39:35 +02:00
Anton Khirnov
e4e9e1da15 lavc/hevcdec: drop redundant HEVCContext.threads_{type,number}
They are useless duplicates of corresponding AVCodecContext fields.
2024-06-11 17:39:35 +02:00
Anton Khirnov
b0c29a45dc lavc/hevc/cabac: do not infer WPP use based on HEVCContext.threads_number
Pass this information explicitly instead.
2024-06-11 17:39:35 +02:00
Anton Khirnov
d86ac94df2 lavc/hevcdec: output RASL frames based on the value of no_rasl_output_flag
Instead of an ad-hoc scheme. Also, combine skipping RASL frames with
skip_frame handling - current code seems flawed as it only executes for
the first slice of a RASL frame and unnecessarily unsets is_decoded,
which should not be set at this point anyway..

Some RASL frames in fate-hevc-afd-tc-sei that were previously discarded
are now output.
2024-06-11 17:39:35 +02:00
Anton Khirnov
3115c84015 lavc/hevcdec: only set no_rasl_output_flag for IRAP frames
Its meaning is only specified for IRAP frames.

As it's currently never used otherwise, this should not change decoder
behaviour, but will be useful in future commits.
2024-06-11 17:39:35 +02:00
Anton Khirnov
381b70e173 lavc/hevcdec: do not pass HEVCContext to ff_hevc_frame_nb_refs()
Pass the only things required from it - slice header and PPS -
explicitly.

Will be useful in the following commits to avoid mofiying HEVCContext in
hls_slice_header().
2024-06-11 17:39:35 +02:00
Anton Khirnov
07eb60c0da lavc/hevcdec: only call export_stream_params_from_sei() once per frame
Not once per each slice header, as it makes no sense and may cause races
with frame threading.
2024-06-11 17:39:35 +02:00
Anton Khirnov
01b379a93e lavc/hevcdec: move pocTid0 computation to hevc_frame_start()
It is only done once per frame. Also, rename the variable to poc_tid0 to
be consistent with our naming conventions.
2024-06-11 17:39:35 +02:00
Anton Khirnov
5e438511ab lavc/hevcdec: do not pass HEVCContext to decode_lt_rps()
Pass the two numbers needed from it explicitly.

Makes it clear that HEVCContext is not modified by this function.
2024-06-11 17:39:35 +02:00
Anton Khirnov
0892ec947c lavc/hevcdec: pass SliceHeader explicitly to pred_weight_table()
And replace the HEVCContext* parameter by void *logctx.

Makes it clear that only SliceHeader is modified by this function.
2024-06-11 17:39:35 +02:00
Anton Khirnov
90fc331b0f lavc/hevcdec: only ignore INVALIDDATA in decode_nal_unit()
All other errors should cause a failure, regardless of the value of
err_recognition. Also, print a warning message when skipping invalid NAL
units.
2024-06-11 17:39:35 +02:00
Anton Khirnov
8eb134f4f9 lavc/hevcdec: drop an always-zero variable 2024-06-11 17:39:35 +02:00
Anton Khirnov
8c8072c29c lavc/hevcdec: move active PPS from HEVCParamSets to HEVCContext
"Currently active PPS" is a property of the decoding process, not of the
list of available parameter sets.
2024-06-11 17:39:34 +02:00
Anton Khirnov
0f47342c12 lavc/hevcdec: stop accessing parameter sets through HEVCParamSets
Instead, accept PPS/SPS as function arguments.

Makes the code shorter and significantly reduces diff in future commits.
2024-06-11 17:39:34 +02:00
Anton Khirnov
38b8ae4112 lavc/hevc/pred: stop accessing parameter sets through HEVCParamSets
Instead, accept PPS/SPS as function arguments.

Makes the code shorter and significantly reduces diff in future commits.
2024-06-11 17:39:34 +02:00
Anton Khirnov
d0868d70ea lavc/hevc/cabac: stop accessing parameter sets through HEVCParamSets
Instead, accept PPS/SPS as function arguments.

Makes the code shorter and significantly reduces diff in future commits.
2024-06-11 17:39:34 +02:00
Anton Khirnov
b38aecffec lavc/hevc/filter: stop accessing parameter sets through HEVCParamSets
Instead, accept PPS as a function argument and retrieve SPS through it.

Makes the code shorter and significantly reduces diff in future commits.
2024-06-11 17:39:34 +02:00
Anton Khirnov
fb873a05b3 lavc/hevc/mvs: stop accessing parameter sets through HEVCParamSets
Instead, accept PPS as a function argument and retrieve SPS through it.

Makes the code shorter and significantly reduces diff in future commits.
2024-06-11 17:39:34 +02:00
Anton Khirnov
6ddba110eb lavc/hevc/parser: stop using HEVCParamSets.[psv]ps
The parser does not need to preserve these between frames.
2024-06-11 17:39:34 +02:00
Anton Khirnov
2e46d68f55 lavc/hevc_ps: make SPS hold a reference to its VPS
SPS and its dependent PPSes depend on, and are parsed for, specific VPS data.

This will be useful in following commits.
2024-06-11 17:39:34 +02:00
Anton Khirnov
c879165b39 lavc/hevc_ps: make PPS hold a reference to its SPS
PPS depends on, and is parsed for, specific SPS data.

This will be useful in following commits.
2024-06-11 17:39:34 +02:00
Anton Khirnov
e12fd62d1d lavc/hevcdec: drop a redundant assignment in hevc_decode_frame()
The exact same code is executed at the beginning of decode_nal_units()
2024-06-11 17:39:34 +02:00
Anton Khirnov
a82f2b0924 lavc/hevcdec: simplify condition 2024-06-11 17:39:34 +02:00
Anton Khirnov
0407556716 lavc/hevcdec: do not free SliceHeader arrays in pic_arrays_free()
SliceHeader.{entry_point_offset,size,offset} are not derived from frame
size and do not need to be freed here.
2024-06-11 17:39:34 +02:00
sfan5
0455a62d84 lavf/tls_mbedtls: handle session ticket error code as no-op
When TLSv1.3 and session tickets are enabled mbedtls_ssl_read()
will return an error code to inform about a received session ticket.
This can simply be handled like EAGAIN instead of errornously
aborting the connection.

ref: https://github.com/Mbed-TLS/mbedtls/issues/8749
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2024-06-11 17:00:35 +02:00
sfan5
1b1e9cadc5 lavf/tls_mbedtls: fix handling of certification validation failures
We manually check the verification status after the handshake has completed
using mbedtls_ssl_get_verify_result(). However with VERIFY_REQUIRED
mbedtls_ssl_handshake() already returns an error, so this code is never reached.
Fix that by using VERIFY_OPTIONAL, which performs the verification but
does not abort the handshake.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2024-06-11 16:58:22 +02:00
sfan5
827578ca76 lavf/tls_mbedtls: hook up debug message callback
Unfortunately this won't work out-of-the-box because mbedTLS
only provides a global (not per-context) debug toggle.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2024-06-11 16:58:15 +02:00
sfan5
807d1505bf lavf/tls_mbedtls: add missing call to psa_crypto_init
This is mandatory depending on configuration or at least with mbedTLS 3.6.0.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2024-06-11 16:35:46 +02:00
sfan5
63b6620ad3 lavf/tls_mbedtls: handle more error codes for human-readable messages
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2024-06-11 16:35:31 +02:00
Rémi Denis-Courmont
b6f37ffba7 lavc/vc1dsp: match C block layout in inv_trans_4x8_rvv
Although checkasm does not verify this, the decoder requires that the
transform updates the input block exactly like the C code does.

This fixes vc1-ism, vc1_ilaced_twomv, vc1_sa00040, vc1_sa10091,
vc1_sa10143, vc1_sa20021, vc1test_smm0005 and wmv3-drm-dec tests.
2024-06-11 17:15:09 +03:00