Commit Graph

112938 Commits

Author SHA1 Message Date
Rémi Denis-Courmont 286d674221 checkasm: add helper to report a fatal signal 2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont 0fa421c8f1 lavc/llvidencdsp: add R-V V diff_bytes
diff_bytes_c:      163.0
diff_bytes_rvv_i32: 52.7
2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont 0183c2c830 lavc/aacpsdsp: use LMUL=2 and amortise strides
The input is laid out in 16 segments, of which 13 actually need to be
loaded. There are no really efficient ways to deal with this:
1) If we load 8 segments wit unit stride, then narrow to 16 segments with
   right shifts, we can only get one half-size vector per segment, or just 2
   elements per vector (EMUL=1/2) - at least with 128-bit vectors.
   This ends up unsurprisingly about as fas as the C code.
2) The current approach is to load with strides. We keep that approach,
   but improve it using three 4-segmented loads instead of 12 single-segment
   loads. This divides the number of distinct loaded addresses by 4.
3) A potential third approach would be to avoid segmentation altogether
   and splat the scalar coefficient into vectors. Then we can use a
   unit-stride and maximum EMUL. But the downside then is that we have to
   multiply the 3 (of 16) unused segments with zero as part of the
   multiply-accumulate operations.

In addition, we also reuse vectors mid-loop so as to increase the EMUL
from 1 to 2, which also improves performance a little bit.

Oeverall the gains are quite small with the device under test, as it does
not deal with segmented loads very well. But at least the code is tidier,
and should enjoy bigger speed-ups on better hardware implementation.

Before:
ps_hybrid_analysis_c:       1819.2
ps_hybrid_analysis_rvv_f32: 1037.0 (before)
ps_hybrid_analysis_rvv_f32:  990.0 (after)
2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont b88d4058f9 lavc/g722dsp: optimise R-V V apply_qmf
This stores the constant coefficients deinterleaved, so that they can be
loaded directly with NF=0. Unfortunately, we cannot optimise loading the
input, due to insufficient memory alignment (not 32-bit).

Before:
g722_apply_qmf_c:       82.5
g722_apply_qmf_rvv_i32: 78.2

After:
g722_apply_qmf_c:       82.5
g722_apply_qmf_rvv_i32: 65.2
2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont e33ce0d9dd lavu/fixed_dsp: R-V V fmul_window_scaled
vector_fmul_window_scaled_fixed_c:       4393.7
vector_fmul_window_scaled_fixed_rvv_i64: 1642.7
2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont e49f41fb27 lavu/float_dsp: optimise R-V V fmul_reverse & fmul_window
Roll the loop to avoid slow gathers.

Before:
vector_fmul_reverse_c:       1561.7
vector_fmul_reverse_rvv_f32: 2410.2
vector_fmul_window_c:        2068.2
vector_fmul_window_rvv_f32:  1879.5

After:
vector_fmul_reverse_c:       1561.7
vector_fmul_reverse_rvv_f32:  916.2
vector_fmul_window_c:        2068.2
vector_fmul_window_rvv_f32:  1202.5
2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont 3a134e8299 lavu/fixed_dsp: optimise R-V V fmul_reverse
Gathers are (unsurprisingly) a notable exception to the rule that R-V V
gets faster with larger group multipliers. So roll the function to speed
it up.

Before:
vector_fmul_reverse_fixed_c:       2840.7
vector_fmul_reverse_fixed_rvv_i32: 2430.2

After:
vector_fmul_reverse_fixed_c:       2841.0
vector_fmul_reverse_fixed_rvv_i32:  962.2

It might be possible to further optimise the function by moving the
reverse-subtract out of the loop and adding ad-hoc tail handling.
2023-11-23 18:57:18 +02:00
Paul B Mahol 4adb93dff0 avfilter/asrc_afirsrc: fix by one smaller allocation of buffer 2023-11-23 15:01:55 +01:00
James Almer 0008e1c5d5 avfilter/asrc_anullsrc: fix allowed range for sample_rate
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-22 19:41:15 -03:00
James Almer 567c67c6c8 avcodec/ac3dsp: make len a size_t in float_to_fixed24
Should simplify asm implementations, and prevent UB on at least win64.

Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-22 18:33:00 -03:00
Paul B Mahol 4af412be71 avfilter: use AV_OPT_TYPE_CHLAYOUT 2023-11-22 19:28:40 +01:00
James Almer 707e46dc54 test/checkasm: test llauddsp
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-22 14:22:19 -03:00
James Almer 2d9fd814d0 x86/: clear the high bits for order in scalarproduct_and_madd functions
Should fix checkasm failures on win64.

Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-22 14:18:42 -03:00
Michael Niedermayer 3c154e8579 doc/git-howto: use less weird username for git URL
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: J. Dekker <jdek@itanimul.li>
2023-11-22 10:21:50 +01:00
Zhao Zhili e8a49b1424 avcodec/mmaldec: Fix build error
Fix #10670.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-22 21:02:04 +08:00
Zhao Zhili bec6dfcd5c avformat/rtmpproto: Pass rw_timeout to underlying transport protocol
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-22 21:02:04 +08:00
Zhao Zhili e1d6b3cb5a avformat/flvenc: add extract_extradata bsf for new video codecs
When encoders don't support global header like MediaCodec, FLV
muxer needs to add extract_extradata bsf automatically. The codec
list doesn't include VP9 since it's not supported by
extract_extradata.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-22 21:02:04 +08:00
Zhao Zhili f27fce0c0c avcodec/mediacodecdec: fix return EAGAIN after EOF
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-22 21:02:04 +08:00
Dmitry Rogozhkin e9c93009fc avcodec/decode: validate hw_frames_ctx when AVHWAccel.free_frame_priv is used
Validate that a hw_frames_ctx is available before using it for
the AVHWAccel.free_frame_priv callback, and don't require it to
be present when the callback is not in use by the HWAccel.

v2: check for free_frame_priv (Hendrik)
v3: return EINVAL (Christoph Reiter)
v4: better commit message (Hendrik)
v5: fix typo with missed frames_ctx (Lynne)

See[1]: https://github.com/msys2/MINGW-packages/pull/19050
Fixes: be07145109 ("avcodec: add AVHWAccel.free_frame_priv callback")
CC: Lynne <dev@lynne.ee>
CC: Christoph Reiter <reiter.christoph@gmail.com>
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2023-11-22 05:01:16 +01:00
Zhao Zhili 641f8a71fb fate/h264: move mp4toannexb_ticket5927 test to fate-h264
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-22 19:42:15 +08:00
Zhao Zhili aa3b857101 avcodec/h264_mp4toannexb_bsf: process new extradata
For fate-h264_mp4toannexb_ticket5927 and
fate-h264_mp4toannexb_ticket5927_2, they work by accident
previously. The sample file has two 'avc1' entries, and video
samples use the second one. It means packets should be decoded with
new extradata in side data. Before this patch, only extradata was
kept in the output, new extradata has been dropped. The output can
be decoded because the two extradata are almost the same, except
level indication. This patch fixed the issue, and add another
fate test.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-22 19:42:14 +08:00
Zhao Zhili d3aa0cd16f avcodec/h264_mp4toannexb_bsf: fix missing PS before IDR frames
If there is a single group of SPS/PPS before an IDR frame, but no
SPS/PPS after that, we will miss the chance to reset
idr_sps_seen/idr_pps_seen. No SPS/PPS are inserted afterwards.

This patch saves in-band SPS/PPS and insert them before IDR frames
when necessary.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-22 19:42:14 +08:00
Zhao Zhili 4c4b833abd avcodec/h264_mp4toannexb_bsf: remove pass padding size as argument
It's a fixed value. There is no use case to change that.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-22 19:42:14 +08:00
Zhao Zhili 91cbae2f6c avcodec/h264_mp4toannexb_bsf: refactor start_code_size handling
start_code_size depends on whether PS comes from out-of-band or
in-band. Make the code more readable.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2023-11-22 19:42:14 +08:00
Michael Niedermayer fb52070848
avcodec/h264dec: use BOOL for skip_gray, noref_gray
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-22 01:22:31 +01:00
Jun Zhao c961ac4b0c vulkan_decode: fix the print format of VkDeviceSize
VkDeviceSize represents device memory size and offset
values as uint64_t in Spec.

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2023-11-21 08:02:43 +08:00
Jun Zhao ab3bd5ead0 avdevice/decklink_dec: add explicit specifier
The explicit specifier used with a single argument constructor
to prevent implicit type conversions.

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2023-11-21 08:02:29 +08:00
James Almer 1258f99978 avcodec: bump version after EVC additions
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-20 11:55:51 -03:00
Dawid Kozinski cfe2947887 avcodec/evc_decoder: Provided support for EVC decoder
- Added EVC decoder wrapper
- Changes in project configuration file and libavcodec Makefile
- Added documentation for xevd wrapper

Signed-off-by: Dawid Kozinski <d.kozinski@samsung.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-20 11:55:51 -03:00
Dawid Kozinski c59a96fd08 avcodec/evc_encoder: Provided support for EVC encoder
- Added EVC encoder wrapper
- Changes in project configuration file and libavcodec Makefile
- Added documentation for xeve wrapper

Signed-off-by: Dawid Kozinski <d.kozinski@samsung.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2023-11-20 11:55:51 -03:00
Michael Niedermayer e56d91f8a8
avcodec/h264dec: Support skipping frames that used gray gap frames
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-20 00:19:25 +01:00
Michael Niedermayer 6364fa9e9a
avcodec/h264: Avoid using gray gap frames as references
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-20 00:19:25 +01:00
Michael Niedermayer 29f6c9b04d
avcodec/h264: keep track of which frames used gray references
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-20 00:19:04 +01:00
Michael Niedermayer e4337606e1
avcodec/h264dec: More elaborate documentation for frame_recovered
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-20 00:12:30 +01:00
Michael Niedermayer 68e1cf204a
avcodec/h264: Use FRAME_RECOVERED_HEURISTIC instead of IDR/SEI
This keeps IDR/SEI and heuristically detected recovery points cleaner seperated

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-20 00:12:30 +01:00
Michael Niedermayer 3f4a1a24a5
avcodec/h264: Seperate SEI and IDR recovery handling
This avoids SEI and IDR recovery flags affecting each other

Also eliminate litteral numbers from recovery handling
This should make the code clearer

Improves: tickets/4738/tickets_cut.ts

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-11-20 00:12:29 +01:00
Paul B Mahol d55d0bba48 avfilter/af_afir: remove flag that is not needed 2023-11-19 23:59:23 +01:00
Paul B Mahol 28a43cf7fe avfilter/af_afir: no need to dynamically add outpad 2023-11-19 23:55:54 +01:00
Paul B Mahol 6579d95df3 avfilter/af_afir: refactor crossfade code 2023-11-19 23:47:52 +01:00
Paul B Mahol bbdd604b9e avfilter/af_afir: add timeline support 2023-11-19 23:47:51 +01:00
Rémi Denis-Courmont 954d50e2ae riscv: set fast half-precision conversion
This is only supported at compilation time. If Zfhmin is supported, then
conversions are fast, which is what the flag is used for. At this time,
run-tiem detection is not possible, as in not supported by Linux. But even
if it were, the current FFmpeg approach seems unable to deal with it (same
problem as on x86, really).
2023-11-19 20:06:20 +02:00
Paul B Mahol a9205620b1 avfilter/af_afir: remove IR response video rendering support
And deprecate related options.
The same functionality can be done with specialized audio visualization filters.
2023-11-19 13:41:13 +01:00
Paul B Mahol 496df68815 doc/filters: add one more example for afir filter usage 2023-11-19 13:40:35 +01:00
Anton Khirnov 6fb1eaf73a tools/general_assembly: update to conform to new rules 2023-11-19 12:58:47 +01:00
Anton Khirnov 4cad7c0522 tools/general_assembly: make the script executable 2023-11-19 12:58:47 +01:00
Paul B Mahol 7c16bf0829 avfilter/avf_showvolume: improve step for vertical orientation 2023-11-18 23:50:39 +01:00
Paul B Mahol 3ed2225a9d avfilter/avf_showvolume: draw channel names directly into output frame 2023-11-18 23:50:38 +01:00
Rémi Denis-Courmont fbc7adba67 lavc/llviddsp: R-V V add_bytes
add_bytes_c:      2077.2
add_bytes_rvv_i32: 105.0
2023-11-18 22:07:14 +02:00
Rémi Denis-Courmont ca664f2254 lavc/flacdsp: R-V V LPC16 function
In this case, the inner loop computing the scalar product can be reduced
to just one multiplication and one sum even with 128-bit vectors. The
result is a lot simpler, but also brings more modest performance gains:

flac_lpc_16_13_c:       15241.0
flac_lpc_16_13_rvv_i32: 11230.0
flac_lpc_16_16_c:       17884.0
flac_lpc_16_16_rvv_i32: 12125.7
flac_lpc_16_29_c:       27847.7
flac_lpc_16_29_rvv_i32: 10494.0
flac_lpc_16_32_c:       30051.5
flac_lpc_16_32_rvv_i32: 10355.0
2023-11-18 22:06:57 +02:00
Rémi Denis-Courmont 295092b46d lavc/flacdsp: R-V V LPC32
The entire set of 32 coefficients and corresponding past 32 samples can
fit in a single vector (with LMUL=8) exactly, but... since widening
double the needed vector sizes, we still end up too short with 128-bit
vectors. This adds a very simple version for future 256+-bit hardware,
and for pred_orders values up to 16, and a bit more involved loop for
for 128-bit hardware with pred_orders between 17 and 32.

With 128-bit hardware, the benchmarks look like this:
flac_lpc_32_13_c:       30152.0
flac_lpc_32_13_rvv_i32: 10244.7
flac_lpc_32_16_c:       37314.2
flac_lpc_32_16_rvv_i32: 10126.2
flac_lpc_32_29_c:       61910.0
flac_lpc_32_29_rvv_i32: 14495.2
flac_lpc_32_32_c:       68204.0
flac_lpc_32_32_rvv_i32: 13273.7
2023-11-18 22:05:43 +02:00