Commit Graph

108225 Commits

Author SHA1 Message Date
Rémi Denis-Courmont
ff14e37393 configure/riscv: detect fast CLZ
RISC-V defines the CLZ instruction as part of the ratified Zbb subset
of the (not yet ratified) bit mapulation extension (B). We can detect
it from the __riscv_zbb predefined constant. At least GCC 12 already
supports this correctly.

Note that the macro will be non-zero if supported, zero if enabled
in the compiler flags (e.g. -march=rv64gzbb) but not known to the
compiler, and undefined otherwise.
2022-09-13 16:50:43 -03:00
Rémi Denis-Courmont
d808070547 lavu/riscv: AV_READ_TIME cycle counter
This uses the architected RISC-V 64-bit cycle counter from the
RISC-V unprivileged instruction set.

In 64-bit and 128-bit, this is a straightforward CSR read.
In 32-bit mode, the 64-bit value is exposed as two CSRs, which
cannot be read atomically, so a loop is necessary to detect and fix up
the race condition where the bottom half wraps exactly between the two
reads.
2022-09-13 16:50:43 -03:00
Rémi Denis-Courmont
092ce9712f doc: reference the RISC-V specification 2022-09-13 16:50:43 -03:00
James Almer
bda3a9faf4 x86/float_dsp: use three operand form for some instructions
Fixes compilation with old yasm

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-13 13:50:09 -03:00
Paul B Mahol
37a503ac87 avcodec/x86/audiodsp: add scalarproduct avx2 2022-09-13 17:43:16 +02:00
Paul B Mahol
72acff9f59 avutil/x86/float_dsp: add fma3 for scalarproduct 2022-09-13 17:43:15 +02:00
Paul B Mahol
cf2cf31805 avcodec/flac_parser: avoid returning too negative number
If return value is very small parser code will assert.
2022-09-13 17:43:15 +02:00
Andreas Rheinhardt
9ad3db3ad9 fate/spdif: Add spdif tests
These tests test both the demuxer as well as the muxer
wherever possible. It is not always possible due to the fact
that the muxer supports more codecs than the demuxer.

The spdif demuxer does currently not set the need_parsing flag.
If one were to set this to AVSTREAM_PARSE_FULL, the test results
would change as follows:
- For spdif-aac-remux, the packets are currently padded to 16bits,
i.e. if the actual packet size is odd, there is a padding byte.
The parser splits this byte away into a one byte packet of its own.
Insanely, these one byte packets get the same duration as normal
packets, i.e. timing is ruined.
- The DCA-remux tests get proper duration/timestamps.
- In the spdif-mp2-remux test the demuxer marks the stream as
being MP2; the parser sets it to MP3 and this triggers
the "Codec change in IEC 61937" codepath; this test therefore
returns only two packets with the parser.
- For spdif-mp3-remux some bytes end up in different packets:
Some input packets of this file have an odd length (417B instead
of 418B like all the other packets) and are padded to 418B.
Without a parser, all returned packets from the spdif-demuxer
are 418B. With a parser, the packets that were originally 417B
are 417B again, but the padding byte has not been discarded,
but added to the next packet which is now 419B.
This fixes "Multiple frames in a packet" warning and avoids
an "Invalid data found when processing input" error when decoding.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-13 14:50:01 +02:00
James Cowgill
50a4dff69f avcodec/arm/sbcenc: avoid callee preserved vfp registers
When compiling FFmpeg with GCC-9, some very random segfaults were
observed in code which had previously called down into the SBC encoder
NEON assembly routines. This was caused by these functions clobbering
some of the vfp callee saved registers (d8 - d15 aka q4 - q7). GCC was
using these registers to save local variables, but after these
functions returned, they would contain garbage.

Fix by reallocating the registers in the two affected functions in
the following way:
 ff_sbc_analyze_4_neon: q2-q5 => q8-q11, then q1-q4 => q8-q11
 ff_sbc_analyze_8_neon: q2-q9 => q8-q15

The reason for using these replacements is to keep closely related
sets of registers consecutively numbered which hopefully makes the
code more easy to follow. Since this commit only reallocates
registers, it should have no performance impact.

Signed-off-by: James Cowgill <jcowgill@debian.org>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-13 09:51:51 +03:00
Andreas Rheinhardt
8d12f3de14 avcodec/bonk: Actually clip when using av_clip()
Also fixes a "statement with no effect [-Wunused-value]"
warning from GCC.

Reviewed-by: James Almer <jamrial@gmail.com>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-12 23:51:51 +02:00
Andreas Rheinhardt
f6448133e7 fate/subtitles: Add PGS remux test
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-12 22:26:27 +02:00
Andreas Rheinhardt
3a783fc8cb fate/id3v2: Add test for reading and writing UTF-16 BOM tags
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-12 22:26:27 +02:00
Paul B Mahol
3ce6fa6b6d avformat: add bonk demuxer 2022-09-12 11:35:43 +02:00
Paul B Mahol
88170070c4 avcodec: add bonk audio decoder 2022-09-12 11:34:27 +02:00
Andreas Rheinhardt
5c19cb3f92 avcodec/ralf: Move variable from context to stack
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-11 21:27:14 +02:00
Andreas Rheinhardt
dcbb7e8a30 avcodec/ralf: Move frame allocation after error checks
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-11 21:27:14 +02:00
Andreas Rheinhardt
df215e5758 avcodec/dca_core: Only call emms_c() if needed
It is not needed on x64, because the AV_COPY* and AV_ZERO*
macros never use MMX on x64.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-11 21:08:04 +02:00
Andreas Rheinhardt
29c4c0886d avutil/x86/intreadwrite: Add ability to detect whether MMX code is used
It can be used to call emms_c() only when needed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-11 21:08:04 +02:00
Andreas Rheinhardt
a54e53a1c4 avcodec/vp8dsp: Constify src in vp8_mc_func
Reviewed-by: Peter Ross <pross@xvid.org>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-11 20:57:51 +02:00
Andreas Rheinhardt
4130789f4f avcodec/vp8: Move fade_present from context to stack
It is only an auxiliary value used for parsing the VP7 frame header.

Reviewed-by: Peter Ross <pross@xvid.org>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-11 20:56:03 +02:00
Andreas Rheinhardt
b3591ccdf1 avcodec/vp8dsp: Remove declarations of inexistent functions
Forgotten in d6f8476be4.

Reviewed-by: Peter Ross <pross@xvid.org>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-11 20:55:19 +02:00
Andreas Rheinhardt
361c875340 avcodec/vp8: Remove unused macros
Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-11 20:49:35 +02:00
James Almer
60d8c2019f avformat/riffdec: don't unconditionally overwrite WAVEFORMATEXTENSIBLE layout
Do it only if the value conflicts with the previous channels value.

Fixes ticket #9912

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-11 09:52:02 -03:00
Lynne
f1b35fc8f0
lavu/tx: remove av_cold from table definitions
How did this get here?
2022-09-11 03:18:40 +02:00
Hao Chen
925ac0da32
swscale/la: Add output_lasx.c file.
ffmpeg -i 1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -s 640x480 -pix_fmt
rgb24 -y /dev/null -an
before: 150fps
after:  183fps

Signed-off-by: Hao Chen <chenhao@loongson.cn>
Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-09-10 22:56:39 +02:00
Hao Chen
74d09b068d
swscale/la: Add yuv2rgb_lasx.c and rgb2rgb_lasx.c files
ffmpeg -i 1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -pix_fmt rgb24 -y /dev/null -an
before: 178fps
after:  210fps

Signed-off-by: Hao Chen <chenhao@loongson.cn>
Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-09-10 22:56:38 +02:00
Hao Chen
38cacce22a
swscale/la: Optimize hscale functions with lasx.
ffmpeg -i 1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -s 640x480 -y /dev/null -an
before: 101fps
after:  138fps

Signed-off-by: Hao Chen <chenhao@loongson.cn>
Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-09-10 22:56:38 +02:00
Paul B Mahol
09cce81245 avfilter/vf_gblur: allow filtering with zero horizontal sigma 2022-09-10 22:11:38 +02:00
Philip Langdale
09a8e5debb swscale/output: add support for Y210LE and Y212LE 2022-09-10 12:29:12 -07:00
Philip Langdale
68181623e9 swscale/output: add support for XV30LE 2022-09-10 12:29:12 -07:00
Philip Langdale
366f073c62 swscale/output: add support for XV36LE 2022-09-10 12:29:12 -07:00
Philip Langdale
caf8d4d256 swscale/output: add support for P012
This generalises the existing P010 support.
2022-09-10 12:29:12 -07:00
Michael Niedermayer
d32a9f3137
libavformat/hls: Free keys
Fixes: memleak
Fixes: 50703/clusterfuzz-testcase-minimized-ffmpeg_dem_HLS_fuzzer-6399058578636800

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Steven Liu <lingjiujianke@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-09-10 17:32:47 +02:00
Michael Niedermayer
9af7de0867
tools/target_dec_fuzzer: Adjust threshold for UTVIDEO
Fixes: Timeout
Fixes: 47969/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_UTVIDEO_fuzzer-5097256832860160

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-09-10 17:32:47 +02:00
Michael Niedermayer
9783749c66
avcodec/fmvc: Move frame allocation to a later stage
This way more things are checked before allocation

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-09-10 17:32:38 +02:00
Lynne
c92edd969a
lavu/tx: rotate 3 & 15-point exptabs
This just inverts their signs. Simplifies SIMD.
2022-09-10 02:37:17 +02:00
Lynne
51172223fd
lavu/tx: generalize MDCTs
The same code can perform any-length MDCTs with minimal changes.
2022-09-10 02:37:16 +02:00
Lynne
645a1f4422
lavu/tx: add the inplace flag to PFA FFTs
They support in-place, because they have to use a temporary buffer.
2022-09-10 02:37:14 +02:00
Lynne
8c283e8fe6
lavu/tx: propagate the codelet flags into the context
The field is documented as a combination of both.
2022-09-10 02:37:11 +02:00
Andreas Rheinhardt
91e9a6df33 fate/matroska: Add test for updating AV1 extradata
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-10 01:38:07 +02:00
Andreas Rheinhardt
a5ab4be081 tests/fate-run: Allow to set input options for encoding pass
This will be useful in the next commit.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-10 01:38:07 +02:00
Hubert Mazur
06b98e396a lavc/aarch64: Provide neon implementation of nsse16
Add vectorized implementation of nsse16 function.

Performance comparison tests are shown below.
- nsse_0_c: 682.2
- nsse_0_neon: 116.5

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Co-authored-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-09 10:19:46 +03:00
Hubert Mazur
908abe8032 lavc/aarch64: Add neon implementation for vsse_intra16
Provide optimized implementation for vsse_intra16 for arm64.

Performance tests are shown below.
- vsse_4_c: 155.2
- vsse_4_neon: 36.2

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-09 10:19:46 +03:00
Hubert Mazur
ce03ea3e79 lavc/aarch64: Add neon implementation for vsad_intra16
Provide optimized implementation for vsad_intra16 function for arm64.

Performance comparison tests are shown below.
- vsad_4_c: 177.5
- vsad_4_neon: 23.5

Benchmarks and tests are run with checkasm tool on AWS Gravtion 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-09 10:19:46 +03:00
Hubert Mazur
c495a4b32d lavc/aarch64: Add neon implementation of vsse16
Provide optimized implementation of vsse16 for arm64.

Performance comparison tests are shown below.
- vsse_0_c: 257.7
- vsse_0_neon: 59.2

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-09 10:19:46 +03:00
Hubert Mazur
200f5e578f lavc/aarch64: Add neon implementation for vsad16
Provide optimized implementation of vsad16 function for arm64.

Performance comparison tests are shown below.
- vsad_0_c: 285.2
- vsad_0_neon: 39.5

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Co-authored-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-09 10:19:46 +03:00
Wenbin Chen
a2fd553dd3 libavcodec/qsvenc: Add low_delay_brc reset support to qsvenc
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
2022-09-09 09:39:44 +08:00
Wenbin Chen
005c7a4f61 libavcodec/qsvenc: Add max/min qp reset support in qsvenc
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
2022-09-09 09:39:44 +08:00
Wenbin Chen
9155ec096b libavcodec/qsvenc: Add intra refresh reset support to qsvenc
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
2022-09-09 09:39:44 +08:00
Wenbin Chen
f3ba1458b6 libavcodec/qsvenc: Add "slice" intra refresh type to qsvenc
Add "slice" intra refresh type to h264_qsv and hevc_qsv. This type means
horizontal refresh by slices without overlapping. Also update the doc.

Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
2022-09-09 09:39:44 +08:00