Commit Graph

116514 Commits

Author SHA1 Message Date
Rémi Denis-Courmont
54b1970c60 lavu/riscv: fix return type 2024-08-01 18:44:01 +03:00
Rémi Denis-Courmont
54ae270213 lavc/rv34dsp: use saturating add/sub for R-V V DC add
T-Head C908 (cycles):
rv34_idct_dc_add_c:      113.2
rv34_idct_dc_add_rvv_i32: 48.5 (before)
rv34_idct_dc_add_rvv_i32: 39.5 (after)
2024-08-01 18:43:04 +03:00
Rémi Denis-Courmont
952b426f3b lavc/bswapdsp: add RV Zvbb bswap16 and bswap32 2024-08-01 18:43:04 +03:00
James Almer
f4daf633b2 avcodec/aacps_tablegen_template: don't redefine CONFIG_HARDCODED_TABLES
Fixes relevant warnings when compiling with --enable-hardcoded-tables

Signed-off-by: James Almer <jamrial@gmail.com>
2024-08-01 12:13:53 -03:00
James Almer
6f8e365a2a avutil/hwcontext_vaapi: use the correct type for VASurfaceAttribExternalBuffers.buffers
Should fix ticket #11115.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-08-01 12:13:53 -03:00
Marvin Scholz
ca7fcf5089 avutil/hwcontext_videotoolbox: Fix build with older SDKs
The previous fix was not sufficient.
To make things easier to reason about, split the function and
add the guards there instead of complicating the call site more.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-01 20:58:27 +08:00
Anton Khirnov
bcf08c1171 lavc/ffv1: change FFV1SliceContext.plane into a RefStruct object
Frame threading in the FFV1 decoder works in a very unusual way - the
state that needs to be propagated from the previous frame is not decoded
pixels(¹), but each slice's entropy coder state after decoding the slice.

For that purpose, the decoder's update_thread_context() callback stores
a pointer to the previous frame thread's private data. Then, when
decoding each slice, the frame thread uses the standard progress
mechanism to wait for the corresponding slice in the previous frame to
be completed, then copies the entropy coder state from the
previously-stored pointer.

This approach is highly dubious, as update_thread_context() should be
the only point where frame-thread contexts come into direct contact.
There are no guarantees that the stored pointer will be valid at all, or
will contain any particular data after update_thread_context() finishes.

More specifically, this code can break due to the fact that keyframes
reset entropy coder state and thus do not need to wait for the previous
frame. As an example, consider a decoder process with 2 frame threads -
thread 0 with its context 0, and thread 1 with context 1 - decoding a
previous frame P, current frame F, followed by a keyframe K. Then
consider concurrent execution consistent with the following sequence of
events:
* thread 0 starts decoding P
* thread 0 reads P's slice header, then calls
  ff_thread_finish_setup() allowing next frame thread to start
* main thread calls update_thread_context() to transfer state from
  context 0 to context 1; context 1 stores a pointer to context 0's private
  data
* thread 1 starts decoding F
* thread 1 reads F's slice header, then calls
  ff_thread_finish_setup() allowing the next frame thread to start
  decoding
* thread 0 finishes decoding P
* thread 0 starts decoding K; since K is a keyframe, it does not
  wait for F and reallocates the arrays holding entropy coder state
* thread 0 finishes decoding K
* thread 1 reads entropy coder state from its stored pointer to context
  0, however it finds state from K rather than from P

This execution is currently prevented by special-casing FFV1 in the
generic frame threading code, however that is supremely ugly. It also
involves unnecessary copies of the state arrays, when in fact they can
only be used by one thread at a time.

This commit addresses these deficiencies by changing the array of
PlaneContext (each of which contains the allocated state arrays)
embedded in FFV1SliceContext into a RefStruct object. This object can
then be propagated across frame threads in standard manner. Since the
code structure guarantees only one thread accesses it at a time, no
copies are necessary. It is also re-created for keyframes, solving the
above issue cleanly.

Special-casing of FFV1 in the generic frame threading code will be
removed in a later commit.

(¹) except in the case of a damaged slice, when previous frame's pixels
    are used directly
2024-08-01 10:09:26 +02:00
Anton Khirnov
c335218a81 lavc/ffv1dec: inline copy_fields() into update_thread_context()
It is now only called from a single place, so there is no point in it
being a separate function.
2024-08-01 10:09:26 +02:00
Anton Khirnov
d44812f7cf lavc/ffv1dec: stop using per-slice FFV1Context
All remaining accesses to them are for fields that have the same value
in the main encoder context.

Drop now-unused FFV1Context.slice_contexts.
2024-08-01 10:09:26 +02:00
Anton Khirnov
2b21cdff6e lavc/ffv1dec: move slice_damaged to per-slice context 2024-08-01 10:09:26 +02:00
Anton Khirnov
f2aeba56c4 lavc/ffv1dec: move slice_reset_contexts to per-slice context 2024-08-01 10:09:26 +02:00
Anton Khirnov
84dda32202 lavc/ffv1enc: stop using per-slice FFV1Context
All remaining accesses to them are for fields that have the same value
in the main encoder context.
2024-08-01 10:09:26 +02:00
Anton Khirnov
96e8af6c4d lavc/ffv1: move ac_byte_count to per-slice context 2024-08-01 10:09:26 +02:00
Anton Khirnov
e7d0f44138 lavc/ffv1enc: store per-slice rc_stat(2?) in FFV1SliceContext
Instead of the per-slice FFV1Context, which will be removed in future
commits.
2024-08-01 10:09:26 +02:00
Anton Khirnov
7b2bfba55d lavc/ffv1: move RangeCoder to per-slice context 2024-08-01 10:09:26 +02:00
Anton Khirnov
28769f6bc1 lavc/ffv1: move FFV1Context.plane to per-slice context 2024-08-01 10:09:26 +02:00
Anton Khirnov
9b86ba5a92 lavc/ffv1: always use the main context values of ac
It cannot change between slices.
2024-08-01 10:09:26 +02:00
Anton Khirnov
a57c88d67b lavc/ffv1: move FFV1Context.slice_{coding_mode,rct_.y_coef} to per-slice context 2024-08-01 10:09:26 +02:00
Anton Khirnov
39486a2b29 lavc/ffv1: always use the main context values of plane_count/transparency
They cannot change between slices.
2024-08-01 10:09:26 +02:00
Anton Khirnov
492df65201 lavc/ffv1: drop write-only PlaneContext.interlace_bit_state 2024-08-01 10:09:26 +02:00
Anton Khirnov
a411fc5a84 lavc/ffv1: drop redundant PlaneContext.quant_table
It is a copy of FFV1Context.quant_tables[quant_table_index].
2024-08-01 10:09:26 +02:00
Anton Khirnov
4b9f7c7e3a lavc/ffv1: drop redundant FFV1Context.quant_table
In all cases except decoding version 1 it's either not used, or contains
a copy of a table from quant_tables, which we can just as well use
directly.

When decoding version 1, we can just as well decode into
quant_tables[0], which would otherwise be unused.
2024-08-01 10:09:26 +02:00
Anton Khirnov
d2f507233a lavc/ffv1enc: move bit writer to per-slice context 2024-08-01 10:09:26 +02:00
Anton Khirnov
889faedd26 lavc/ffv1dec: move the bitreader to stack
There is no reason to place it in persistent state.
2024-08-01 10:09:25 +02:00
Anton Khirnov
19e9f3d5f2 lavc/ffv1: move run_index to the per-slice context 2024-08-01 10:09:25 +02:00
Anton Khirnov
91d3c1ac47 lavc/ffv1: move sample_buffer to the per-slice context 2024-08-01 10:09:25 +02:00
Anton Khirnov
54aa33f116 lavc/ffv1: add a per-slice context
FFV1 decoder and encoder currently use the same struct - FFV1Context -
both as codec private data and per-slice context. For this purpose
FFV1Context contains an array of pointers to per-slice FFV1Context
instances.

This pattern is highly confusing, as it is not clear which fields are
per-slice and which per-codec.

Address this by adding a new struct storing only per-slice data. Start
by moving slice_{x,y,width,height} to it.
2024-08-01 10:09:25 +02:00
Anton Khirnov
d845ea49c5 lavc/ffv1dec: move copy_fields() under HAVE_THREADS
It is unused otherwise
2024-08-01 10:09:25 +02:00
Anton Khirnov
3a5c814b19 lavc/ffv1dec: drop a pointless variable in decode_slice()
fsdst is by construction always equal to fs, there is even an
av_assert1() checking that. Just use fs directly.
2024-08-01 10:09:25 +02:00
Anton Khirnov
4da146ba83 lavc/ffv1dec: drop FFV1Context.cur
It is merely a pointer to FFV1Context.picture.f, which can just as well
be used directly.
2024-08-01 10:09:25 +02:00
Anton Khirnov
e1fa107fd1 lavc/ffv1dec: simplify slice index calculation 2024-08-01 10:09:25 +02:00
Anton Khirnov
d776fa4e4d lavc/ffv1dec: declare loop variables in the loop where possible 2024-08-01 10:09:25 +02:00
Anton Khirnov
8e19c24634 tests/fate/vcodec: add vsynth tests for FFV1 version 2 2024-08-01 10:09:25 +02:00
Michael Niedermayer
06f5ed40f8
avcodec/snow: Fix off by 1 error in run_buffer
Fixes: out of array access
Fixes: 70741/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_SNOW_fuzzer-5703668010647552

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-08-01 00:18:02 +02:00
Michael Niedermayer
58fbeb59e7
avcodec/utils: apply the same alignment to YUV410 as we do to YUV420 for snow
The snow encoder uses block based motion estimation which can read out of array if
insufficient alignment is used

It may be better to only apply this for the encoder, as it would safe a few bytes of memory
for the decoder. Until then, this fixes the issue in a simple way.

Fixes: out of array access
Fixes: 68963/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_SNOW_fuzzer-4979988435632128
Fixes: 68969/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_SNOW_fuzzer-6239933667803136.fuzz
Fixed: 70497/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_SNOW_fuzzer-5751882631413760

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-08-01 00:18:02 +02:00
Michael Niedermayer
ed96ac87a9
avformat/iamf_parse: Check for 0 samples
Fixes: division by zero
Fixes: 70561/clusterfuzz-testcase-minimized-ffmpeg_IO_DEMUXER_fuzzer-6199435013455872
Fixes: 70565/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-5783790316748800

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-08-01 00:18:02 +02:00
Nathan E. Egge
8280ec7a32 lavu/riscv: Revert d808070, removing AV_READ_TIME
The implementation of ff_read_time() for RISC-V uses rdtime which has
 precision on existing hardware too low (!) for benchmarking purposes.
Deleting this implementation falls back on clock_gettime() which was
 added as the default ff_read_time() implementation in 33e4cc9.
Below are metrics gathered on SpacemiT K1, before and after this commit:

Before:

$ tests/checkasm/checkasm --bench
benchmarking with native FFmpeg timers
nop: 0.0
checkasm: using random seed 3473665261
checkasm: bench runs 1024 (1 << 10)
RVI:
 - pixblockdsp.get_pixels                [OK]
 - vc1dsp.mspel_pixels                   [OK]
RVF:
 - audiodsp.audiodsp                     [OK]
checkasm: all 4 tests passed
audiodsp.vector_clipf_c: 1388.7
audiodsp.vector_clipf_rvf: 261.5
get_pixels_c: 2.0
get_pixels_rvi: 1.5
vc1dsp.put_vc1_mspel_pixels_tab[0][0]_c: 8.0
vc1dsp.put_vc1_mspel_pixels_tab[0][0]_rvi: 1.0
vc1dsp.put_vc1_mspel_pixels_tab[1][0]_c: 2.0
vc1dsp.put_vc1_mspel_pixels_tab[1][0]_rvi: 0.5

After:

$ tests/checkasm/checkasm --bench
benchmarking with native FFmpeg timers
nop: 56.4
checkasm: using random seed 1021411603
checkasm: bench runs 1024 (1 << 10)
RVI:
 - pixblockdsp.get_pixels                [OK]
 - vc1dsp.mspel_pixels                   [OK]
RVF:
 - audiodsp.audiodsp                     [OK]
checkasm: all 4 tests passed
audiodsp.vector_clipf_c: 23236.4
audiodsp.vector_clipf_rvf: 11038.4
get_pixels_c: 79.6
get_pixels_rvi: 48.4
vc1dsp.put_vc1_mspel_pixels_tab[0][0]_c: 329.6
vc1dsp.put_vc1_mspel_pixels_tab[0][0]_rvi: 38.1
vc1dsp.put_vc1_mspel_pixels_tab[1][0]_c: 89.9
vc1dsp.put_vc1_mspel_pixels_tab[1][0]_rvi: 17.1

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-07-31 17:48:50 +03:00
James Almer
ab5c612137 avcodec/Makefile: use the correct path for aacdec_fixed.o when setting its dependencies
Fixes ticket #11112

Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-31 11:32:56 -03:00
Anton Khirnov
43f702a253 lavfi/framesync: avoid forcing frame writability unnecessarily
Callers of ff_framesync_get_frame() generally do not expect the result
to be writable, those that do (e.g. ff_framesync_dualinput_get_writable())
ensure writability themselves.

Significantly reduces memory consumption in complex graphs with
framesync-based filters (e.g. scale, ssim).

Reported-By: Mark Shwartzman
2024-07-31 11:12:45 +02:00
Rémi Denis-Courmont
262168b04e lavc/videodsp: RISC-V zicbop prefetch
There are currently no ways to run-time detect the CPU capability, so we
take it for granted (in the worst case, it will execute NOPs).
2024-07-30 18:41:51 +03:00
Rémi Denis-Courmont
4570b9f3c4 configure: check if assembler supports RV zicbop
zicbop is the Cache Block Operation, Prefetch extension to RVI.
2024-07-30 18:41:51 +03:00
Rémi Denis-Courmont
324eba69f7 lavc/vc1dsp: use saturating arithmetic for RVV inv_trans_dc
T-Head C908 (cycles):
vc1dsp.vc1_inv_trans_4x4_dc_c:      113.7
vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 46.5 (before)
vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 45.5 (after)
vc1dsp.vc1_inv_trans_4x8_dc_c:      230.7
vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 65.7 (before)
vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 52.5 (after)
vc1dsp.vc1_inv_trans_8x4_dc_c:      246.7
vc1dsp.vc1_inv_trans_8x4_dc_rvv_i64: 56.7 (before)
vc1dsp.vc1_inv_trans_8x4_dc_rvv_i64: 45.5 (after)
vc1dsp.vc1_inv_trans_8x8_dc_c:      419.7
vc1dsp.vc1_inv_trans_8x8_dc_rvv_i64: 81.2 (before)
vc1dsp.vc1_inv_trans_8x8_dc_rvv_i64: 53.5 (after)
2024-07-30 18:41:51 +03:00
Rémi Denis-Courmont
784a72a116 lavc/vc1dsp: unify R-V V DC bypass functions 2024-07-30 18:41:51 +03:00
Rémi Denis-Courmont
bd0c3edb13 lavu/riscv: count bytes rather than words for bswap32
This removes the dependency on Zba at essentially zero cost.
2024-07-30 18:41:51 +03:00
Rémi Denis-Courmont
5171baa228 lavc/ac3dsp: fix R-V CPU requirements
It probably will not matter on any real hardware, but the Zbb optimisations
do not require Zba. And then, we need HAVE_RVV to build the RVV stuff.
2024-07-30 18:41:51 +03:00
Peter Ross
0e09f6d690 avcodec/adpcm: only process right samples when decoding stereo
Fixes Coverity issue #1610760.
2024-07-30 19:55:31 +10:00
Leo Izen
7bb5626fa7
avcodec/pngenc: fix sBIT writing for indexed-color PNGs
We currently write invalid sBIT entries for indexed PNGs, which by PNG
specification[1] must be 3-bytes long. The values also are capped at 8
for indexed-color PNGs, not the palette depth. This patch fixes both of
these issues previously fixed in the decoder, but not the encoder.

[1]: https://www.w3.org/TR/png-3/#11sBIT

Regression since: c125860892.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
Reported-by: Ramiro Polla: <ramiro.polla@gmail.com>
2024-07-30 05:43:36 -04:00
Leo Izen
825606641b
avcodec/pngdec: use 8-bit sBIT cap for indexed PNGs per spec
The PNG specification[1] says that sBIT entries must be at most the bit
depth specified in IHDR, unless the PNG is indexed-color, in which case
sBIT must be between 1 and 8. We should not reject valid sBITs on PNGs
with indexed color.

[1]: https://www.w3.org/TR/png-3/#11sBIT

Regression since 84b454935f.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
Reported-by: Ramiro Polla <ramiro.polla@gmail.com>
2024-07-30 05:43:31 -04:00
Marth64
e2105b2800
avcodec/aacenc: Correct option description for aac_coder fast
The description advertises fast as "Default fast search", but
this has not been the default for a long time (current default
is twoloop).

Signed-off-by: Marth64 <marth64@proxyid.net>
2024-07-30 05:42:50 -04:00
Fei Wang
79b4869959 lavu/hwcontext_qsv: Derive bind flag from frame type if no valid surface
Fix cmd:
ffmpeg.exe -init_hw_device d3d11va=d3d -init_hw_device qsv=qsv@d3d \
-filter_hw_device d3d -hwaccel qsv -hwaccel_output_format qsv      \
-i in.h264 -vf "hwmap,format=d3d11,hwdownload,format=nv12" -y out.yuv

Signed-off-by: Fei Wang <fei.w.wang@intel.com>
Tested-by: Tong Wu <wutong1208@outlook.com>
2024-07-30 13:41:15 +08:00