coded_block is only used for I-frames, so it is unnecessary
to reset it in ff_clean_intra_table_entries() (which
cleans certain tables for a non-intra MB).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When !CONFIG_SMALL, we create separate functions for FMT_MPEG1
(i.e. for MPEG-1/2); given that there are only three possibilities
for out_format (FMT_MPEG1, FMT_H263 and FMT_H261 -- MJPEG and SpeedHQ
are both intra-only and do not have motion vectors at all, ergo
they don't call this function), one can optimize MPEG-1/2-only code
away in mpeg_motion_internal().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These references now always exist due to dummy frames.
Also remove the corresponding checks in the lowres code
in mpegvideo_dec.c.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
MPEG-2 allows to pair an intra field (as first field) together
with a P-field. In this case a conformant bitstream has to satisfy
certain restrictions in order to ensure that only the I field
is used for prediction. See section 7.6.3.5 of the MPEG-2
specifications.
We do not check these restrictions; normally we simply allocate
dummy frames for reference in order to avoid checks lateron.
This happens in ff_mpv_frame_start() and therefore does not happen
for a second field. This is inconsistent. Fix this by allocating
these dummy frames for the second field, too.
This already fixes two bugs:
1. Undefined pointer arithmetic in prefetch_motion() in
mpegvideo_motion.c where it is simply presumed that the reference
frame exists.
2. Several MPEG-2 hardware accelerations rely on last_picture
being allocated for P pictures and next picture for B pictures;
e.g. VDPAU returns VDP_STATUS_INVALID_HANDLE when decoding
an I-P fields pair because the forward_reference was set incorrectly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This will allow to reuse it to allocate dummy frames for
the second field (which can be a P-field even if the first
field was an intra field).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
linesize and uvlinesize are supposed to be the common linesize of all
the Y/UV-planes of all the currently cached pictures.
ff_mpeg_update_thread_context() syncs the pictures, yet it did not sync
linesize and uvlinesize. This mostly works, because ff_alloc_picture()
only accepts new pictures if they coincide with the linesize of the
already provided pictures (if any). Yet there is a catch: Linesize
changes are accepted when the dimensions change (in which case the
cached frames are discarded).
So imagine a scenario where all frame threads use the same dimension A
until a frame with a different dimension B is encountered in the
bitstream, only to be instantly reverted to A in the next picture. If
the user changes the linesize of the frames upon the change to dimension
B and keeps the linesize thereafter (possible if B > A),
ff_alloc_picture() will report an error when frame-threading is in use:
The thread decoding B will perform a frame size change and so will the
next thread in ff_mpeg_update_thread_context() as well as when decoding
its picture. But the next thread will (presuming it is not the same
thread that decoded B, i.e. presuming >= 3 threads) not perform a frame
size change, because the new frame size coincides with its old frame
size, yet the linesize it expects from ff_alloc_picture() is outdated,
so that it errors out.
It is also possible for the user to use the original linesizes for
the frame after the frame that reverted back to A; this will be
accepted, yet the assumption that of all pictures are the same
will be broken, leading to segfaults.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The mpegvideo-based codecs currently require the linesize to be
constant (except when the frame dimensions change); one reason
for this is that certain scratch buffers whose size depend on
linesize are only allocated once and are presumed to be correctly
sized if the pointers are != NULL.
This commit changes this by storing the actual linesize these
buffers belong to and reallocating the buffers if it does not
suffice. This is not enough to actually support changing linesizes,
but it is a start. And it is a prerequisite for the next patch.
Also don't emit an error message in case the source ctx's
edge_emu_buffer is unset in ff_mpeg_update_thread_context().
It need not be an error at all; e.g. it is a perfectly normal
state in case a hardware acceleration is used as the scratch
buffers are not allocated in this case (it is easy to run into
this issue with MPEG-4) or if the src context was not initialized
at all (e.g. because the first packet contained garbage).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is unnecessary to check for whether the number of planes
of an already existing audio pool coincides with the number
of planes to use for the frame: If the common format of both
is planar, then the number of planes coincides with the number
of channels for which there is already a check*; if not,
then both the existing pool as well as the frame use one pool.
*: In fact, one could reuse the pool in this case even if the
number of channels changes.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is currently done inconsistently: Only one error path
(namely the one from init_pass2()) made ff_rate_control_init()
call ff_rate_control_uninit(); in other error paths
cleanup was left to the caller.
Given that the only caller of this function already performs
the necessary cleanup this commit changes this to always
rely on the caller to perform cleanup on error.
Also return the error code from init_pass2().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The issue is that if a frame has no complex stereo prediction,
the alpha values must all be assumed to be zero if the next frame
has complex prediction and uses delta coding.
The LC part of the decoder combines scalefactor application with
spectrum decoding, and this was the plan here, but that's not possible,
so change the function name.
The issue here is that the spec implied that the offset is done
on the dequantized scalefactor, but in fact, it is done on the
scalefactor offset. Delay dequantizing the scalefactors until
after noise synthesis is performed, and change to apply the
offset onto the offset.
Otherwise nothing is written into the destination when a write mapping
is requested.
For example, a vulkan frame mapped from a drm frame (which is wrapped as
a vaapi frame in the example) is used as the output of scale_vulkan
filter, it always gets a green screen without this patch.
ffmpeg -init_hw_device vaapi=va -init_hw_device vulkan=vulkan@va
-filter_hw_device vulkan -f lavfi -i testsrc=size=352x288,format=nv12
-vf
"hwupload,scale_vulkan,hwmap=derive_device=vaapi:reverse=1,format=vaapi,hwdownload,format=nv12"
-f nut - | ffplay -
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
This changes the behavior and makes it behave how it probably was intended.
Either way this is unlikely to result in any user visible change
Fixes: CID1494637 Missing break in switch
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This also makes the code more robust
Fixes: CID1512414 Uninitialized pointer read
Sponsored-by: Sovereign Tech Fund
Reviewed-by: Pierre-Anthony Lemieux <pal@sandflow.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Alot more input checking can be performed, this is only checking the obvious missing case
Fixes: CID1598562 Unchecked return value
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This adds runtime support to use Zbb REV8 for 32- and 64-bit byte-wise
swaps. The result is about five times slower than if targetting Zbb
statically, but still a lot faster than the default bespoke C code or a
call to GCC run-time functions.
For 16-bit swap, this is however unsurprisingly a lot worse, and so this
sticks to the baseline. In fact, even using REV8 statically does not
seem to be beneficial in that case.
Zbb static Zbb dynamic I baseline
bswap16: 0.668184765 3.340764069 0.668029012
bswap32: 0.668174014 3.340763319 9.353855435
bswap64: 0.668221765 3.340496313 14.698672283
(seconds for 1 billion iterations on a SiFive-U74 core)
Due to hysterical raisins, most RISC-V Linux distributions target a
RV64GC baseline excluding the Bit-manipulation ISA extensions, most
notably:
- Zba: address generation extension and
- Zbb: basic bit manipulation extension.
Most CPUs that would make sense to run FFmpeg on support Zba and Zbb
(including the current FATE runner), so it makes sense to optimise for
them. In fact a large chunk of existing assembler optimisations relies
on Zba and/or Zbb.
Since we cannot patch shared library code, the next best thing is to
carry a flag initialised at load-time and check it on need basis.
This results in 3 instructions overhead on isolated use, e.g.:
1: AUIPC rd, %pcrel_hi(ff_rv_zbb_supported)
LBU rd, %pcrel_lo(1b)(rd)
BEQZ rd, non_Zbb_fallback_code
// Zbb code here
The C compiler will typically load the flag ahead of time to reducing
latency, and can also keep it around if Zbb is used multiple times in a
single optimisation scope. For this to work, the flag symbol must be
hidden; otherwise the optimisation degrades with a GOT look-up to
support interposition:
1: AUIPC rd, GOT_OFFSET_HI
LD rd, GOT_OFFSET_LO(rd)
LBU rd, (rd)
BEQZ rd, non_Zbb_fallback_code
// Zbb code here
This patch adds code to provision the flag in libraries using bit
manipulation functions from libavutil: byte-swap, bit-weight and
counting leading or trailing zeroes.
Do not do it in hls_slice_header(), which is the wrong place for it.
Avoids special magic return value of 1 in that function. The comment
mentioning potential corrupted state is no longer relevant, as
hls_slice_header() modifies no state beyond SliceHeader, which will only
get used for a valid frame.