This moves all VP3-specific function pointers from dsputil to a
new vp3dsp context. There is no reason to ever use the VP3 IDCT
where an MPEG2 IDCT is expected or vice versa.
Signed-off-by: Mans Rullgard <mans@mansr.com>
It turns out that the reference decoder subtracts 128 from DC during block
decode but adds it back during reordering block with zigzag pattern.
Transforming block with incorrect DC caused heavy visual artifacts for
many quantisers.
Passing a cutoff value < sample_rate/256 will cause a crash.
Also, values >20000 will have no effect and 20000 will be used anyway.
Signed-off-by: Mohammad Alsaleh <msal@tormail.org>
Signed-off-by: Martin Storsjö <martin@martin.st>
This was unnoticed on linux, since stdlib.h apparently includes
files declaring the pthread_mutex_t and pthread_cond_t types.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Testing gives 25-30% gain on HD clips with two threads and
up to 50% gain with eight threads.
Sliced threading uses more memory than single or frame threading.
Frame threading and single threading keep the previous memory
layout.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The number of pixel formats outgrew the number of available bits in
the bitmask used in avcodec_find_best_pix_fmt().
avcodec_find_best_pix_fmt2() uses a PIX_FMT_NONE terminated list
of pixel formats instead.
This allows compiling and running these tests on systems lacking a built-
in version of getopt(), such as MSVC.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The SPLATB_REG macro already adds the 'd' suffix internally.
This fixes building on Win64, which has been broken since 878e66902.
This worked for unix, where r2 happened to be rdx in this case, which
with the first suffix rdxd was mapped to eax, and eaxd is defined back
to eax. On win64 however, r2 happened to be R8 in this case, and
R8d mapps to R8D just fine, but there's no mapping for R8Dd to anything.
Signed-off-by: Martin Storsjö <martin@martin.st>
Instead of inlining everything into ff_h264_hl_decode_mb(), use
explicit templating to create versions of the called functions
with constant parameters filled in. This greatly speeds up
compilation of h264.c and reduces the code size without any
measurable impact on performance.
Compilation time for h264.c on an i7 goes from 30s to 5.5s.
Code size is reduced by 430kB.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Using ff_mspel_motion assumes that s (a MpegEncContext
poiinter) really is a Wmv2Context.
This fixes crashes in error resilience on vc1/wmv3 videos.
CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
The buffers are only allocated once, although it can happen from
any of a few different places, so there is no need to use realloc.
Using av_malloc() ensures they are aligned suitably for SIMD
optimisations.
Signed-off-by: Mans Rullgard <mans@mansr.com>
ff_wma_init is used only by wmadec and wmaenc, and neither of them
can handle more than 2 channels.
This fixes crashes with invalid files.
Based on patch by Piotr Bandurski and Michael Niedermayer.
Signed-off-by: Martin Storsjö <martin@martin.st>
This creates proper position independent code when accessing
data symbols if CONFIG_PIC is set.
References to external symbols should now use the movrelx macro.
Some additional code changes are required since this macro may
need a register to hold the GOT pointer.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The problem is that the ssse3 psign instruction does the wrong
thing here. Commit ea60dfe incorrectly removed a macro emulating
this instruction for pre-ssse3 code. However, the emulation is
incorrect, and the code relies on the behaviour of the macro.
Specifically, the psign sets destination elements to zero where
the corresponding source element is zero, whereas the emulation
only negates destination elements where the source is negative.
Furthermore, the PSIGNW_MMX macro in x86util.asm is totally bogus,
which is why the original VC-1 code had an additional right shift
when using it. Since the psign instruction cannot be used here,
skip all the macro hell and use the working instruction sequence
directly.
None of this was noticed due a stray return statement in
ff_vc1dsp_init_mmx() which meant that only the mmx version of the
loop filter was ever used (before being removed in ea60dfe).
Signed-off-by: Mans Rullgard <mans@mansr.com>
The function call was a mess to handle, and memcpy cannot make
the assumptions we do in the new code.
Tested on an IMC sample: 430c -> 370c.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Apparently, some build environments require dxva.h even for dxva2,
while others lack this header entirely. Including it conditionally
allows building in both cases.
Signed-off-by: Martin Storsjö <martin@martin.st>
The MBAFF flag may only be signaled if we're actually dealing with
a full frame, and not singular fields, as it can happen in mixed content.
Signed-off-by: Martin Storsjö <martin@martin.st>
This removes a dependency on implementation details from generic
code and allows easy addition of the equivalent optimisation for
other architectures than x86.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Its documentation states that it is allocated/freed by the caller, but
it is declared as an AV_OPT_TYPE_STRING AVOption. Since
367732832f the AVOptions system frees
strings automatically. This can be considered an API break, since it
won't work when the caller doesn't use av_malloc() to allocate the
memory or wants to use the string after closing the codec.
Since there is not much value in this field being an AVOption, the best
solution is to remove it from the options table.
Fixes infinite loop in FLAC decoding in case of a truncated bitstream due to
the safe bitstream reader returning 0's at the end.
Fixes Bug 310.
CC:libav-stable@libav.org
Override the frame size from the SPS with AVCodecContext values
if the latter specify a size smaller by less than one macroblock.
This is required for correct cropping of MOV files from Canon cameras.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Unlike its predecessor, Indeo Audio codec generates tables depending on
sampling rate. Previously decoder used pre-generated tables for 22050 Hz
which obviously doesn't work with other frequencies.
Many thanks to Maxim Poliakovsky for providing all needed information
for this.
This ensures that these functions are inlined into the per-position
entry points, allowing constant propagation as needed for proper
optimisation.
18% faster VC1 decoding on Cortex-A9.
Signed-off-by: Mans Rullgard <mans@mansr.com>
In Musepack SV8 codec property tell the maximum nonzero band, but every
frame codes maximum band as a limit (i.e. strictly less than given value).
Synthesis also expects maximum nonzero band, so there's a need to convert
frame maximum band limit value.
This prevents gcc from assuming that contents of it may have changed
between calls to vp56_range_get_prob(), thus preventing countless (and
unnecessary) movs. Decoding of sintel trailer goes from (avg+SG) 9.796
+/- 0.003 to 9.635 +/- 0.010.
We support every defined value for channel layout, bitrate and sample depth.
All other values are not unsupported, but reserved.
Update comments to say "are used" instead of "are known or exist".
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
This silences some valgrind warnings.
CC: libav-stable@libav.org
Fixes second half of http://ffmpeg.org/trac/ffmpeg/ticket/794
Bug found by: Oana Stratulat
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Reinhard Tartler <siretart@tauware.de>
(cherry picked from commit f85334f58e)