No difference in PSNR or bitrate in the printed precission with the matrix lobby scene at 322x242
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
We're shifting individual components (8-bit, unsigned) left by 24,
so making them unsigned should give the same results without the
overflow.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
For certain types of filters where the intermediate sum of coefficients
can go above the fixed-point equivalent of 1.0 in the middle of a filter,
the sum of a 31-bit calculation can overflow in both directions and can
thus not be represented in a 32-bit signed or unsigned integer. To work
around this, we subtract 0x40000000 from a signed integer base, so that
we're halfway signed/unsigned, which makes it fit even if it overflows.
After the filter finishes, we add the scaled bias back after a shift.
We use the same trick for 16-bit bpc YUV output routines.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The buffer splicing relies on the bitstream reader over-reading
the end of the buffer as declared in init_get_bits(), although
more data is actually present. Manually moving the bitstream
boundary after init_get_bits() allows this to work as expected.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The sample has an incomplete last frame. Decoding it is pointless.
The garbage produced was changed by the bitstream reader now
protecting against over-reads.
Signed-off-by: Mans Rullgard <mans@mansr.com>
When turned on, H264/CAVLC gets ~15% (CVPCMNL1_SVA_C.264) slower for
ultra-high-bitrate files, or ~2.5% (CVFI1_SVA_C.264) for lower-bitrate
files. Other codecs are affected to a lesser extent because they are
less optimized; e.g., VC-1 slows down by less than 1% (all on x86).
The patch generated 3 extra instructions (cmp, cmovae and mov) per
call to get_bits().
The performance penalty on ARM is within the error margin for most
files, up to 4% in extreme cases such as CVPCMNL1_SVA_C.264.
Based on work (for GCI) by Aneesh Dogra <lionaneesh@gmail.com>, and
inspired by patch in Chromium by Chris Evans <cevans@chromium.org>.
* qatar/master:
get_bits: remove A32 variant
avconv: support stream specifiers in -metadata and -map_metadata
wavpack: Fix 32-bit clipping
wavpack: Clip samples after shifting
h264: don't drop B-frames after next keyframe on POC reset.
get_bits: remove useless pointer casts
configure: refactor lists of tests and components into variables
rv40: NEON optimised weak loop filter
mpegts: replace some magic numbers with the existing define
swscale: add unscaled packed 16 bit per component endianess conversion
Conflicts:
libavcodec/get_bits.h
libavcodec/h264.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The A32 bitstream reader variant is only used on ARMv5 and for
Prores due to the larger bit cache this decoder requires.
In benchmarks on ARMv5 (Marvell Sheeva) with gcc 4.6, the only
statistically significant difference between ALT and A32 is
a 4% advantage for ALT in FLAC decoding. There is thus no (longer)
any reason to keep the A32 reader from this point of view.
This patch adds an option to the ALT reader increasing the bit
cache to 32 bits as required by the Prores decoder. Benchmarking
shows no significant change in speed on Intel i7. Again, the
A32 reader fails to justify its existence.
Signed-off-by: Mans Rullgard <mans@mansr.com>
In the case that (frame_flags & 0x03) == 3, hybrid_maxclip
may have had a signed integer overflow.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
It doesn't make much sense to clip pre-shift,
nor is it correct for proper decoding.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The keyframe after a POC reset may not be the first to be returned to
the user. Therefore, don't reset the expected next POC once we return
a keyframe to the user, but once we know that the next frame in the
return-queue is a keyframe.
width and height might get passed as 0 and would cause floating point
exceptions in decode_frame.
Fixes bugzilla #149
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This uses the old demuxing code for OP1a and separate demuxing code for OPAtom.
Timestamp output is added to the old demuxing code.
The seeking code is made to seek to the start of the desired EditUnit only,
from which the normal demuxing code takes over (if OP1a). This means we don't
use delta entries or slices, only StreamOffsets.
OPAtom seeking basically works like before.
This also makes D-10 seeking behave the same way as OP1a and OPAtom. In other
words, we allow seeking before the start or past the end for D-10 too.
This fixes ticket #746.