This will be useful to test more aggressively for failures to mark XMM
registers as clobbered in Win64 builds, and prevent regressions thereof.
Based on a patch by Ramiro Polla <ramiro.polla@gmail.com>
Return the correct number of consumed bytes and set *data_size = 0.
Returned size is 1 too small, leading to that 1 byte being read as the next
frame, which results in an extra blank frame at the beginning of the stream.
Avoids doing malloc/free for each frame.
Also fixes valgrind errors due to use of uninitialized padding bytes.
Based on a patch by Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Wrapper around av_fast_malloc() that keeps FF_INPUT_BUFFER_PADDING_SIZE
zero-padded bytes at the end of the used buffer.
Based on a patch by Reimar Döffinger <Reimar.Doeffinger@gmx.de>.
get_ue_golomb_long() is only tested for values up to 2^15 - 2 since
we can not write larger values.
Silence the test on success and return a non-zero value on error.
Use an heap scratch buffer instead of large stack buffer.
Remove unneeded includes.
This way, if the AVCodecContext is allocated for a specific codec, the
caller doesn't need to store this codec separately and then pass it
again to avcodec_open2().
It also allows to set codec private options using av_opt_set_* before
opening the codec.
It allows to check whether an AVCodecContext is open in a documented
way. Right now the undocumented way this check is done in lavf/lavc is
by checking whether AVCodecContext.codec is NULL. However it's desirable
to be able to set AVCodecContext.codec before avcodec_open2().
In some cases, what is left to read from ptr is smaller than EXTRABYTES.
Based on a patch by Thierry Foucu <tfoucu@gmail.com>.
Signed-off-by: Alex Converse <alex.converse@gmail.com>
Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are
multiples of 512 (which is often the case when the values round up nicely).
*_TIMER report for the 16x16 and 8x8 cases:
C:
9015 decicycles in 16, 524257 runs, 31 skips
2656 decicycles in 8, 524271 runs, 17 skips
MMX:
4156 decicycles in 16, 262090 runs, 54 skips
1206 decicycles in 8, 262131 runs, 13 skips
MMX on fast-path:
2760 decicycles in 16, 524222 runs, 66 skips
995 decicycles in 8, 524252 runs, 36 skips
SSE2:
2163 decicycles in 16, 262131 runs, 13 skips
832 decicycles in 8, 262137 runs, 7 skips
SSE2 with fast path:
1783 decicycles in 16, 524276 runs, 12 skips
711 decicycles in 8, 524283 runs, 5 skips
SSSE3:
2117 decicycles in 16, 262136 runs, 8 skips
814 decicycles in 8, 262143 runs, 1 skips
SSSE3 with fast path:
1315 decicycles in 16, 524285 runs, 3 skips
578 decicycles in 8, 524286 runs, 2 skips
This means around a 4% speedup for some sequences.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Currently, any samples in the final frame are not decoded because they are
only represented by one frame instead of two. So we encode two final frames to
cover both the analysis delay and the MDCT delay.