This moves setting the thread count to a minimum of 1 to
frame_thread_init(), allowing a value of zero to propagate
through to the codec if frame threading is not used. This
makes auto-threads work in libx264.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit ff1efc524c)
For intra codecs, ff_thread_finish_setup() is called before decoding starts
automatically. However, get_buffer can only be used before it's called, so
adding this requirement broke frame threading for them. Fixed by moving the
call until after get_buffer is finished.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
(cherry picked from commit ad9791e12b)
The assembler emits literal pools too far from the load instructions,
so we must do it explicitly at a suitable location.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit 8b454c352f)
decode_init sets bands[0] == 2, so this loop always sets the band table
index (k) to zero.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
(cherry picked from commit a304def1dc)
This avoids the core substream extensions scan when the EXT_AUDIO_ID
field indicates no extensions or only unsupported extensions. The scan
is done only if the value of EXT_AUDIO_ID is unknown or indicates a
present XCh extension which we can decode.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit 7e06e0ede3)
There is no need to expand to 16-bits. Just use memcpy() to copy the raw data.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
(cherry picked from commit 1108f8998c)
This also adds output buffer size checks for AUDIO and SILENCE block types.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
(cherry picked from commit 1574eff3d2)
The size should depend on the output sample size, not the internal bit depth.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
(cherry picked from commit a58bcb40b164b92957db73e702465808a9180126)
This allows the values to be used without changing C code and is closer to how
the other DEBUG flags work.
If this causes a problem for any user of this flag, please tell me and
ill split the flag in 2.
It was doubled in size for the LTP implementation. This brings it back
down to its original size.
(cherry picked from commit e22910b21a6c78b0159f98426b10c204f12bc15a)
The 4-tap filters should only access one row/column before the
reference block.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit e0e46cae377347cbe1cd27c0d85568921b12c2ad)
GCC 4.3 and later are more particular about signedness matching
in vector operations. The operations under if(rangered) were
missing assignments and thus had no effect.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit 381efba0ecedd41575f99ba9e9bd3826551079f6)
Merging these functions allows merging some loops, which makes the
results (particularly after SIMD optimizations) much faster.
(cherry picked from commit f8bed30d8b176fa030f6737765338bb4a2bcabc9)
Advantage is that it allows us to combine several loops into a single
one, and these can eventually be merged into the IDCT itself. Also, it
allows us to remove vc1_put_block(), and makes CODEC_FLAG_GRAY faster.
(cherry picked from commit bbfd2e7ab4e2ae0b934657fe51afdbbbaead52b7)
lsf_r is an array of int16_t, not float.
Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit 1efa772e20be5869817b2370a557bb14e7ce2fff)
Advanced profile never uses "range reduction", so vc1_put_block() quite
literally just calls put_pixels_clamped() from vc1_decode_i_blocks_adv().
By inlining the function, we can prevent calling IDCT8x8 if
CODEC_FLAG_GRAY is set, and we don't have to scale the coeffs in the
[0,256] range, but can instead use put_signed_pixels_clamped().
(cherry picked from commit 70aa916e4630bcec14439a2d703074b6d4c890a8)
With negative stride, the start of the edge_emu buffer should be pointing to
the last line, not the end of the buffer.
With positive stride, pointing to the end of the buffer was completely wrong.
(cherry picked from commit a89f4ca005)