Optimized memset with mmi in following functions:
1. ff_h264_add_pixels4_8_mmi.
2. ff_h264_idct_add_8_mmi.
3. ff_h264_idct8_add_8_mmi.
This optimization improved h264 decoding performance about 1.3%(tested on loongson 3A3000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Reoptimize function ff_put_h264_chroma_mc8_mmi and ff_avg_h264_chroma_mc8_mmi.
Performance of h264 decoding improved about 5%(from 69fps to 73fps, tested on loongson 3A3000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Performance of mpeg4 decoding improved about 23%(from 128fps to 158fps, tested on loongson 3A3000).
Reoptimized following functions with mmi.
1. ff_simple_idct_put_8_mmi
2. ff_simple_idct_add_8_mmi
3. ff_simple_idct_8_mmi
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Also make sure we set the URL context max packet size accordingly.
Based on a patch by Tudor Suciu <tudor.suciu@gmail.com>
Signed-off-by: Marton Balint <cus@passwd.hu>
Entries are always at least 8 bytes per the parsing code, so if we
see an impossible entry count avoid massive allocations. This is
similar to an existing check in mov_read_stsc().
Since ff_mov_read_stsd_entries() does eof checks, an alternative
approach could be to clamp the entry count to atom.size / 8.
Signed-off-by: Dale Curtis <dalecurtis@chromium.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
AV_CODEC_FLAG_GLOBAL_HEADER should be set before calling avcodec_open2() to have any effect.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
- Allow to add deps in any order rather than "in linking order".
- Expand deps chains as required rather than just once.
- Validate that there are no cycles.
- Validate that [after expansion] deps are limited to other fflibs.
- Remove expectation for a specific output order of unique().
Previously when adding items to <fflib>_deps, developers were
required to add them in linking order. This can be awkward and
bug-prone, especially when a list is not empty, e.g. when adding
conditional deps.
It also implicitly expected unique() to keep the last instance of
recurring items such that these lists maintain their linking order
after removing duplicate items.
This patch mainly allows to add deps in any order by keeping just
one master list in linking order, and then reordering all the
<fflib>_deps lists to align with the master list order.
This master list is LIBRARY_LIST itself, where otherwise its order
doesn't matter.
The patch also removes a limit where these deps lists were expanded
only once. This could have resulted in incomplete expanded lists,
or forcing devs to add already-deducable deps to avoid this issue.
Note: it is possible to deduce the master list order automatically
from the deps lists, but in this case it's probably not worth the
added complexity, even if minor. Maintaining one list should be OK.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes the following warnings:
libavcodec/v4l2_m2m_enc.c:51:12: warning: missing braces around initializer
libavcodec/v4l2_m2m_enc.c:71:12: warning: missing braces around initializer
PTS is in microseconds, so correct field name is out_time_us.
Old field out_time_ms kept for now - will be removed after a suitable transition
period.
Fixes#7345
Just remove some dead variable assignments, unneeded variables and
change the FFMAX order to something more readable. Still identical.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Much simpler than regular decoding, does allow for 5.1 and 7.1
streams to be decoded without desync.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Unlike the range, the gradient start value does not have to be lower
than the end value.
Does allow more files to be correctly decoded without errors.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
x4 - x25 faster.
check_deps() recursively enables/disables components, and its loop is
iterated nearly 6000 times. It's particularly slow in bash - currently
consuming more than 50% of configure runtime, and about 20% with other
shells.
This commit applies few local optimizations, most effective first:
- Use $1 $2 ... instead of pushvar/popvar, and same at enable_deep*
- Abort early in one notable case - empty deps, to avoid costly no-op.
- Smaller changes which do add up:
- Handle ${cfg}_checking locally instead of via enable[d]/disable
- ${cfg}_checking: test done before inprogress - x2 faster in 50%+
- one eval instead of several at the empty-deps early abort path.
- The "actual work" part is unmodified - just its surroundings.
Biggest speedups (relative and absolute) are observed with bash.
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Tested-by: Helmut K. C. Tessarek <tessarek@evermeet.cx>
Tested-by: Dave Yeo <daveryeo@telus.net>
Tested-by: Reino Wijnsma <rwijnsma@xs4all.nl>
Signed-off-by: James Almer <jamrial@gmail.com>
x4 - x10 faster.
Inside print_enabled components, the filter_list case invokes sed
about 350 times to parse the same source file and extract different
info for each arg. This is never instant, and on systems where fork is
slow (notably MSYS2/Cygwin on windows) it takes many seconds.
Change it to use sed once on the source file and set env vars with the
parse results, then use these results inside the loop.
Additionally, the cases of indev_list and outdev_list are very
infrequent, but nevertheless they're faster, and arguably cleaner, with
shell parameter substitutions than with command substitutions.
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Tested-by: Helmut K. C. Tessarek <tessarek@evermeet.cx>
Tested-by: Dave Yeo <daveryeo@telus.net>
Tested-by: Reino Wijnsma <rwijnsma@xs4all.nl>
Signed-off-by: James Almer <jamrial@gmail.com>
x50 - x200 faster.
Currently configure spends 50-70% of its runtime inside a single
function: flatten_extralibs[_wrapper] - which does string processing.
During its run, nearly 20K command substitutions (subshells) are used,
including its callees unique() and resolve(), which is the reason
for its lengthy run.
This commit avoids all subshells during its execution, speeding it up
by about two orders of magnitude, and reducing the overall configure
runtime by 50-70% .
resolve() is rewritten to avoid subshells, and in unique() and
flatten_extralibs() we "inline" the filter[_out] functionality.
Note that logically, "unique" functionality has more than one possible
output (depending on which of the recurring items is kept). As it
turns out, other parts expect the last recurring item to be kept
(which was the original behavior of uniqie()). This patch preservs
its output order.
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Tested-by: Helmut K. C. Tessarek <tessarek@evermeet.cx>
Tested-by: Dave Yeo <daveryeo@telus.net>
Tested-by: Reino Wijnsma <rwijnsma@xs4all.nl>
Signed-off-by: James Almer <jamrial@gmail.com>
Encoder frame_number may be double-counted if some frames are cached and then flushed.
Take qsv encoder (some frames are cached firsty for asynchronism) as example,
./ffmpeg -loglevel verbose -hwaccel qsv -c:v h264_qsv -i in.mp4 -vframes 100 -c:v h264_qsv out.mp4
frame_number passed to encoder is double-counted and larger than the accurate value.
Libx264 encoding with B frames can also reproduce it.
Signed-off-by: Zhong Li <zhong.li@intel.com>
Fixes: signed integer overflow: -19818 + -2147483648 cannot be represented in type 'int'
Fixes: 9545/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_SNOW_fuzzer-4928769537081344
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>