Commit Graph

107834 Commits

Author SHA1 Message Date
Derek Buitenhuis e1e981c65e mov: Compare frag times in correct time base when seeking a stream without a corresponding sidx
Some muxers, such as GPAC, create files with only one sidx, but two streams
muxed into the same fragments pointed to by this sidx.

Prevously, in such a case, when we seeked in such files, we fell back
to, for example, using the sidx associated with the video stream, to
seek the audio stream, leaving the seekhead in the wrong place.

We can still do this, but we need to take care to compare timestamps
in the same time base.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2022-08-19 17:02:55 +01:00
Andreas Rheinhardt 8bec225c3c swscale/x86/yuv2yuvX: Remove unused ff_yuv2yuvX_mmx()
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-19 12:01:34 +02:00
Andreas Rheinhardt afd9da24d9 avcodec/mpegvideo_dec: Don't sync AVCodecContext fields manually
They are already synced generically in update_context_from_thread()
in pthread_frame.c.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-18 20:20:50 +02:00
Andreas Rheinhardt 22e157c1c6 avcodec/mpegvideo_dec: Remove commented-out cruft
The fields in question were removed in
759001c534.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-18 17:25:10 +02:00
Chema Gonzalez 59225b459f doc: fix binary values of SI prefixes
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-18 17:10:42 +02:00
Andreas Rheinhardt 3553b70d6d avcodec/ffv1enc: Remove redundant wrapper
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-18 16:57:47 +02:00
Andreas Rheinhardt 7e9a790441 avcodec/ffv1enc: Don't create and keep unnecessary reference
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-18 16:57:47 +02:00
Andreas Rheinhardt f76cef5c51 avcodec/get_buffer: Don't get AVPixFmtDescriptor unnecessarily
It is unused since 3575a495f6
(and the error message is dangerous: av_get_pix_fmt_name(format)
returns NULL iff av_pix_fmt_desc_get(format) returns NULL
and using a NULL string for %s would be UB).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-18 16:57:47 +02:00
Andreas Rheinhardt e506843183 avcodec/mpegpicture: Reset fields explicitly instead of memsetting them
Improves the grepability of the code.
(Furthermore, I hope that no compiler will really call memset
for 28 bytes.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-18 16:57:47 +02:00
Andreas Rheinhardt f0ea5094af avcodec/h263dec: Don't set frame parameters redundantly
This frame will be reset later in ff_mpv_frame_start()
anyway.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-18 16:57:47 +02:00
Andreas Rheinhardt 74d623914f avcodec/h263dec: Remove redundant code to set cur_pic_ptr
It is done later in ff_mpv_frame_start() (and nobody uses
current_picture_ptr between setting it in ff_mpv_frame_start()).

(The reason the vsynth*-h263-obmc ref files change is because
the call to ff_find_unused_picture() now happens after the older
pictures have been unreferenced in ff_mpv_frame_start(),
so that their slots in the picture array can be immediately
reused; the obmc code is somehow buggy and changes its output
depending on the earlier contents of the motion_val buffer.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-18 16:53:41 +02:00
Alan Kelly da0a37bab7 checkasm/sw_scale: hscale does not requires cpuflag test.
This is done in ff_shuffle_filter_coefficients.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2022-08-18 16:24:48 +02:00
Alan Kelly a38293e444 libswscale: Enable hscale_avx2 for all input sizes.
ff_shuffle_filter_coefficients shuffles the tail as required.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2022-08-18 16:24:48 +02:00
Alan Kelly a6724285fd sws: allow avx2 hscale to process inputs of any size.
The main loop processes blocks of 16 pixels. The tail processes blocks
of size 4.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2022-08-18 16:24:48 +02:00
Alan Kelly 51a34e8525 sws: Replace call to yuv2yuvX_mmx by yuv2yuvX_mmxext
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-18 16:19:13 +02:00
J. Dekker ce2f47318b lavc/aarch64: hevc_add_res add 12bit variants
hevc_add_res_4x4_12_c: 46.0
hevc_add_res_4x4_12_neon: 18.7
hevc_add_res_8x8_12_c: 194.7
hevc_add_res_8x8_12_neon: 25.2
hevc_add_res_16x16_12_c: 716.0
hevc_add_res_16x16_12_neon: 69.7
hevc_add_res_32x32_12_c: 3820.7
hevc_add_res_32x32_12_neon: 261.0

Signed-off-by: J. Dekker <jdek@itanimul.li>
2022-08-18 15:04:43 +02:00
Martin Storsjö 48be6616d0 aarch64: me_cmp: Remove a leftover unnecessary instruction
This was missed in a2e45ad407.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-18 12:14:53 +03:00
Hubert Mazur 70efa4d011 lavc/aarch64: Add neon implementation for pix_abs8
Provide optimized implementation of pix_abs8 function for arm64.

Performance comparison tests are shown below.
- pix_abs_1_0_c: 101.2
- pix_abs_1_0_neon: 22.5
- sad_1_c: 101.2
- sad_1_neon: 22.5

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-18 12:07:26 +03:00
Hubert Mazur 74312e80d7 lavc/aarch64: Add neon implementation for sse8
Provide optimized implementation of sse8 function for arm64.

Performance comparison tests are shown below.
- sse_1_c: 130.7
- sse_1_neon: 29.7

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-18 12:07:26 +03:00
Hubert Mazur a2e45ad407 lavc/aarch64: Add neon implementation for pix_abs16_y2
Provide optimized implementation of pix_abs16_y2 function for arm64.

Performance comparison tests are shown below.
pix_abs_0_2_c: 317.2
pix_abs_0_2_neon: 37.5

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-18 12:07:26 +03:00
Hubert Mazur d7abb7d143 lavc/aarch64: Add neon implementation for sse4
Provide neon implementation for sse4 function.

Performance comparison tests are shown below.
- sse_2_c: 80.7
- sse_2_neon: 31.0

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-18 12:07:26 +03:00
Hubert Mazur ad251fd262 lavc/aarch64: Add neon implementation for sse16
Provide neon implementation for sse16 function.

Performance comparison tests are shown below.
- sse_0_c: 268.2
- sse_0_neon: 43.5

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-18 12:07:26 +03:00
Martin Storsjö 60109d5b3d aarch64: me_cmp: Fix the indentation of function declarations
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-18 12:07:26 +03:00
Gyan Doshi d5544f6457 ffprobe: restore reporting error code for failed inputs
c11fb46731 led to a regression whereby the return code for missing
input or input probe is overridden by writer close return code and
hence not conveyed in the exit code.
2022-08-17 16:46:05 +05:30
Andreas Rheinhardt 444d80bd87 avcodec/me_cmp: Remove now incorrect av_assert2()
Since d69d12a5b9 these av_assert2()
(or more exactly, the ones in hadamard8_diff8x8_c() and
hadamard8_intra8x8_c()) are hit. So just remove all of these asserts.
(If the test were improved to know which functions expect h == 8
and which support any value, the asserts could be readded
at the appropriate places.)

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-17 11:07:53 +02:00
Martin Storsjö 1eaa575cf1 tools: Make sure to create the tools directory before building decode_simple.o
This directory dependency is normally added implicitly by rules
in ffbuild/common.mak; for tools it's created by a rule for TOOLOBJS.
TOOLOBJS is populated implicitly from TOOLS, and decode_simple.o
doesn't end up there because it's an odd occurrance of a lone
object file in the tools subdirectory, not belonging to any other
tool.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-17 00:00:50 +03:00
Martin Storsjö d69d12a5b9 checkasm: motion: Test different h parameters
Previously, the checkasm test always passed h=8, so no other cases
were tested.

Out of the me_cmp functions, in practice, some functions are hardcoded
to always assume a 8x8 block (ignoring the h parameter), while others
do use the parameter. For those with hardcoded height, both the
reference C function and the assembly implementations ignore the
parameter similarly.

The documentation for the functions indicate that heights between
w/2 and 2*w, within the range of 4 to 16, should be supported. This
patch just tests random heights in that range, without knowing what
width the current function actually uses.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-17 00:00:50 +03:00
Martin Storsjö dc55e63578 x86: Don't hardcode the height to 8 in sad8_xy2_mmx
The height is hardcoded in some of the me_cmp functions, but not
in all of them. But in the case of all other functions, it's hardcoded
in the same place in SIMD functions as in the C reference functions,
while this one function differs from the behaviour of the C code.

(Before 542765ce3e, there were a
couple other sad8_*_mmx functions with similar hardcoded height.)

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-17 00:00:50 +03:00
Martin Storsjö 21c2c57ba5 checkasm: Provide enough alignment in the new yuv2plane1 test
This fixes the checkasm test in some setups on x86.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-16 23:47:16 +03:00
J. Dekker aa9eabb7a5 lavc/aarch64: reformat add_res funcs
Signed-off-by: J. Dekker <jdek@itanimul.li>
2022-08-16 14:00:34 +02:00
J. Dekker ea6ecb12aa checkasm/hevc_add_res: add 12bit test
Also fix the bug where in every other byte only the lower 2 bits were
used in the 8bit test.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2022-08-16 14:00:34 +02:00
Swinney, Jonathan 0d7caa5b09 swscale/aarch64: add vscale specializations
This commit adds new code paths for vscale when filterSize is 2, 4, or
8. By using specialized code with unrolling to match the filterSize we
can improve performance.

On AWS c7g (Graviton 3, Neoverse V1) instances:
                                 before   after
yuv2yuvX_2_0_512_accurate_neon:  558.8    268.9
yuv2yuvX_4_0_512_accurate_neon:  637.5    434.9
yuv2yuvX_8_0_512_accurate_neon:  1144.8   806.2
yuv2yuvX_16_0_512_accurate_neon: 2080.5   1853.7

Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-16 13:40:42 +03:00
Swinney, Jonathan 3e708722a2 swscale/aarch64: vscale optimization
Use scalar times vector multiply accumlate instructions instead of
vector times vector to remove the need for replicating load instructions
which are slightly slower.

On AWS c7g (Graviton 3, Neoverse V1) instances:
yuv2yuvX_8_0_512_accurate_neon:  1144.8  987.4
yuv2yuvX_16_0_512_accurate_neon: 2080.5 1869.4

Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-16 13:40:42 +03:00
Swinney, Jonathan 4dcd191a50 checkasm: updated tests for sw_scale
Change the reference to exactly match the C reference in swscale,
instead of exactly matching the x86 SIMD implementations (which
differs slightly). Test with and without SWS_ACCURATE_RND - if this
flag isn't set, the output must match the C reference exactly,
otherwise it is allowed to be off by 2.

Mark a couple x86 functions as unavailable when SWS_ACCURATE_RND
is set - apparently this discrepancy hasn't been noticed in other
exact tests before.

Add a test for yuv2plane1.

Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-16 13:40:42 +03:00
Timo Rothenpieler 317f5252c0 doc/APIchanges: add missing rgbaf16 pixfmt entry 2022-08-16 12:31:03 +02:00
Anton Khirnov ab31473830 fftools/ffmpeg: store a separate copy of input codec parameters
Use it instead of AVStream.codecpar in the main thread. While
AVStream.codecpar is documented to only be updated when the stream is
added or avformat_find_stream_info(), it is actually updated during
demuxing. Accessing it from a different thread then constitutes a race.

Ideally, some mechanism should eventually be provided for signalling
parameter updates to the user. Then the demuxing thread could pick up
the changes and propagate them to the decoder.
2022-08-16 11:09:09 +02:00
Swinney, Jonathan 75ffca7eef libswscale/aarch64: add another hscale specialization
This specialization handles the case where filtersize is 4 mod 8, e.g.
12, 20, etc. Aarch64 was previously using the c function for this case.
This implementation speeds up that case significantly.

hscale_8_to_15__fs_12_dstW_512_c: 6234.1
hscale_8_to_15__fs_12_dstW_512_neon: 1505.6

Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-08-16 12:08:38 +03:00
Zhao Zhili 1af7797d21 avformat/mov: fix encryption index in the case of multiple trun
frag_stream_info->index_entry isn't the first sample/trun index.
cenc.frag_index_entry_base failed to catch the case since
current_index > 0.

Fix ticket #9807.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2022-08-16 18:47:40 +08:00
Zhao Zhili 98dcdd1868 avformat/mov: fix frag_index.current out of sync
frag_index.current is used by cenc_filter, and is updated inside
mov_read_moof. It can out of sync regarding to mov_read_packet.

Partly fix ticket #9807.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2022-08-16 18:47:33 +08:00
Lynne ae66a9db7b
lavu/tx: optimize and simplify inverse MDCTs
Convert the input from a scatter to a gather instead,
which is faster and better for SIMD.
Also, add a pre-shuffled exptab version to avoid
gathering there at all. This doubles the exptab size,
but the speedup makes it worth it. In SIMD, the
exptab will likely be purged to a higher cache
anyway because of the FFT in the middle, and
the amount of loads stays identical.

For a 960-point inverse MDCT, the speedup is 10%.

This makes it possible to write sane and fast SIMD
versions of inverse MDCTs.
2022-08-16 01:22:38 +02:00
Derek Buitenhuis 412922cc6f ipfsgateway: Remove default gateway
A gateway can see everything, and we should not be shipping a hardcoded
default from a third party company; it's a security risk.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2022-08-15 20:09:13 +01:00
Andreas Rheinhardt 6789b73a81 avcodec/mpegvideo: Don't zero unnecessarily
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-15 18:19:19 +02:00
Andreas Rheinhardt 9703f5d87d avcodec/mpegvideodec: Constify some functions
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-15 18:19:19 +02:00
Andreas Rheinhardt b645138a34 avcodec/mpegpicture: Don't copy unnecessarily, fix race
mpegvideo uses an array of Pictures and when it is done with using
them, it only unreferences them incompletely: Some buffers are kept
so that they can be reused lateron if the same slot in the Picture
array is reused, making this a sort of a bufferpool.
(Basically, a Picture is considered used if the AVFrame's buf is set.)
Yet given that other pieces of the decoder may have a reference to
these buffers, they need not be writable and are made writable using
av_buffer_make_writable() when preparing a new Picture. This involves
reading the buffer's data, although the old content of the buffer
need not be retained.

Worse, this read can be racy, because the buffer can be used by another
thread at the same time. This happens for Real Video 3 and 4.

This commit fixes this race by no longer copying the data;
instead the old buffer is replaced by a new, zero-allocated buffer.

(Here are the details of what happens with three or more decoding threads
when decoding rv30.rm from the FATE-suite as happens in the rv30 test:
The first decoding thread uses the first slot of its picture array
to store its current pic; update_thread_context copies this for the
second thread that decodes a P-frame. It uses the second slot in its
Picture array to store its P-frame. This arrangement is then copied
to the third decode thread, which decodes a B-frame. It uses the third
slot in its Picture array for its current frame.
update_thread_context copies this to the next thread. It unreferences
the third slot containing the other B-frame and then it reuses this
slot for its current frame. Because the pic array slots are only
incompletely unreferenced, the buffers of the previous B-frame are
still in there and they are not writable; in fact the previous
thread is concurrently writing to them, causing races when making
the buffer writable.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-15 18:10:31 +02:00
Andreas Rheinhardt 70f3035482 avcodec/avcodec: Remove redundant check
At this point active_thread_type is set iff active_thread_type
is set to FF_THREAD_FRAME iff AVCodecInternal.frame_thread_encoder
is set.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-15 18:10:31 +02:00
Andreas Rheinhardt 3040876833 avcodec/avcodec: Move initializing frame-thrd encoder to encode_preinit
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-15 18:10:31 +02:00
Timo Rothenpieler c469c3c3b1 avfilter/vsrc_ddagrab: add options for more control over output format fallback 2022-08-13 15:22:14 +02:00
Timo Rothenpieler 6a574e3901 avfilter/vsrc_ddagrab: add rgbaf16 output support 2022-08-13 15:21:59 +02:00
Timo Rothenpieler dd94a03468 avutil/hwcontext_d3d11va: add support for rgbaf16 pixel format 2022-08-13 15:21:59 +02:00
Timo Rothenpieler e95b08a7dd lavu/pixfmt: add packed RGBA float16 format
This is the default format of the Windows compositor and what DXGI
Desktop Duplication will give you for any kind of HDR output.
2022-08-13 15:21:46 +02:00