The audio format neogitation code was pretty complicated, although the
idea was simple: when the format changes (or on the first audio frame),
filter only the new frame through the entire filter chain, discard the
resulting frame, but use the format to initialize the AO.
This was useful for "fudging" the channel remix behavior (upmix or
downmix), and moving it before other filters. Apparently this was useful
for things like DRC filters, which might work better in stereo, and
which also can only achieve the desired volume levels by doing it before
a downmix, which would modify the volume. This mechanism was introduced
in commit 60048b7eb9 (which the commit message also describes as
"idiotic heuristic"). Knowing the output format is inherently necessary
for this, because otherwise we can't know what the hell the user defined
filters will do.
There were problems with robustness. Some filters needed more than one
frame. Resampling in particular would discard initial audio at high
resampling ratios. Some filters might drop audio intentionally (like
clipping data on timestamp ranges). There were also allegations that
some decoders output 0 length frames (although that is invalid in
libavcodec). The state machine was excessively complex and hard to
understand too.
There are 3 things that could have been done:
1. Fix robustness problems by doing more heuristics, like repeating
audio frames or simply decoding several frames. Since filters can
behave differently, this would have added lots of complexity.
2. Make use of libavfilter's format negotiation, and add the same to
mpv builtin filters. This is sort of annoying, because the format
negotiation in libavfilter changes the state of the filters. It also
reports only some parameters (mostly all for audio, but a lot of
holes for video). It would remove some of the state machine, but not
all.
3. Drop the channel remix fudging, and do the same as the video chain.
This would not require format negotiation, but instead you can just
filter the audio frames, and look what comes out of it. If nothing
comes out, simply never create an AO.
This commit selects option 3. It removes the remix fudging, which means
the loss of a feature. Users can instead add "--af=format=channels=2"
before their DRC filter, or something. I'm also considering changing the
default for --audio-channels back to stereo, and downmix in the decoder
or at the start of the filter chain, which would give the same results,
except requiring more configuration.
Implementation-wise, this is still a bit different from the video path.
The VO always remains the same instance, while the AO might have to be
recreated on configuration changes. This still requires explicit format
change handling + draining old data, but by putting it into
f_autoconvert, not much new code is needed.
This tried to avoid running the audio/video functions depending on
whether any of the audio or video related format restrictions were
called (so the filter would show an error if a mismatching media type
was passed in). It was a shit idea anyway, so fuck it.
At least the libavcodec MediaCodec wrapper sometimes seems to break the
libavcodec API, and does the following sequence:
send_packet() -> EAGAIN
receive_frame() -> EAGAIN
send_packet() -> OK
The libavcodec API never allows returning EAGAIN from both functions, so
we discarded packets in this case. Change it to retrying decoding, for
the sake of MediaCodec. Note that it could also happen due to internal
bugs in the vd_lavc.c hw fallback code, but if there are any remaining,
they should be fixed properly instead.
Requested.
Just bail out immediately (and disable audio) if format probing has no
result, instead of doing nothing and then apparently freezing.
This can happen with bogus filters, cases where the first audio frame is
essentially dropped by filters (can happen with large resampling
factors), and if the audio track contains no packets at all, or all
packets fail to decode.
The rest of the function should be executed only if both are set. It
seems like in practice this didn't happen yet with only one of them
unset, but in theory it's possible. Found by Coverity.
Also print type and help string. Also print AV_OPT_TYPE_CONST, which are
like the mpv choice option type, except different. Print them as
separate lines because FFmpeg usually has help strings for them too.
There was the following problem: if a filter graph had asynchronous
filters, and the filter graph user did not call mp_filter_run() (and
only accessed the mp_pins), then filtering could stall, because using
mp_pin_out_request_data() only recursively invoked filtering if the
data_requested flag wasn't already set. The latter can happen if a
request was tried earlier but failed, and then an asynchronous filter
actually produced output that would satisfy the request. Obviously, it
has to invoke filtering again to get the requested frame.
Fix this by organizing the code differently, and making sure to invoke
mp_filter_run() on every request (if there's nothing to do, it doesn't
do anything anyway). Simplify it a bit by removing things which are not
needed, like connecting filter graphs with different root filters.
Avoids that the audio decoder shows up with a "[root/ad]" log prefix.
This is an annoying consequence of mp_log_new(): if a log parent doesn't
have a prefix with "!", it'll add the prefix to all mp_logs created from
it. This should probably be fixed in the mp_log code itself, but doing
so would be a big deal as we'd have to make sure all the other log
prefixes are what we want. So work it around for now.
Before this, we made deinterlacing dependent on the video codec metadata
(AVFrame.interlaced_frame for libavcodec). So even if --deinterlace=yes
was set, we skipped deinterlacing if the flag wasn't set. This is very
unreliable and there are many streams with flags incorrectly set.
The potential problem is that this might upset people who alwase enabled
deinterlace and hoped it worked. But it's likely these people were
screwed by this setting anyway. The new behavior is less tricky and
easier to understand, and this preferable. Maybe one day we could
introduce a --deinterlace=auto, which does the right thing, but of
course this would be hard to implement (esecially with hwdec).
Fixes#5219.
The recent changes to player/audio.c moved PTS jump detection to after
audio filtering. This was mostly done for convenience, because dataflow
between decoder and filters was made "automatic", and jump detection
would have to be done as filter. Now move it back to after decoders,
again out of convenience. The future direction is to make the dataflow
between filters and AO automatic, so this is a bit in the way. Another
reason is that speed changes tend to cause jumps - these are legitimate,
but get annoying quickly.
Another "what was I thinking" thing - destroying filters explicitly
skipped async wakeups for no reason. These were notifications for
filters that are not going to be destroyed too, and so their wakeup will
be lost, leading to stalled playback. This is completely unnecessary and
the special code can be removed.
Fixes#5488. (This case destroyed all audio filters due to AO init
failure, which could make clear out the f_demux_in.c wakeup for video,
and "freeze" playback.)
We called mp_pin_out_request_data() if there was input _and_ output.
This is not how it should be: we should request new input only if output
was requested, but we could not produce any output.
On the other hand, the upper half of the process() function will request
new input if output is required, but all input was consumed. But this
requires calling mp_filter_internal_mark_progress(), as otherwise the
general filter logic would not know that we can continue.
If the speed is changed by a large amount, we need to effectively change
the output rate by a large amount, and swr_set_compensation() is
apparently not designed to handle such large changes well. So it's
better to reinitialize the resampler on all large changes.
Also, strictly reinitialize the resampler if the rate changes, otherwise
it could happen that libavresample (which does not automatically
initialize resampling if avresample_set_compensation() is used) would
never apply speed changes properly.
Also document some conditions better that handle corner cases (remove
the inline condition from the if gating the compensation code).
It also appears that we crashed with very large compensation ratios
(when raising audio speed quickly by keeping the "[" key down), and this
commit accidentally mitigates it by not allowing large compensation.
Similar to the previous commit, and for the same reasons. Unlike with
af_scaletempo, resampling does not have a natural frame size, so we set
an arbitrary size limit on output frames. We add a new option to control
this size, although I'm not sure whether anyone will use it, so mark it
for testing only.
Note that we go through some effort to avoid buffering data in
libswresample itself. One reason is that we might have to reinitialize
the resampler completely when changing speed, which drops the buffered
data. Another is that I'm not sure whether the resampler will do the
right thing when applying dynamic speed changes.
--vf=help will now list libavfilter filters, and e.g. --vf=yadif=help
will list libavfilter filter options.
The latter is rather bare, because the AVOption API is really awful
(holy shit how is it so bad), and would require us to handle _every_
option type manually.
Alternatively we could call av_opt_show2(), which ffmpeg uses for help
output in its CLI tools and which is much more detailed. But it's rather
foreign and forces output through av_log(), so I don't really want to
use it.
Use the decoder wrapper that was introduced for video. This removes all
code duplication the old audio decoder wrapper had with the video code.
(The audio wrapper was copy pasted from the video one over a decade ago,
and has been kept in sync ever since by the power of copy&paste. Since
the original copy&paste was possibly done by someone who did not answer
to the LGPL relicensing, this should also remove all doubts about
whether any of this code is left, since we now completely remove any
code that could possibly have been based on it.)
There is some complication with spdif handling, and a minor behavior
change (it will restrict the list of codecs to spdif if spdif is to be
used), but there should not be any difference in practice.
Move dec_video.c to filters/f_decoder_wrapper.c. It essentially becomes
a source filter. vd.h mostly disappears, because mp_filter takes care of
the dataflow, but its remains are in struct mp_decoder_fns.
One goal is to simplify dataflow by letting the filter framework handle
it (or more accurately, using its conventions). One result is that the
decode calls disappear from video.c, because we simply connect the
decoder wrapper and the filter chain with mp_pin_connect().
Another goal is to eventually remove the code duplication between the
audio and video paths for this. This commit prepares for this by trying
to make f_decoder_wrapper.c extensible, so it can be used for audio as
well later.
Decoder framedropping changes a bit. It doesn't seem to be worse than
before, and it's an obscure feature, so I'm content with its new state.
Some special code that was apparently meant to avoid dropping too many
frames in a row is removed, though.
I'm not sure how the source code tree should be organized. For one,
video/decode/vd_lavc.c is the only file in its directory, which is a bit
annoying.
Get rid of the old vf.c code. Replace it with a generic filtering
framework, which can potentially handle more than just --vf. At least
reimplementing --af with this code is planned.
This changes some --vf semantics (including runtime behavior and the
"vf" command). The most important ones are listed in interface-changes.
vf_convert.c is renamed to f_swscale.c. It is now an internal filter
that can not be inserted by the user manually.
f_lavfi.c is a refactor of player/lavfi.c. The latter will be removed
once --lavfi-complex is reimplemented on top of f_lavfi.c. (which is
conceptually easy, but a big mess due to the data flow changes).
The existing filters are all changed heavily. The data flow of the new
filter framework is different. Especially EOF handling changes - EOF is
now a "frame" rather than a state, and must be passed through exactly
once.
Another major thing is that all filters must support dynamic format
changes. The filter reconfig() function goes away. (This sounds complex,
but since all filters need to handle EOF draining anyway, they can use
the same code, and it removes the mess with reconfig() having to predict
the output format, which completely breaks with libavfilter anyway.)
In addition, there is no automatic format negotiation or conversion.
libavfilter's primitive and insufficient API simply doesn't allow us to
do this in a reasonable way. Instead, filters can use f_autoconvert as
sub-filter, and tell it which formats they support. This filter will in
turn add actual conversion filters, such as f_swscale, to perform
necessary format changes.
vf_vapoursynth.c uses the same basic principle of operation as before,
but with worryingly different details in data flow. Still appears to
work.
The hardware deint filters (vf_vavpp.c, vf_d3d11vpp.c, vf_vdpaupp.c) are
heavily changed. Fortunately, they all used refqueue.c, which is for
sharing the data flow logic (especially for managing future/past
surfaces and such). It turns out it can be used to factor out most of
the data flow. Some of these filters accepted software input. Instead of
having ad-hoc upload code in each filter, surface upload is now
delegated to f_autoconvert, which can use f_hwupload to perform this.
Exporting VO capabilities is still a big mess (mp_stream_info stuff).
The D3D11 code drops the redundant image formats, and all code uses the
hw_subfmt (sw_format in FFmpeg) instead. Although that too seems to be a
big mess for now.
f_async_queue is unused.