lavc_process() calls the receive/send callbacks, which mirror
libavcodec's send/receive API. The receive API in particular can return
both a status code and a frame. Normally, libavcodec is pretty explicit
that it can't return both a frame and an error. But the receive callback
does more stuff in addition (vd_lavc does hardware decoding fallbacks
etc.). The previous commit shows an instance where this happened, and
where it leaked a frame in this case.
For robustness, check whether the frame is set first, i.e. trust it over
the status code. Before this, it checked for an EOF status first.
Hopefully is of no consequence otherwise. I made this change after
testing everything (can someone implement a test suite which tests this
exhaustively).
ad_lavc and vd_lavc use the lavc_process() helper to translate the
FFmpeg push/pull API to the internal filter API (which completely
mismatch, even though I'm responsible for both, just fucking kill me).
This interface was "slightly" too tight. It returned only a bool
indicating "progress", which was not enough to handle some cases (see
following commit).
While we're at it, move all state into a struct. This is only a single
bool, but we get the chance to add more if needed.
This fixes mpv falling asleep if decoding returns an error during
draining. If decoding fails when we already sent EOF, the state machine
stopped making progress. This left mpv just sitting around and doing
nothing.
A test case can be created with: echo $RANDOM >> image.png
This makes libavformat read a proper packet plus a packet of garbage.
libavcodec will decode a frame, and then return an error code. The
lavc_process() wrapper could not deal with this, because there was no
way to differentiate between "retry" and "send new packet". Normally, it
would send a new packet, so decoding would make progress anyway. If
there was "progress", we couldn't just retry, because it'd retry
forever.
This is made worse by the fact that it tries to decode at least two
frames before starting display, meaning it will "sit around and do
nothing" before the picture is displayed.
Change it so that on error return, "receiving" a frame is retried. This
will make it return the EOF, so everything works properly.
This is a high-risk change, because all these funny bullshit exceptions
for hardware decoding are in the way, and I didn't retest them. For
example, if hardware decoding is enabled, it keeps a list of packets,
that are fed into the decoder again if hardware decoding fails, and a
software fallback is performed. Another case of horrifying accidental
complexity.
Fixes: #6618
Form some reason (and because of my fault), vf_format converts image
formats, but nothing else. For example, setting the "colormatrix"
sub-parameter would not convert it to the new value, but instead
overwrite the metadata (basically "reinterpreting" the image data
without changing it).
Make the historical mistake worse, and go all the way and extend it such
that it can perform a conversion. For compatibility reasons, this needs
to be requested explicitly. (Maybe this would deserve a separate filter
to begin with, but things are messed up anyway. Feel free to suggest an
elegant and simple solution.)
This demonstrates how zimg can properly perform some conversions which
swscale cannot (see examples added to vf.rst).
Stupidly this requires 2 code paths, one for conversion, and one for
overriding the parameters.
Due to the filter bullshit (what was I thinking), this requires quite
some acrobatics that would not be necessary without these abstractions.
On the other hand, it'd definitely be more of a mess without it. Oh
whatever.
With the previous commit, this is dead code.
This also makes the f_autoconvert.c code for this dead code
(fortunately). Will probably remove this later.
Instead of using custom code.
Now if only f_lavfi knew what formats FFmpeg's vf_yadif accepts, this
could look much nicer, and wouldn't require the additional converter
filter setup.
This adds the function as seen in the f_autoconvert.h part of the patch.
It's pretty simple, but goes along with an intrusive code move. I guess
the resulting code is slightly nicer anyway.
Before this commit, enabling hardware deinterlacing via the
"deinterlace" option/property just failed if no hardware deinterlacing
was available. An error message was logged, and playback continued
without deinterlacing.
Change this, and try to copy the hardware surface to memory, and then
use yadif. This will have approximately the same effect as
--hwdec=auto-copy. Technically it's implemented differently, because
changing the hwdec mode is much more convoluted than just inserting a
filter for performing the "download". But the low level code for
actually performing the download is the same again.
Although performance won't be as good as with a hardware deinterlacer
(probably), it's more convenient than forcing the user to switch hwdec
modes separately. The "deinterlace" property is supposed to be a
convenience thing after all.
As far as the code architecture goes, it would make sense to auto-insert
such a download filter for all software-only filters that need it.
However, libavfilter does not tell us what formats a filter supports
(isn't that fucking crazy?), so all attempts to work towards this are
kind of hopeless. In this case, we merely have hardcoded knowledge that
vf_yadif definitely does not support hardware formats. But yes, this
sucks ass.
This just wraps the mp_image_hw_download() function as a filter and adds
some minor caching/error logging. (Shame that it needs to much
boilerplate, I guess.)
Will be used by the following commit. Wrapping it as filter seemed more
convenient than other choices.
Can be used with mp_chain_filters() to combine multiple filters into a
single one. This is a bit silly, but whatever. I'm making it an explicit
separate filter, because it lets the user access mp_filter.ppins against
all conventions.
This reverts commit 6385a5fd1b, and in
addition removes the code that automatically inserts the vavpp filter.
The reason is the same as the commit that is being reverted: this
filter seems to trigger driver bugs. It can cause GPU freezes or
just doesn't work.
This variant of disabling the filter is better. There was no way to
add the "force" parameter to the automatically inserted filter, so
the old approach just made manual filter insertion (with the --vf
option or "vf" command) more cumbersome.
skip-logo.lua is just what I wanted to have. Explanations are on the top
of that file. As usual, all documentation threatens to remove this stuff
all the time, since this stuff is just for me, and unlike a normal user
I can afford the luxuary of hacking the shit directly into the player.
vf_fingerprint is needed to support this script. It needs to scale down
video frames as part of its operation. For that, it uses zimg. zimg is
much faster than libswscale and generates more correct output. (The
filter includes a runtime fallback, but it doesn't even work because
libswscale fucks up and can't do YUV->Gray with range adjustment.)
Note on the algorithm: seems almost too simple, but was suggested to me.
It seems to be pretty effective, although long time experience with
false positives is missing. At first I wanted to use dHash [1][2], which
is also pretty simple and effective, but might actually be worse than
the implemented mechanism. dHash has the advantage that the fingerprint
is smaller. But exact matching is too unreliable, and you'd still need
to determine the number of different bits for fuzzier comparison. So
there wasn't really a reason to use it.
[1] https://pypi.org/project/dhash/
[2] http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html
I once created this because someone wanted to use vapoursynth without
the Python dependency. No idea if anyone ever actually used it. It's
sort of icky (it calls itself "lazy" to preempt complaints about how
much it sucks), and complicates the build process. Kill it.
It seems much more promising to have something like this:
https://github.com/vapoursynth/vapoursynth/issues/386
This would either solve the build distribution problem by relaxing the
Python dependency, and/or allow a Lua backend to be included without
pain.
Some state wasn't reset when decoding was started without a seek reset
before it. The code used to rely on reset_decoder() resetting this
state, but since the commit referenced below, reset_decoder() does less
than reset().
Fix this by explicitly calling reset() on initialization.
Fixes: "f_decoder_wrapper: avoid full reset on timeline switch etc."
Before this commit, there was a single process_decoded_frame() function.
It handled various aspects of dealing with a newly decoded frame. Move
some of these to a separate process_output_frame() function.
This new function is called in the order the frames are returned to the
playback core. Some correct_audio_pts() (was process_audio_frame())
becomes slightly less awkward due to this, and the timestamp smoothing
can actually work in backward playback mode now (thus moving p->pts out
of reset_decoder()).
Behavior for normal playback also changes subtly. This shouldn't matter
in sane cases, but if you mix broken files, --no-correct-pts, and
timeline stuff, differences in behavior might be visible.
Timeline clipping (EDL/ordered chapters) works now, because it's done
before "transforming" the timestamps. Audio timestamp smoothing happens
after it, which is a behavior change, but should be more correct. This
still runs crazy_video_pts_stuff() before everything else. On the pther
hand, --no-correct-pts or missing timestamp processing is done last. But
these things didn't really work with timeline before.
Slightly cleaner. We don't need to awkwardly backup "some" state on
backwards playback. Due to not resetting last_format, normal timeline
switches don't unconditionally trigger recomputing of certain image
parameters. Also probably doesn't reset framedrop parameters, although I
don't care about that part.
This could lead to nonsense when backward playback is involved. Better
reduce the possible interactions. Besides, it's better to fully reset
things on seeks in general.
The only exception is has_broken_packet_pts, which enables hr-seek if
everything looks good. It's intended to trigger at the second hr-seek or
so if the file is normal, and to disable it if the file is broken. It
tries to avoid enabling the hr-seek logic before it can know about
whether things are "good", so resetting it on seeks would obviously
never enable it. Document it as explicit exception.
Some audio codecs will discard or cut the first frames when starting
decoding. While some of that works through well-defined mechanisms (like
initial padding), it's in general very codec/decoder specific, and not
really predictable. In addition, libavcodec isn't very good with
reporting "dropped" frames (and our internal interface reflects this).
It seems our only chance to handle this is through timestamps.
In theory, it would be best to discard frames that have timestamps
before the "resume" position. But since video has reordered timestamps,
we'd need to put some effort into finding this position. The first video
packet doesn't necessarily contain this timestamp. (In theory, we could
just do this in the demuxer with some trivial additional work, and set
it on the packet's kf_seek_pts field. Although this field is supposed to
contain just this value, the field is considered demuxer-internal, and I
didn't want to make matters worse by reusing it for the interface to the
decoder. With some more effort and buffering, we could calculate this
value within the decoder, but fuck that.)
The approach chosen in this commit is setting the timestamp to NOPTS.
This will break in some obscure situations, but backward playback is a
pretty obscure feature to begin with, so I considered this a reasonable
implementation choice.
Before passing a preroll packet to the decoder, its timestamps are set
to NOPTS. Frames that are returned from the decoder and have the NOPTS
timestamp are considered preroll and are discarded. This happens only
during "preroll" mode (preroll_discard==true), so it doesn't affect
normal forward playback. It's disabled on the first packet with a
timestamp, so it can tolerate some crap even in backward playback mode.
We don't check the dts fields out of laziness (decoded audio frames
don't even have this field).
I considered using an approach using the EDL clipping infrastructure (as
mentioned in the last two paragraphs in the commit message of commit
" demux_lavf: implement bad hack for backward playback of wav"). This
didn't work, and I blamed timestamp rounding within mpv for it. But the
problem was actually due to Matroska-rounded timestamps. Since the audio
frame size isn't exactly aligned to 1ms, there will be an overlap (or
gap) in the timestamps. This overlap is much smaller than 1ms, since
it's just the sub-millisecond remainder part of the audio frame size.
This makes the timestamps discontinuous and unreliable for the purpose
we wanted to use it. We can't just smooth the timestamps in the demuxer
either.
Shitty ancient hack that wastes my time all the time.
demux.c: always return the coverart packet as soon as possible, and
don't let the backward demux state machine possibly stop it.
f_decoder_wrapper.c: mess with some shit until it somehow starts to
work. I think the old code tried to let it cleverly fall through so the
packet was processed "normally"; just make it run the "usual" code
instead.
See manpage additions. This is a huge hack. You can bet there are shit
tons of bugs. It's literally forcing square pegs into round holes.
Hopefully, the manpage wall of text makes it clear enough that the whole
shit can easily crash and burn. (Although it shouldn't literally crash.
That would be a bug. It possibly _could_ start a fire by entering some
sort of endless loop, not a literal one, just something where it tries
to do work without making progress.)
(Some obvious bugs I simply ignored for this initial version, but
there's a number of potential bugs I can't even imagine. Normal playback
should remain completely unaffected, though.)
How this works is also described in the manpage. Basically, we demux in
reverse, then we decode in reverse, then we render in reverse.
The decoding part is the simplest: just reorder the decoder output. This
weirdly integrates with the timeline/ordered chapter code, which also
has special requirements on feeding the packets to the decoder in a
non-straightforward way (it doesn't conflict, although a bugmessmass
breaks correct slicing of segments, so EDL/ordered chapter playback is
broken in backward direction).
Backward demuxing is pretty involved. In theory, it could be much
easier: simply iterating the usual demuxer output backward. But this
just doesn't fit into our code, so there's a cthulhu nightmare of shit.
To be specific, each stream (audio, video) is reversed separately. At
least this means we can do backward playback within cached content (for
example, you could play backwards in a live stream; on that note, it
disables prefetching, which would lead to losing new live video, but
this could be avoided).
The fuckmess also meant that I didn't bother trying to support
subtitles. Subtitles are a problem because they're "sparse" streams.
They need to be "passively" demuxed: you don't try to read a subtitle
packet, you demux audio and video, and then look whether there was a
subtitle packet. This means to get subtitles for a time range, you need
to know that you demuxed video and audio over this range, which becomes
pretty messy when you demux audio and video backwards separately.
Backward display is the most weird (and potentially buggy) part. To
avoid that we need to touch a LOT of timing code, we negate all
timestamps. The basic idea is that due to the navigation, all
comparisons and subtractions of timestamps keep working, and you don't
need to touch every single of them to "reverse" them.
E.g.:
bool before = pts_a < pts_b;
would need to be:
bool before = forward
? pts_a < pts_b
: pts_a > pts_b;
or:
bool before = pts_a * dir < pts_b * dir;
or if you, as it's implemented now, just do this after decoding:
pts_a *= dir;
pts_b *= dir;
and then in the normal timing/renderer code:
bool before = pts_a < pts_b;
Consequently, we don't need many changes in the latter code. But some
assumptions inhererently true for forward playback may have been broken
anyway. What is mainly needed is fixing places where values are passed
between positive and negative "domains". For example, seeking and
timestamp user display always uses positive timestamps. The main mess is
that it's not obvious which domain a given variable should or does use.
Well, in my tests with a single file, it suddenly started to work when I
did this. I'm honestly surprised that it did, and that I didn't have to
change a single line in the timing code past decoder (just something
minor to make external/cached text subtitles display). I committed it
immediately while avoiding thinking about it. But there really likely
are subtle problems of all sorts.
As far as I'm aware, gstreamer also supports backward playback. When I
looked at this years ago, I couldn't find a way to actually try this,
and I didn't revisit it now. Back then I also read talk slides from the
person who implemented it, and I'm not sure if and which ideas I might
have taken from it. It's possible that the timestamp reversal is
inspired by it, but I didn't check. (I think it claimed that it could
avoid large changes by changing a sign?)
VapourSynth has some sort of reverse function, which provides a backward
view on a video. The function itself is trivial to implement, as
VapourSynth aims to provide random access to video by frame numbers (so
you just request decreasing frame numbers). From what I remember, it
wasn't exactly fluid, but it worked. It's implemented by creating an
index, and seeking to the target on demand, and a bunch of caching. mpv
could use it, but it would either require using VapourSynth as demuxer
and decoder for everything, or replacing the current file every time
something is supposed to be played backwards.
FFmpeg's libavfilter has reversal filters for audio and video. These
require buffering the entire media data of the file, and don't really
fit into mpv's architecture. It could be used by playing a libavfilter
graph that also demuxes, but that's like VapourSynth but worse.
Manual changes done:
* Merged the interface-changes under the already master'd changes.
* Moved the hwdec-related option changes to video/decode/vd_lavc.c.
Historically, there's been no way to offer deinterlacing with nvdec,
and for cuviddec, it required a command line flag, with no way to
toggle while playing.
Now that we have a cuda deinterlacing filter in ffmpeg, we can hook
it up hook it up as the cuda auto-deinterlacer. In practice, this
isn't going to be present very often, due to the licensing mess with
the cuda sdk, but we can support it when it is there.
This was always a legacy thing. Remove it by applying an orgy of
mp_get_config_group() calls, and sometimes m_config_cache_alloc() or
mp_read_option_raw().
win32 changes untested.
AVFilterContext instances support some additional AVOptions over the
actual filter. This includes useful options like "threads". We didn't
support setting those for the "direct" wrapper (--vf=yadif:threads=1
failed). Change this. It requires setting options on the AVFilterContext
directly, except the code for positional parameters still needs to
access the actual filter's AVOptions.
Sometimes this hints that there's a bug, but sometimes it's normal.
Since the code for --end/--frames puts frames that should not be shown
anymore back into the pin, using those options will show this warning
when playback ends. This is a minor annoyance. We could change how it's
done (e.g. set an explicit flag somewhere), but that seems bothersome,
so just change the message from warning to verbose.
Also rename stereo3d to stereo_in. The only real change is that the
vo_gpu OSD code now uses the actual stereo 3D mode, instead of the
--video-steroe-mode value. (Why does this vo_gpu code even exist?)
I think this is more intuitive. This requires a dedicated "out" dummy
filter. But keep the "in" dummy filter for symmetry, like in the old
filter code. (We could remove the "in" dummy filter, because the first
actual filter would still show the real input format.)
This means vf_vapoursynth doesn't need a hack to work around the filter
code, and libavfilter filters now actually get the frame_rate field on
input pads set.
The libavfilter doxygen says the frame_rate field is only to be set if
the frame rate is known to be constant, and uses the word "must" (which
probably means they really mean it?) - but ffmpeg.c sets the field to
mere guesses anyway, and it looks like this normally won't lead to
problems.
Normally we don't even try this, but in corner cases it can happen. For
example when inserting lavcac3enc at runtime, and display-sync-resample
was active.
The audio format neogitation code was pretty complicated, although the
idea was simple: when the format changes (or on the first audio frame),
filter only the new frame through the entire filter chain, discard the
resulting frame, but use the format to initialize the AO.
This was useful for "fudging" the channel remix behavior (upmix or
downmix), and moving it before other filters. Apparently this was useful
for things like DRC filters, which might work better in stereo, and
which also can only achieve the desired volume levels by doing it before
a downmix, which would modify the volume. This mechanism was introduced
in commit 60048b7eb9 (which the commit message also describes as
"idiotic heuristic"). Knowing the output format is inherently necessary
for this, because otherwise we can't know what the hell the user defined
filters will do.
There were problems with robustness. Some filters needed more than one
frame. Resampling in particular would discard initial audio at high
resampling ratios. Some filters might drop audio intentionally (like
clipping data on timestamp ranges). There were also allegations that
some decoders output 0 length frames (although that is invalid in
libavcodec). The state machine was excessively complex and hard to
understand too.
There are 3 things that could have been done:
1. Fix robustness problems by doing more heuristics, like repeating
audio frames or simply decoding several frames. Since filters can
behave differently, this would have added lots of complexity.
2. Make use of libavfilter's format negotiation, and add the same to
mpv builtin filters. This is sort of annoying, because the format
negotiation in libavfilter changes the state of the filters. It also
reports only some parameters (mostly all for audio, but a lot of
holes for video). It would remove some of the state machine, but not
all.
3. Drop the channel remix fudging, and do the same as the video chain.
This would not require format negotiation, but instead you can just
filter the audio frames, and look what comes out of it. If nothing
comes out, simply never create an AO.
This commit selects option 3. It removes the remix fudging, which means
the loss of a feature. Users can instead add "--af=format=channels=2"
before their DRC filter, or something. I'm also considering changing the
default for --audio-channels back to stereo, and downmix in the decoder
or at the start of the filter chain, which would give the same results,
except requiring more configuration.
Implementation-wise, this is still a bit different from the video path.
The VO always remains the same instance, while the AO might have to be
recreated on configuration changes. This still requires explicit format
change handling + draining old data, but by putting it into
f_autoconvert, not much new code is needed.
This tried to avoid running the audio/video functions depending on
whether any of the audio or video related format restrictions were
called (so the filter would show an error if a mismatching media type
was passed in). It was a shit idea anyway, so fuck it.
At least the libavcodec MediaCodec wrapper sometimes seems to break the
libavcodec API, and does the following sequence:
send_packet() -> EAGAIN
receive_frame() -> EAGAIN
send_packet() -> OK
The libavcodec API never allows returning EAGAIN from both functions, so
we discarded packets in this case. Change it to retrying decoding, for
the sake of MediaCodec. Note that it could also happen due to internal
bugs in the vd_lavc.c hw fallback code, but if there are any remaining,
they should be fixed properly instead.
Requested.
Just bail out immediately (and disable audio) if format probing has no
result, instead of doing nothing and then apparently freezing.
This can happen with bogus filters, cases where the first audio frame is
essentially dropped by filters (can happen with large resampling
factors), and if the audio track contains no packets at all, or all
packets fail to decode.