In display-sync mode, the very first video frame is idiotically fully
timed, even though audio has not been synced yet at this point, and the
video frame is more like a "preview" frame. But since it's fully timed,
an underflow is detected if audio takes longer than the display time of
the frame (we send the second frame only after audio is done).
The timing code will try to compensate for the determined desync, but it
really shouldn't. So explicitly discard the timing info in this specific
case. On the other hand, if the first frame still hasn't finished
display, we can pretend everything is ok.
This is a hack - ideally, we either would send a frame without timing
info (and then send it again or so when playback starts properly), or we
would add real pause support to the VO, and pause it during syncing.
Otherwise it behaves dumb. (Although you could argue it shouldn't try to
guess whether speed changes work, but instead simply disable DS if they
don't work.)
--deinterlace=auto is the default, and has the obscure semantics that
deinterlacing is disabled, unless the user has manually inserted a
deinterlacing filter.
While in software decoding this doesn't matter, and we will happily
insert 2 yadif filters (if the user has already added one), or not
remove the yadif filter (if deinterlacing is disabled, but the user has
added the filter manually), this is different with hardware deinterlacer
filters. These support VFCTRL_SET_DEINTERLACE for toggling deinterlacing
filtering at runtime. It exists mainly for legacy reasons, and possibly
because it makes switching deinterlacing modes more efficient. It might
also gives us an entry-point for VO deinterlacing, maybe. For whatever
reasons this mechanism exists, we still support and use it.
This commit fixes that video.c always used VFCTRL_SET_DEINTERLACE to
disable deinterlacing, even if --deinterlace=auto was set. Fix this by
checking the value of the option directly.
Commit 771a8bf5 added code to avoid unnecessary vf_reconfig() calls for
unrelated reasons, but forget to consider that it has to be called at
least once if the input format changes. As a consequence it got "stuck"
due to not being able to decode more frames.
vo_frame can have more than 1 frame - the extra frames are future
references, which are sometimes useful for filtering (vo_opengl
interpolation). There's no harm in reducing the number of frames sent to
the VO requested amount of future frames, so do that.
Doesn't actually reduce the number of concurrently in use frames in
practice.
Some filters support VFCTRL_SET_DEINTERLACE. This affects most hardware
deinterlace filters. They can be inserted by the user manually, or auto-
inserted by vf.c itself as conversion filter (vf_d3d11vpp). In these
cases, we shouldn't insert or remove filters outselves, and instead
VFCTRL_SET_DEINTERLACE should be invoked to switch the mode.
This wasn't done correctly in the recently refactored code and could
have broken with --deinterlace. (The refactor only considered switching
via property in this case.) Fix it by making it a proper part of the
filter_reconfig() function, and making set_deinterlacing() (which is
called by the property handler) merely call filter_reconfig() in all
cases to do the real work.
We can even avoid rebuilding the filter chain - though only if no other
auto-filters are inserted. It probably also provides a slightly cleaner
way to implement functionality in the VO while still inserting video
filter fallbacks correctly if required.
The test scenario at hand was hardware decoding a file with d3d11 and
with deinterlacing enabled. The file switches to a non-hardware
dedocdeable format mid-stream. This failed, because it tried to call
vf_reconfig() with the old filters inserted, with was fatal due to
vf_d3d11vpp accepting only hardware input formats.
Fix this by always strictly removing all auto-inserted filters
(including the deinterlacing one), and reconfiguring only after that.
Note that this change is good for other situations too, because we
generally don't want to use a hardware deinterlacer for software
decoding by default. They're not necessarily optimal, and VAAPI VPP even
has incomprehensible deinterlacer bugs specifically with software frames
not coming from a hardware decoder.
Instead of using the "vf" command code (which changes filters at runtime
on user input), use the general filter-insertion code. The latter was
added later, and is more suitable for automatically inserted filters.
The old code failed in particular when using watch-later saving, which
stored the filter list in the resume config file. If a user changed the
hardware decoding mode via command line, the stored filter chain was out
of date and could cause failure due to not working with hardware or
software decoding mode. Storing the deinterlace filter in the filter
list was unavoidable, because it was part of the user state. (The new
code only edits the actually instantiated filters.)
The main change is with video/hwdec.h. mp_hwdec_info is made opaque (and
renamed to mp_hwdec_devices). Its accessors are mainly thread-safe (or
documented where not), which makes the whole thing saner and cleaner. In
particular, thread-safety rules become less subtle and more obvious.
The new internal API makes it easier to support multiple OpenGL interop
backends. (Although this is not done yet, and it's not clear whether it
ever will.)
This also removes all the API-specific fields from mp_hwdec_ctx and
replaces them with a "ctx" field. For d3d in particular, we drop the
mp_d3d_ctx struct completely, and pass the interfaces directly.
Remove the emulation checks from vaapi.c and vdpau.c; they are
pointless, and the checks that matter are done on the VO layer.
The d3d hardware decoders might slightly change behavior: dxva2-copy
will not use the VO device anymore if the VO supports proper interop.
This pretty much assumes that any in such cases the VO will not use any
form of exclusive mode, which makes using the VO device in copy mode
unnecessary.
This is a big refactor. Some things may be untested and could be broken.
Another crappy fix for timestamp reset issues. This time, we try to fix
files which have very weird but legitimate frame durations, such as
cdgraphics. It can have many short frames, but once in a while there are
potentially very long frames.
Fixes#3027.
This would get stuck in reconfiguring the filter chain forever, because
params was mutated ("params.rotate = 0;"). This was used as input for
vf_reconfig(), but the filter chain input must always be equivalent to
the decoder output, or filter chain reconfiguration will be triggered.
The line of code to reset the rotation is from a time when this used to
work differently.
Also remove the unnecessary try_filter() parameter.
Unfortunately I see no better solution.
The refresh seek is skipped if the amount of buffered audio is not
overly huge.
Unfortunately softvol af_volume insertion still can cause this issue,
because it's outside of the normal dynamic filter chain changing code.
Move the video refresh call to reinit_video_filters() to make it more
uniform along with the audio code.
See --lavfi-complex option.
This is still quite rough. There's no support for dynamic configuration
of any kind. There are probably corner cases where playback might freeze
or burn 100% CPU (due to dataflow problems when interaction with
libavfilter).
Future possible plans might include:
- freely switch tracks by providing some sort of default track graph
label
- automatically enabling audio visualization
- automatically mix audio or stack video when multiple tracks are
selected at once (similar to how multiple sub tracks can be selected)
Will be helpful for the coming filter support. I planned on merging
audio/video decoding, but this will have to wait a bit longer, so only
remove the duplicate status codes.
These changes don't make too much sense without context, but are
preparation for later. Then the audio_src/video_src fields will be
actually be NULL under circumstances.
Regression caused by commit 3b95dd47. Also see commit 4c25b000. We can
either use video_next_pts and add "delay", or we just use video_pts. Any
other combination breaks. The reason why the assumption that delay==0 at
this point was wrong exactly because after displaying the first video
frame (usually done before audio resync) a new frame might be "added"
immediately, resulting in a new video_next_pts and "delay", which will
still amount to video_pts.
Fixes#2770. (The reason why display-sync was blamed in this issue is
because enabling display-sync in the options forces a prefetch by 2
instead of 1 frames for seeks/playback restart, which triggers the
issue, even if display-sync is not actually enabled. In this case,
display-sync is never enabled because the frames have a unusually high
frame duration. This is also what exposed the initial desync issue.)
Even though the timing logic is correct, it tends to mess with looping
videos and such in unappreciated ways.
It also has to be admitted that most file formats seem not to properly
define the duration of the last video frame (or libavformat does not
export it in a useful way), so whether or not we should use the demuxer
reported framerate for the last frame is questionable. (Still, why would
you essentially just discard the last frame?)
The timing logic is kept, but disabled for video with "normal" FPS
values. In particular, we want to keep it for displaying images, which
implicitly set the frame duration to 1 second by reporting 1 FPS. It's
also good for slide shows with mf://.
Fixes#2745.
vo_chain_uninit() isn't supposed to care much about the decoder
(although decoders and outputs still go strictly together, so there is
not much of an actual difference now).
Also unset track.d_video correctly.
Remove a stale declaration from dec_video.h as well.
Similar to the video path. dec_audio.c now handles decoding only. It
also looks very similar to dec_video.c, and actually contains some of
the rewritten code from it. (A further goal might be unifying the
decoders, I guess.)
High potential for regressions.
Eventually we want the VO be driven by a A->V filter, so a decoder
doesn't even have to exist. Some features definitely require a decoder
though (like reporting the decoder in use, hardware decoding, etc.), so
for each thing which accessed d_video, it has to be redecided if and how
it can access decoder state.
At least the "framedrop" property slightly changes semantics: you can
now always set this property, even if no video is active.
Some untested changes in this commit, but our bio-based distributed
test suite has to take care of this.
This moves some code related to decoding from video.c to dec_video.c,
and also removes some accesses to dec_video.c from the filtering code.
dec_video.ch is starting to make sense, and simply returns video frames
from a demuxer stream. The API exposed is also somewhat intended to be
easily changeable to move decoding to a separate thread, if we ever want
this (due to libavcodec already being threaded, I don't see much of a
reason, but it might still be helpful).
Lots of noise to remove the vfilter/vo fields from dec_video.
From now on, video filtering and output will still be done together,
summarized under struct vo_chain.
There is the question where exactly the vf_chain should go in such a
decoupled architecture. The end goal is being able to place a "complex"
filter between video decoders and output (which will culminate in
natural integration of A->V filters for natural integration of
libavfilter audio visualizations). The vf_chain is still useful for
"final" processing, such as format conversions and deinterlacing. Also,
there's only 1 VO and 1 --vf option. So having 1 vf_chain for a VO seems
ideal, since otherwise there would be no natural way to handle all these
existing options and mechanisms.
There is still some work required to truly decouple decoding.
Instead of handling this on filter chain reinit, do it directly after
the decoder. This makes the code less entangled. In particular, this
gets rid of the really weird "override params" concept in the video
filter code.
The last_format/fixed_formats have some redundance with decoder_output,
but unfortunately the latter has a slightly different use.
Basically reimplement it. The old implementation was quite stupid, and
was probably done this way because video filtering and output used to be
way less decoupled. Now we can reimplement it in a very simple way: when
backstepping, seek to current time, but keep the last frame that was
supposed to be discarded when reaching the target time. When the seek
finishes, prepend the saved frame to the video frame queue.
A disadvantage is that the new implementation fails to skip over
timeline boundaries (ordered chapters etc.), but this never worked
properly anyway. It's possible that this will be fixed some time in the
future.
This is mainly a refactor. I'm hoping it will make some things easier
in the future due to cleanly separating codec metadata and stream
metadata.
Also, declare that the "codec" field can not be NULL anymore. demux.c
will set it to "" if it's NULL when added. This gets rid of a corner
case everything had to handle, but which rarely happened.
This is another attempt at making files with sparse video frames work
better.
The problem is that you generally can't know whether a jump in video
timestamps is just a (very) long video frame, or a timestamp reset. Due
to the existence of files with sparse video frames (new frame only every
few seconds or longer), every heuristic will be arbitrary (in general,
at least).
But we can use the fact that if video is continuous, audio should also
be continuous. Audio discontinuities can be easily detected, and if that
happens, reset some of the playback state.
The way the playback state is reset is rather radical (resets decoders
as well), but it's just better not to cause too much obscure stuff to
happen here. If the A/V sync code were to be rewritten, it should
probably strictly use PTS values (not this strange time_frame/delay
stuff), which would make it much easier to detect such situations and
to react to them.
Slightly change how it is decided when a new packet should be read.
Switch to demux_read_packet_async(), and let the player "wait properly"
until required subtitle packets arrive, instead of blocking everything.
Move distinguishing the cases of passive and active reading into the
demuxer, where it belongs.
MPlayer traditionally always used the display aspect ratio, e.g. 16:9,
while FFmpeg uses the sample (aka pixel) aspect ratio.
Both have a bunch of advantages and disadvantages. Actually, it seems
using sample aspect ratio is generally nicer. The main reason for the
change is making mpv closer to how FFmpeg works in order to make life
easier. It's also nice that everything uses integer fractions instead
of floats now (except --video-aspect option/property).
Note that there is at least 1 user-visible change: vf_dsize now does
not set the display size, only the display aspect ratio. This is
because the image_params d_w/d_h fields did not just set the display
aspect, but also the size (except in encoding mode).