These changes don't make too much sense without context, but are
preparation for later. Then the audio_src/video_src fields will be
actually be NULL under circumstances.
Regression caused by commit 3b95dd47. Also see commit 4c25b000. We can
either use video_next_pts and add "delay", or we just use video_pts. Any
other combination breaks. The reason why the assumption that delay==0 at
this point was wrong exactly because after displaying the first video
frame (usually done before audio resync) a new frame might be "added"
immediately, resulting in a new video_next_pts and "delay", which will
still amount to video_pts.
Fixes#2770. (The reason why display-sync was blamed in this issue is
because enabling display-sync in the options forces a prefetch by 2
instead of 1 frames for seeks/playback restart, which triggers the
issue, even if display-sync is not actually enabled. In this case,
display-sync is never enabled because the frames have a unusually high
frame duration. This is also what exposed the initial desync issue.)
Even though the timing logic is correct, it tends to mess with looping
videos and such in unappreciated ways.
It also has to be admitted that most file formats seem not to properly
define the duration of the last video frame (or libavformat does not
export it in a useful way), so whether or not we should use the demuxer
reported framerate for the last frame is questionable. (Still, why would
you essentially just discard the last frame?)
The timing logic is kept, but disabled for video with "normal" FPS
values. In particular, we want to keep it for displaying images, which
implicitly set the frame duration to 1 second by reporting 1 FPS. It's
also good for slide shows with mf://.
Fixes#2745.
vo_chain_uninit() isn't supposed to care much about the decoder
(although decoders and outputs still go strictly together, so there is
not much of an actual difference now).
Also unset track.d_video correctly.
Remove a stale declaration from dec_video.h as well.
Similar to the video path. dec_audio.c now handles decoding only. It
also looks very similar to dec_video.c, and actually contains some of
the rewritten code from it. (A further goal might be unifying the
decoders, I guess.)
High potential for regressions.
Eventually we want the VO be driven by a A->V filter, so a decoder
doesn't even have to exist. Some features definitely require a decoder
though (like reporting the decoder in use, hardware decoding, etc.), so
for each thing which accessed d_video, it has to be redecided if and how
it can access decoder state.
At least the "framedrop" property slightly changes semantics: you can
now always set this property, even if no video is active.
Some untested changes in this commit, but our bio-based distributed
test suite has to take care of this.
This moves some code related to decoding from video.c to dec_video.c,
and also removes some accesses to dec_video.c from the filtering code.
dec_video.ch is starting to make sense, and simply returns video frames
from a demuxer stream. The API exposed is also somewhat intended to be
easily changeable to move decoding to a separate thread, if we ever want
this (due to libavcodec already being threaded, I don't see much of a
reason, but it might still be helpful).
Lots of noise to remove the vfilter/vo fields from dec_video.
From now on, video filtering and output will still be done together,
summarized under struct vo_chain.
There is the question where exactly the vf_chain should go in such a
decoupled architecture. The end goal is being able to place a "complex"
filter between video decoders and output (which will culminate in
natural integration of A->V filters for natural integration of
libavfilter audio visualizations). The vf_chain is still useful for
"final" processing, such as format conversions and deinterlacing. Also,
there's only 1 VO and 1 --vf option. So having 1 vf_chain for a VO seems
ideal, since otherwise there would be no natural way to handle all these
existing options and mechanisms.
There is still some work required to truly decouple decoding.
Instead of handling this on filter chain reinit, do it directly after
the decoder. This makes the code less entangled. In particular, this
gets rid of the really weird "override params" concept in the video
filter code.
The last_format/fixed_formats have some redundance with decoder_output,
but unfortunately the latter has a slightly different use.
Basically reimplement it. The old implementation was quite stupid, and
was probably done this way because video filtering and output used to be
way less decoupled. Now we can reimplement it in a very simple way: when
backstepping, seek to current time, but keep the last frame that was
supposed to be discarded when reaching the target time. When the seek
finishes, prepend the saved frame to the video frame queue.
A disadvantage is that the new implementation fails to skip over
timeline boundaries (ordered chapters etc.), but this never worked
properly anyway. It's possible that this will be fixed some time in the
future.
This is mainly a refactor. I'm hoping it will make some things easier
in the future due to cleanly separating codec metadata and stream
metadata.
Also, declare that the "codec" field can not be NULL anymore. demux.c
will set it to "" if it's NULL when added. This gets rid of a corner
case everything had to handle, but which rarely happened.
This is another attempt at making files with sparse video frames work
better.
The problem is that you generally can't know whether a jump in video
timestamps is just a (very) long video frame, or a timestamp reset. Due
to the existence of files with sparse video frames (new frame only every
few seconds or longer), every heuristic will be arbitrary (in general,
at least).
But we can use the fact that if video is continuous, audio should also
be continuous. Audio discontinuities can be easily detected, and if that
happens, reset some of the playback state.
The way the playback state is reset is rather radical (resets decoders
as well), but it's just better not to cause too much obscure stuff to
happen here. If the A/V sync code were to be rewritten, it should
probably strictly use PTS values (not this strange time_frame/delay
stuff), which would make it much easier to detect such situations and
to react to them.
Slightly change how it is decided when a new packet should be read.
Switch to demux_read_packet_async(), and let the player "wait properly"
until required subtitle packets arrive, instead of blocking everything.
Move distinguishing the cases of passive and active reading into the
demuxer, where it belongs.
MPlayer traditionally always used the display aspect ratio, e.g. 16:9,
while FFmpeg uses the sample (aka pixel) aspect ratio.
Both have a bunch of advantages and disadvantages. Actually, it seems
using sample aspect ratio is generally nicer. The main reason for the
change is making mpv closer to how FFmpeg works in order to make life
easier. It's also nice that everything uses integer fractions instead
of floats now (except --video-aspect option/property).
Note that there is at least 1 user-visible change: vf_dsize now does
not set the display size, only the display aspect ratio. This is
because the image_params d_w/d_h fields did not just set the display
aspect, but also the size (except in encoding mode).
Helps with files that have occasional broken timestamps. For larger
discontinuities, e.g. caused by actual timestamp resets, we still want
to realign audio.
(I guess in general, this should be removed and replaced by a more
general resync-on-desync logic, but not now.)
At least I hope so.
Deriving the duration from the pts was not really correct. It doesn't
include speed adjustments, and becomes completely wrong of the user e.g.
changes the playback speed by a huge amount. Pass through the accurate
duration value by adding a new vo_frame field.
The value for vsync_offset was not correct either. We don't need the
error for the next frame, but the error for the current one. This wasn't
noticed because it makes no difference in symmetric cases, like 24 fps
on 60 Hz.
I'm still not entirely confident in the correctness of this, but it sure
is an improvement.
Also, remove the MP_STATS() calls - they're not really useful to debug
anything anymore.
This was just converting back and forth between int64_t/microseconds and
double/seconds. Remove this stupidity. The pts/duration fields are still
in microseconds, but they have no meaning in the display-sync case (also
drop printing the pts field from opengl/video.c - it's always 0).
Instead of periodically trying to enable it again. There are two cases
that can happen:
1. A random discontinuity messed everything up,
2. Things are just broken and will desync all the time
Until now, it tried to deal with case 1 - but maybe this is really rare,
and we don't really need to care about it. On the other hand, case 2 is
kind of hard to diagnose if the user doesn't use the terminal.
Seeking will reenable display-sync, so you can fix playback if case 1
happens, but still get predictable behavior in case 2.
Use the demux_set_ts_offset() added in the previous commit to base each
timeline segment to use timestamps according to its relative position
within the overall timeline. As a consequence we don't need to care
about these timestamps anymore, and everything becomes simpler.
(Another minor but delicious nugget of sanity.)
If the player sends a frame with duration==0 to the VO, it can trivially
underrun. Don't panic, but keep the correct time.
Also, returning the absolute time from vo_get_next_frame_start_time()
just to turn it into a float with relative time was silly. Rename it and
make it return what the caller needs.
80ms allowable desync was a bit too much. It'd allow for a range of
160ms, which everyone can notice. It might also be a bother to apply
compensation resampling speed for that long.
We always let audio slowly desync until a threshold is reached, and then
pushed it back by applying a maximum compensation speed. Refine what
comes afterwards: instead of playing with the nominal video speed, use
the actual required audio speed for keeping sync as measured by the A/V
difference. (The "actual" speed is the ideal speed with A/V differences
added.)
Although this works in theory, it's somewhat questionable how much this
works in practice. The ideal time value is actually not exact, but is
the time at which the frame is scheduled (could be compensated by using
the time_left calculations in handle_display_sync_frame()). It doesn't
account for speed changes or catastrophic discontinuities. It uses only
10 past frames.
As long as it's within the desync tolerance, do not change the audio
speed at all for resampling. This reduces speed changes which might be
caused by jittering timestamps and similar cases.
(While in theory you could just not care and change speed every single
frame, I'm afraid that such changes could possibly cause audio
artifacts. So better just avoid it in the first place.)
This is very "illustrative", unlike the video-speed-correction
property, and thus useful. It can also be used to observe scheduling
errors, which are not detected by the core. (These happen due to
rounding errors; possibly not evne our fault, but coming from
files with rounded timestamps and so on.)
Instead of looking at the current frame duration for the intended
speedup, look at all past frames, and find a good average speed. This
ties in with not wanting to average _all_ frame durations, which
doesn't make sense in VFR situations.
This is currently done in the most naive way possible, but already sort
of works for VFR which switches between frame durations that are
integer multiples of a base rate. Certainly more improvements could
be made, such as trying to adjust directly on FPS changes, instead of
averaging everything, but for now this is not needed at all.
Helps somewhat with muxer-rounded timestamps.
There is some danger that this introduces a timestamp drift. But since
they are averaged values (unlike as when using an incorrect container
framerate hint), any potential drift shouldn't be too brutal, or
compensate itself soon. So I won't bother yet with comparing the results
with the real timestamp, unless we run into actual problems.
Of course we still prefer potentially real timestamps over the
approximated ones. But unless the timestamps match the container FPS,
we can't know whether they are (no, checking whether the they have
microsecond components would be cheating). Perhaps in future, we could
let the demuxer export the timebase - if the timebase is not 1000 (or
divisible by it), we know that millisecond-rounded timestamps won't
happen.
Get rid of get_past_frame_durations(), which was a bit too messy. Add
a past_frames array, which contains the same information in a more
reasonable way. This also means that we can get the exact current and
past frame durations without going through awful stuff. (The main
problem is that vo_pts_history contains future frames as well, which is
needed for frame backstepping etc., but gets in the way here.)
Also disable the automatic disabling of display-sync if the frame
duration changes, and extend the frame durations allowed for display
sync. To allow arbitrarily high durations, vo.c needs to be changed
to pause and potentially redraw OSD while showing a single frame, so
they're still limited.
In an attempt to deal with VFR, calculate the overall speed using the
average FPS. The frame scheduling itself does not use the average FPS,
but the duration of the current frame. This does not work too well,
but provides a good base for further improvements.
Where this commit actually helps a lot is dealing with rounded
timestamps, e.g. if the container framerate is wrong or unknown, or
if the muxer wrote incorrectly rounded timestamps. While the rounding
errors apparently can't be get rid of completely in the general case,
this is still much better than e.g. disabling display-sync completely
just because some frame durations go out of bounds.
We need a frame duration even on start, because the number of vsyncs
the frame is shown is predetermined. (vo_opengl actually makes use of
this property in certain cases.)
This check disables the display-sync resample method. If the filters
convert PCM to AC3, we can still insert a filter to change speed. This
is because filters are inserted at the beginning of the filter chain.
Move it (in a cosmetic sense), and also move its invocation to below all
the video handling.
All other changes remain cosmetic, including moving the framedrop
calculation code, and getting rid of the video_speed_correction
variable.
update_av_diff() works on the timestamps, while time_left is in real
time. When playing at not-1 speed, these are very different, and cause
the A/V difference to jitter. Fix this by scaling the expected A/V
desync to the correct range.
This didn't show up with cases where the frame pattern has a cycle of 1
or 2 like it is the case with 24-on-24 fps, or 24-on-60 fps. It did show
up with 25-on-60 fps. (We don't slow down 25 fps video to 24 on default
settings.)
In this case, we must not add the timing error of the next frame to the
A/V difference estimation of the current frame. Use the previous timing
error instead.
This is another bug resulting from the confusion about whether we
calculate parameters for the currently playing frame, or the one we're
about to queue.
Commit a1315c76 broke this slightly. Frame drops got counted multiple
times, and also vo.c was actually trying to "render" the dropped frame
over and over again (normally not a problem, since frames are always
queued "tightly" in display-sync mode, but could have caused 100% CPU
usage in some rare corner cases).
Do not repeat already dropped frames, but still treat new frames with
num_vsyncs==0 as dropped frames. Also, strictly count dropped frames in
the VO. This means we don't count "soft" dropped frames anymore (frames
that are shown, but for fewer vsyncs than intended). This will be
adjusted in the next commit.
Bump it to 80, and 2 vsyncs. This is another measure against vsync
jitter. Admittedly this is a bit simplistic (and we should probably
estimate a stable estimated vsync phase instead), but for now this will
do.
Calculate the A/V difference directly in the display sync code, instead
of the awkward current way, which reuses the fields for audio sync.
We still set time_frame, because it makes falling back to audio sync
somewhat smoother.
When dropping or repeating frames, we essentially influence when the
frame after the next frame will be shown, not the next frame. This led
to dropping/repeating frames 2 times, because the A/V difference had a
delay of one frame. Compensate it with the expected value.
This is all kinds of stupid - update_avsync_after_frame() will multiply
this value with the speed at a later point, and we only update this
field for this function. (This should be refactored.)
Apparently this function caused weird problems to me. I have no idea
why. The usage of the function looks perfectly fine to me, and even
rounding issues can be excluded. In any case, getting rid of this solved
my problem, and makes the code actually more readable.
This adjustment is supposed to improve the audio speed calculation in
case of unexpected desync. The flipped sign made it actually worse,
although the total impact of this bug was very minor.
If video EOF happens during playback restart, and audio is syncing, and
the demuxer packet queue overflows (i.e. no new packets will be read),
then it could happen that the player accidentally enters sleeping, and
continues playing anything only after e.g. user input wakes it up.
This parameter has been unused for years (the last flag was removed in
commit d658b115). Get rid of it.
This affects the general VO API, as well as the vo_opengl backend API,
so it touches a lot of files.
The VOFLAGs are still used to control OpenGL context creation, so move
them to the OpenGL backend code.
The vf_format suboption is replaced with --video-output-levels (a global
option and property). In particular, the parameter is removed from
mp_image_params. The mechanism is moved to the "video equalizer", which
also handles common video output customization like brightness and
contrast controls.
The new code is slightly cleaner, and the top-level option is slightly
more user-friendly than as vf_format sub-option.
Caused by one of the --force-window commits. The unconditional
uninit_video_out() call (which normally should be idempotent) raised
sporadic MPV_EVENT_VIDEO_RECONFIG events. This is ok, except for the
fact that clients (like a Lua script or libmpv users) would cause the
event loop to run again after receiving it, triggering a feedback loop.
Fix it by sending the events really only on a change.
If this mode is enabled, the player tries to strictly synchronize video
to display refresh. It will adjust playback speed to match the display,
so if you play 23.976 fps video on a 24 Hz screen, playback speed is
increased by approximately 1/1000. Audio wll be resampled to keep up
with playback.
This is different from the default sync mode, which will sync video to
audio, with the consequence that video might skip or repeat a frame once
in a while to make video keep up with audio.
This is still unpolished. There are some major problems as well; in
particular, mkv VFR files won't work well. The reason is that Matroska
is terrible and rounds timestamps to milliseconds. This makes it rather
hard to guess the framerate of a section of video that is playing. We
could probably fix this by just accepting jittery timestamps (instead
of explicitly disabling the sync code in this case), but I'm not ready
to accept such a solution yet.
Another issue is that we are extremely reliant on OS video and audio
APIs working in an expected manner, which of course is not too often
the case. Consequently, the new sync mode is a bit fragile.
For video sync, we want separate playback speed controls for user-
requested speed and the "correction" speed for video timing. Further, we
use this separation to make sure only a resampler is inserted if
playback speed is only changed for video sync correction.
As of this commit, this is basically inactive code. It's just
preparation for the video sync code (the following commit).
Additionally to taking the average, this tries to use the demuxer FPS to
eliminate jitter, and applies some other heuristics to check if the
result is sane.
This code will also be used for the display sync code (it will actually
make use of the require_exact parameter).
(The value of doing this over keeping the simpler demux_mkv hack is
somewhat questionable. But at least it allows us to deal with other
container formats that use jittery timestamps, such as mp4 remuxed
from mkv.)
Normally when there's a timestamp reset, we make audio resync to make
sure audio and video line up (again). But in video-only mode, just
setting audio to resyncing breaks EOF detection, because there's no code
which would get audio_status out of this bogus state.
Remove the exception for decoding only 1 frame if VO framedrop is
disabled. This was originally done to be able to test potential
regressions when we enabled VO framedrop and decoding 2 frames by
default. It's not needed anymore.
If filters are disabled or reconfigured, attempt to remove and probe the
deinterlace filter again. This fixes behavior if e.g. a software deint
filter was automatically inserted, and then hardware decoding is enabled
during playback. Without this commit, initializing hw decoding would
fail because of the software filter; with this commit, it'll replace it
with the hw deinterlacer instead.
Instead of calling it "future frames" and adding or subtracting 1 from
it, always call it "requested frames". This simplifies it a bit.
MPContext.next_frames had 2 added to it; this was mainly to ensure a
minimum size of 2. Drop it and assume VO_MAX_REQ_FRAMES is at least 2;
together with the other changes, this can be the exact size of the
array.
This is a real pain: if a quit command is received, it's set to PT_QUIT.
And then other code could overwrite it, making it not quit. The annoying
bit is that stop_play is written and read in many places. Just not
overwriting it unconditionally seems to be the best course of action.
draw_image_timed is renamed to draw_frame. struct frame_timing is
renamed to vo_frame. flip_page_timed is merged into draw_frame (the
additional parameters are part of struct vo_frame). draw_frame also
deprecates VOCTRL_REDRAW_FRAME, and replaces it with a method that
works for both VOs which can cache the current frame, and VOs which
need to redraw it anyway.
This is preparation to making the interpolation and (work in progress)
display sync code saner.
Lots of other refactoring, and also some simplifications.
Now the VO can request a number of future frames with the last parameter
of vo_set_queue_params(). This will be helpful to fix the interpolation
code.
Note that the first frame (after playback start or seeking) will usually
not have any future frames (to make seeking fast). Near the end of the
file, the number of future frames will become lower as well.
When showing cover art, the decoding logic pretends that the source has
an infinite number of frames. This slightly simplifies dealing with
filter data flow. It was done by feeding the same packet repeatedly to
the decoder (each decode run produces new output).
Change this by decoding once at the video initialization. This is easier
to follow, and increases robustness in case of broken images. Usually,
we try to tolerate decoding errors, so decoding normally continues, but
in this case it would just burn the CPU for no reason.
Fixes#2056.
Reduce the default tolerance for timestamp jumps from 60 to 15 seconds.
For .ts files, where ts_resets_possible coming from AVFMT_TS_DISCONT is
set, apply a more sophisticated heuristic. It's clear that such a file
wouldn't have a framerate below, say, 23hz. If the demuxer reports a
lower fps, we allow longer PTS jumps.
This should replace long pauses on discontinuities with .ts files with
at most a short stutter.
Of course, all kinds of things could go wrong anyway if the source is
VFR, or FFmpeg's frame rate detection fails in some other way. I haven't
found such a file yet, though.
Fixes PNG cover art not showing up immediately (for example when running
with --pause).
libavformat exports embedded cover art as a single packet. For example,
a PNG attachment simply contains the PNG image, which can be sent to the
decoder. Normally you would expect that the PNG decoder would return 1
frame for 1 packet, without any delays. But this stopped working, and it
incurs a 1 frame delay.
This is perfectly legal (even if unexpected), so let our code feed the
decoder packets until we get something back. (In theory feeding the
packet instead of a real flush packet is still somewhat questionable.)
last_av_difference can be MP_NOPTS_VALUE under certain circumstances
(like no video timestamp yet). This triggered the desync message,
because fabs(MP_NOPTS_VALUE) is quite a large value. We don't want to
show a message in this situation.
libavcodec makes it impossible to distinguish dropped frames (requested
with AVCodecContext.skip_frame), and cases when the decoder simply does
not return a frame by default (such as with VP9, which has invisible
reference frames).
This confuses users when decoding VP9 video. It's basically a cosmetic
issue, so just paint it over by ignoring them if framedropping is
disabled.
When playing cover art, it conceptually reaches EOF as soon as the image
was put on the VO, causing the EOF message to be repeated every time new
audio was decoded. Just silence the message.
Use OPT_CHOICE_C() instead of the custom parser. The functionality is
pretty much equivalent.
(On a side note, it seems --video-stereo-mode can't be removed, because
it controls whether to "reduce" stereo video to mono, which is also the
default. In fact I'm not sure how this should be handled at all.)