At least I hope so.
Deriving the duration from the pts was not really correct. It doesn't
include speed adjustments, and becomes completely wrong of the user e.g.
changes the playback speed by a huge amount. Pass through the accurate
duration value by adding a new vo_frame field.
The value for vsync_offset was not correct either. We don't need the
error for the next frame, but the error for the current one. This wasn't
noticed because it makes no difference in symmetric cases, like 24 fps
on 60 Hz.
I'm still not entirely confident in the correctness of this, but it sure
is an improvement.
Also, remove the MP_STATS() calls - they're not really useful to debug
anything anymore.
This was just converting back and forth between int64_t/microseconds and
double/seconds. Remove this stupidity. The pts/duration fields are still
in microseconds, but they have no meaning in the display-sync case (also
drop printing the pts field from opengl/video.c - it's always 0).
If we switched away from the system FPS, we were remaining in this mode
ssentially forever. There's no reason to do so; switch back if the
estimated FPS gets worse again. Improvement over commit 41f2c653.
Commit 12eb8b2d accidentally disabled framedropping in the audio timing
case. It tried to replace the last_flip field with the prev_vsync one,
which didn't work because prev_sync is reset to 0 if the timing code is
used. Fix it by always setting it properly. This field must (or should)
be reinitialized to something sensible when switching to display sync
timing mode; since prev_vsync is not reset anymore, the check when to
reinitialize this field has to be adjusted as well.
It's a bit weird that update_vsync_timing_after_swap() now does some
minor work for timing mode too, but I guess it's ok, if only to avoid
additional fields and timer calls.
If the system-reported display FPS (returned by the VO backends, or
forced with --display-fps) is too imprecise (deviating frame duration by
more than 1%). This works if the display FPS is off by almost 1 (typical
for old/bad/broken OS APIs). Actually it even works if the FPs is
completely wrong.
Is it a good idea? Probably not. It might be better to only output a
warning message. But unless there are reports about it going terribly
wrong, I'll go with this for now.
Actually I'm not content with the detection added in commit 44376d2d. It
triggers too often if vsync is very jittery. It's easy to avoid this: we
add more samples to the detection by reusing the drift computation loop.
If there's a significant step in the drift, we consider it a drop.
This adds basic support for ICC profiles. Per-monitor profiles are
supported. WCS profiles are not supported, but there is an API for
converting WCS profiles to ICC, so they might be supported in future.
I'm just not sure if anyone actually uses them.
Reloading the ICC profile when it's changed in the control panel is also
not supported. This might be possible by using the WCS APIs and watching
the registry for changes, but there is no official API for it, and as
far as I can tell, no other Windows programs can do it.
Return the estimated/ideal flip time to the timing logic (meaning
vo_get_delay() returns a smoothed out time). In addition to this add
some lame but working drift compensation. (Useful especially if the
display FPS is wrong by a factor such as 1.001.)
Also remove some older leftovers. The vsync_interval_approx and
last_flip fields are redundant or unneeded.
For the vo-delayed-frame-count property.
Slightly less dumb than the previous one (which was removed earlier),
but still pretty dumb. But this also seems to be relatively robust, even
with strong vsync jittering.
This logic was kind of questionable anyway, and --display-sync should
give much better results. (I would even go as far as saying that the
FPS-dependent framedrop code made things worse in some situations. Not
all, though.)
This is simply the average refresh rate. Including "bad" samples is
actually an advantage, because the property exists only for
informational purposes, and will reflect problems such as the driver
skipping a vsync.
Also export the standard deviation of the vsync frame duration
(normalized to the range 0-1) as vsync-jitter property.
The OSD takes up an entire fullscreen dispmanx layer. Although the GPU
should be able to handle it (possibly even without any disadvantages),
it'll still be useful for debugging performance issues.
This is a hack, but unfortunately the DwmGetCompositionTimingInfo
heuristic does not work in all cases (with multiple-monitors on Windows
8.1 and even with a single monitor in Windows 10.) See the comment in
mp_w32_is_in_exclusive_mode() for more details.
It should go without saying that if any better method of doing this
reveals itself, this hack should be dropped.
It doesn't have any real purpose anymore. Up until now, it was still
implemented by vo_wayland, but since we changed how the frame callbacks
work, even that appears to be pointless.
Originally, the plan was to somehow extend this mechanism to all
backends and to magically fix frame scheduling, but since we can't hope
for proper mechanisms even on wayland, this idea looks way less
interesting.
10 bit HEVC would require DXVA2_ModeHEVC_VLD_Main10, and most a
different surface type (judging by lavfsplitter source code, both
P010 and P016 would work). Since I'm unable to test this stuff,
exclude 10 bit for now.
See #2516.
The D3D9 backend does not support GLES 3, which makes it pretty useless.
But it still might be a legitimate replacement of vo_direct3d.c on
Windows 7 machines.
Note that we could just use:
eglGetDisplay(EGL_D3D11_ELSE_D3D9_DISPLAY_ANGLE)
But for now I'll leave the old code. Maybe this can exclude use of
software rendering backends (EGL_PLATFORM_ANGLE_DEVICE_TYPE_WARP_ANGLE).
Since I'm not sure, I won't touch it.
Running mpv with default config will now pick up ANGLE by default. Since
some think ANGLE is still not good enough for hq features, extend the
"es" option to reject GLES backends, and add to to the opengl-hq preset.
One consequence is that mpv will by default use libswscale to convert
10 bit video to 8 bit, before it reaches the VO.
I decided that I actually can't stand how vo_opengl unnecessarily puts
the video through 3 shader stages (instead of 1). Thus, what was meant
to be a fallback for weak OpenGL implementations, the dumb-mode, now
becomes default if the user settings allow it.
The code required to check for the settings isn't so wild, so I guess
it's manageable. I still hope that one day, our rendering logic can
generate ideal shader stages for this case too.
Note that in theory, dumb-mode could be reenabled at runtime due to a
color management 3D LUT being set, so a separate dumb_mode field is
required. The dumb-mode option can't just be overwritten.
Unfortunately, color management can still not work, because no GLES
version specified so far support fixed-point 16 bit textures. Maybe
we could use integer textures, but these don't support filtering.
Using float textures would be another possibility.
Polar scalers use 1D textures, because they're slightly faster on some
GPUs than 2D textures. But 2D textures work too, so add support for
them.
Allows using these scalers with ANGLE.
Just like commit f9a2fc59. There are probably some more such cases.
The vec2 constructor calls are probably fine, but don't bother with
confusing inconsistencies.
While desktop GL's glTexImage2D() essentially accepts anything, GLES is
much stricter. The combination of allowed formats/types/internal formats
is exactly specified. The GLES 3.0.4 specification lists them in
table 3.2. (The ANGLE API validation code references this table.)
The table could probably be extended into a general declarative table
about GL formats covering other uses, but this would be a big
non-trivial project, so don't bother and accept a minor degree
of duplication with other tables.
Note that the format and type do (or should) not matter here, because
no image data is transferred to the GPU.
We don't only need float textures for advanced scaling - we also need
them to be filterable with GL_LINEAR. On GLES, this is not supported
until GLES 3.1, but some implementation expose them with extensions.
This makes advanced scaling sort-of work for GLES 3.0 (on ANGLE). It's
still not very advisable, as 8 bits might not be enough to avoid
debanding. (Ironically, the debanding filter can be enabled, and does
not raise any GL errors - but probably doesn't do anything useful.)
Turns out glGetTexLevelParameter, which is missing in ANGLE, is a
GLES3.1 function. Removing it from the list of core GLES3 functions
makes ANGLE work in GLES3 mode.
ANGLE is a GLES2 implementation for Windows that uses Direct3D 11 for
rendering, enabling vo_opengl to work on systems with poor OpenGL
drivers and bypassing some of the problems with native GL, such as VSync
in fullscreen mode.
Unfortunately, using GLES2 means that most of vo_opengl's advanced
features will not work, however ANGLE is under rapid development and
GLES3 support is supposed to be coming soon.
Something goes wrong somewhere. Don't bother, it's only needed for
compatibility with our absolute baseline (GL 2.1/GLES 2).
On the other hand, we can process nv12 formats just fine.
For the sake of vaapi interop, we want to use EGL, but on the other
hand, but because driver developers are full of shit, vdpau interop will
not work on EGL (even if the driver supports EGL). The latter happens
with both nvidia and AMD Mesa drivers.
Additionally, EGL vaapi interop support can apparently only detected at
runtime by actually using it. While hwdec_vaegl.c already does this, it
would require initializing libva on _every_ system, which will cause
libav to print an unpreventable bullshit message to the terminal.
Try to counter these huge loads of bullshit by adding more fucking
bullshit.
We want the following behavior:
- VO probed, backend probed: only accept non-sw, fail completely
otherwise
- VO forced, backend probed: use the first non-sw, or if none is found,
fall back to the first working sw backend
- VO probed, backend forced: (I don't care about this case)
- VO forced, backend forced: just use that backend
Also, on backend probe failure the vo->probed field was left in its old
state.
This adds support for the progress indicator taskbar extension
that was introduced with Windows 7 and Windows Server 2008 R2.
I don’t like this solution because it keeps its own state and
introduces another VOCTRL, but I couldn’t come up with anything
less messy.
closes#2399
In the display-sync, non-interpolation case, and if the display refresh
rate is higher than the video framerate, we duplicate display frames by
rendering exactly the same screen again. The redrawing is cached with a
FBO to speed up the repeat.
Use glBlitFramebuffer() instead of another shader pass. It should be
faster.
For some reason, post-process was run again on each display refresh.
Stop doing this, which should also be slightly faster. The only
disadvantage is that temporal dithering will be run only once per video
frame, but I can live with this.
One aspect is messy: clearing the background is done at the start on the
target framebuffer, so to avoid clearing twice and duplicating the code,
only copy the part of the framebuffer that contains the rendered video.
(Which also gets slightly messy - needs to compensate for coordinate
system flipping.)
Currently, vo.c will always continue to render the currently queued
frame, which sets last_flip, which in turn confuses vo_get_delay(),
which in turn will show a bogus A/V desync message on unpause. So just
reset it again on unpause.
I guess the removed code is an old leftover, and makes no sense anymore.
Should fix weird A/V diff dropouts when frames are being dropped with
display-sync.
If the player sends a frame with duration==0 to the VO, it can trivially
underrun. Don't panic, but keep the correct time.
Also, returning the absolute time from vo_get_next_frame_start_time()
just to turn it into a float with relative time was silly. Rename it and
make it return what the caller needs.
"Missed" implies the frame was dropped, but what really happens is that
the following frame will be shown later than intended (due to the
current frame skipping a vsync).
(As of this commit, this property is still inactive and always
returns 0. See git blame for details.)
Apparently Windows treats windows that use OpenGL, cover an entire
screen and have the WS_POPUP style set or are topmost windows as
exclusive fullscreen windows that bypass DWM and cannot be covered
by other windows.
This means we can’t use dwmflush in fullscreen mode, and it also
means that no other window can cover mpv, and it makes the screen
flicker when switching to fullscreen mode.
This can be avoided by not setting the WS_POPUP flag.
Users can still access the old behavior by enabling stay-on-top
(which IMO at least makes sense—now we just need to get dwmflush
autodetection right to avoid nasty surprises).
fixes#2177
Until now, we've relied on the following things:
- you can send flush packets to the decoder even if it's fully flushed,
- you can send new packets to a flushed decoder,
- you can send new packers to a partially flushed decoder.
("flushing" refers to sending flush packets to the decoder until the
decoder does not return new pictures, not avcodec_flush_buffers().)
All of these are questionable. The libavcodec API probably doesn't
guarantee that these work well or at all, even though most decoders have
no issue with these. But especially with hardware decoding wrappers
(like MMAL), real problems can be expected. Isolate us from these corner
cases by handling them explicitly.
This applies to unexpected freezes or deadlocks, not e.g. normal
framedrops. The verbose messages also might remind an API user if the
API usage is incorrect, such as not calling mpv_opengl_cb_draw() when a
redraw request was issued.
The nnedi3 prescaler requires a normalized range to work properly,
but the original implementation did the range normalization after
the first step of the first pass. This could lead to severe quality
degradation when debanding is not enabled for NNEDI3.
Fix this issue by passing `tex_mul` into the shader code.
Fixes#2464
vo_opengl_cb is a special case, because we somehow have to render video
asynchronously, all while "trusting" the API user to do it correctly.
This didn't quite work, and a while ago a compromise using a timeout to
prevent theoretically possible deadlocks was added.
Make it even more synchronous. Basically, go all the way, and
synchronize rendering between VO and user renderer thread to the
full extent possible.
This means the silly frame queue is dropped, and we event attempt to
synchronize the GL SwapBuffer call (via mpv_opengl_cb_report_flip()).
The changes introduced with commit dc33eb56 are effectively dropped. I
don't even remember if they mattered.
In the future, we might make all VOs fetch asynchronously from a frame
queue, which would mostly remove the differences between vo_opengl and
vo_opengl_cb, but this will take a while (if it will even be done).
Pick the correct GLSL version from the GL_SHADING_LANGUAGE_VERSION
string. Might be somewhat questionable, as we expect the minor version
number not to have leading 0s.
Should help with cases when the reported GLSL version is much higher
than the equivalent of the reported GL version. This problem was
observed in combination with GL_ARB_uniform_buffer_object, which
can't be used if the declared GLSL version is too low.
Simplifies some auto detection matters.
I _still_ don't want to remove the lazy loading mechanism, because it's
still slightly useful for filters using the hwdec APIs. My main
motivation for not always preloading them is actually that libva prints
random useless crap to the terminal with no way to prevent this.
Notes:
- Unfortunately the only way to talk to EGL from within DRM I could find
involves linking with GBM (generic buffer management for Mesa.)
Because of this, I'm pretty sure it won't work with proprietary NVidia
drivers, but then again, last time I checked NVidia didn't offer
proper screen resolution for VT.
- VT switching doesn't seem to work at all. It's worth mentioning that
using vo_drm before introduction of VT switcher had an anomaly where
user could switch to another VT and input text to it, while video
played on top of that VT. However, that isn't the case with drm_egl:
I can't switch to other VT during playback like this. This makes me
think that it's either a limitation coming from my firmware or from
EGL/KMS itself rather than a bug with my code. Nonetheless, I still
left (untestable) VT switching code in place, in case it's useful to
someone else.
- The mode_id, connector_id and device_path should be configurable for
power users and people who wish to watch videos on nonprimary screen.
Unfortunately I didn't see anything that would allow OpenGL backends
to register their own set of options. At the same time, adding them to
global namespace is pointless.
- A few dozens of lines could be shared with vo_drm (setting up VT
switching, most of code behind page flipping). I don't have any strong
opinion on this.
- Sometimes I get minor visual glitches. I'm not sure if there's a race
condition of some sort, unitialized variable (doubtful), or if it's
buggy driver. (I'm using integrated Intel HD Graphics 4400 with Mesa)
- .config and .control are very minimal.
Signed-off-by: wm4 <wm4@nowhere>
glXCreateContextAttribsARB() by design can throw some X11 errors. We
ignore these, but we generally still print error messages to the
terminal. This was confusing/annoying users, so silence it. The stupid
part is that the Xlib error handler is global, so we have to be slightly
careful here.
They are evil and should be eradicated. Some of these were pretty dumb
anyway.
There are probably some more around in platform specific code or other
code not enabled by default on Linux.
This is based on an older patch by James Ross-Gowan. It was rebased and
cleaned up. Also, the DWM API usage present in the older patch was
removed, because DWM reports nonsense rates at least on Windows 8.1
(they are rounded to integers, just like with the old GDI API - except
the GDI API had a good excuse, as it could report only integers).
Signed-off-by: wm4 <wm4@nowhere>
This simplifies update_screen_rect a bit. Unless --fs-screen=all is
used, it will always get an HMONITOR and call GetMonitorInfo to
determine its dimensions. This will make it easier for the next few
commits to determine the colour profile and the refresh rate from the
HMONITOR.
There is a slight change in behaviour. When selecting a screen that is
out of range, such as --screen=9 on a machine with only two monitors,
the old code would silently select the last existing monitor. The new
code prints an error message and falls back to the default screen (same
as the Cocoa code.)
Signed-off-by: wm4 <wm4@nowhere>
The call to EnumDisplaySettings seems to be a relic from when MPlayer
ran on systems that didn't have GetMonitorInfo or SM_CX/CYVIRTUALSCREEN.
GetMonitorInfo was loaded dynamically, so it was possible for MPlayer to
run without it and use the values returned by EnumDisplaySettings.
These are always present in modern versions of Windows, so the values
returned from EnumDisplaySettings are always overwritten. Remove the
call to EnumDisplaySettings and assume SM_CX/CYVIRTUALSCREEN is always
present.
Signed-off-by: wm4 <wm4@nowhere>
Commit 27dc834f added it as such.
Also remove the check for glUniformBlockBinding() - it's part of an
extension, and the check glGetUniformBlockIndex() already checks whether
the extension is fully available.
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes#2230
Add the Super-xBR filter for image doubling, and the prescaling framework
to support it.
The shader code was ported from MPDN extensions project, with
modification to process luma only.
This commit is largely inspired by code from #2266, with
`gl_transform_trans()` authored by @haasn taken directly.
The noframe event is logged whenever there is no new frame. This can
happen due to normal redraws, but also due to video frame queue
underflow.
The mpv_opengl_cb_report_flip() API function is currently pretty
useless, because blocking on the video frame queue is more reliable and
simpler. But at least we can log the actual vsync.
next_vsync/prev_vsync was only used to retrieve the vsync duration. We
can get this in a simpler way.
This also removes the vsync duration estimation from vo_opengl_cb.c,
which is probably worthless anyway. (And once interpolation is made
display-sync only, this won't matter at all.)
This affects only the display-sync code path, as for normal timing the
wakeup_pts stuff handles proper wakeup. It's probably mostly a
theoretical issue.
A hw decoder might fail to decode a frame for multiple reasons, and not
always just because decoding is impossible. We can't generally
distinguish these reasons well. Make it more tolerant by accepting
failures of 3 frames, but not more. The threshold can be adjusted by the
repurposed --vd-lavc-software-fallback option.
(This behavior was suggested much earlier in some PR, but at the time
the "proper" hwdec fallback was indistinguishable from decoding error.
With the current situation, "proper" fallback is still instantious.)
The uninit() function was called twice if the uninit() function failed
(once by init(), once by vd_lavc.c code), which caused crashes due to
double-free. (This failure is a corner case, and all other hwdec
backends appear to handle this case gracefully.)
I do not think this code should be able to deal with uninit() being
called other than once. Guarantee that it's called exactly once.
Quoting MSDN: "Notifies the Desktop Window Manager (DWM) to opt in to or
out of Multimedia Class Schedule Service (MMCSS) scheduling while the
calling process is alive.". Whatever this means. (An application can
change the scheduling priority of the window manager?)
Does this improve anything? I have no idea. Certainly this is a program
that does multimedia and graphics, so we seem to be a good match for
this.
Is it bad if we enable this even while playback is inactive or paused? I
have no idea either.
Is there a magic cargo cult function that will mark our renderer thread
as multimedia thing? I have no idea. (We use a function to enable MMCSS
for our audio thread in ao_wasapi.)