This replaces `vo-performance` by `vo-passes`, bringing with it a number
of changes and improvements:
1. mpv users can now introspect the vo_opengl passes, which is something
that has been requested multiple times.
2. performance data is now measured per-pass, which helps both
development and debugging.
3. since adding more passes is cheap, we can now report information for
more passes (e.g. the blit pass, and the osd pass). Note: we also
switch to nanosecond scale, to be able to measure these passes
better.
4. `--user-shaders` authors can now describe their own passes, helping
users both identify which user shaders are active at any given time
as well as helping shader authors identify performance issues.
5. the timing data per pass is now exported as a full list of samples,
so projects like Argon-/mpv-stats can immediately read out all of the
samples and render a graph without having to manually poll this
option constantly.
Due to gl_timer's design being complicated (directly reading performance
data would block, so we delay the actual read-back until the next _start
command), it's vital not to conflate different passes that might be
doing different things from one frame to another. To accomplish this,
the actual timers are stored as part of the gl_shader_cache's sc_entry,
which makes them unique for that exact shader.
Starting and stopping the time measurement is easy to unify with the
gl_sc architecture, because the existing API already relies on a
"generate, render, reset" flow, so we can just put timer_start and
timer_stop in sc_generate and sc_reset, respectively.
The ugliest thing about this code is that due to the need to keep pass
information relatively stable in between frames, we need to distinguish
between "new" and "redrawn" frames, which bloats the code somewhat and
also feels hacky and vo_opengl-specific. (But then again, this entire
thing is vo_opengl-specific)
Might be useful for other backends too. For context_vdpau, resize
handling, presentation, and handling the mapping state becomes somewhat
less awkward.
In some cases, such as when using the libmpv opengl-cb API, or with
certain vo_opengl backends, the main framebuffer is never accessed.
Instead, rendering is done to a FBO that acts as back buffer. This meant
an incorrect/broken bit depth could be used for dithering.
Change it to read the framebuffer depth lazily on the first render call.
Also move the main FBO field out of the GL struct to MPGLContext,
because the renderer's init function does not need to access it anymore.
Useful for testing. Unfortunately, the nVidia EGL driver ignores this,
and returns a GLES 3.2 context anyway (which it is allowed to do). Might
still be useable with ANGLE, which will really give you a GLES 2 context
if you ask for it.
This fixes a race condition created by the previous commit, and possibly
others. Sometimes interpolated frames weren't redrawn as uninterpolated
ones.
The problem is that redrawing/drawing a frame can't reset the VO
want_redraw flags, because logically these have to happen after the core
acknowledged and explicitly reissued a redraw. The core needs to be
involved because the OSD text and drawings could depend on the playback
or window state.
Change it such that it always goes through the core.
Also, VOs inconsistently called vo_wakeup() when setting want_redraw,
which is also taken care of by this commit.
vo_opengl used to have it as sub-option, which made it very hard to pass
down option values to backends in a generic way (even if these options
were completely backend-specific). For --opengl-dcomposition we used a
VOFLAG to deal with this. Fortunately, sub-options are gone, and we can
just add it as global option.
Move the option to context_angle.c and add it as global option. I
thought about adding a mechanism to let backends declare options, which
would get magically picked up my m_config instead of having to add them
to the global option list manually (similar to VO vo_driver.options),
but decided against this complexity just for 1 or 2 backends. Likewise,
it could have been added as a single option to avoid the boilerplate of
an option struct, but then again there are probably going to be more
angle suboptions, and it's cleaner.
Introduce the --opengl-hwdec-interop option, which replaces
--hwdec-preload. The new option allows explicit selection of the interop
backend.
This is relatively complex, and I would have preferred not to add this,
but it's probably useful to debug certain problems. In exchange, the
"new" option documents that pretty much any but the simplest use of it
will not be forward compatible.
Apparently we don't always set the viewport to window dimensions
anymore, e.g. if nothing is actually rendered. This means the viewport
can contain old values.
The window screenshot code uses the viewport values to guess the default
framebuffer dimensions. With --force-window --idle --no-osc (which draws
nothing and issues a glClear() command only), taking a screenshot would
yield an image with the wrong size and possibly garbage in it. Fix this
by explicitly passing the currently known window dimensions. Abusing the
values stored in the viewport was questionable anyway.
Long planned. Leads to some sanity.
There still are some rather gross things. Especially g_groups is ugly,
and a hack that can hopefully be removed. (There is a plan for it, but
whether it's implemented depends on how much energy is left.)
It was used to determine whether the VO supports VOCTRL_SET_PANSCAN.
With all those changes to property semantics this became unnecessary,
and its only use was dropped at some point.
Instead of requiring each VO or AO to manually add members to MPOpts and
the global option table, make it possible to register them automatically
via vo_driver/ao_driver.global_opts members. This avoids modifying
options.c/options.h every time, including having to duplicate the exact
ifdeffery used to enable a driver.
vo_opengl sub-option were always rather annoying to handle. It seems
better to make them global options instead. This is simpler and easier
to use. The only disadvantage we are aware of is that it's not clear
that many/all of these new global options work with vo_opengl only.
--vo=opengl-hq is also deprecated.
There is extensive compatibility with the old behavior. One exception is
that --vo-defaults will not apply to opengl-hq (though with opengl it
still works). vo-cmdline is also dysfunctional and will be removed in a
following commit.
These changes also affect opengl-cb.
The update mechanism is still rather inefficient: it requires syncing
with the VO after each option change, rather than batching updates.
There's also no granularity (video.c just updates "everything", and if
auto-ICC profiles are enabled, vo_opengl.c will fetch them on each
update).
Most of the manpage changes were done by Niklas Haas <git@haasn.xyz>.
Reduce accesses to the renderer opts in vo_opengl.c, and instead add
accessors for them to video.c.
I suppose gamma and maybe icc-auto could be moved to vo_opengl.c
options. Also, the output colorspace could probably be adjusted to what
is really used, not just the options (although it's possible that this
commit changes this, due to video.c mutating its own copy of the options
according to actual renderer capapbilities).
But don't deal with this now.
Until now, this has been either handled over vo.event_fd (which should
go away), or by putting event handling on a separate thread. The
backends which do the latter do it for a reason and won't need this, but
X11 and Wayland will, in order to get rid of event_fd.
This has two reasons:
1. I tend to add new fields to this metadata, and every time I've done
so I've consistently forgotten to update all of the dozens of places in
which this colorimetry metadata might end up getting used. While most
usages don't really care about most of the metadata, sometimes the
intend was simply to “copy” the colorimetry metadata from one struct to
another. With this being inside a substruct, those lines of code can now
simply read a.color = b.color without having to care about added or
removed fields.
2. It makes the type definitions nicer for upcoming refactors.
In going through all of the usages, I also expanded a few where I felt
that omitting the “young” fields was a bug.
This is plumbed through a new VOCTRL, VOCTRL_PERFORMANCE_DATA, and
exposed as properties render-time-last, render-time-avg etc.
All of these numbers are in microseconds, which gives a good precision
range when just outputting them via show-text. (Lua scripts can
obviously still do their own formatting etc.)
Signed-off-by: wm4 <wm4@nowhere>
Instead of implicitly resetting the options to defaults and then
applying the options, they're always applied on top of the current
options (in the same way adding new options to the CLI command line
will).
This does not apply to vo_opengl_cb, because that has an even worse mess
which I refuse to deal with.
Originally, video.c did not access any CMS things (other than lut3d
being set on it), but this has changed. In practice, almost all accesses
to it have moved to video.c. vo_opengl only created it, and set the auto
icc profile path.
Complete the move.
Some things wrt. option handling are a bit fishy. (But when is this not
the case.)
icc-profile-auto was not tested, but the distributed human CI will take
care of it.
Remove the opengl-hq option default that caused it not to autoselect
ANGLE (unlike --vo=opengl). Details see commit d5df90a2.
Back then the intention was to use ANGLE by default, since it integrates
much nicer with the Windows compositor (instead of native OpenGL, which
tends to cause crazy glitches). On the other hand, many opengl-hq
capabilities are not available with older ANGLE builds, so it didn't
make any sense to autoselect ANGLE for it.
With the GL_EXT_texture_norm16 extension recently added to ANGLE, it has
essentially reached feature parity to desktop GL for the subset we are
using. (Even the integer texture hack for high bit depth input could be
dropped now.)
It (probably) still does not support nnedi3, due to the weird way the NN
coefficients are imported. Also, it uses half-floats instead of 16 bit
fixed-point textures for technical reasons, which implies about 5 bits
of precision loss. If anyone actually manages to distinguish the two
dithering texture formats in a double-blind test, I will fix it.
The main change is with video/hwdec.h. mp_hwdec_info is made opaque (and
renamed to mp_hwdec_devices). Its accessors are mainly thread-safe (or
documented where not), which makes the whole thing saner and cleaner. In
particular, thread-safety rules become less subtle and more obvious.
The new internal API makes it easier to support multiple OpenGL interop
backends. (Although this is not done yet, and it's not clear whether it
ever will.)
This also removes all the API-specific fields from mp_hwdec_ctx and
replaces them with a "ctx" field. For d3d in particular, we drop the
mp_d3d_ctx struct completely, and pass the interfaces directly.
Remove the emulation checks from vaapi.c and vdpau.c; they are
pointless, and the checks that matter are done on the VO layer.
The d3d hardware decoders might slightly change behavior: dxva2-copy
will not use the VO device anymore if the VO supports proper interop.
This pretty much assumes that any in such cases the VO will not use any
form of exclusive mode, which makes using the VO device in copy mode
unnecessary.
This is a big refactor. Some things may be untested and could be broken.
This commit refactors the 3DLUT loading mechanism to build the 3DLUT
against the original source characteristics of the file. This allows us,
among other things, to use a real BT.1886 profile for the source. This
also allows us to actually use perceptual mappings. Finally, this
reduces errors on standard gamut displays (where the previous 3DLUT
target of BT.2020 was unreasonably wide).
This also improves the overall accuracy of the 3DLUT due to eliminating
rounding errors where possible, and allows for more accurate use of
LUT-based ICC profiles.
The current code is somewhat more ugly than necessary, because the idea
was to implement this commit in a working state first, and then maybe
refactor the profile loading mechanism in a later commit.
Fixes#2815.
Do this to make the license situation less confusing.
This change should be of no consequence, since LGPL is compatible with
GPL anyway, and making it LGPL-only does not restrict the use with GPL
code.
Additionally, the wording implies that this is allowed, and that we can
just remove the GPL part.
Now common.c only contains the code for the function loader, while
context.c contains the backend loader/dispatcher.
Not calling it "backend.c", because the central struct is called
MPGLContext.
Store the determined framebuffer depth in struct GL instead of
MPGLContext. This means gl_video_set_output_depth() can be removed, and
also justifies adding new fields describing framebuffer/backend
properties to struct GL instead of having to add more functions just to
shovel the information around.
Keep in mind that mpgl_load_functions() will wipe struct GL, so the
new fields must be set before calling it.
WGL_NV_DX_interop is widely supported by Nvidia and AMD drivers. It
allows a texture to be shared between Direct3D and WGL, so that
rendering can be done with WGL and presentation can be done with
Direct3D. This should allow us to work around some persistent WGL
issues, such as dropped frames with some driver/OS combos, drivers that
buffer frames to increase performance at the cost of latency, and the
inability to disable exclusive fullscreen mode when using WGL to render
to a fullscreen window.
The addition of a DX_interop backend might also enable some cool
Direct3D-specific enhancements in the future, such as using the
GetPresentStatistics API to get accurate frame presentation timestamps.
Note that due to a driver bug, this backend is currently broken on
Intel. It will appear to work as long as the window is not resized too
often, but after a few changes of size it will be unable to share the
newly created renderbuffer with GL. See:
https://software.intel.com/en-us/forums/graphics-driver-bug-reporting/topic/562051
Running mpv with default config will now pick up ANGLE by default. Since
some think ANGLE is still not good enough for hq features, extend the
"es" option to reject GLES backends, and add to to the opengl-hq preset.
One consequence is that mpv will by default use libswscale to convert
10 bit video to 8 bit, before it reaches the VO.
Simplifies some auto detection matters.
I _still_ don't want to remove the lazy loading mechanism, because it's
still slightly useful for filters using the hwdec APIs. My main
motivation for not always preloading them is actually that libva prints
random useless crap to the terminal with no way to prevent this.
Enable it by default, but not unconditionally. Add an "auto" mode, which
disable DwmFlush if the compositor is (probably) inactive. Let's see how
this goes.
Since I accidentally enabled DwmFlush always by default (more or less)
in a previous commit touching this code, this is probably mostly just
cargo-culting, and it's uncertain whether it does anything.
Note that I still got bad vsync behavior when fullscreening mpv, and
making another window visible on the same screen. This happens even if
forcing DWM.
Yet another relatively useless option that tries to make OpenGL's sync
behavior somewhat sane. The results are not too encouraging. With a
value of 1, vsync jitter is gone on nVidia, but there are frame drops
(less than with glfinish). With 2, I get the usual vsync jitter _and_
frame drops.
There's still some hope that it might prevent too deep queuing with some
GPUs, I guess.
The timeout for the wait call is 1 second. The value is pretty
arbitrary; it should just not be too high to freeze the process (if
the GPU is un-nice), and not too low to trigger the timeout in normal
cases, even if the GPU load is very high. So I guess 1 second is ok
as a timeout.
The idea to use fences this way to control the queue depth was stolen
from RetroArch:
df01279cf3/gfx/drivers/gl.c (L1856)
This parameter has been unused for years (the last flag was removed in
commit d658b115). Get rid of it.
This affects the general VO API, as well as the vo_opengl backend API,
so it touches a lot of files.
The VOFLAGs are still used to control OpenGL context creation, so move
them to the OpenGL backend code.
This might fix some problems when framestepping with interpolation
enabled. The problem here is that we want to show the non-interpolated
frame while paused. Framestepping is like unpausing the video for a
frame, and then pausing again. This draws an interpolated frame, and
redrawing on pausing is supposed to take care of this.
This possibly didn't always work, because vo->want_redraw is not checked
by the vo_control() code path. So wake up the VO thread (which takes
care of servicing redraw requests, kind of) explicitly.
The correct solution is getting rid of the public-writable want_redraw
field and replacing it with a new vo_request_redraw() function, but this
can come later.