This adds basic support for ICC profiles. Per-monitor profiles are
supported. WCS profiles are not supported, but there is an API for
converting WCS profiles to ICC, so they might be supported in future.
I'm just not sure if anyone actually uses them.
Reloading the ICC profile when it's changed in the control panel is also
not supported. This might be possible by using the WCS APIs and watching
the registry for changes, but there is no official API for it, and as
far as I can tell, no other Windows programs can do it.
The OSD takes up an entire fullscreen dispmanx layer. Although the GPU
should be able to handle it (possibly even without any disadvantages),
it'll still be useful for debugging performance issues.
Running mpv with default config will now pick up ANGLE by default. Since
some think ANGLE is still not good enough for hq features, extend the
"es" option to reject GLES backends, and add to to the opengl-hq preset.
One consequence is that mpv will by default use libswscale to convert
10 bit video to 8 bit, before it reaches the VO.
I decided that I actually can't stand how vo_opengl unnecessarily puts
the video through 3 shader stages (instead of 1). Thus, what was meant
to be a fallback for weak OpenGL implementations, the dumb-mode, now
becomes default if the user settings allow it.
The code required to check for the settings isn't so wild, so I guess
it's manageable. I still hope that one day, our rendering logic can
generate ideal shader stages for this case too.
Note that in theory, dumb-mode could be reenabled at runtime due to a
color management 3D LUT being set, so a separate dumb_mode field is
required. The dumb-mode option can't just be overwritten.
vo_opengl_cb is a special case, because we somehow have to render video
asynchronously, all while "trusting" the API user to do it correctly.
This didn't quite work, and a while ago a compromise using a timeout to
prevent theoretically possible deadlocks was added.
Make it even more synchronous. Basically, go all the way, and
synchronize rendering between VO and user renderer thread to the
full extent possible.
This means the silly frame queue is dropped, and we event attempt to
synchronize the GL SwapBuffer call (via mpv_opengl_cb_report_flip()).
The changes introduced with commit dc33eb56 are effectively dropped. I
don't even remember if they mattered.
In the future, we might make all VOs fetch asynchronously from a frame
queue, which would mostly remove the differences between vo_opengl and
vo_opengl_cb, but this will take a while (if it will even be done).
Notes:
- Unfortunately the only way to talk to EGL from within DRM I could find
involves linking with GBM (generic buffer management for Mesa.)
Because of this, I'm pretty sure it won't work with proprietary NVidia
drivers, but then again, last time I checked NVidia didn't offer
proper screen resolution for VT.
- VT switching doesn't seem to work at all. It's worth mentioning that
using vo_drm before introduction of VT switcher had an anomaly where
user could switch to another VT and input text to it, while video
played on top of that VT. However, that isn't the case with drm_egl:
I can't switch to other VT during playback like this. This makes me
think that it's either a limitation coming from my firmware or from
EGL/KMS itself rather than a bug with my code. Nonetheless, I still
left (untestable) VT switching code in place, in case it's useful to
someone else.
- The mode_id, connector_id and device_path should be configurable for
power users and people who wish to watch videos on nonprimary screen.
Unfortunately I didn't see anything that would allow OpenGL backends
to register their own set of options. At the same time, adding them to
global namespace is pointless.
- A few dozens of lines could be shared with vo_drm (setting up VT
switching, most of code behind page flipping). I don't have any strong
opinion on this.
- Sometimes I get minor visual glitches. I'm not sure if there's a race
condition of some sort, unitialized variable (doubtful), or if it's
buggy driver. (I'm using integrated Intel HD Graphics 4400 with Mesa)
- .config and .control are very minimal.
Signed-off-by: wm4 <wm4@nowhere>
Implement NNEDI3, a neural network based deinterlacer.
The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.
The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.
We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.
For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
the shader code will be regenerated for each frame, it's on CPU
though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
implementation with some bit tricks (probably with the help of the
intBitsToFloat function).
The code is tested with nvidia card and driver (355.11), on Linux.
Closes#2230
Add the Super-xBR filter for image doubling, and the prescaling framework
to support it.
The shader code was ported from MPDN extensions project, with
modification to process luma only.
This commit is largely inspired by code from #2266, with
`gl_transform_trans()` authored by @haasn taken directly.
Enable it by default, but not unconditionally. Add an "auto" mode, which
disable DwmFlush if the compositor is (probably) inactive. Let's see how
this goes.
Since I accidentally enabled DwmFlush always by default (more or less)
in a previous commit touching this code, this is probably mostly just
cargo-culting, and it's uncertain whether it does anything.
Note that I still got bad vsync behavior when fullscreening mpv, and
making another window visible on the same screen. This happens even if
forcing DWM.
Yet another relatively useless option that tries to make OpenGL's sync
behavior somewhat sane. The results are not too encouraging. With a
value of 1, vsync jitter is gone on nVidia, but there are frame drops
(less than with glfinish). With 2, I get the usual vsync jitter _and_
frame drops.
There's still some hope that it might prevent too deep queuing with some
GPUs, I guess.
The timeout for the wait call is 1 second. The value is pretty
arbitrary; it should just not be too high to freeze the process (if
the GPU is un-nice), and not too low to trigger the timeout in normal
cases, even if the GPU load is very high. So I guess 1 second is ok
as a timeout.
The idea to use fences this way to control the queue depth was stolen
from RetroArch:
df01279cf3/gfx/drivers/gl.c (L1856)
It's great that the new algorithm supports multiple placebo iterations
and all, but it's really not necessary and hurts performance in the
general case for the sake of the 0.1% that actually pause the screen
and look for minute differences.
Signed-off-by: wm4 <wm4@nowhere>
This reverts commit d11184a256.
Unfortunately, there was a lot of unexpected resistance.
Do note that this is still extremely slow, crappy, etc.
Note that vo_x11.c was further edited. Compared to the removed vo_x11.c,
an additional ~200 lines of code was removed in order to simplify it. I
tried to strip it down as much as possible. In particular, support for
odd non-32 bit formats (24, 16, 15, 8 bit) is dropped.
Closes#2300.
This turns the old scalers (inherited from MPlayer) into a pre-
processing step (after color conversion and before scaling). The code
for the "sharpen5" scaler is reused for this.
The main reason MPlayer implemented this as scalers was perhaps because
FBOs were too expensive, and making it a scaler allowed to implement
this in 1 pass. But unsharp masking is not really a scaler, and I would
guess the result is more like combining bilinear scaling and unsharp
masking.
The removal of source-shader is a side effect, since this effectively
replaces it - and the video-reading code has been significantly
restructured to make more sense and be more readable.
This means users no longer have to constantly download and maintain a
separate deband.glsl installation alongside mpv, which was the only real
use case for source-shader that we found either way.
The single path optimization, rendering the video in one shader pass and
without FBO indirections, was removed soem commits ago. It didn't have a
place in this code, and caused considerable complexity and maintenance
issues.
On the other hand, it still has some worth, such as for use with
extremely crappy hardware (GLES only or OpenGL 2.1 without FBO
extension). Ideally, these use cases would be handled by a separate VO
(say, vo_gles). While cleaner, this would still cause code duplication
and other complexity.
The third option is making the single-pass optimization a completely
separate code path, with most vo_opengl features disabled. While this
does duplicate some functionality (such as "unpacking" the video data
from textures), it's also relatively unintrusive, and the high quality
code path doesn't need to take it into account at all. On another
positive node, this "dumb-mode" could be forced in other cases where
OpenGL 2.1 is not enough, and where we don't want to care about versions
this old.
This change makes vo_opengl slightly less compatible (ancient devices
without FBOs will no longer work) and decreases performance in the
simplest case (vo=opengl), in exchange for significantly reducing code
complexity and making everything easier to reason about.
Can significantly help with very large video resolutions on nvidia
drivers. It doesn't seem to have negative effects on Intel drivers
either. (Although it could have on Intel drivers for older hardware.)
For now, this is only for --vo=opengl-hq. Maybe --vo=opengl should use
it too, but it's still meant to be the crappy, fail-safe default.
This significantly reduces the amount of noticeable flashing when using
tscale kernels with negative lobes, by cutting them off completely.
I'm not sure if this has any negative effects. It needs a bit of
subjective testing over a period of time, so I just made it an option.
Fixes#2155.
This should make interpolation work much better in general, although
there still might be some side effects for unusual framerates (eg. 35 Hz
or 48 Hz). Most of the common framerates are tested and working fine.
(24 Hz, 30 Hz, 60 Hz)
The new code doesn't have support for oversample yet, so it's been
removed (and will most likely be reimplemented in a cleaner way if
there's enough demand). I would recommend using something like robidoux
or mitchell instead of oversample, though - they're much
smoother for the common cases.
They're completely orthogonal concepts, merged in the past due to
convenience and ease of implementing it in the old #ifdef hell renderer.
Especially after the CMS stuff was generalized by 634b4a, this was a
trivial change to implement and also means color management will be much
higher quality when enabled with vo=opengl (which had quantization
issues in the past due to the 8 bit FBO format and upscaling), since it
can be done in a single pass now.
Instead of having separate backends, make use of GLES a flag. This
reduces the number of backends and the resulting annoyances.
Also, nobody cares about using GLES, so there's no backward
compatibility either.
(I have no idea why there are different modes.)
Instead of risking to drop frames too early, give it some margin. Since
there are situations this could deadlock, wait with a timeout. This can
happen if e.g. the API user is refusing to render anything, or if
uninitialization is happening.
Reduces (but likely does not remove) the danger of rounding intermediate
values down to 8 bit. This is important for cscale, or any other
processing that might store raw YUV values in framebuffers.
Fixes#1918.
This now stores caches for multiple ICC profiles, potentially all the
user has ever used. The big use case for this is for users with multiple
monitors. The old logic would mandate recomputing the LUT and discarding
the cache whenever dragging mpv from one screen to another.
This also avoids having to save and check the ICC profile itself, since
the file name already uniquely determines it.
This could help in cases where the DWM (Windows desktop compositor) adds another
layer of bufferring and therefore the SwapBuffers timing could get messed up.
Signed-off-by: wm4 <wm4@nowhere>