Commit Graph

34 Commits

Author SHA1 Message Date
wm4 1c81311673 vo_opengl: osd: refactor and simplify
Reduce this to 1 draw call per OSD pass. This removes the need for some
annoying special handling regarding 3D video support (we supported
duplicating the OSD/subtitles for side-by-side 3D output etc.).

Remove the unneeded texture sampler uniform thing.
2017-07-22 21:46:59 +02:00
Niklas Haas b93bcce5df
vo_opengl: coalesce intra-plane PBOs
Instead of allocating three PBOs and cycling through them, we allocate
one PBO that's three times as large, and cycle through the subregion
offsets.

This results in arguably simpler code and faster initialization
performance. Especially for 4K textures, initializing PBOs can take
quite some time (e.g. 180ms -> 110ms). For 1080p, it's more like 66ms ->
52ms for me.

The alignment to 4096 is completely unnecessary by spec, but we do it
anyway just for peace of mind.
2017-07-15 22:11:48 +02:00
Niklas Haas dd78cc6fe7 vo_opengl: refactor vo performance subsystem
This replaces `vo-performance` by `vo-passes`, bringing with it a number
of changes and improvements:

1. mpv users can now introspect the vo_opengl passes, which is something
   that has been requested multiple times.

2. performance data is now measured per-pass, which helps both
   development and debugging.

3. since adding more passes is cheap, we can now report information for
   more passes (e.g. the blit pass, and the osd pass). Note: we also
   switch to nanosecond scale, to be able to measure these passes
   better.

4. `--user-shaders` authors can now describe their own passes, helping
   users both identify which user shaders are active at any given time
   as well as helping shader authors identify performance issues.

5. the timing data per pass is now exported as a full list of samples,
   so projects like Argon-/mpv-stats can immediately read out all of the
   samples and render a graph without having to manually poll this
   option constantly.

Due to gl_timer's design being complicated (directly reading performance
data would block, so we delay the actual read-back until the next _start
command), it's vital not to conflate different passes that might be
doing different things from one frame to another. To accomplish this,
the actual timers are stored as part of the gl_shader_cache's sc_entry,
which makes them unique for that exact shader.

Starting and stopping the time measurement is easy to unify with the
gl_sc architecture, because the existing API already relies on a
"generate, render, reset" flow, so we can just put timer_start and
timer_stop in sc_generate and sc_reset, respectively.

The ugliest thing about this code is that due to the need to keep pass
information relatively stable in between frames, we need to distinguish
between "new" and "redrawn" frames, which bloats the code somewhat and
also feels hacky and vo_opengl-specific. (But then again, this entire
thing is vo_opengl-specific)
2017-07-01 00:58:27 +02:00
wm4 759ac6cc93 vo_opengl: add option for caching shaders on disk
Mostly because of ANGLE (sadly).

The implementation became unpleasantly big, but at least it's relatively
self-contained.

I'm not sure to what degree shaders from different drivers are
compatible as in whether a driver would randomly misbehave if it's fed
a binary created by another driver. The useless binayFormat parameter
won't help it, as they can probably easily clash. As usual, OpenGL is
pretty shit here.
2017-04-08 16:43:56 +02:00
wm4 71caa0b79b vo_opengl: remove two unused symbols 2017-04-08 16:43:56 +02:00
wm4 8fb9cc2534 vo_opengl: read framebuffer depth from actual FBO used for rendering
In some cases, such as when using the libmpv opengl-cb API, or with
certain vo_opengl backends, the main framebuffer is never accessed.
Instead, rendering is done to a FBO that acts as back buffer. This meant
an incorrect/broken bit depth could be used for dithering.

Change it to read the framebuffer depth lazily on the first render call.

Also move the main FBO field out of the GL struct to MPGLContext,
because the renderer's init function does not need to access it anymore.
2017-03-20 13:31:28 +01:00
wm4 03fe50651b vo_opengl: move some init_gl code to utility functions 2017-03-20 13:20:35 +01:00
wm4 09238a9bb5 vo_opengl: don't rely on viewport to contain window dimensions
Apparently we don't always set the viewport to window dimensions
anymore, e.g. if nothing is actually rendered. This means the viewport
can contain old values.

The window screenshot code uses the viewport values to guess the default
framebuffer dimensions. With --force-window --idle --no-osc (which draws
nothing and issues a glClear() command only), taking a screenshot would
yield an image with the wrong size and possibly garbage in it. Fix this
by explicitly passing the currently known window dimensions. Abusing the
values stored in the viewport was questionable anyway.
2016-12-02 15:26:45 +01:00
wm4 88a07c5f53 vo_opengl: dynamically manage texture units
A minor cleanup that makes the code simpler, and guarantees that we
cleanup the GL state properly at any point.

We do this by reusing the uniform caching, and assigning each sampler
uniform its own texture unit by incrementing a counter. This has various
subtle consequences for the GL driver, which hopefully don't matter. For
example, it will bind fewer textures at a time, but also rebind them
more often.

For some reason we keep TEXUNIT_VIDEO_NUM, because it limits the number
of hook passes that can be bound at the same time.

OSD rendering is an exception: we do many passes with the same shader,
and rebinding the texture each pass. For now, this is handled in an
unclean way, and we make the shader cache reserve texture unit 0 for the
OSD texture. At a later point, we should allocate that one dynamically
too, and just pass the texture unit to the OSD rendering code. Right now
I feel like vo_rpi.c (may it rot in hell) is in the way.
2016-09-14 20:46:45 +02:00
wm4 e24ba8fa7f vo_opengl: require explicit reset on shader cache after rendering
The caller now has to call gl_sc_reset(), and _after_ rendering. This
way we can unset OpenGL state that was setup for rendering. This affects
the shader program, for example. The next commit uses this to
automatically manage texture units via the shader cache.

vo_rpi.c changes untested.
2016-09-14 20:24:06 +02:00
Niklas Haas 8f1a889f75 vo_opengl: make the number of PBOs tunable
Also set the number of PBOs from 2 to 3, which should be better for
pipelining. This makes it easier to add more in the future.
2016-09-14 14:07:21 +02:00
wm4 b57debe5b3 vo_opengl: use ringbuffer of PBOs
This is how PBOs are normally supposed to be used.

Unfortunately I can't see an any absolute improvement on nVidia binary
drivers and playing 4K material. Compared to the "old" PBO path with 1
buffer, the measured GL time decreases significantly, though.
2016-07-03 16:34:32 +02:00
wm4 823c353faa vo_opengl: move PBO upload handling to shared code
This introduces a gl_pbo_upload_tex() function, which works almost like
our gl_upload_tex() glTexSubImage2D() wrapper, except it takes a struct
which caches the PBO handles. It also takes the full texture size (to
make allocating an ideal buffer size easier), and a parameter to disable
PBOs (so that the caller doesn't have to duplicate the gl_upload_tex()
call if PBOs are disabled or unavailable).

This also removes warnings and fallbacks on PBO failure. We just
silently try using PBOs on every frame, and if that fails at some point,
revert to normal texture uploads. Probably doesn't matter.
2016-07-03 16:34:32 +02:00
Bin Jin 3df95ee57a vo_opengl: remove uniform buffer object routines 2016-06-18 19:16:31 +02:00
Niklas Haas 8ceb935bd8 vo_opengl: add time queries
To avoid blocking the CPU, we use 8 time objects and rotate through
them, only blocking until the last possible moment (before we need
access to them on the next iteration through the ring buffer). I tested
it out on my machine and 4 query objects were enough to guarantee
block-free querying, but the extra margin shouldn't hurt.

Frame render times are just output at the end of each frame, via MP_DBG.
This might be improved in the future. (In particular, I want to expose
these numbers as properties so that users get some more visible feedback
about render times)

Currently, we measure pass_render_frame and pass_draw_to_screen
separately because the former might be called multiple times due to
interpolation. Doing it this way gives more faithful numbers. Same goes
for frame upload times.
2016-06-07 12:16:15 +02:00
wm4 c4707cdee6 vo_opengl: fix other minor namespace issues
See previous commit.
2016-05-23 21:27:18 +02:00
wm4 e76aa7e8db vo_opengl: rename glUploadTex, drop unused parameter
Rename it to get out of OpenGL's namespace. The gl_ prefix is used by
other mpv functions, but no OpenGL ones.

The "slice" parameter was never actually used, and all callers passed 0
for it.
2016-05-23 21:27:18 +02:00
wm4 66079048ea vo_opengl: unify PBO and normal OSD texture upload path
The main change is actually that e first copy to a "staging" memory
frame, and then upload this at once. The old non-PBO code called
glTexsubImage2D for each OSD sub-bitmap.

The new non-PBO code path is a bit faster now if there are many small
sub-bitmaps (on Linux/nVidia). It's also a bit simpler, so this is a
win.

(Although I don't particularly appreciate the mixed normal/PBO texture
code.)
2016-05-23 21:27:18 +02:00
wm4 cc72a4e8c3 vo_opengl: support framebuffer invalidation
Not sure how much can be gained with this, as we can't use it properly
yet. For now, this is used only before rendering, which probably does
overwhelmingly nothing.

In the future, this should be used after temporary passes, which could
possibly reduce memory usage and even memory bandwidth usage, depending
on the drivers.
2016-05-23 21:27:18 +02:00
Niklas Haas dfc7b59909 vo_opengl: make the screen blue on shader errors
This helps visually signify that somthing went wrong, and prevents
confusing shader compilation errors with other types of bugs.
2016-05-15 20:42:02 +02:00
Niklas Haas 7c3d78fd82 vo_opengl: support external user hooks
This allows users to add their own near-arbitrary hooks to the vo_opengl
processing pipeline, greatly enhancing the flexibility of user shaders.
This enables, among other things, user shaders such as CrossBilateral,
SuperRes, LumaSharpen and many more.

To make parsing the user shaders easier, shaders are now loaded as
bstrs, and the hooks are set up during video reconfig instead of on
every single frame.
2016-05-15 20:42:02 +02:00
Niklas Haas 070edd7300 vo_opengl: add hooks and rework pass_read_video
The hook mechanism allows arbitrary processing stages to get dispatched
whenever certain named textures have been "finalized" by the code.

This is mostly meant to serve as a change that opens up the internal
processing in pass_read_video to user scripts, but as a side benefit all
of the code dealing with offsets and plane alignment and other such
confusing things has been rewritten.

This hook mechanism is powerful enough to cover the needs of both
debanding and prescaling (and more), so as a result they can be removed
from pass_read_video entirely and implemented through hooks.

Some avenues for optimization:

- The prescale hook is currently somewhat distributed code-wise. It might be
  cleaner to split it into superxbr and NNEDI3 hooks which can each be
  self-contained.

- It might be possible to move a large part of the hook code out to an
  external file (including the hook definitions for debanding and
  prescaling), which would be very much desired.

- Currently, some stages (chroma merging, integer conversion) will
  *always* run even if unnecessary. I'm planning another series of
  refactors (deferred img_tex) to allow dropping unnecessary shader
  stages like these, but that's probably some ways away. In the meantime
  it would be doable to re-add some of the logic to skip these stages if
  we know we don't need them.

- More hook locations could be added (?)
2016-05-15 20:42:02 +02:00
wm4 84ccebd9b9 vo_opengl: reorganize texture format handling
This merges all knowledge about texture format into a central table.

Most of the work done here is actually identifying which formats exactly
are supported by OpenGL(ES) under which circumstances, and keeping this
information in the format table in a somewhat declarative way. (Although
only to the extend needed by mpv.) In particular, ES and float formats
are a horrible mess.

Again this is a big refactor that might cause regression on "obscure"
configurations.
2016-05-12 21:22:28 +02:00
wm4 e5b5cc2a2f vo_opengl: fix row-major vs. column-major confusion
gl_transform_vec() assumed column-major, while everything else seemed to
assumed row-major memory organization for gl_transform.m. Also,
gl_transform_trans() seems to contain additional confusion.

This didn't matter until now, as everything has been orthogonal, this
the swapped matrix entries were always 0.
2016-03-28 16:16:09 +02:00
Niklas Haas 93546f0c2f vo_opengl: refactor pass_read_video and texture binding
This is a pretty major rewrite of the internal texture binding
mechanic, which makes it more flexible.

In general, the difference between the old and current approaches is
that now, all texture description is held in a struct img_tex and only
explicitly bound with pass_bind. (Once bound, a texture unit is assumed
to be set in stone and no longer tied to the img_tex)

This approach makes the code inside pass_read_video significantly more
flexible and cuts down on the number of weird special cases and
spaghetti logic.

It also has some improvements, e.g. cutting down greatly on the number
of unnecessary conversion passes inside pass_read_video (which was
previously mostly done to cope with the fact that the alternative would
have resulted in a combinatorial explosion of code complexity).

Some other notable changes (and potential improvements):

- texture expansion is now *always* handled in pass_read_video, and the
  colormatrix never does this anymore. (Which means the code could
  probably be removed from the colormatrix generation logic, modulo some
  other VOs)

- struct fbo_tex now stores both its "physical" and "logical"
  (configured) size, which cuts down on the amount of width/height
  baggage on some function calls

- vo_opengl can now technically support textures with different bit
  depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries
  inside img_format.c doesn't export this (nor does ffmpeg support it,
  really) so the status quo of using the same tex_mul for all planes is
  kept.

- dumb_mode is now only needed because of the indirect_fbo being in the
  main rendering pipeline. If we reintroduce p->use_indirect and thread
  a transform through the entire program this could be skipped where
  unnecessary, allowing for the removal of dumb_mode. But I'm not sure
  how to do this in a clean way. (Which is part of why it got introduced
  to begin with)

- It would be trivial to resurrect source-shader now (it would just be
  one extra 'if' inside pass_read_video).
2016-03-05 13:08:38 +01:00
wm4 7b6e3772ab vo_opengl: support 10 bit support with ANGLE
GLES does not support high bit depth fixed point textures for unknown
reasons, so direct 10 bit input is not possible. But we can still use
integer textures, which are supported by GLES 3.0. These store integer
data just like the standard fixed point textures, except they are not
normalized on sampling. They also don't support bilinear filtering, and
require a special sampler ("usampler2D").

While these texture formats enable us to shuffle the data to the GPU,
they're rather impractical with the requirements mentioned above and our
current architecture. One problem is that most code assumes it can
always use bilinear scaling (even if bilinear is never used when using
appropriate scale/cscale options). Another is that we don't have any
concept of running a function on a texture in an uniform way.

So for now, run a simple conversion step through a FBO. The FBO will use
the rgba16f format normally, which gives enough bits for 10 bit, and
will at least gracefully degrade with higher depth input.

This is bound to be much slower than a more "direct" method, but at
least it works and is simple to implement.

The odd change of function call order in init_video() is to properly
disable "dumb mode" (no FBO use) if these texture formats are in use.
2016-01-26 21:35:23 +01:00
wm4 e4ec0f42e4 Change GPL/LGPL dual-licensed files to LGPL
Do this to make the license situation less confusing.

This change should be of no consequence, since LGPL is compatible with
GPL anyway, and making it LGPL-only does not restrict the use with GPL
code.

Additionally, the wording implies that this is allowed, and that we can
just remove the GPL part.
2016-01-19 18:36:34 +01:00
wm4 eeb5f98758 vo_opengl: handle GL_ARB_uniform_buffer_object with low GLSL versions
Why is this stupid crap being so much a pain for no reason.
2015-11-09 16:24:01 +01:00
Bin Jin 27dc834f37 vo_opengl: implement NNEDI3 prescaler
Implement NNEDI3, a neural network based deinterlacer.

The shader is reimplemented in GLSL and supports both 8x4 and 8x6
sampling window now. This allows the shader to be licensed
under LGPL2.1 so that it can be used in mpv.

The current implementation supports uploading the NN weights (up to
51kb with placebo setting) in two different way, via uniform buffer
object or hard coding into shader source. UBO requires OpenGL 3.1,
which only guarantee 16kb per block. But I find that 64kb seems to be
a default setting for recent card/driver (which nnedi3 is targeting),
so I think we're fine here (with default nnedi3 setting the size of
weights is 9kb). Hard-coding into shader requires OpenGL 3.3, for the
"intBitsToFloat()" built-in function. This is necessary to precisely
represent these weights in GLSL. I tried several human readable
floating point number format (with really high precision as for
single precision float), but for some reason they are not working
nicely, bad pixels (with NaN value) could be produced with some
weights set.

We could also add support to upload these weights with texture, just
for compatibility reason (etc. upscaling a still image with a low end
graphics card). But as I tested, it's rather slow even with 1D
texture (we probably had to use 2D texture due to dimension size
limitation). Since there is always better choice to do NNEDI3
upscaling for still image (vapoursynth plugin), it's not implemented
in this commit. If this turns out to be a popular demand from the
user, it should be easy to add it later.

For those who wants to optimize the performance a bit further, the
bottleneck seems to be:
1. overhead to upload and access these weights, (in particular,
   the shader code will be regenerated for each frame, it's on CPU
   though).
2. "dot()" performance in the main loop.
3. "exp()" performance in the main loop, there are various fast
   implementation with some bit tricks (probably with the help of the
   intBitsToFloat function).

The code is tested with nvidia card and driver (355.11), on Linux.

Closes #2230
2015-11-05 17:38:20 +01:00
Bin Jin 4c43c30421 vo_opengl: add Super-xBR filter for upscaling
Add the Super-xBR filter for image doubling, and the prescaling framework
to support it.

The shader code was ported from MPDN extensions project, with
modification to process luma only.

This commit is largely inspired by code from #2266, with
`gl_transform_trans()` authored by @haasn taken directly.
2015-11-05 17:38:20 +01:00
wm4 17cd6798a6 vo_opengl: move shader file caching to video.c
It's just about loading and cachign small files, not does not
necessarily have anything to do with shaders. Move it to video.c where
it's used.
2015-09-23 22:13:03 +02:00
wm4 e2139488ff vo_opengl: move sampler type mapping to a function 2015-09-10 20:52:50 +02:00
Niklas Haas 97363e176d vo_opengl: implement debanding (and remove source-shader)
The removal of source-shader is a side effect, since this effectively
replaces it - and the video-reading code has been significantly
restructured to make more sense and be more readable.

This means users no longer have to constantly download and maintain a
separate deband.glsl installation alongside mpv, which was the only real
use case for source-shader that we found either way.
2015-09-09 19:19:23 +02:00
Niklas Haas 44eda2177d vo_opengl: remove gl_ prefixes from files in video/out/opengl
This is a bit redundant with the name of the directory itself, and not
in line with existing naming conventions.
2015-09-09 18:09:31 +02:00