1
0
mirror of https://github.com/mpv-player/mpv synced 2025-03-11 08:37:59 +00:00
Commit Graph

315 Commits

Author SHA1 Message Date
wm4
61c8a147b5 vo_opengl: call ra_free() in the correct context
This also fixes a double free in vo_opengl_cb.c.
2017-08-07 19:57:15 +02:00
wm4
47ea771b7a vo_opengl: further GL API use separation
Move multiple GL-specific things from the renderer to other places like
vo_opengl.c, vo_opengl_cb.c, and ra_gl.c.

The vp_w/vp_h parameters to gl_video_resize() make no sense anymore, and
are implicitly part of struct fbodst.

Checking the main framebuffer depth is moved to vo_opengl.c. For
vo_opengl_cb.c it always assumes 8. The API user now has to override
this manually. The previous heuristic didn't make much sense anyway.

The only remaining dependency on GL is the hwdec stuff, which is harder
to change.
2017-08-07 19:17:28 +02:00
wm4
346ac1e09f vo_opengl: simplify mirroring and fix it if glBlitFramebuffer is used
The vp_w/vp_h variables and parameters were not really used anymore
(they were redundant with ra_tex w/h) - but vp_h was still used to
identify whether rendering should be done mirrored.

Simplify this by adding a fbodst struct (some bad naming), which
contains the render target texture, and some parameters how it should be
rendered to (for now only flipping). It would not be appropriate to make
this a member of ra_tex, so it's a separate struct.

Introduces a weird regression for the first frame rendered after
interpolation is toggled at runtime, but seems to work otherwise. This
is possibly due to the change that blit() now mirrors, instead of just
copying. (This is also why ra_fns.blit is changed.)

Fixes #4719.
2017-08-07 16:44:15 +02:00
wm4
41ee66d566 vo_opengl: drop pointless fbotex_init() function 2017-08-07 14:34:18 +02:00
Niklas Haas
9581fbe569 vo_opengl: generalize ra_buf to support other buffer objects
This allows us to integrate PBOs and SSBOs into the same abstraction,
with the potential to easily add UBOs if the need arises.
2017-08-07 12:46:30 +02:00
Niklas Haas
494aa0f651
vo_opengl: only mark frames as fresh if they contain a new image
When using dumb mode, we can actually redraw a frame without uploading
it. Marking this as fresh as well results in unpredictable pass
behavior, which is confusing and makes debugging harder. So mark it as a
redraw instead, in that case.
2017-08-06 02:51:11 +02:00
Niklas Haas
e5748e891f vo_opengl: measure pass_draw_osd as a whole
In the past, this always measured the per-shader execution times of the
individual OSD parts, which was thrown off because the shader was reused
anyway. (And apparently recording the OSD shader execution times was
removed completely, probably because of them being so unrealiably
anyway)

Since ra_timer no longer has the restriction of not allowing timers to
run concurrently, we can just wrap the entire OSD block inside a single
osd_timer now, and record that. (Technically, this can still be off when
using --blend-subtitles=video/yes and showing a full-screen OSD at the
same time. Maybe this can be done better?)
2017-08-06 00:10:20 +02:00
Niklas Haas
f2298f394e vo_opengl: move timers to struct ra
In order to prevent code duplication and keep the ra abstraction as
small as possible, `ra` only implements the actual timer queries,
it does not do pooling/averaging of the results. This is instead moved
to a ra-neutral struct timer_pool in utils.c.
2017-08-06 00:10:20 +02:00
wm4
dddda6e4a5 vo_opengl: move GL state resetting to vo_opengl_cb
This code is pretty much for the sake of vo_opengl_cb API users. It
resets certain state that either the user or our code doesn't reset
correctly. This is somewhat outdated. With GL implicit state being
so awfully large, it seems more reasonable require that any code
restores the default state when returning to the caller. Some
exceptions are defined in opengl_cb.h.
2017-08-05 16:27:09 +02:00
wm4
333cae74ef vo_opengl: move shader handling to ra
Now all GL-specifics of shader compilation are abstracted through ra.
Of course we still have everything hardcoded to GLSL - that isn't going
to change.

Some things will probably change later - in particular, the way we pass
uniforms and textures to the shader. Currently, there is a confusing
mismatch between "primitive" uniforms like floats, and others like
textures.

Also, SSBOs are not abstracted yet.
2017-08-05 16:27:09 +02:00
wm4
f72a33d2cb vo_opengl: organize ra PBO flag slightly differently
Instead of having a mutable ra_tex field (and the only one), move the
flag to struct ra, since we have only 2 tex_upload user calls anyway,
and both want the same PBO behavior. (At first I considered making it
a RA_TEX_UPLOAD_ flag, but why bother. PBOs are a terribly GL-specific
thing, so we can't expect a reasonable abstraction of it anyway.)
2017-08-05 13:48:46 +02:00
wm4
dd096863fa vo_opengl: make OSD code use ra for textures
This requires a silly extension to ra_fns.tex_upload: since the OSD
texture can be much larger than the actual OSD image data to upload, a
mechanism for uploading only to a small part of the texture is needed.
Otherwise, we'd have to realloc/copy the data, just to pad it, and then
pay for uploading the padding too.

The RA_TEX_UPLOAD_DISCARD flag is not interpreted by GL (not sure how
you'd tell GL about this), but it clarifies the API and might be
helpful if we support other backend APIs in the future.
2017-08-05 13:44:30 +02:00
wm4
fa4a1c4675 vo_opengl: always use GL_TRIANGLES for all primitives
Will make the ra layer _slightly_ simpler.
2017-08-05 13:09:05 +02:00
wm4
0206efa94a vo_opengl: pass ra objects during rendering instead of GL objects
Another "small" step towards removing GL dependencies from the renderer.
This commit generally passes ra_tex objects instead of GL FBO integer
IDs to various rendering functions. video.c still manually binds the
FBOs when calling shaders.

This also happens to fix a memory leak with output_fbo.
2017-08-05 13:09:05 +02:00
wm4
a796745fd2 vo_opengl: make fbotex helper use ra
Further work removing GL dependencies from the actual video renderer,
and moving them into ra backends.

Use of glInvalidateFramebuffer() falls away. I'd like to keep this, but
it's better to readd it once shader runs are in ra.
2017-08-05 13:09:05 +02:00
Niklas Haas
fee6b287a5 vo_opengl: support embedded ICC profiles
This currently only works when using lcms-based color management
(--icc-profile-*).

In principle, we could also support using lcms even when the user has
not specified an ICC profile, by generating the profile against a fixed
reference (--target-prim/--target-trc) instead. I still might do that
some day, simply because 3dlut provides a higher quality conversion than
our simple gamut mapping does for stuff like BT.2020, and also because
it's now needed to enable embedded ICC profiles. But that would be a
separate change, so preserve the status quo for now.

(Besides, my opinion is still that you should be using an ICC profile if
you care about colors being accurate _at all_)
2017-08-03 21:48:25 +02:00
Niklas Haas
5e89aed934 vo_opengl: don't precompute texcoord in global scope
Breaks on mesa for whatever reason... even though it doesn't generate a
GLSL shader compiler error

Shouldn't make a performance difference for us because we cache `pos`
anyway, and most compute shaders will probably cache all of their
samples to shmem. Might have to re-visit this when we have an actual use
case for repeated sampling inside CS though. (RAVU + anti-ringing is a
possible candidate for that)
2017-08-03 18:50:07 +02:00
Niklas Haas
83f3910398
vo_opengl: make compute shaders more flexible
This allows users to do their own custom sample writing, mainly meant to
address use cases such as RAVU. Also clean up the compute shader code a
bit.
2017-08-03 18:27:36 +02:00
wm4
ffe0526064 vo_opengl: simplify/fix user shader textures
This broke float textures, which were actually used by some shaders.
There were probably some other bugs as well.

Lots of code can be avoided by using ra_tex_params directly, so do that.

The main change is that COMPONENT/FORMAT are replaced by a single FORMAT
directive, which takes different parameters now. Due to the mess with
16/32 bit float textures, and because we want to support other APIs than
just GL in the future, it's not really clear how this should be handled,
and the nice component/type separation makes things actually harder. So
just jump the gun and use the ra_format.name names, which were
originally meant mostly for debugging. (This is probably something that
will be regretted later.)

Still only superficially tested, but seems to work.

Fixes #4708.
2017-08-03 16:19:49 +02:00
Niklas Haas
5e1e7d32e8
vo_opengl: generalize HDR tone mapping to gamut mapping
Since this code was already written for HDR, and is now per-channel
(because it works better for HDR as well), we can actually reuse this to
get very high quality gamut mapping without clipping. The only required
change is to move the tone mapping from before the gamut map to after
the gamut map. Additonally, we need to also account for changes in the
signal range as a result of applying the CMS when we compute ref_peak,
which is fortunately pretty easy because we only need to consider the
case of primaries mapping to themselves.

Since `HDR` no longer really makes sense as a label, rename it to
`--tone-mapping` in general. Also fits better with
`--tone-mapping-desat` etc.

Arguably we could also rename `--hdr-compute-peak`, but that option is
basically only useful for HDR content anyway because we don't need
information about the signal range for gamut mapping.

This (finally!) gives us reasonably high quality gamut mapping even in
the absence of an ICC profile / 3DLUT.
2017-08-03 12:46:57 +02:00
wm4
53188a14bf vo_opengl: manage user shader textures with ra
Drops some features I guess, no idea if those were needed. Untested due
to lack of test cases.
2017-07-30 11:38:52 +02:00
wm4
5429dbf2a2 vo_opengl: fix dither texture filter
Should be GL_NEAREST, not GL_LINEAR.
2017-07-30 09:43:41 +02:00
wm4
ab1ffa1382 vo_opengl: manage ICC LUT texture via ra
Also move the capability check to gl_video_get_lut3d(), because it
seems more convenient (ra won't have a _CAP_EXT16).
2017-07-29 21:23:31 +02:00
wm4
37b7b32d61 vo_opengl: manage scaler LUT textures via ra
Also fix the RA_CAP_ bitmask nonsense.
2017-07-29 20:15:59 +02:00
wm4
8494fdadae vo_opengl: manage dither texture via ra
Also add some more helpers.

Fix the broken math.h include statement.

utils.c uses ra_gl.h internals, which it shouldn't, and which will be
removed again as soon as this code gets converted to ra fully.
2017-07-29 20:14:48 +02:00
wm4
0f9fcf0ed4 vo_opengl: do not use GL format conversion on texture upload
The dither texture data is created as a float array, but uploaded to a
texture with GL_R16 as internal format. We relied on GL to do the
conversion from float to uint16_t. Not all GL variants even support
this: GLES does not provide this conversion (one of the reasons why this
code has a float16 code path). Also, ra is not going to do this. So just
convert on the fly.

Still keep the float16 texture format fallback, because not all GLES
implementations provide GL_R16.

There is some possibility that we'll need to provide some kind of upload
conversion anyway for float->float16. We still rely on GL doing this
implicitly, and all GL variants support it, but with RA there might be
the need for explicit conversion. Even then, it might be best to reduce
the number of conversion cases. I'll worry about this later.
2017-07-29 20:12:43 +02:00
wm4
6fcc09ff3d vo_opengl: use ra_* for format negotiation too
Format handling via ra_* was added earlier, but the format negotiation
part was forgotten.

Actually move some aspects of it to ra_get_imgfmt_desc(). Also make sure
the unorm and float formats selected by the common format lookup
functions are linear filterable. (For OpenGL, this is implicitly
guaranteed, so it wasn't done before.) Whether these assumptions should
be checked/enforced in the ra code at all is a bit fuzzy, but with ra
being helper code only for the actual video renderer, it's probably
justified.
2017-07-29 20:11:51 +02:00
Niklas Haas
345bb193fe
vo_opengl: support loading custom user textures
Parsing the texture data as raw strings makes the textures the most
portable and self-contained. In order to facilitate different types of
shaders, the parse_user_shader interaction has been changed to instead
have it loop through blocks and call the passed functions for each valid
block parsed. This is more modular and also cleaner, with better code
separation.

Closes #4586.
2017-07-27 23:51:05 +02:00
Niklas Haas
f1af6e53f0 vo_opengl: slightly refactor user_shaders code
- Each struct tex_hook now stores multiple hooks, this allows us to
  avoid the awkward way of the current code has to add the same pass
  multiple times.

- As a consequence, SHADER_MAX_HOOKS was split up into SHADER_MAX_PASSES
  (number of tex_hooks) and SHADER_MAX_HOOKS (number of hooked textures
  per tex_hook), and both numbers decreased correspondingly.

- Instead of having a weird free() callback, we can just leverage
  talloc's recursive free behavior. The only user is the user shaders code
  anyway.
2017-07-27 23:45:17 +02:00
Niklas Haas
e1cc43182c
vo_opengl: fix mpgl_caps bit check
As pointed out by @bjin, this would match if _any_ of the reqs are set.
Need to test for explicit equality.
2017-07-27 00:38:54 +02:00
wm4
81851febc4 vo_opengl: start work on rendering API abstraction
This starts work on moving OpenGL-specific code out of the general
renderer code, so that we can support other other GPU APIs. This is in
a very early stage and it's only a proof of concept. It's unknown
whether this will succeed or result in other backends.

For now, the GL rendering API ("ra") and its only provider (ra_gl) does
texture creation/upload/destruction only. And it's used for the main
video texture only. All other code is still hardcoded to GL.

There is some duplication with ra_format and gl_format handling. In the
end, only the ra variants will be needed (plus the gl_format table of
course). For now, this is simpler, because for some reason lots of hwdec
code still requires the GL variants, and would have to be updated to
use the ra ones.

Currently, the video.c code accesses private ra_gl fields. In the end,
it should not do that of course, and it would not include ra_gl.h.

Probably adds bugs, but you can keep them.
2017-07-26 11:31:43 +02:00
Niklas Haas
5904eddb38
vo_opengl: describe the texture uploading mode
Be a bit more transparent here, which is especially helpful when people
are sending me screenshots of stats pages.
2017-07-26 02:42:23 +02:00
Niklas Haas
b31020b193
vo_opengl: check against shmem limits
The radius check was not strict enough, especially not for all
platforms. To fix this, actually check the hardware capabilities instead
of relying on a hard-coded maximum radius.
2017-07-26 01:54:33 +02:00
Niklas Haas
49a648447f vo_opengl: cosmetic change 2017-07-25 20:14:03 +02:00
Niklas Haas
62de84cbe3
vo_opengl: kill off FBOTEX_COMPUTE again
The textures not having an FBO actually caused regressions when trying
to render the subtitles on top of this texture (--blend-subtitles),
which still relied on an FBO.

So just kill off the logic entirely. Why worry about a single FBO wasted
when we're allocating like 10 anyway.

Fixes #4657.
2017-07-25 06:32:29 +02:00
Niklas Haas
d099e037ef
vo_opengl: fix incoherent SSBO usage
According to the OpenGL spec, atomic access to SSBO variables is *not*
guaranteed to be coherent, even when reusing the same SSBO attached to
the same shader across different frames. So we actually need a
glMemoryBarrier here, at least in theory.
2017-07-25 06:11:57 +02:00
Niklas Haas
cd226bdfd8 vo_opengl: fix incoherent texture usage
This bug slipped past my attention because nvidia ignores memory
barriers, but this is not necessarily always the case. Since
image_load_store is incoherent (specifically, writing to images from
compute shaders is incoherent) we need to insert a memory barrier to
make it coherent again. Since we only care about texture fetches, that's
the only barrier we need.
2017-07-25 05:22:29 +02:00
Niklas Haas
241d5ebc46
vo_opengl: adjust the rules for linearization
Two changes, compounded into one since they affect the same logic:

1. Never use linearization for HDR downscaling
2. Always use linearization for interpolation

Instead of fixing p->use_linear at the beginning of pass_render_frame,
we flip it on "dynamically" as needed. I plan on killing this
p->use_linear frame (along with other per-pass metadata) and moving them
into their own struct for tracking the "current" state of the video, but
that's a separate/upcoming refactor.

As a small bonus, reduce some code duplication in the interpolation
logic.

Fixes #4631
2017-07-24 23:26:15 +02:00
Bin Jin
13ef6bcf6f vo_opengl: enable compute shader for mesa
Mesa 17.1 supports compute shader but not full specs of OpenGL 4.3.
Change the code to detect OpenGL extension "GL_ARB_compute_shader"
rather than OpenGL version 4.3.

HDR peak detection requires SSBO, and polar scaler requires 2D array
extension. Add these extensions as requirement as well.
2017-07-25 04:07:26 +08:00
Niklas Haas
0c84ee01d5
vo_opengl: support user compute shaders
These are identical to regular fragment shader hooks, but with extra
metadata indicating the preferred block size.
2017-07-24 17:19:34 +02:00
Niklas Haas
f338ec4591 vo_opengl: implement compute shader based EWA kernel
This performs almost 50% faster on my machine (!!), from 4650μs down to
about 3176μs for ewa_lanczossharp.

It's possible we could use a similar approach to speed up the separable
scalers, although with vastly simpler code. For separable scalers we'd
also have the additional huge benefit of only needing padding in one
direction, so we could potentially use a big 256x1 kernel or something
to essentially compute an entire row at once.
2017-07-24 17:19:31 +02:00
Niklas Haas
b196cadf9f vo_opengl: support HDR peak detection
This is done via compute shaders. As a consequence, the tone mapping
algorithms had to be rewritten to compute their known constants in GLSL
(ahead of time), instead of doing it once. Didn't affect performance.

Using shmem/SSBO atomics in this way is extremely fast on nvidia, but it
might be slow on other platforms. Needs testing.

Unfortunately, setting up the SSBO still requires OpenGL calls, which
means I can't have it in video_shaders.c, where it belongs. But I'll
defer worrying about that until the backend refactor, since then I'll be
breaking up the video/video_shaders structure anyway.
2017-07-24 17:19:31 +02:00
Niklas Haas
aad6ba018a vo_opengl: support compute shaders
These can either be invoked as dispatch_compute to do a single
computation, or finish_pass_fbo (after setting compute_size_minimum) to
render to a new texture using a compute shader. To make this stuff all
work transparently, we try really, really hard to make compute shaders
as identical to fragment shaders as possible in their behavior.
2017-07-24 17:19:31 +02:00
Niklas Haas
eb54d2ad4d vo_opengl: cut down on FBOTEX_FUZZY abuse
Don't use FBOTEX_FUZZY where the FBO is sized according to
p->texture_w/h, since this changes infrequently (and when it does, we
need to reset everything anyway). No real reason to make this change
other than that it possibly prevents nasty surprises in the future, so I
feel more comfortable about it.
2017-07-24 16:41:38 +02:00
wm4
24dc91907a common, vo_opengl: add/use helper for formatted strings on the stack
Seems like I really like this C99 idiom. No reason not to generalize it
do snprintf(). Introduce mp_tprintf(), which basically this idiom to
snprintf(). This macro looks like it returns a string that was allocated
with alloca() on the caller site, except it's portable C99/C11. (And
unlike alloca(), the result is valid only within block scope.)

Use it in 2 places in the vo_opengl code. But it has the potential to
make a whole bunch of weird looking code look slightly nicer.
2017-07-24 08:12:42 +02:00
wm4
64d56114ed vo_opengl: add direct rendering support
Can be enabled via --vd-lavc-dr=yes. See manpage additions for what it
does.

This reminds of the MPlayer -dr flag, but the implementation is
completely different. It's the same basic concept: letting the decoder
render into a GPU buffer to avoid a copy. Unlike MPlayer, this doesn't
try to go through filters (libavfilter doesn't support this anyway).
Unless a filter can work in-place, DR will be silently disabled. MPlayer
had very complex semantics about buffer types and management (which
apparently nobody ever understood) and weird restrictions that mostly
limited it to mpeg2 style codecs. The mpv code does not do any of this,
and just lets the decoder allocate an arbitrary number of untyped
images. (No MPlayer code was used.)

Parts of the code based on work by atomnuker (starting point for the
generic code) and haasn (some GL definitions, some basic PBO code, and
correct fencing).
2017-07-24 04:32:55 +02:00
wm4
87e9cd13aa vo_opengl: add printf format checking to pass_describe() 2017-07-22 22:08:23 +02:00
wm4
d3dfffdf02 vo_opengl: osd: use new VAO mechanism
In addition to using the new VAO mechanism introduced in the previous
commit, this tries to keep the OSD code self-contained. This doesn't
work all too well (because of the pass and CMS stuff), but it's still
better than before.
2017-07-22 21:55:38 +02:00
wm4
a24df94fed vo_opengl: add mechanism to create/cache VAO on the fly
This removes VAO handling from video.c. Instead the shader cache will
create the VAO as needed. The consequence is that this creates a VAO
per shader, which might be a bit wasteful, but doesn't matter anyway.
2017-07-22 21:51:32 +02:00
wm4
1c81311673 vo_opengl: osd: refactor and simplify
Reduce this to 1 draw call per OSD pass. This removes the need for some
annoying special handling regarding 3D video support (we supported
duplicating the OSD/subtitles for side-by-side 3D output etc.).

Remove the unneeded texture sampler uniform thing.
2017-07-22 21:46:59 +02:00