RepoMirrors/mpv

mirror of https://github.com/mpv-player/mpv synced 2025-01-10 00:49:32 +00:00

Author	SHA1	Message	Date
Niklas Haas	b93bcce5df	vo_opengl: coalesce intra-plane PBOs Instead of allocating three PBOs and cycling through them, we allocate one PBO that's three times as large, and cycle through the subregion offsets. This results in arguably simpler code and faster initialization performance. Especially for 4K textures, initializing PBOs can take quite some time (e.g. 180ms -> 110ms). For 1080p, it's more like 66ms -> 52ms for me. The alignment to 4096 is completely unnecessary by spec, but we do it anyway just for peace of mind.	2017-07-15 22:11:48 +02:00
Niklas Haas	dd78cc6fe7	vo_opengl: refactor vo performance subsystem This replaces `vo-performance` by `vo-passes`, bringing with it a number of changes and improvements: 1. mpv users can now introspect the vo_opengl passes, which is something that has been requested multiple times. 2. performance data is now measured per-pass, which helps both development and debugging. 3. since adding more passes is cheap, we can now report information for more passes (e.g. the blit pass, and the osd pass). Note: we also switch to nanosecond scale, to be able to measure these passes better. 4. `--user-shaders` authors can now describe their own passes, helping users both identify which user shaders are active at any given time as well as helping shader authors identify performance issues. 5. the timing data per pass is now exported as a full list of samples, so projects like Argon-/mpv-stats can immediately read out all of the samples and render a graph without having to manually poll this option constantly. Due to gl_timer's design being complicated (directly reading performance data would block, so we delay the actual read-back until the next _start command), it's vital not to conflate different passes that might be doing different things from one frame to another. To accomplish this, the actual timers are stored as part of the gl_shader_cache's sc_entry, which makes them unique for that exact shader. Starting and stopping the time measurement is easy to unify with the gl_sc architecture, because the existing API already relies on a "generate, render, reset" flow, so we can just put timer_start and timer_stop in sc_generate and sc_reset, respectively. The ugliest thing about this code is that due to the need to keep pass information relatively stable in between frames, we need to distinguish between "new" and "redrawn" frames, which bloats the code somewhat and also feels hacky and vo_opengl-specific. (But then again, this entire thing is vo_opengl-specific)	2017-07-01 00:58:27 +02:00
wm4	759ac6cc93	vo_opengl: add option for caching shaders on disk Mostly because of ANGLE (sadly). The implementation became unpleasantly big, but at least it's relatively self-contained. I'm not sure to what degree shaders from different drivers are compatible as in whether a driver would randomly misbehave if it's fed a binary created by another driver. The useless binayFormat parameter won't help it, as they can probably easily clash. As usual, OpenGL is pretty shit here.	2017-04-08 16:43:56 +02:00
wm4	71caa0b79b	vo_opengl: remove two unused symbols	2017-04-08 16:43:56 +02:00
wm4	1c0bd59bc2	vo_opengl: use 16 bit textures with angle Regression due to `03fe506`. It accidentally changed the default value if glGetTexLevelParameteriv() is not available, which is the case with ANGLE.	2017-04-03 18:12:42 +02:00
wm4	8fb9cc2534	vo_opengl: read framebuffer depth from actual FBO used for rendering In some cases, such as when using the libmpv opengl-cb API, or with certain vo_opengl backends, the main framebuffer is never accessed. Instead, rendering is done to a FBO that acts as back buffer. This meant an incorrect/broken bit depth could be used for dithering. Change it to read the framebuffer depth lazily on the first render call. Also move the main FBO field out of the GL struct to MPGLContext, because the renderer's init function does not need to access it anymore.	2017-03-20 13:31:28 +01:00
wm4	03fe50651b	vo_opengl: move some init_gl code to utility functions	2017-03-20 13:20:35 +01:00
wm4	03933f3564	vo_opengl: fix some undefined behavior The gl_timer_last_us() function could access samples[-1]. Fix by coercing to unsigned, so the % will put it into index [0,max). The real value returned in this corner case doesn't mean too much, I guess.	2017-03-18 20:22:50 +01:00
wm4	79272e1469	Fix two typos They're unrelated. Sue me.	2017-02-20 08:47:17 +01:00
wm4	09238a9bb5	vo_opengl: don't rely on viewport to contain window dimensions Apparently we don't always set the viewport to window dimensions anymore, e.g. if nothing is actually rendered. This means the viewport can contain old values. The window screenshot code uses the viewport values to guess the default framebuffer dimensions. With --force-window --idle --no-osc (which draws nothing and issues a glClear() command only), taking a screenshot would yield an image with the wrong size and possibly garbage in it. Fix this by explicitly passing the currently known window dimensions. Abusing the values stored in the viewport was questionable anyway.	2016-12-02 15:26:45 +01:00
wm4	30d147687f	vo_opengl: fix OSD with icc-profile after previous commit This happened to break because the texture unit wasn't reset to 0, which some code expects. The OSD code in particular set the OSD texture on the wrong texture unit, with the result that OSD/OSC was not visible.	2016-09-14 22:49:08 +02:00
wm4	88a07c5f53	vo_opengl: dynamically manage texture units A minor cleanup that makes the code simpler, and guarantees that we cleanup the GL state properly at any point. We do this by reusing the uniform caching, and assigning each sampler uniform its own texture unit by incrementing a counter. This has various subtle consequences for the GL driver, which hopefully don't matter. For example, it will bind fewer textures at a time, but also rebind them more often. For some reason we keep TEXUNIT_VIDEO_NUM, because it limits the number of hook passes that can be bound at the same time. OSD rendering is an exception: we do many passes with the same shader, and rebinding the texture each pass. For now, this is handled in an unclean way, and we make the shader cache reserve texture unit 0 for the OSD texture. At a later point, we should allocate that one dynamically too, and just pass the texture unit to the OSD rendering code. Right now I feel like vo_rpi.c (may it rot in hell) is in the way.	2016-09-14 20:46:45 +02:00
wm4	e24ba8fa7f	vo_opengl: require explicit reset on shader cache after rendering The caller now has to call gl_sc_reset(), and _after_ rendering. This way we can unset OpenGL state that was setup for rendering. This affects the shader program, for example. The next commit uses this to automatically manage texture units via the shader cache. vo_rpi.c changes untested.	2016-09-14 20:24:06 +02:00
Niklas Haas	8f1a889f75	vo_opengl: make the number of PBOs tunable Also set the number of PBOs from 2 to 3, which should be better for pipelining. This makes it easier to add more in the future.	2016-09-14 14:07:21 +02:00
wm4	e8cdc22245	vo_opengl: better behavior in GL error corner cases If the shader fails to compile, and assertion could trigger in gl_sc_gen_shader_and_reset() due to the code trying to recreate the shader every time, and re-appending the uniforms every time. Just reset the uniform array to fix this. Some disturbed GL drivers might not return anything for glGetShaderiv() if the GL state got "lost", so initialize variables just for additional robustness.	2016-09-12 20:05:43 +02:00
wm4	b57debe5b3	vo_opengl: use ringbuffer of PBOs This is how PBOs are normally supposed to be used. Unfortunately I can't see an any absolute improvement on nVidia binary drivers and playing 4K material. Compared to the "old" PBO path with 1 buffer, the measured GL time decreases significantly, though.	2016-07-03 16:34:32 +02:00
wm4	823c353faa	vo_opengl: move PBO upload handling to shared code This introduces a gl_pbo_upload_tex() function, which works almost like our gl_upload_tex() glTexSubImage2D() wrapper, except it takes a struct which caches the PBO handles. It also takes the full texture size (to make allocating an ideal buffer size easier), and a parameter to disable PBOs (so that the caller doesn't have to duplicate the gl_upload_tex() call if PBOs are disabled or unavailable). This also removes warnings and fallbacks on PBO failure. We just silently try using PBOs on every frame, and if that fails at some point, revert to normal texture uploads. Probably doesn't matter.	2016-07-03 16:34:32 +02:00
quilloss	24478a8a72	vo_opengl utils: use gl->main_fb when reading window content The main framebuffer is not the default framebuffer for the dxinterop backend. Bind the main framebuffer and use the appropriate attachment when reading the window content. Fix #3284	2016-06-26 13:17:39 +02:00
Bin Jin	3df95ee57a	vo_opengl: remove uniform buffer object routines	2016-06-18 19:16:31 +02:00
wm4	a15181e5df	vo_opengl: do not leak previous FBO when reallocating it WTF of the day.	2016-06-08 17:49:23 +02:00
Niklas Haas	8ceb935bd8	vo_opengl: add time queries To avoid blocking the CPU, we use 8 time objects and rotate through them, only blocking until the last possible moment (before we need access to them on the next iteration through the ring buffer). I tested it out on my machine and 4 query objects were enough to guarantee block-free querying, but the extra margin shouldn't hurt. Frame render times are just output at the end of each frame, via MP_DBG. This might be improved in the future. (In particular, I want to expose these numbers as properties so that users get some more visible feedback about render times) Currently, we measure pass_render_frame and pass_draw_to_screen separately because the former might be called multiple times due to interpolation. Doing it this way gives more faithful numbers. Same goes for frame upload times.	2016-06-07 12:16:15 +02:00
James Ross-Gowan	84fba1df21	vo_opengl: enable color management on GLES This requires the GL_EXT_texture_norm16 extension and works in ANGLE. A default precision had to be set for sampler3Ds, otherwise the shaders would fail to compile.	2016-05-27 23:02:26 +10:00
wm4	c4707cdee6	vo_opengl: fix other minor namespace issues See previous commit.	2016-05-23 21:27:18 +02:00
wm4	e76aa7e8db	vo_opengl: rename glUploadTex, drop unused parameter Rename it to get out of OpenGL's namespace. The gl_ prefix is used by other mpv functions, but no OpenGL ones. The "slice" parameter was never actually used, and all callers passed 0 for it.	2016-05-23 21:27:18 +02:00
wm4	66079048ea	vo_opengl: unify PBO and normal OSD texture upload path The main change is actually that e first copy to a "staging" memory frame, and then upload this at once. The old non-PBO code called glTexsubImage2D for each OSD sub-bitmap. The new non-PBO code path is a bit faster now if there are many small sub-bitmaps (on Linux/nVidia). It's also a bit simpler, so this is a win. (Although I don't particularly appreciate the mixed normal/PBO texture code.)	2016-05-23 21:27:18 +02:00
wm4	cc72a4e8c3	vo_opengl: support framebuffer invalidation Not sure how much can be gained with this, as we can't use it properly yet. For now, this is used only before rendering, which probably does overwhelmingly nothing. In the future, this should be used after temporary passes, which could possibly reduce memory usage and even memory bandwidth usage, depending on the drivers.	2016-05-23 21:27:18 +02:00
wm4	b0d3c2ede7	vo_opengl: make gl_sc_enable_extension() permanent/idempotent No reason not to, and makes the following commit slightly simpler. In fact, this makes the shaders more correct too. Normally, "#extension" must come before any normal shader text, including the "precision" directive. Not sure why this worked before. (Probably didn't.)	2016-05-19 12:01:59 +02:00
wm4	a2d58d9986	vo_opengl: move UT_buffer to switch handling No reason to make it a special case.	2016-05-17 10:48:05 +02:00
wm4	f00040b9fc	vo_opengl: make number of cached shaders/uniform dynamic Use dynamic memory allocation, as the static allocation is starting to get annoying. Currently, SC_MAX_ENTRIES is essentially still a static upper limit on the number of shaders. But in future we could try a more clever cache replacement strategy, which does not keep stale entries forever if the maximum happens not to be reached.	2016-05-17 10:45:01 +02:00
wm4	09763bef83	vo_opengl: move cached uniforms to a separate struct	2016-05-17 10:33:45 +02:00
Niklas Haas	8262a4dd52	vo_opengl: increase shader limits The new uniforms introduced by `362015c` have exceeded the uniform limit when using high-radius tscale. In addition, the SC limit of 32 entries might be pushing it with user shaders. Just make these value a bigger to delay the onset of this same failure mode. Maybe in the future it should be reworked to grow dynamically? Either way, we can always predict a static upper bound on the number of uniforms and shader cache entries, it's just that we forgot to do so. Fixes #3151	2016-05-17 10:09:01 +02:00
Niklas Haas	dfc7b59909	vo_opengl: make the screen blue on shader errors This helps visually signify that somthing went wrong, and prevents confusing shader compilation errors with other types of bugs.	2016-05-15 20:42:02 +02:00
Niklas Haas	7c3d78fd82	vo_opengl: support external user hooks This allows users to add their own near-arbitrary hooks to the vo_opengl processing pipeline, greatly enhancing the flexibility of user shaders. This enables, among other things, user shaders such as CrossBilateral, SuperRes, LumaSharpen and many more. To make parsing the user shaders easier, shaders are now loaded as bstrs, and the hooks are set up during video reconfig instead of on every single frame.	2016-05-15 20:42:02 +02:00
wm4	449b948ee8	vo_opengl: remove some pointless compatibility Remove non-texture_rg compatibility from LUT sampling. OpenGL without texture_rg support will always trigger dumb-mode, and dumb-mode does not use LUTs. It used not to, and that was when this made sense.	2016-05-14 12:02:02 +02:00
wm4	84ccebd9b9	vo_opengl: reorganize texture format handling This merges all knowledge about texture format into a central table. Most of the work done here is actually identifying which formats exactly are supported by OpenGL(ES) under which circumstances, and keeping this information in the format table in a somewhat declarative way. (Although only to the extend needed by mpv.) In particular, ES and float formats are a horrible mess. Again this is a big refactor that might cause regression on "obscure" configurations.	2016-05-12 21:22:28 +02:00
wm4	d4712af5af	vo_opengl: angle: dump translated shaders Helpful for debugging and such.	2016-05-12 11:17:49 +02:00
wm4	a3d416c3d3	vo_opengl: d3d11egl: native NV12 sampling support This uses EGL_ANGLE_stream_producer_d3d_texture_nv12 and related extensions to map the D3D textures coming from the hardware decoder directly in GL. In theory this would be trivial to achieve, but unfortunately ANGLE does not have a mechanism to "import" D3D textures as GL textures. Instead, an awkward mechanism via EGL_KHR_stream was implemented, which involves at least 5 extensions and a lot of glue code. (Even worse than VAAPI EGL interop, and very far from the simplicity you get on OSX.) The ANGLE mechanism so far supports only the NV12 texture format, which means 10 bit won't work. It also does not work in ES3 mode yet. For these reasons, the "old" ID3D11VideoProcessor code is kept and used as a fallback.	2016-05-10 21:06:34 +02:00
wm4	bd98d9e232	vo_opengl: slightly compress gl_set_debug_logger() No functional changes.	2016-03-28 18:07:41 +02:00
wm4	b8b2a465d1	vo_opengl: reduce temporary variables in gl_transform_trans() Using a single gl_transform variable instead of many float ones makes it easier to see what it's doing. No functional change.	2016-03-28 18:07:18 +02:00
wm4	e5b5cc2a2f	vo_opengl: fix row-major vs. column-major confusion gl_transform_vec() assumed column-major, while everything else seemed to assumed row-major memory organization for gl_transform.m. Also, gl_transform_trans() seems to contain additional confusion. This didn't matter until now, as everything has been orthogonal, this the swapped matrix entries were always 0.	2016-03-28 16:16:09 +02:00
wm4	a76f3e8e46	vo_opengl: minor coding style adjustment	2016-03-24 21:23:22 +01:00
wm4	da015d9d00	vo_opengl: utils: some more minor shader string building optimization Instead of reallocating almost all of the shader string several times per pass, build it into a fixed buffer that will be reallocated as needed. While this still uses a linear search and full comparison of the shader text, this will compare the shader's string length first before doing a full comparison as a nice side effect. (That's also why the fragment shader is compared first - it's more likely to be different for different cache entries than the vertex shader stub.)	2016-03-24 21:22:10 +01:00
wm4	30e94fa711	vo_opengl: utils: slightly optimize shader string building Use bstr as appending buffer, which should avoid frequent reallocation and copying. The previous commit should help with this a little.	2016-03-23 22:03:53 +01:00
wm4	0e1e4005fb	vo_opengl: use the same type for cached and current uniform values Slightly improvement over the previous commit.	2016-03-10 22:42:20 +01:00
igv	5199c2ee3a	vo_opengl: cache the values of the uniform variables	2016-03-10 22:40:14 +01:00
igv	95ca308d44	vo_opengl: cache the locations of the uniform variables	2016-03-09 22:57:05 +01:00
Niklas Haas	93546f0c2f	vo_opengl: refactor pass_read_video and texture binding This is a pretty major rewrite of the internal texture binding mechanic, which makes it more flexible. In general, the difference between the old and current approaches is that now, all texture description is held in a struct img_tex and only explicitly bound with pass_bind. (Once bound, a texture unit is assumed to be set in stone and no longer tied to the img_tex) This approach makes the code inside pass_read_video significantly more flexible and cuts down on the number of weird special cases and spaghetti logic. It also has some improvements, e.g. cutting down greatly on the number of unnecessary conversion passes inside pass_read_video (which was previously mostly done to cope with the fact that the alternative would have resulted in a combinatorial explosion of code complexity). Some other notable changes (and potential improvements): - texture expansion is now always handled in pass_read_video, and the colormatrix never does this anymore. (Which means the code could probably be removed from the colormatrix generation logic, modulo some other VOs) - struct fbo_tex now stores both its "physical" and "logical" (configured) size, which cuts down on the amount of width/height baggage on some function calls - vo_opengl can now technically support textures with different bit depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries inside img_format.c doesn't export this (nor does ffmpeg support it, really) so the status quo of using the same tex_mul for all planes is kept. - dumb_mode is now only needed because of the indirect_fbo being in the main rendering pipeline. If we reintroduce p->use_indirect and thread a transform through the entire program this could be skipped where unnecessary, allowing for the removal of dumb_mode. But I'm not sure how to do this in a clean way. (Which is part of why it got introduced to begin with) - It would be trivial to resurrect source-shader now (it would just be one extra 'if' inside pass_read_video).	2016-03-05 13:08:38 +01:00
Niklas Haas	2f562825e0	vo_opengl: declare vec4 color inside fragment shader stub Why was this done so stupidly, with so many complicated special cases, before? Declare it once so the shader bits don't have to figure out where and when to do so themselves.	2016-02-23 20:58:15 +01:00
wm4	f8bb24184b	vo_opengl: add precision qualifier to usampler2D on ANGLE GLES requires this. Some more common sampler types have default precisions, but not usampler2D. Newer ANGLE builds verify this more strictly than older builds, so this wasn't caught before. Fixes #2761.	2016-01-27 21:07:57 +01:00
wm4	7b6e3772ab	vo_opengl: support 10 bit support with ANGLE GLES does not support high bit depth fixed point textures for unknown reasons, so direct 10 bit input is not possible. But we can still use integer textures, which are supported by GLES 3.0. These store integer data just like the standard fixed point textures, except they are not normalized on sampling. They also don't support bilinear filtering, and require a special sampler ("usampler2D"). While these texture formats enable us to shuffle the data to the GPU, they're rather impractical with the requirements mentioned above and our current architecture. One problem is that most code assumes it can always use bilinear scaling (even if bilinear is never used when using appropriate scale/cscale options). Another is that we don't have any concept of running a function on a texture in an uniform way. So for now, run a simple conversion step through a FBO. The FBO will use the rgba16f format normally, which gives enough bits for 10 bit, and will at least gracefully degrade with higher depth input. This is bound to be much slower than a more "direct" method, but at least it works and is simple to implement. The odd change of function call order in init_video() is to properly disable "dumb mode" (no FBO use) if these texture formats are in use.	2016-01-26 21:35:23 +01:00

1 2

61 Commits