RepoMirrors/mpv - mpv

Commit Graph

Author	SHA1	Message	Date
Niklas Haas	a6aab5dfd6	vo_gpu: vulkan: refine queue family selection algorithm This gets confused by e.g. SPARSE_BIT on the TRANSFER_BIT, leading to situations where "more specialized" is ambiguous and the logic breaks down. So to fix it, only compare the subset we care about.	2017-12-25 00:47:53 +01:00
Niklas Haas	2d1769a534	vo_gpu: vulkan: prefer vkCmdCopyImage over vkCmdBlitImage blit() implies scaling, copy() is the equivalent command to use when the formats are compatible (same pixel size) and the rects have the same dimensions.	2017-12-25 00:47:53 +01:00
Niklas Haas	a42b8b1142	vo_gpu: attempt re-using the FBO format for p->output_tex This allows RAs with support for non-opaque FBO formats to use a more appropriate FBO format for the output tex, possibly enabling a more efficient blit operation. This requires distinguishing between real formats (which can be used to create textures) and fake formats (e.g. ra_gl's FBO hack).	2017-12-25 00:47:53 +01:00
Niklas Haas	80540be211	vo_gpu: vulkan: properly depend on the swapchain acquire semaphore This is now associated with the ra_tex directly and used in the correct way, rather than hackily done from submit_frame.	2017-12-25 00:47:53 +01:00
Niklas Haas	b138bdc01c	vo_gpu: vulkan: use correct access flag for present This needs VK_ACCESS_MEMORY_READ_BIT (spec)	2017-12-25 00:47:53 +01:00
Niklas Haas	8b0a111c59	vo_gpu: vulkan: make the swapchain more robust Now handles both VK_ERROR_OUT_OF_DATE_KHR and VK_SUBOPTIMAL_KHR for both vkAcquireNextImageKHR and vkQueuePresentKHR in the correct way.	2017-12-25 00:47:53 +01:00
Niklas Haas	dcda8bd36a	vo_gpu: aggressively prefer async compute On AMD devices, we only get one graphics pipe but several compute pipes which can (in theory) run independently. As such, we should prefer compute shaders over fragment shaders in scenarios where we expect them to be better for parallelism. This is amusingly trivial to do, and actually improves performance even in a single-queue scenario.	2017-12-25 00:47:53 +01:00
Niklas Haas	bded247fb5	vo_gpu: vulkan: support split command pools Instead of using a single primary queue, we generate multiple vk_cmdpools and pick the right one dynamically based on the intent. This has a number of immediate benefits: 1. We can use async texture uploads 2. We can use the DMA engine for buffer updates 3. We can benefit from async compute on AMD GPUs Unfortunately, the major downside is that due to the lack of QF ownership tracking, we need to use CONCURRENT sharing for all resources (buffers and images!). In theory, we could try figuring out a way to get rid of the concurrent sharing for buffers (which is only needed for compute shader UBOs), but even so, the concurrent sharing mode doesn't really seem to have a significant impact over here (nvidia). It's possible that other platforms may disagree. Our deadlock-avoidance strategy is stupidly simple: Just flush the command every time we need to switch queues, and make sure all submission and callbacks happen in FIFO order. This required lifting the cmds_pending and cmds_queued out from vk_cmdpool to mpvk_ctx, and some functions died/got moved as a result, but that's a relatively minor change. On my hardware this is a fairly significant performance boost, mainly due to async transfers. (Nvidia doesn't expose separate compute queues anyway). On AMD, this should be a performance boost as well due to async compute.	2017-12-25 00:47:53 +01:00
Niklas Haas	6186cc79e6	vo_gpu: allow invalidating FBO in renderpass_run This is especially interesting for vulkan since it allows completely skipping the layout transition as part of the renderpass. Unfortunately, that also means it needs to be put into renderpass_params, as opposed to renderpass_run_params (unlike #4777). Closes #4777.	2017-12-25 00:47:53 +01:00
Niklas Haas	fb1c7bde42	vo_gpu: vulkan: properly track image dependencies This uses the new vk_signal mechanism to order all access to textures. This has several advantageS: 1. It allows real synchronization of image access across multiple frames when using multiple queues for parallelism. 2. It allows using events instead of pipeline barriers, which is a finer-grained synchronization primitive that allows for more efficient layout transitions over longer durations. This commit also restructures some of the implicit transition code for renderpasses to be more flexible and correct. (Note: this technically drops the ability to transition the image out of undefined layout when not blending, but that was a bug anyway and needs to be done properly) vo_gpu: vulkan: remove no-longer-true optimization The change to the output_tex format makes this no longer true, and it actually seems to hurt performance now as well. So just don't do it anymore. I also realized it hurts performance when drawing an OSD, so it's probably not a good idea anyway.	2017-12-25 00:47:53 +01:00
Niklas Haas	f2f91cf570	vo_gpu: vulkan: add a vk_signal abstraction This combines VkSemaphores and VkEvents into a common umbrella abstraction which can resolve to either. We aggressively try to prefer VkEvents over VkSemaphores whenever the conditions are met (1. we can unsignal the semaphore, i.e. it comes from the same frame; and 2. it comes from the same queue).	2017-12-25 00:47:53 +01:00
Niklas Haas	5feaaba0fd	vo_gpu: vulkan: refactor command submission Instead of being submitted immediately, commands are appended into an internal submission queue, and the actual submission is done once per frame (at the same time as queue cycling). Again, the benefits are not immediately obvious because nothing benefits from this yet, but it will make more sense for an upcoming vk_signal mechanism. This also cleans up the way the ra_vk submission interacts with the synchronization/callbacks from the ra_vk_ctx. Although currently, the way the dependency is signalled is a bit hacky: normally it would be associated with the ra_tex itself and waited on in the appropriate stage implicitly. But that code is just temporary, so I'm keeping it in there for a better commit order.	2017-12-25 00:47:53 +01:00
Niklas Haas	885497a445	vo_gpu: vulkan: reorganize vk_cmd slightly Instead of associating a single VkSemaphore with every command buffer and allowing the user to ad-hoc wait on it during submission, make the raw semaphores-to-signal array work like the raw semaphores-to-wait-on array. Doesn't really provide a clear benefit yet, but it's required for upcoming modifications.	2017-12-25 00:47:53 +01:00
Niklas Haas	4e34615872	vo_gpu: vulkan: refactor vk_cmdpool 1. No more static arrays (deps / callbacks / queues / cmds) 2. Allows safely recording multiple commands at the same time 3. Uses resources optimally by never over-allocating commands	2017-12-25 00:47:53 +01:00
James Ross-Gowan	9b2dae79b1	vo_gpu: d3d11: add RA caps for ra_d3d11 ra_d3d11 uses the SPIR-V compiler to translate GLSL to SPIR-V, which is then translated to HLSL. This means it always exposes the same GLSL version that the SPIR-V compiler supports (4.50 for shaderc/glslang.) Despite claiming to support GLSL 4.50, some features that are tied to the GLSL version in OpenGL are not supported by ra_d3d11 when targeting legacy Direct3D feature levels. This includes two features that mpv relies on: - Reading from gl_FragCoord in the fragment shader (requires FL 10_0) - textureGather from any texture component (requires FL 11_0) These features have been exposed as new RA caps.	2017-11-07 20:27:13 +11:00
James Ross-Gowan	8020a62953	vo_gpu: export the GLSL format qualifier for ra_format Backported from @haasn's change to libplacebo, except in the current RA, there's nothing to indicate an ra_format can be bound as a storage image, so there's no way to force all of these formats to have a glsl_format. Instead, the layout qualifier will be removed if glsl_format is NULL. This is needed for the upcoming ra_d3d11 backend. In Direct3D 11, while loading float values from unorm images often works as expected, it's technically undefined behaviour, and in Windows 10, it will cause the debug layer to spam the log with error messages. Also, apparently in GLSL, the format name must match the image's format exactly (but in Direct3D, it just has to have the same component type.)	2017-11-07 20:27:13 +11:00
James Ross-Gowan	41dff03f8d	vo_gpu: add namespace query mechanism Backported from @haasn's change to libplacebo. More flexible than the previous "shared \|\| non-shared" distinction. The extra flexibility is needed for Direct3D 11, but it also doesn't hurt code-wise.	2017-11-07 20:27:13 +11:00
wm4	7cfae5adce	vo_gpu: semi-fix --gpu-context/--gpu-api options and help output This was confusing at best. Change it to output the actual choices. (Seems like in the end it's always me who has to clean up other people's bullshit.) Context names were not unique - but they should be, so fix it. The whole point of the original --opengl-backend option was to side-step the tricky auto-detection, so you know exactly what you get. The goal of this commit is to make --gpu-context work the same way. Fix the non-unique names by appending "vk" to the names. Keep in mind that this was not suitable for slecting the "UI" backend anyway, since "x11" would force GLX, whereas people on not-NVIDIA actually want "x11egl". Users trying to use --gpu-context=x11 to force the X11 backend would always end up with GLX, which would at least break VAAPI hardware decoding for them. Basically the idea that this option could select the "UI" type is completely broken - it selects an implementation, which implies a UI. Selecting the UI type This would require a separate mechanism. (Although in theory this separate mechanism could be part of the --gpu-context option - in any case, someone would have to implement it.) To achieve help output that can actually be understood, just duplicate the code. Most of that code is duplicated anyway, and trying to share just the list code with the result of making the output unreadable doesn't make too much sense. If we wanted to save code/effort, we could just remove the help output altogether. --gpu-api has non-unique entries, and it would be nice to group them (e.g. list all OpenGL capable contexts with "opengl"), but C makes this simple idea too much of a pain, so don't do it. Also remove a stray tab from the android entry on the manpage.	2017-10-16 10:57:51 +02:00
Rostislav Pehlivanov	4c7c8daf9c	wayland_common: implement output tracking, cleanups and bugfixes This commit: - Implements output tracking (e.g. monitor plug/unplug) - Creates the surface during registry (no other dependencies) - Queues the callback immediately after surface creation - Cleaner and better event handling (functions return directly) - Better reconfigure handling (resizes reduced to 1 during init) - Don't unnecessarily resize (if dimensions match) Apart from that fixes 2 potential memory leaks (mime type and window title), 2 string ownership issues (output name and make need to be dup'd), fixes some style issues (switches were indented) and finally adds messages when disabling/enabling idle inhibition. The callback setter function was removed in preparation for the commit which will use the frame event cb because it was unnecessary.	2017-10-09 02:23:04 +01:00
Rostislav Pehlivanov	68f9ee7e0b	wayland_common: rewrite from scratch The wayland code was written more than 4 years ago when wayland wasn't even at version 1.0. This commit rewrites everything in a more modern way, switches to using the new xdg v6 shell interface which solves a lot of bugs and makes mpv tiling-friedly, adds support for drag and drop, adds support for touchscreens, adds support for KDE's server decorations protocol, and finally adds support for the new idle-inhibitor protocol. It does not yet use the frame callback as a main rendering loop driver, this will happen with a later commit.	2017-10-03 19:36:02 +01:00
Niklas Haas	f6fd2a05c4	vo_gpu: vulkan: reword comment This is fixed upstream (and we now know it's a driver bug) so reword the comment.	2017-09-29 00:48:39 +02:00
James Ross-Gowan	a65abe2447	vo_gpu: vulkan: add support for Windows	2017-09-28 10:02:22 +10:00
Niklas Haas	868bf4da7d	vo_gpu: vulkan: indent queue family enumeration Consistency	2017-09-27 00:46:20 +02:00
Niklas Haas	5b6b77b8dc	vo_gpu: vulkan: normalize use of Flags and FlagBits FlagBits is just the name of the enum. The actual data type representing a combination of these flags follows the Flags convention. (The relevant difference is that the latter is defined to be uint32_t instead of left implicit) For consistency, use Flags everywhere instead of randomly switching between Flags and FlagBits. Also fix a wrong type name on `stageFlags`, pointed out by @atomnuker	2017-09-27 00:25:18 +02:00
Niklas Haas	0ba6c7d73f	vo_gpu: vulkan: optimize redundant pipeline barriers Using renderpass layout transitions is more optimal and doesn't require a redundant pipeline barrier. Since our render passes are static and don't change throughout the lifetime of a ra_renderpass, we unfortunately don't have much flexibility here - so just hard-code SHADER_READ_ONLY_OPTIMAL as the output format as this will be the most common case. We also can't short-circuit the transition when we need to preserve the framebuffer contents, since that depends on the current layout; so we still use an explicit tex_barrier in this case. (Most optimal for this scenario would be an input attachment anyway)	2017-09-26 23:50:01 +02:00
Niklas Haas	ca85a153b4	vo_gpu: vulkan: add support for push constants Can in theory avoid updating the uniform buffer every frame	2017-09-26 17:25:35 +02:00
Rostislav Pehlivanov	ed345ffc2f	vo_gpu: vulkan: add support for wayland	2017-09-26 17:25:35 +02:00
Niklas Haas	258487370f	vo_gpu: vulkan: generalize SPIR-V compiler In addition to the built-in nvidia compiler, we now also support a backend based on libshaderc. shaderc is sort of like glslang except it has a C API and is available as a dynamic library. The generated SPIR-V is now cached alongside the VkPipeline in the cached_program. We use a special cache header to ensure validity of this cache before passing it blindly to the vulkan implementation, since passing invalid SPIR-V can cause all sorts of nasty things. It's also designed to self-invalidate if the compiler gets better, by offering a catch-all `int compiler_version` that implementations can use as a cache invalidation marker.	2017-09-26 17:25:35 +02:00
Niklas Haas	91f23c7067	vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result.	2017-09-26 17:25:35 +02:00

29 Commits