RepoMirrors/mpv - mpv

Commit Graph

Author	SHA1	Message	Date
Niklas Haas	96cdf5315e	vo_gpu: vulkan: print libplacebo API ver This normally gets printed by libplacebo itself when initializing the context, but due to the way our code is structured (for convenience) we don't have the log hook enabled by the time this function call is relevant. So instead just print it manually as an easier work-around than restructuring the code.	2020-07-16 09:41:09 +02:00
Niklas Haas	7006d6752d	vo_gpu: vulkan: use libplacebo instead This commit rips out the entire mpv vulkan implementation in favor of exposing lightweight wrappers on top of libplacebo instead, which provides much of the same except in a more up-to-date and polished form. This (finally) unifies the code base between mpv and libplacebo, which is something I've been hoping to do for a long time. Note: The ra_pl wrappers are abstract enough from the actual libplacebo device type that we can in theory re-use them for other devices like d3d11 or even opengl in the future, so I moved them to a separate directory for the time being. However, the rest of the code is still vulkan-specific, so I've kept the "vulkan" naming and file paths, rather than introducing a new `--gpu-api` type. (Which would have been ended up with significantly more code duplicaiton) Plus, the code and functionality is similar enough that for most users this should just be a straight-up drop-in replacement. Note: This commit excludes some changes; specifically, the updates to context_win and hwdec_cuda are deferred to separate commits for authorship reasons.	2019-04-21 23:55:22 +03:00
Niklas Haas	34df6bd82f	vo_gpu: vulkan: only rotate the queues on swap Makes performance slightly better when using multiple queues by avoiding unnecessary semaphores due to bad queue selection. Also remove an aeons-old workaround for an nvidia bug that only ever existed in the earliest beta vulkan drivers anyway.	2018-11-19 00:22:41 +02:00
Philip Langdale	c8a065df12	vo_gpu: vulkan: Always use KHR suffix types and defines I was inconsistent about this originally, as the functionality was moved into the core spec in 1.1 and so both suffixed and unsuffixed versions of everything exist and can be mixed together. There's no reason to fail to build with 1.0.39+ so I'm fixing the names.	2018-11-03 23:53:08 +02:00
Philip Langdale	621389134a	vo_gpu: vulkan: Add a function to get the device UUID We need this to do device matching for the cuda interop.	2018-10-22 21:35:48 +02:00
Philip Langdale	93f800a00f	vo_gpu: vulkan: Add support for exporting buffer memory The CUDA/Vulkan interop works on the basis of memory being exported from Vulkan and then imported by CUDA. To enable this, we add a way to declare a buffer as being intended for export, and then add a function to do the export. For now, we support the fd and Handle based exports on Linux and Windows respectively. There are others, which we can support when a need arises. Also note that this is just for exporting buffers, rather than textures (VkImages). Image import on the CUDA side is supposed to work, but it is currently buggy and waiting for a new driver release. Finally, at least with my nvidia hardware and drivers, everything seems to work even if we don't initialise the buffer with the right exportability options. Nevertheless I'm enforcing it so that we're following the spec.	2018-10-22 21:35:48 +02:00
Niklas Haas	5997248505	vo_gpu: vulkan: correctly enable textureGatherOffset This also requires a vulkan feature / SPIR-V capability to function	2018-02-05 02:49:03 -08:00
Niklas Haas	f92e45bb8c	vo_gpu: vulkan: try enabling required features Instead of enabling every feature under the sun, make an effort to just whitelist the ones we actually might use. Turns out the extended storage format support is needed for some of the storage formats we use, in particular rgba16.	2018-02-05 02:49:03 -08:00
Niklas Haas	d588bdaaf7	vo_gpu: vulkan: fix segfault due to index mismatch The queue family index and the queue info index are not necessarily the same, so we're forced to do a check based on the queue family index itself. Fixes #5049	2017-12-25 00:47:53 +01:00
Niklas Haas	286d421666	vo_gpu: vulkan: allow disabling async tf/comp Async compute in particular seems to cause problems on some drivers, and even when supprted the benefits are not that massive from the tests I have seen, so it's probably safe to keep off by default. Async transfer on the other hand seems to work better and offers a more substantial improvement, so it's kept on.	2017-12-25 00:47:53 +01:00
Niklas Haas	a6aab5dfd6	vo_gpu: vulkan: refine queue family selection algorithm This gets confused by e.g. SPARSE_BIT on the TRANSFER_BIT, leading to situations where "more specialized" is ambiguous and the logic breaks down. So to fix it, only compare the subset we care about.	2017-12-25 00:47:53 +01:00
Niklas Haas	bded247fb5	vo_gpu: vulkan: support split command pools Instead of using a single primary queue, we generate multiple vk_cmdpools and pick the right one dynamically based on the intent. This has a number of immediate benefits: 1. We can use async texture uploads 2. We can use the DMA engine for buffer updates 3. We can benefit from async compute on AMD GPUs Unfortunately, the major downside is that due to the lack of QF ownership tracking, we need to use CONCURRENT sharing for all resources (buffers and images!). In theory, we could try figuring out a way to get rid of the concurrent sharing for buffers (which is only needed for compute shader UBOs), but even so, the concurrent sharing mode doesn't really seem to have a significant impact over here (nvidia). It's possible that other platforms may disagree. Our deadlock-avoidance strategy is stupidly simple: Just flush the command every time we need to switch queues, and make sure all submission and callbacks happen in FIFO order. This required lifting the cmds_pending and cmds_queued out from vk_cmdpool to mpvk_ctx, and some functions died/got moved as a result, but that's a relatively minor change. On my hardware this is a fairly significant performance boost, mainly due to async transfers. (Nvidia doesn't expose separate compute queues anyway). On AMD, this should be a performance boost as well due to async compute.	2017-12-25 00:47:53 +01:00
Niklas Haas	f2f91cf570	vo_gpu: vulkan: add a vk_signal abstraction This combines VkSemaphores and VkEvents into a common umbrella abstraction which can resolve to either. We aggressively try to prefer VkEvents over VkSemaphores whenever the conditions are met (1. we can unsignal the semaphore, i.e. it comes from the same frame; and 2. it comes from the same queue).	2017-12-25 00:47:53 +01:00
Niklas Haas	5feaaba0fd	vo_gpu: vulkan: refactor command submission Instead of being submitted immediately, commands are appended into an internal submission queue, and the actual submission is done once per frame (at the same time as queue cycling). Again, the benefits are not immediately obvious because nothing benefits from this yet, but it will make more sense for an upcoming vk_signal mechanism. This also cleans up the way the ra_vk submission interacts with the synchronization/callbacks from the ra_vk_ctx. Although currently, the way the dependency is signalled is a bit hacky: normally it would be associated with the ra_tex itself and waited on in the appropriate stage implicitly. But that code is just temporary, so I'm keeping it in there for a better commit order.	2017-12-25 00:47:53 +01:00
Niklas Haas	885497a445	vo_gpu: vulkan: reorganize vk_cmd slightly Instead of associating a single VkSemaphore with every command buffer and allowing the user to ad-hoc wait on it during submission, make the raw semaphores-to-signal array work like the raw semaphores-to-wait-on array. Doesn't really provide a clear benefit yet, but it's required for upcoming modifications.	2017-12-25 00:47:53 +01:00
Niklas Haas	4e34615872	vo_gpu: vulkan: refactor vk_cmdpool 1. No more static arrays (deps / callbacks / queues / cmds) 2. Allows safely recording multiple commands at the same time 3. Uses resources optimally by never over-allocating commands	2017-12-25 00:47:53 +01:00
Niklas Haas	868bf4da7d	vo_gpu: vulkan: indent queue family enumeration Consistency	2017-09-27 00:46:20 +02:00
Niklas Haas	5b6b77b8dc	vo_gpu: vulkan: normalize use of Flags and FlagBits FlagBits is just the name of the enum. The actual data type representing a combination of these flags follows the Flags convention. (The relevant difference is that the latter is defined to be uint32_t instead of left implicit) For consistency, use Flags everywhere instead of randomly switching between Flags and FlagBits. Also fix a wrong type name on `stageFlags`, pointed out by @atomnuker	2017-09-27 00:25:18 +02:00
Rostislav Pehlivanov	ed345ffc2f	vo_gpu: vulkan: add support for wayland	2017-09-26 17:25:35 +02:00
Niklas Haas	258487370f	vo_gpu: vulkan: generalize SPIR-V compiler In addition to the built-in nvidia compiler, we now also support a backend based on libshaderc. shaderc is sort of like glslang except it has a C API and is available as a dynamic library. The generated SPIR-V is now cached alongside the VkPipeline in the cached_program. We use a special cache header to ensure validity of this cache before passing it blindly to the vulkan implementation, since passing invalid SPIR-V can cause all sorts of nasty things. It's also designed to self-invalidate if the compiler gets better, by offering a catch-all `int compiler_version` that implementations can use as a cache invalidation marker.	2017-09-26 17:25:35 +02:00
Niklas Haas	91f23c7067	vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result.	2017-09-26 17:25:35 +02:00

21 Commits