mpv/video/out/vulkan/ra_vk.h

#pragma once

#include "video/out/gpu/ra.h"

#include "common.h"
#include "utils.h"

struct ra *ra_create_vk(struct mpvk_ctx *vk, struct mp_log *log);

// Access to the VkDevice is needed for swapchain creation
VkDevice ra_vk_get_dev(struct ra *ra);

// Allocates a ra_tex that wraps a swapchain image. The contents of the image
// will be invalidated, and access to it will only be internally synchronized.
// So the calling could should not do anything else with the VkImage.
struct ra_tex *ra_vk_wrap_swapchain_img(struct ra *ra, VkImage vkimg,
                                        VkSwapchainCreateInfoKHR info);

// Associates an external semaphore (dependency) with a ra_tex, such that this
// ra_tex will not be used by the ra_vk until the external semaphore fires.
void ra_tex_vk_external_dep(struct ra *ra, struct ra_tex *tex, VkSemaphore dep);

// This function finalizes rendering, transitions `tex` (which must be a
// wrapped swapchain image) into a format suitable for presentation, and returns
// the resulting command buffer (or NULL on error). The caller may add their
// own semaphores to this command buffer, and must submit it afterwards.
struct vk_cmd *ra_vk_submit(struct ra *ra, struct ra_tex *tex);

// May be called on a struct ra of any type. Returns NULL if the ra is not
// a vulkan ra.
struct mpvk_ctx *ra_vk_get(struct ra *ra);
vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result. 2016-09-14 18:54:18 +00:00			`#pragma once`

			`#include "video/out/gpu/ra.h"`

			`#include "common.h"`
			`#include "utils.h"`

			`struct ra ra_create_vk(struct mpvk_ctx vk, struct mp_log *log);`

			`// Access to the VkDevice is needed for swapchain creation`
			`VkDevice ra_vk_get_dev(struct ra *ra);`

			`// Allocates a ra_tex that wraps a swapchain image. The contents of the image`
			`// will be invalidated, and access to it will only be internally synchronized.`
			`// So the calling could should not do anything else with the VkImage.`
			`struct ra_tex ra_vk_wrap_swapchain_img(struct ra ra, VkImage vkimg,`
			`VkSwapchainCreateInfoKHR info);`

vo_gpu: vulkan: properly depend on the swapchain acquire semaphore This is now associated with the ra_tex directly and used in the correct way, rather than hackily done from submit_frame. 2017-09-29 12:15:13 +00:00			`// Associates an external semaphore (dependency) with a ra_tex, such that this`
			`// ra_tex will not be used by the ra_vk until the external semaphore fires.`
			`void ra_tex_vk_external_dep(struct ra ra, struct ra_tex tex, VkSemaphore dep);`

vo_gpu: vulkan: refactor command submission Instead of being submitted immediately, commands are appended into an internal submission queue, and the actual submission is done once per frame (at the same time as queue cycling). Again, the benefits are not immediately obvious because nothing benefits from this yet, but it will make more sense for an upcoming vk_signal mechanism. This also cleans up the way the ra_vk submission interacts with the synchronization/callbacks from the ra_vk_ctx. Although currently, the way the dependency is signalled is a bit hacky: normally it would be associated with the ra_tex itself and waited on in the appropriate stage implicitly. But that code is just temporary, so I'm keeping it in there for a better commit order. 2017-09-28 21:06:56 +00:00			// This function finalizes rendering, transitions `tex` (which must be a
			`// wrapped swapchain image) into a format suitable for presentation, and returns`
			`// the resulting command buffer (or NULL on error). The caller may add their`
			`// own semaphores to this command buffer, and must submit it afterwards.`
			`struct vk_cmd ra_vk_submit(struct ra ra, struct ra_tex *tex);`
vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result. 2016-09-14 18:54:18 +00:00
			`// May be called on a struct ra of any type. Returns NULL if the ra is not`
			`// a vulkan ra.`
			`struct mpvk_ctx ra_vk_get(struct ra ra);`