mpv/video/out/vulkan/context.h

#pragma once

#include "video/out/gpu/context.h"
#include "common.h"

struct ra_vk_ctx_params {
    // See ra_swapchain_fns.get_vsync.
    void (*get_vsync)(struct ra_ctx *ctx, struct vo_vsync_info *info);

    // In case something special needs to be done on the buffer swap.
    void (*swap_buffers)(struct ra_ctx *ctx);
};

// Helpers for ra_ctx based on ra_vk. These initialize ctx->ra and ctx->swchain.
void ra_vk_ctx_uninit(struct ra_ctx *ctx);
bool ra_vk_ctx_init(struct ra_ctx *ctx, struct mpvk_ctx *vk,
                    struct ra_vk_ctx_params params,
                    VkPresentModeKHR preferred_mode);

// Handles a resize request, and updates ctx->vo->dwidth/dheight
bool ra_vk_ctx_resize(struct ra_ctx *ctx, int width, int height);

// May be called on a ra_ctx of any type.
struct mpvk_ctx *ra_vk_ctx_get(struct ra_ctx *ctx);
vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result. 2016-09-14 18:54:18 +00:00			`#pragma once`

			`#include "video/out/gpu/context.h"`
			`#include "common.h"`

wayland: use callback flag + poll for buffer swap The old way of using wayland in mpv relied on an external renderloop for semi-accurate timings. This had multiple issues though. Display sync would break whenever the window was hidden (since the frame callback stopped being executed) which was really annoying. Also the entire external renderloop logic was kind of fragile and didn't play well with mpv's internal structure (i.e. using presentation time in that old paradigm breaks stats.lua). Basically the problem is that swap buffers blocks on wayland which is crap whenever you hide the mpv window since it looks up the entire player. So you have to make swap buffers not block, but this has a different problem. Timings will be terrible if you use the unblocked swap buffers call. Based on some discussion in #wayland, the trick here is relatively simple and works well enough for our purposes. Instead we basically build a way to block with a timeout in the wayland buffer swap functions. A bool is set in the frame callback function that indicates whether or not mpv is waiting for a frame to be displayed. In the actual buffer swap function, we enter into a while loop waiting for this flag to be set. At the same time, the wl_display is polled to block the thread and wakeup if it receives any events from the compositor. This loop only breaks if enough time has passed or if the frame callback bool is received. In the near future, it is better to set whether or not frame a frame has been displayed in the presentation feedback. However as a first pass, doing it in the frame callback is more than good enough. The "downside" is that we render frames that aren't actually shown on screen when the player is hidden (it seems like wayland people don't like that). But who cares. Accurate timings are way more important. It's probably not too hard to add that behavior back in the player though. 2019-10-07 20:58:36 +00:00			`struct ra_vk_ctx_params {`
wayland: add presentation time Use ust/msc/refresh values from wayland's presentation time in mpv's ra_swapchain_fns.get_vsync for the wayland contexts. 2019-10-10 19:14:40 +00:00			`// See ra_swapchain_fns.get_vsync.`
			`void (get_vsync)(struct ra_ctx ctx, struct vo_vsync_info *info);`

wayland: use callback flag + poll for buffer swap The old way of using wayland in mpv relied on an external renderloop for semi-accurate timings. This had multiple issues though. Display sync would break whenever the window was hidden (since the frame callback stopped being executed) which was really annoying. Also the entire external renderloop logic was kind of fragile and didn't play well with mpv's internal structure (i.e. using presentation time in that old paradigm breaks stats.lua). Basically the problem is that swap buffers blocks on wayland which is crap whenever you hide the mpv window since it looks up the entire player. So you have to make swap buffers not block, but this has a different problem. Timings will be terrible if you use the unblocked swap buffers call. Based on some discussion in #wayland, the trick here is relatively simple and works well enough for our purposes. Instead we basically build a way to block with a timeout in the wayland buffer swap functions. A bool is set in the frame callback function that indicates whether or not mpv is waiting for a frame to be displayed. In the actual buffer swap function, we enter into a while loop waiting for this flag to be set. At the same time, the wl_display is polled to block the thread and wakeup if it receives any events from the compositor. This loop only breaks if enough time has passed or if the frame callback bool is received. In the near future, it is better to set whether or not frame a frame has been displayed in the presentation feedback. However as a first pass, doing it in the frame callback is more than good enough. The "downside" is that we render frames that aren't actually shown on screen when the player is hidden (it seems like wayland people don't like that). But who cares. Accurate timings are way more important. It's probably not too hard to add that behavior back in the player though. 2019-10-07 20:58:36 +00:00			`// In case something special needs to be done on the buffer swap.`
			`void (swap_buffers)(struct ra_ctx ctx);`
			`};`

vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result. 2016-09-14 18:54:18 +00:00			`// Helpers for ra_ctx based on ra_vk. These initialize ctx->ra and ctx->swchain.`
			`void ra_vk_ctx_uninit(struct ra_ctx *ctx);`
			`bool ra_vk_ctx_init(struct ra_ctx ctx, struct mpvk_ctx vk,`
wayland: use callback flag + poll for buffer swap The old way of using wayland in mpv relied on an external renderloop for semi-accurate timings. This had multiple issues though. Display sync would break whenever the window was hidden (since the frame callback stopped being executed) which was really annoying. Also the entire external renderloop logic was kind of fragile and didn't play well with mpv's internal structure (i.e. using presentation time in that old paradigm breaks stats.lua). Basically the problem is that swap buffers blocks on wayland which is crap whenever you hide the mpv window since it looks up the entire player. So you have to make swap buffers not block, but this has a different problem. Timings will be terrible if you use the unblocked swap buffers call. Based on some discussion in #wayland, the trick here is relatively simple and works well enough for our purposes. Instead we basically build a way to block with a timeout in the wayland buffer swap functions. A bool is set in the frame callback function that indicates whether or not mpv is waiting for a frame to be displayed. In the actual buffer swap function, we enter into a while loop waiting for this flag to be set. At the same time, the wl_display is polled to block the thread and wakeup if it receives any events from the compositor. This loop only breaks if enough time has passed or if the frame callback bool is received. In the near future, it is better to set whether or not frame a frame has been displayed in the presentation feedback. However as a first pass, doing it in the frame callback is more than good enough. The "downside" is that we render frames that aren't actually shown on screen when the player is hidden (it seems like wayland people don't like that). But who cares. Accurate timings are way more important. It's probably not too hard to add that behavior back in the player though. 2019-10-07 20:58:36 +00:00			`struct ra_vk_ctx_params params,`
vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result. 2016-09-14 18:54:18 +00:00			`VkPresentModeKHR preferred_mode);`
vo_gpu: vulkan: use libplacebo instead This commit rips out the entire mpv vulkan implementation in favor of exposing lightweight wrappers on top of libplacebo instead, which provides much of the same except in a more up-to-date and polished form. This (finally) unifies the code base between mpv and libplacebo, which is something I've been hoping to do for a long time. Note: The ra_pl wrappers are abstract enough from the actual libplacebo device type that we can in theory re-use them for other devices like d3d11 or even opengl in the future, so I moved them to a separate directory for the time being. However, the rest of the code is still vulkan-specific, so I've kept the "vulkan" naming and file paths, rather than introducing a new `--gpu-api` type. (Which would have been ended up with significantly more code duplicaiton) Plus, the code and functionality is similar enough that for most users this should just be a straight-up drop-in replacement. Note: This commit excludes some changes; specifically, the updates to context_win and hwdec_cuda are deferred to separate commits for authorship reasons. 2018-11-10 11:53:33 +00:00
			`// Handles a resize request, and updates ctx->vo->dwidth/dheight`
			`bool ra_vk_ctx_resize(struct ra_ctx *ctx, int width, int height);`
vo_gpu: vulkan: generalize SPIR-V compiler In addition to the built-in nvidia compiler, we now also support a backend based on libshaderc. shaderc is sort of like glslang except it has a C API and is available as a dynamic library. The generated SPIR-V is now cached alongside the VkPipeline in the cached_program. We use a special cache header to ensure validity of this cache before passing it blindly to the vulkan implementation, since passing invalid SPIR-V can cause all sorts of nasty things. It's also designed to self-invalidate if the compiler gets better, by offering a catch-all `int compiler_version` that implementations can use as a cache invalidation marker. 2017-09-13 01:09:48 +00:00
			`// May be called on a ra_ctx of any type.`
			`struct mpvk_ctx ra_vk_ctx_get(struct ra_ctx ctx);`