vo_gpu: vulkan: initial implementation
This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop!
Current problems / limitations / improvement opportunities:
1. The swapchain/flipping code violates the vulkan spec, by assuming
that the presentation queue will be bounded (in cases where rendering
is significantly faster than vsync). But apparently, there's simply
no better way to do this right now, to the point where even the
stupid cube.c examples from LunarG etc. do it wrong.
(cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370)
2. The memory allocator could be improved. (This is a universal
constant)
3. Could explore using push descriptors instead of descriptor sets,
especially since we expect to switch descriptors semi-often for some
passes (like interpolation). Probably won't make a difference, but
the synchronization overhead might be a factor. Who knows.
4. Parallelism across frames / async transfer is not well-defined, we
either need to use a better semaphore / command buffer strategy or a
resource pooling layer to safely handle cross-frame parallelism.
(That said, I gave resource pooling a try and was not happy with the
result at all - so I'm still exploring the semaphore strategy)
5. We aggressively use pipeline barriers where events would offer a much
more fine-grained synchronization mechanism. As a result of this, we
might be suffering from GPU bubbles due to too-short dependencies on
objects. (That said, I'm also exploring the use of semaphores as a an
ordering tactic which would allow cross-frame time slicing in theory)
Some minor changes to the vo_gpu and infrastructure, but nothing
consequential.
NOTE: For safety, all use of asynchronous commands / multiple command
pools is currently disabled completely. There are some left-over relics
of this in the code (e.g. the distinction between dev_poll and
pool_poll), but that is kept in place mostly because this will be
re-extended in the future (vulkan rev 2).
The queue count is also currently capped to 1, because of the lack of
cross-frame semaphores means we need the implicit synchronization from
the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
|
|
|
#pragma once
|
|
|
|
|
|
|
|
#include "video/out/gpu/context.h"
|
|
|
|
#include "common.h"
|
|
|
|
|
2019-10-07 20:58:36 +00:00
|
|
|
struct ra_vk_ctx_params {
|
2019-10-10 19:14:40 +00:00
|
|
|
// See ra_swapchain_fns.get_vsync.
|
|
|
|
void (*get_vsync)(struct ra_ctx *ctx, struct vo_vsync_info *info);
|
|
|
|
|
2021-11-04 14:42:54 +00:00
|
|
|
// For special contexts (i.e. wayland) that want to check visibility
|
|
|
|
// before drawing a frame.
|
|
|
|
bool (*check_visible)(struct ra_ctx *ctx);
|
wayland: only render if we have frame callback
Back in the olden days, mpv's wayland backend was driven by the frame
callback. This had several issues and was removed in favor of the
current approach which allowed some advanced features (like
display-resample and presentation time) to actually work properly.
However as a consequence, it meant that mpv always rendered, even if the
surface was hidden. Wayland people consider this "wasteful" (and well
they aren't wrong). This commit aims to avoid wasteful rendering by
doing some additional checks in the swapchain. There's three main parts
to this.
1. Wayland EGL now uses an external swapchain (like the drm context).
Before we start a new frame, we check to see if we are waiting on a
callback from the compositor. If there is no wait, then go ahead and
proceed to render the frame, swap buffers, and then initiate
vo_wayland_wait_frame to poll (with a timeout) for the next potential
callback. If we are still waiting on callback from the compositor when
starting a new frame, then we simple skip rendering it entirely until
the surface comes back into view.
2. Wayland on vulkan has essentially the same approach although the
details are a little different. The ra_vk_ctx does not have support for
an external swapchain and although such a mechanism could theoretically
be added, it doesn't make much sense with libplacebo. Instead,
start_frame was added as a param and used to check for callback.
3. For wlshm, it's simply a matter of adding frame callback to it,
leveraging vo_wayland_wait_frame, and using the frame callback value to
whether or not to draw the image.
2020-09-18 17:29:53 +00:00
|
|
|
|
2019-10-07 20:58:36 +00:00
|
|
|
// In case something special needs to be done on the buffer swap.
|
|
|
|
void (*swap_buffers)(struct ra_ctx *ctx);
|
2024-10-18 18:21:16 +00:00
|
|
|
|
|
|
|
// See ra_swapchain_fns.color_depth.
|
|
|
|
int (*color_depth)(struct ra_ctx *ctx);
|
2019-10-07 20:58:36 +00:00
|
|
|
};
|
|
|
|
|
vo_gpu: vulkan: initial implementation
This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop!
Current problems / limitations / improvement opportunities:
1. The swapchain/flipping code violates the vulkan spec, by assuming
that the presentation queue will be bounded (in cases where rendering
is significantly faster than vsync). But apparently, there's simply
no better way to do this right now, to the point where even the
stupid cube.c examples from LunarG etc. do it wrong.
(cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370)
2. The memory allocator could be improved. (This is a universal
constant)
3. Could explore using push descriptors instead of descriptor sets,
especially since we expect to switch descriptors semi-often for some
passes (like interpolation). Probably won't make a difference, but
the synchronization overhead might be a factor. Who knows.
4. Parallelism across frames / async transfer is not well-defined, we
either need to use a better semaphore / command buffer strategy or a
resource pooling layer to safely handle cross-frame parallelism.
(That said, I gave resource pooling a try and was not happy with the
result at all - so I'm still exploring the semaphore strategy)
5. We aggressively use pipeline barriers where events would offer a much
more fine-grained synchronization mechanism. As a result of this, we
might be suffering from GPU bubbles due to too-short dependencies on
objects. (That said, I'm also exploring the use of semaphores as a an
ordering tactic which would allow cross-frame time slicing in theory)
Some minor changes to the vo_gpu and infrastructure, but nothing
consequential.
NOTE: For safety, all use of asynchronous commands / multiple command
pools is currently disabled completely. There are some left-over relics
of this in the code (e.g. the distinction between dev_poll and
pool_poll), but that is kept in place mostly because this will be
re-extended in the future (vulkan rev 2).
The queue count is also currently capped to 1, because of the lack of
cross-frame semaphores means we need the implicit synchronization from
the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
|
|
|
// Helpers for ra_ctx based on ra_vk. These initialize ctx->ra and ctx->swchain.
|
|
|
|
void ra_vk_ctx_uninit(struct ra_ctx *ctx);
|
|
|
|
bool ra_vk_ctx_init(struct ra_ctx *ctx, struct mpvk_ctx *vk,
|
2019-10-07 20:58:36 +00:00
|
|
|
struct ra_vk_ctx_params params,
|
vo_gpu: vulkan: initial implementation
This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop!
Current problems / limitations / improvement opportunities:
1. The swapchain/flipping code violates the vulkan spec, by assuming
that the presentation queue will be bounded (in cases where rendering
is significantly faster than vsync). But apparently, there's simply
no better way to do this right now, to the point where even the
stupid cube.c examples from LunarG etc. do it wrong.
(cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370)
2. The memory allocator could be improved. (This is a universal
constant)
3. Could explore using push descriptors instead of descriptor sets,
especially since we expect to switch descriptors semi-often for some
passes (like interpolation). Probably won't make a difference, but
the synchronization overhead might be a factor. Who knows.
4. Parallelism across frames / async transfer is not well-defined, we
either need to use a better semaphore / command buffer strategy or a
resource pooling layer to safely handle cross-frame parallelism.
(That said, I gave resource pooling a try and was not happy with the
result at all - so I'm still exploring the semaphore strategy)
5. We aggressively use pipeline barriers where events would offer a much
more fine-grained synchronization mechanism. As a result of this, we
might be suffering from GPU bubbles due to too-short dependencies on
objects. (That said, I'm also exploring the use of semaphores as a an
ordering tactic which would allow cross-frame time slicing in theory)
Some minor changes to the vo_gpu and infrastructure, but nothing
consequential.
NOTE: For safety, all use of asynchronous commands / multiple command
pools is currently disabled completely. There are some left-over relics
of this in the code (e.g. the distinction between dev_poll and
pool_poll), but that is kept in place mostly because this will be
re-extended in the future (vulkan rev 2).
The queue count is also currently capped to 1, because of the lack of
cross-frame semaphores means we need the implicit synchronization from
the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
|
|
|
VkPresentModeKHR preferred_mode);
|
2018-11-10 11:53:33 +00:00
|
|
|
|
2022-04-16 16:34:13 +00:00
|
|
|
// Helper for initializing mpvk_ctx->vulkan
|
|
|
|
pl_vulkan mppl_create_vulkan(struct vulkan_opts *opts,
|
|
|
|
pl_vk_inst vkinst,
|
|
|
|
pl_log pllog,
|
|
|
|
VkSurfaceKHR surface);
|
|
|
|
|
2018-11-10 11:53:33 +00:00
|
|
|
// Handles a resize request, and updates ctx->vo->dwidth/dheight
|
|
|
|
bool ra_vk_ctx_resize(struct ra_ctx *ctx, int width, int height);
|
2017-09-13 01:09:48 +00:00
|
|
|
|
|
|
|
// May be called on a ra_ctx of any type.
|
|
|
|
struct mpvk_ctx *ra_vk_ctx_get(struct ra_ctx *ctx);
|
2019-12-19 01:11:36 +00:00
|
|
|
|
|
|
|
// Get the user requested Vulkan device name.
|
2021-11-04 14:42:54 +00:00
|
|
|
char *ra_vk_ctx_get_device_name(struct ra_ctx *ctx);
|