vo_gpu: vulkan: initial implementation
This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop!
Current problems / limitations / improvement opportunities:
1. The swapchain/flipping code violates the vulkan spec, by assuming
that the presentation queue will be bounded (in cases where rendering
is significantly faster than vsync). But apparently, there's simply
no better way to do this right now, to the point where even the
stupid cube.c examples from LunarG etc. do it wrong.
(cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370)
2. The memory allocator could be improved. (This is a universal
constant)
3. Could explore using push descriptors instead of descriptor sets,
especially since we expect to switch descriptors semi-often for some
passes (like interpolation). Probably won't make a difference, but
the synchronization overhead might be a factor. Who knows.
4. Parallelism across frames / async transfer is not well-defined, we
either need to use a better semaphore / command buffer strategy or a
resource pooling layer to safely handle cross-frame parallelism.
(That said, I gave resource pooling a try and was not happy with the
result at all - so I'm still exploring the semaphore strategy)
5. We aggressively use pipeline barriers where events would offer a much
more fine-grained synchronization mechanism. As a result of this, we
might be suffering from GPU bubbles due to too-short dependencies on
objects. (That said, I'm also exploring the use of semaphores as a an
ordering tactic which would allow cross-frame time slicing in theory)
Some minor changes to the vo_gpu and infrastructure, but nothing
consequential.
NOTE: For safety, all use of asynchronous commands / multiple command
pools is currently disabled completely. There are some left-over relics
of this in the code (e.g. the distinction between dev_poll and
pool_poll), but that is kept in place mostly because this will be
re-extended in the future (vulkan rev 2).
The queue count is also currently capped to 1, because of the lack of
cross-frame semaphores means we need the implicit synchronization from
the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
|
|
|
/*
|
|
|
|
* This file is part of mpv.
|
|
|
|
*
|
|
|
|
* mpv is free software; you can redistribute it and/or
|
|
|
|
* modify it under the terms of the GNU Lesser General Public
|
|
|
|
* License as published by the Free Software Foundation; either
|
|
|
|
* version 2.1 of the License, or (at your option) any later version.
|
|
|
|
*
|
|
|
|
* mpv is distributed in the hope that it will be useful,
|
|
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
* GNU Lesser General Public License for more details.
|
|
|
|
*
|
|
|
|
* You should have received a copy of the GNU Lesser General Public
|
|
|
|
* License along with mpv. If not, see <http://www.gnu.org/licenses/>.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include "video/out/gpu/context.h"
|
x11: support xorg present extension
This builds off of present_sync which was introduced in a previous
commit to support xorg's present extension in all of the X11 backends
(sans vdpau) in mpv. It turns out there is an Xpresent library that
integrates the xorg present extention with Xlib (which barely anyone
seems to use), so this can be added without too much trouble. The
workflow is to first setup the event by telling Xorg we would like to
receive PresentCompleteNotify (there are others in the extension but
this is the only one we really care about). After that, just call
XPresentNotifyMSC after every buffer swap with a target_msc of 0. Xorg
then returns the last presentation through its usual event loop and we
go ahead and use that information to update mpv's values for vsync
timing purposes. One theoretical weakness of this approach is that the
present event is put on the same queue as the rest of the XEvents. It
would be nicer for it be placed somewhere else so we could just wait
on that queue without having to deal with other possible events in
there. In theory, xcb could do that with special events, but it doesn't
really matter in practice.
Unsurprisingly, this doesn't work on NVIDIA. Well NVIDIA does actually
receive presentation events, but for whatever the calculations used make
timings worse which defeats the purpose. This works perfectly fine on
Mesa however. Utilizing the previous commit that detects Xrandr
providers, we can enable this mechanism for users that have both Mesa
and not NVIDIA (to avoid messing up anyone that has a switchable
graphics system or such). Patches welcome if anyone figures out how to
fix this on NVIDIA.
Unlike the EGL/GLX sync extensions, the present extension works with any
graphics API (good for vulkan since its timing extension has been in
development hell). NVIDIA also happens to have zero support for the
EGL/GLX sync extensions, so we can just remove it with no loss. Only
Xorg ever used it and other backends already have their own present
methods. vo_vdpau VO is a special case that has its own fancying timing
code in its flip_page. This presumably works well, and I have no way of
testing it so just leave it as it is.
2022-06-10 16:49:38 +00:00
|
|
|
#include "video/out/present_sync.h"
|
vo_gpu: vulkan: initial implementation
This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop!
Current problems / limitations / improvement opportunities:
1. The swapchain/flipping code violates the vulkan spec, by assuming
that the presentation queue will be bounded (in cases where rendering
is significantly faster than vsync). But apparently, there's simply
no better way to do this right now, to the point where even the
stupid cube.c examples from LunarG etc. do it wrong.
(cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370)
2. The memory allocator could be improved. (This is a universal
constant)
3. Could explore using push descriptors instead of descriptor sets,
especially since we expect to switch descriptors semi-often for some
passes (like interpolation). Probably won't make a difference, but
the synchronization overhead might be a factor. Who knows.
4. Parallelism across frames / async transfer is not well-defined, we
either need to use a better semaphore / command buffer strategy or a
resource pooling layer to safely handle cross-frame parallelism.
(That said, I gave resource pooling a try and was not happy with the
result at all - so I'm still exploring the semaphore strategy)
5. We aggressively use pipeline barriers where events would offer a much
more fine-grained synchronization mechanism. As a result of this, we
might be suffering from GPU bubbles due to too-short dependencies on
objects. (That said, I'm also exploring the use of semaphores as a an
ordering tactic which would allow cross-frame time slicing in theory)
Some minor changes to the vo_gpu and infrastructure, but nothing
consequential.
NOTE: For safety, all use of asynchronous commands / multiple command
pools is currently disabled completely. There are some left-over relics
of this in the code (e.g. the distinction between dev_poll and
pool_poll), but that is kept in place mostly because this will be
re-extended in the future (vulkan rev 2).
The queue count is also currently capped to 1, because of the lack of
cross-frame semaphores means we need the implicit synchronization from
the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
|
|
|
#include "video/out/x11_common.h"
|
|
|
|
|
|
|
|
#include "common.h"
|
|
|
|
#include "context.h"
|
|
|
|
#include "utils.h"
|
|
|
|
|
|
|
|
struct priv {
|
|
|
|
struct mpvk_ctx vk;
|
|
|
|
};
|
|
|
|
|
x11: avoid wasteful rendering when possible
Because wayland is a special snowflake, mpv wound up incorporating a lot
of logic into its render loop where visibilty checks are performed
before rendering anything (in the name of efficiency of course). Only
wayland actually uses this, but there's no reason why other backends
(x11 in this commit) can't be smarter. It's far easier on xorg since we
can just query _NET_WM_STATE_HIDDEN directly and not have to do silly
callback dances.
The function, vo_x11_check_net_wm_state_change, already tracks net wm
changes, including _NET_WM_STATE_HIDDEN. There is an already existing
window_hidden variable but that is actually just for checking if the
window was mapped and has nothing to do with this particular atom. mpv
also currently assumes that a _NET_WM_STATE_HIDDEN is exactly the same
as being minimized but according to the spec, that's not neccesarily
true (in practice, it's likely that these are the same though). Anyways,
just keep track of this state in a new variable (hidden) and use that
for determing if mpv should render or not.
There is one catch though: this cannot work if a display sync mode is
used. This is why the previous commit is needed. The display sync modes
in mpv require a blocking vsync implementation since its render loop is
directly driven by vsync. In xorg, if nothing is actually rendered, then
there's nothing for eglSwapBuffers (or FIFO for vulkan) to block on so
it returns immediately. This, of course, results in completely broken
video. We just need to check to make sure that we aren't in a display
sync mode before trying to be smart about rendering. Display sync is
power inefficient anyways, so no one is really being hurt here. As an
aside, this happens to work in wayland because there's basically a
custom (and ugly) vsync blocking function + timeout but that's off
topic.
2022-04-07 03:36:30 +00:00
|
|
|
static bool xlib_check_visible(struct ra_ctx *ctx)
|
|
|
|
{
|
|
|
|
return vo_x11_check_visible(ctx->vo);
|
|
|
|
}
|
|
|
|
|
x11: support xorg present extension
This builds off of present_sync which was introduced in a previous
commit to support xorg's present extension in all of the X11 backends
(sans vdpau) in mpv. It turns out there is an Xpresent library that
integrates the xorg present extention with Xlib (which barely anyone
seems to use), so this can be added without too much trouble. The
workflow is to first setup the event by telling Xorg we would like to
receive PresentCompleteNotify (there are others in the extension but
this is the only one we really care about). After that, just call
XPresentNotifyMSC after every buffer swap with a target_msc of 0. Xorg
then returns the last presentation through its usual event loop and we
go ahead and use that information to update mpv's values for vsync
timing purposes. One theoretical weakness of this approach is that the
present event is put on the same queue as the rest of the XEvents. It
would be nicer for it be placed somewhere else so we could just wait
on that queue without having to deal with other possible events in
there. In theory, xcb could do that with special events, but it doesn't
really matter in practice.
Unsurprisingly, this doesn't work on NVIDIA. Well NVIDIA does actually
receive presentation events, but for whatever the calculations used make
timings worse which defeats the purpose. This works perfectly fine on
Mesa however. Utilizing the previous commit that detects Xrandr
providers, we can enable this mechanism for users that have both Mesa
and not NVIDIA (to avoid messing up anyone that has a switchable
graphics system or such). Patches welcome if anyone figures out how to
fix this on NVIDIA.
Unlike the EGL/GLX sync extensions, the present extension works with any
graphics API (good for vulkan since its timing extension has been in
development hell). NVIDIA also happens to have zero support for the
EGL/GLX sync extensions, so we can just remove it with no loss. Only
Xorg ever used it and other backends already have their own present
methods. vo_vdpau VO is a special case that has its own fancying timing
code in its flip_page. This presumably works well, and I have no way of
testing it so just leave it as it is.
2022-06-10 16:49:38 +00:00
|
|
|
static void xlib_vk_swap_buffers(struct ra_ctx *ctx)
|
|
|
|
{
|
x11: remove PresentNotifyMSC from egl/glx/vulkan to fix xpresent timing
PresentNotifyMSC turns out to be not only redundant, but also harmful with
mesa-backed egl/glx/vulkan VOs because for all of them, mesa uses
PresentPixmap behind the scenes when DRI3 is available, which already
spawns a PresentCompleteNotify event when the buffer swap actually
finishes. This is important because without using the timing information
from these PresentCompleteKindPixmap events, there's no way for mpv to know
exactly when a frame becomes visible on the display.
By using PresentNotifyMSC in conjunction with DRI3-enabled mesa, two
problems are created:
1. mpv assumes that a vblank won't elapse (i.e., it assumes the current MSC
won't change) between the time when mesa enqueues the buffer swap and
the time when mpv calls PresentNotifyMSC to ask xorg for a notification
at the next MSC, relative to the current MSC at the time that xorg reads
it for the PresentNotifyMSC call. This means that mpv could get a
notification one or more vblanks later than it expects, since the
intention here is for mpv to get a notification at the MSC that the
buffer swap completes.
2. mpv assumes that a buffer swap always takes one vblank to complete,
which isn't always true. A buffer swap (i.e., a page flip) could take
longer than that depending on hardware conditions (if the GPU is running
slowly or needs to exit a low-power state), scheduling delays (under
heavy system or GPU load), or unfortunate timing (if the raster scan
line happens to be at one of the last few rows of pixels and a vblank
elapses just before the buffer swap is enqueued).
This causes mpv to have a faulty assumption of when frames become visible.
Since mpv already receives the PresentCompleteNotify events generated by
mesa's buffer swaps under the hood, the PresentNotifyMSC usage is unneeded
and just throws a wrench in mpv's vsync timing when xpresent is enabled.
Simply removing the PresentNotifyMSC usage from the egl, glx, and vulkan
VOs fixes the xpresent vsync timing.
2023-01-25 02:43:05 +00:00
|
|
|
if (ctx->vo->x11->use_present)
|
2022-06-22 04:13:44 +00:00
|
|
|
present_sync_swap(ctx->vo->x11->present);
|
x11: support xorg present extension
This builds off of present_sync which was introduced in a previous
commit to support xorg's present extension in all of the X11 backends
(sans vdpau) in mpv. It turns out there is an Xpresent library that
integrates the xorg present extention with Xlib (which barely anyone
seems to use), so this can be added without too much trouble. The
workflow is to first setup the event by telling Xorg we would like to
receive PresentCompleteNotify (there are others in the extension but
this is the only one we really care about). After that, just call
XPresentNotifyMSC after every buffer swap with a target_msc of 0. Xorg
then returns the last presentation through its usual event loop and we
go ahead and use that information to update mpv's values for vsync
timing purposes. One theoretical weakness of this approach is that the
present event is put on the same queue as the rest of the XEvents. It
would be nicer for it be placed somewhere else so we could just wait
on that queue without having to deal with other possible events in
there. In theory, xcb could do that with special events, but it doesn't
really matter in practice.
Unsurprisingly, this doesn't work on NVIDIA. Well NVIDIA does actually
receive presentation events, but for whatever the calculations used make
timings worse which defeats the purpose. This works perfectly fine on
Mesa however. Utilizing the previous commit that detects Xrandr
providers, we can enable this mechanism for users that have both Mesa
and not NVIDIA (to avoid messing up anyone that has a switchable
graphics system or such). Patches welcome if anyone figures out how to
fix this on NVIDIA.
Unlike the EGL/GLX sync extensions, the present extension works with any
graphics API (good for vulkan since its timing extension has been in
development hell). NVIDIA also happens to have zero support for the
EGL/GLX sync extensions, so we can just remove it with no loss. Only
Xorg ever used it and other backends already have their own present
methods. vo_vdpau VO is a special case that has its own fancying timing
code in its flip_page. This presumably works well, and I have no way of
testing it so just leave it as it is.
2022-06-10 16:49:38 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void xlib_vk_get_vsync(struct ra_ctx *ctx, struct vo_vsync_info *info)
|
|
|
|
{
|
|
|
|
struct vo_x11_state *x11 = ctx->vo->x11;
|
2022-06-22 04:13:44 +00:00
|
|
|
if (ctx->vo->x11->use_present)
|
|
|
|
present_sync_get_info(x11->present, info);
|
x11: support xorg present extension
This builds off of present_sync which was introduced in a previous
commit to support xorg's present extension in all of the X11 backends
(sans vdpau) in mpv. It turns out there is an Xpresent library that
integrates the xorg present extention with Xlib (which barely anyone
seems to use), so this can be added without too much trouble. The
workflow is to first setup the event by telling Xorg we would like to
receive PresentCompleteNotify (there are others in the extension but
this is the only one we really care about). After that, just call
XPresentNotifyMSC after every buffer swap with a target_msc of 0. Xorg
then returns the last presentation through its usual event loop and we
go ahead and use that information to update mpv's values for vsync
timing purposes. One theoretical weakness of this approach is that the
present event is put on the same queue as the rest of the XEvents. It
would be nicer for it be placed somewhere else so we could just wait
on that queue without having to deal with other possible events in
there. In theory, xcb could do that with special events, but it doesn't
really matter in practice.
Unsurprisingly, this doesn't work on NVIDIA. Well NVIDIA does actually
receive presentation events, but for whatever the calculations used make
timings worse which defeats the purpose. This works perfectly fine on
Mesa however. Utilizing the previous commit that detects Xrandr
providers, we can enable this mechanism for users that have both Mesa
and not NVIDIA (to avoid messing up anyone that has a switchable
graphics system or such). Patches welcome if anyone figures out how to
fix this on NVIDIA.
Unlike the EGL/GLX sync extensions, the present extension works with any
graphics API (good for vulkan since its timing extension has been in
development hell). NVIDIA also happens to have zero support for the
EGL/GLX sync extensions, so we can just remove it with no loss. Only
Xorg ever used it and other backends already have their own present
methods. vo_vdpau VO is a special case that has its own fancying timing
code in its flip_page. This presumably works well, and I have no way of
testing it so just leave it as it is.
2022-06-10 16:49:38 +00:00
|
|
|
}
|
|
|
|
|
vo_gpu: vulkan: initial implementation
This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop!
Current problems / limitations / improvement opportunities:
1. The swapchain/flipping code violates the vulkan spec, by assuming
that the presentation queue will be bounded (in cases where rendering
is significantly faster than vsync). But apparently, there's simply
no better way to do this right now, to the point where even the
stupid cube.c examples from LunarG etc. do it wrong.
(cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370)
2. The memory allocator could be improved. (This is a universal
constant)
3. Could explore using push descriptors instead of descriptor sets,
especially since we expect to switch descriptors semi-often for some
passes (like interpolation). Probably won't make a difference, but
the synchronization overhead might be a factor. Who knows.
4. Parallelism across frames / async transfer is not well-defined, we
either need to use a better semaphore / command buffer strategy or a
resource pooling layer to safely handle cross-frame parallelism.
(That said, I gave resource pooling a try and was not happy with the
result at all - so I'm still exploring the semaphore strategy)
5. We aggressively use pipeline barriers where events would offer a much
more fine-grained synchronization mechanism. As a result of this, we
might be suffering from GPU bubbles due to too-short dependencies on
objects. (That said, I'm also exploring the use of semaphores as a an
ordering tactic which would allow cross-frame time slicing in theory)
Some minor changes to the vo_gpu and infrastructure, but nothing
consequential.
NOTE: For safety, all use of asynchronous commands / multiple command
pools is currently disabled completely. There are some left-over relics
of this in the code (e.g. the distinction between dev_poll and
pool_poll), but that is kept in place mostly because this will be
re-extended in the future (vulkan rev 2).
The queue count is also currently capped to 1, because of the lack of
cross-frame semaphores means we need the implicit synchronization from
the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
|
|
|
static void xlib_uninit(struct ra_ctx *ctx)
|
|
|
|
{
|
|
|
|
struct priv *p = ctx->priv;
|
|
|
|
|
|
|
|
ra_vk_ctx_uninit(ctx);
|
|
|
|
mpvk_uninit(&p->vk);
|
|
|
|
vo_x11_uninit(ctx->vo);
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool xlib_init(struct ra_ctx *ctx)
|
|
|
|
{
|
|
|
|
struct priv *p = ctx->priv = talloc_zero(ctx, struct priv);
|
|
|
|
struct mpvk_ctx *vk = &p->vk;
|
|
|
|
int msgl = ctx->opts.probing ? MSGL_V : MSGL_ERR;
|
|
|
|
|
2018-11-10 11:53:33 +00:00
|
|
|
if (!mpvk_init(vk, ctx, VK_KHR_XLIB_SURFACE_EXTENSION_NAME))
|
vo_gpu: vulkan: initial implementation
This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop!
Current problems / limitations / improvement opportunities:
1. The swapchain/flipping code violates the vulkan spec, by assuming
that the presentation queue will be bounded (in cases where rendering
is significantly faster than vsync). But apparently, there's simply
no better way to do this right now, to the point where even the
stupid cube.c examples from LunarG etc. do it wrong.
(cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370)
2. The memory allocator could be improved. (This is a universal
constant)
3. Could explore using push descriptors instead of descriptor sets,
especially since we expect to switch descriptors semi-often for some
passes (like interpolation). Probably won't make a difference, but
the synchronization overhead might be a factor. Who knows.
4. Parallelism across frames / async transfer is not well-defined, we
either need to use a better semaphore / command buffer strategy or a
resource pooling layer to safely handle cross-frame parallelism.
(That said, I gave resource pooling a try and was not happy with the
result at all - so I'm still exploring the semaphore strategy)
5. We aggressively use pipeline barriers where events would offer a much
more fine-grained synchronization mechanism. As a result of this, we
might be suffering from GPU bubbles due to too-short dependencies on
objects. (That said, I'm also exploring the use of semaphores as a an
ordering tactic which would allow cross-frame time slicing in theory)
Some minor changes to the vo_gpu and infrastructure, but nothing
consequential.
NOTE: For safety, all use of asynchronous commands / multiple command
pools is currently disabled completely. There are some left-over relics
of this in the code (e.g. the distinction between dev_poll and
pool_poll), but that is kept in place mostly because this will be
re-extended in the future (vulkan rev 2).
The queue count is also currently capped to 1, because of the lack of
cross-frame semaphores means we need the implicit synchronization from
the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
|
|
|
goto error;
|
|
|
|
|
2017-09-16 02:46:38 +00:00
|
|
|
if (!vo_x11_init(ctx->vo))
|
vo_gpu: vulkan: initial implementation
This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop!
Current problems / limitations / improvement opportunities:
1. The swapchain/flipping code violates the vulkan spec, by assuming
that the presentation queue will be bounded (in cases where rendering
is significantly faster than vsync). But apparently, there's simply
no better way to do this right now, to the point where even the
stupid cube.c examples from LunarG etc. do it wrong.
(cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370)
2. The memory allocator could be improved. (This is a universal
constant)
3. Could explore using push descriptors instead of descriptor sets,
especially since we expect to switch descriptors semi-often for some
passes (like interpolation). Probably won't make a difference, but
the synchronization overhead might be a factor. Who knows.
4. Parallelism across frames / async transfer is not well-defined, we
either need to use a better semaphore / command buffer strategy or a
resource pooling layer to safely handle cross-frame parallelism.
(That said, I gave resource pooling a try and was not happy with the
result at all - so I'm still exploring the semaphore strategy)
5. We aggressively use pipeline barriers where events would offer a much
more fine-grained synchronization mechanism. As a result of this, we
might be suffering from GPU bubbles due to too-short dependencies on
objects. (That said, I'm also exploring the use of semaphores as a an
ordering tactic which would allow cross-frame time slicing in theory)
Some minor changes to the vo_gpu and infrastructure, but nothing
consequential.
NOTE: For safety, all use of asynchronous commands / multiple command
pools is currently disabled completely. There are some left-over relics
of this in the code (e.g. the distinction between dev_poll and
pool_poll), but that is kept in place mostly because this will be
re-extended in the future (vulkan rev 2).
The queue count is also currently capped to 1, because of the lack of
cross-frame semaphores means we need the implicit synchronization from
the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
|
|
|
goto error;
|
|
|
|
|
2017-09-16 02:46:38 +00:00
|
|
|
if (!vo_x11_create_vo_window(ctx->vo, NULL, "mpvk"))
|
vo_gpu: vulkan: initial implementation
This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop!
Current problems / limitations / improvement opportunities:
1. The swapchain/flipping code violates the vulkan spec, by assuming
that the presentation queue will be bounded (in cases where rendering
is significantly faster than vsync). But apparently, there's simply
no better way to do this right now, to the point where even the
stupid cube.c examples from LunarG etc. do it wrong.
(cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370)
2. The memory allocator could be improved. (This is a universal
constant)
3. Could explore using push descriptors instead of descriptor sets,
especially since we expect to switch descriptors semi-often for some
passes (like interpolation). Probably won't make a difference, but
the synchronization overhead might be a factor. Who knows.
4. Parallelism across frames / async transfer is not well-defined, we
either need to use a better semaphore / command buffer strategy or a
resource pooling layer to safely handle cross-frame parallelism.
(That said, I gave resource pooling a try and was not happy with the
result at all - so I'm still exploring the semaphore strategy)
5. We aggressively use pipeline barriers where events would offer a much
more fine-grained synchronization mechanism. As a result of this, we
might be suffering from GPU bubbles due to too-short dependencies on
objects. (That said, I'm also exploring the use of semaphores as a an
ordering tactic which would allow cross-frame time slicing in theory)
Some minor changes to the vo_gpu and infrastructure, but nothing
consequential.
NOTE: For safety, all use of asynchronous commands / multiple command
pools is currently disabled completely. There are some left-over relics
of this in the code (e.g. the distinction between dev_poll and
pool_poll), but that is kept in place mostly because this will be
re-extended in the future (vulkan rev 2).
The queue count is also currently capped to 1, because of the lack of
cross-frame semaphores means we need the implicit synchronization from
the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
|
|
|
goto error;
|
|
|
|
|
|
|
|
VkXlibSurfaceCreateInfoKHR xinfo = {
|
|
|
|
.sType = VK_STRUCTURE_TYPE_XLIB_SURFACE_CREATE_INFO_KHR,
|
|
|
|
.dpy = ctx->vo->x11->display,
|
|
|
|
.window = ctx->vo->x11->window,
|
|
|
|
};
|
|
|
|
|
x11: avoid wasteful rendering when possible
Because wayland is a special snowflake, mpv wound up incorporating a lot
of logic into its render loop where visibilty checks are performed
before rendering anything (in the name of efficiency of course). Only
wayland actually uses this, but there's no reason why other backends
(x11 in this commit) can't be smarter. It's far easier on xorg since we
can just query _NET_WM_STATE_HIDDEN directly and not have to do silly
callback dances.
The function, vo_x11_check_net_wm_state_change, already tracks net wm
changes, including _NET_WM_STATE_HIDDEN. There is an already existing
window_hidden variable but that is actually just for checking if the
window was mapped and has nothing to do with this particular atom. mpv
also currently assumes that a _NET_WM_STATE_HIDDEN is exactly the same
as being minimized but according to the spec, that's not neccesarily
true (in practice, it's likely that these are the same though). Anyways,
just keep track of this state in a new variable (hidden) and use that
for determing if mpv should render or not.
There is one catch though: this cannot work if a display sync mode is
used. This is why the previous commit is needed. The display sync modes
in mpv require a blocking vsync implementation since its render loop is
directly driven by vsync. In xorg, if nothing is actually rendered, then
there's nothing for eglSwapBuffers (or FIFO for vulkan) to block on so
it returns immediately. This, of course, results in completely broken
video. We just need to check to make sure that we aren't in a display
sync mode before trying to be smart about rendering. Display sync is
power inefficient anyways, so no one is really being hurt here. As an
aside, this happens to work in wayland because there's basically a
custom (and ugly) vsync blocking function + timeout but that's off
topic.
2022-04-07 03:36:30 +00:00
|
|
|
struct ra_vk_ctx_params params = {
|
|
|
|
.check_visible = xlib_check_visible,
|
x11: support xorg present extension
This builds off of present_sync which was introduced in a previous
commit to support xorg's present extension in all of the X11 backends
(sans vdpau) in mpv. It turns out there is an Xpresent library that
integrates the xorg present extention with Xlib (which barely anyone
seems to use), so this can be added without too much trouble. The
workflow is to first setup the event by telling Xorg we would like to
receive PresentCompleteNotify (there are others in the extension but
this is the only one we really care about). After that, just call
XPresentNotifyMSC after every buffer swap with a target_msc of 0. Xorg
then returns the last presentation through its usual event loop and we
go ahead and use that information to update mpv's values for vsync
timing purposes. One theoretical weakness of this approach is that the
present event is put on the same queue as the rest of the XEvents. It
would be nicer for it be placed somewhere else so we could just wait
on that queue without having to deal with other possible events in
there. In theory, xcb could do that with special events, but it doesn't
really matter in practice.
Unsurprisingly, this doesn't work on NVIDIA. Well NVIDIA does actually
receive presentation events, but for whatever the calculations used make
timings worse which defeats the purpose. This works perfectly fine on
Mesa however. Utilizing the previous commit that detects Xrandr
providers, we can enable this mechanism for users that have both Mesa
and not NVIDIA (to avoid messing up anyone that has a switchable
graphics system or such). Patches welcome if anyone figures out how to
fix this on NVIDIA.
Unlike the EGL/GLX sync extensions, the present extension works with any
graphics API (good for vulkan since its timing extension has been in
development hell). NVIDIA also happens to have zero support for the
EGL/GLX sync extensions, so we can just remove it with no loss. Only
Xorg ever used it and other backends already have their own present
methods. vo_vdpau VO is a special case that has its own fancying timing
code in its flip_page. This presumably works well, and I have no way of
testing it so just leave it as it is.
2022-06-10 16:49:38 +00:00
|
|
|
.swap_buffers = xlib_vk_swap_buffers,
|
|
|
|
.get_vsync = xlib_vk_get_vsync,
|
x11: avoid wasteful rendering when possible
Because wayland is a special snowflake, mpv wound up incorporating a lot
of logic into its render loop where visibilty checks are performed
before rendering anything (in the name of efficiency of course). Only
wayland actually uses this, but there's no reason why other backends
(x11 in this commit) can't be smarter. It's far easier on xorg since we
can just query _NET_WM_STATE_HIDDEN directly and not have to do silly
callback dances.
The function, vo_x11_check_net_wm_state_change, already tracks net wm
changes, including _NET_WM_STATE_HIDDEN. There is an already existing
window_hidden variable but that is actually just for checking if the
window was mapped and has nothing to do with this particular atom. mpv
also currently assumes that a _NET_WM_STATE_HIDDEN is exactly the same
as being minimized but according to the spec, that's not neccesarily
true (in practice, it's likely that these are the same though). Anyways,
just keep track of this state in a new variable (hidden) and use that
for determing if mpv should render or not.
There is one catch though: this cannot work if a display sync mode is
used. This is why the previous commit is needed. The display sync modes
in mpv require a blocking vsync implementation since its render loop is
directly driven by vsync. In xorg, if nothing is actually rendered, then
there's nothing for eglSwapBuffers (or FIFO for vulkan) to block on so
it returns immediately. This, of course, results in completely broken
video. We just need to check to make sure that we aren't in a display
sync mode before trying to be smart about rendering. Display sync is
power inefficient anyways, so no one is really being hurt here. As an
aside, this happens to work in wayland because there's basically a
custom (and ugly) vsync blocking function + timeout but that's off
topic.
2022-04-07 03:36:30 +00:00
|
|
|
};
|
2019-10-07 20:58:36 +00:00
|
|
|
|
2018-11-10 11:53:33 +00:00
|
|
|
VkInstance inst = vk->vkinst->instance;
|
|
|
|
VkResult res = vkCreateXlibSurfaceKHR(inst, &xinfo, NULL, &vk->surface);
|
vo_gpu: vulkan: initial implementation
This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop!
Current problems / limitations / improvement opportunities:
1. The swapchain/flipping code violates the vulkan spec, by assuming
that the presentation queue will be bounded (in cases where rendering
is significantly faster than vsync). But apparently, there's simply
no better way to do this right now, to the point where even the
stupid cube.c examples from LunarG etc. do it wrong.
(cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370)
2. The memory allocator could be improved. (This is a universal
constant)
3. Could explore using push descriptors instead of descriptor sets,
especially since we expect to switch descriptors semi-often for some
passes (like interpolation). Probably won't make a difference, but
the synchronization overhead might be a factor. Who knows.
4. Parallelism across frames / async transfer is not well-defined, we
either need to use a better semaphore / command buffer strategy or a
resource pooling layer to safely handle cross-frame parallelism.
(That said, I gave resource pooling a try and was not happy with the
result at all - so I'm still exploring the semaphore strategy)
5. We aggressively use pipeline barriers where events would offer a much
more fine-grained synchronization mechanism. As a result of this, we
might be suffering from GPU bubbles due to too-short dependencies on
objects. (That said, I'm also exploring the use of semaphores as a an
ordering tactic which would allow cross-frame time slicing in theory)
Some minor changes to the vo_gpu and infrastructure, but nothing
consequential.
NOTE: For safety, all use of asynchronous commands / multiple command
pools is currently disabled completely. There are some left-over relics
of this in the code (e.g. the distinction between dev_poll and
pool_poll), but that is kept in place mostly because this will be
re-extended in the future (vulkan rev 2).
The queue count is also currently capped to 1, because of the lack of
cross-frame semaphores means we need the implicit synchronization from
the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
|
|
|
if (res != VK_SUCCESS) {
|
2018-11-10 11:53:33 +00:00
|
|
|
MP_MSG(ctx, msgl, "Failed creating Xlib surface\n");
|
vo_gpu: vulkan: initial implementation
This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop!
Current problems / limitations / improvement opportunities:
1. The swapchain/flipping code violates the vulkan spec, by assuming
that the presentation queue will be bounded (in cases where rendering
is significantly faster than vsync). But apparently, there's simply
no better way to do this right now, to the point where even the
stupid cube.c examples from LunarG etc. do it wrong.
(cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370)
2. The memory allocator could be improved. (This is a universal
constant)
3. Could explore using push descriptors instead of descriptor sets,
especially since we expect to switch descriptors semi-often for some
passes (like interpolation). Probably won't make a difference, but
the synchronization overhead might be a factor. Who knows.
4. Parallelism across frames / async transfer is not well-defined, we
either need to use a better semaphore / command buffer strategy or a
resource pooling layer to safely handle cross-frame parallelism.
(That said, I gave resource pooling a try and was not happy with the
result at all - so I'm still exploring the semaphore strategy)
5. We aggressively use pipeline barriers where events would offer a much
more fine-grained synchronization mechanism. As a result of this, we
might be suffering from GPU bubbles due to too-short dependencies on
objects. (That said, I'm also exploring the use of semaphores as a an
ordering tactic which would allow cross-frame time slicing in theory)
Some minor changes to the vo_gpu and infrastructure, but nothing
consequential.
NOTE: For safety, all use of asynchronous commands / multiple command
pools is currently disabled completely. There are some left-over relics
of this in the code (e.g. the distinction between dev_poll and
pool_poll), but that is kept in place mostly because this will be
re-extended in the future (vulkan rev 2).
The queue count is also currently capped to 1, because of the lack of
cross-frame semaphores means we need the implicit synchronization from
the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
2019-10-07 20:58:36 +00:00
|
|
|
if (!ra_vk_ctx_init(ctx, vk, params, VK_PRESENT_MODE_FIFO_KHR))
|
vo_gpu: vulkan: initial implementation
This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop!
Current problems / limitations / improvement opportunities:
1. The swapchain/flipping code violates the vulkan spec, by assuming
that the presentation queue will be bounded (in cases where rendering
is significantly faster than vsync). But apparently, there's simply
no better way to do this right now, to the point where even the
stupid cube.c examples from LunarG etc. do it wrong.
(cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370)
2. The memory allocator could be improved. (This is a universal
constant)
3. Could explore using push descriptors instead of descriptor sets,
especially since we expect to switch descriptors semi-often for some
passes (like interpolation). Probably won't make a difference, but
the synchronization overhead might be a factor. Who knows.
4. Parallelism across frames / async transfer is not well-defined, we
either need to use a better semaphore / command buffer strategy or a
resource pooling layer to safely handle cross-frame parallelism.
(That said, I gave resource pooling a try and was not happy with the
result at all - so I'm still exploring the semaphore strategy)
5. We aggressively use pipeline barriers where events would offer a much
more fine-grained synchronization mechanism. As a result of this, we
might be suffering from GPU bubbles due to too-short dependencies on
objects. (That said, I'm also exploring the use of semaphores as a an
ordering tactic which would allow cross-frame time slicing in theory)
Some minor changes to the vo_gpu and infrastructure, but nothing
consequential.
NOTE: For safety, all use of asynchronous commands / multiple command
pools is currently disabled completely. There are some left-over relics
of this in the code (e.g. the distinction between dev_poll and
pool_poll), but that is kept in place mostly because this will be
re-extended in the future (vulkan rev 2).
The queue count is also currently capped to 1, because of the lack of
cross-frame semaphores means we need the implicit synchronization from
the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
|
|
|
goto error;
|
|
|
|
|
2018-12-30 22:59:48 +00:00
|
|
|
ra_add_native_resource(ctx->ra, "x11", ctx->vo->x11->display);
|
|
|
|
|
vo_gpu: vulkan: initial implementation
This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop!
Current problems / limitations / improvement opportunities:
1. The swapchain/flipping code violates the vulkan spec, by assuming
that the presentation queue will be bounded (in cases where rendering
is significantly faster than vsync). But apparently, there's simply
no better way to do this right now, to the point where even the
stupid cube.c examples from LunarG etc. do it wrong.
(cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370)
2. The memory allocator could be improved. (This is a universal
constant)
3. Could explore using push descriptors instead of descriptor sets,
especially since we expect to switch descriptors semi-often for some
passes (like interpolation). Probably won't make a difference, but
the synchronization overhead might be a factor. Who knows.
4. Parallelism across frames / async transfer is not well-defined, we
either need to use a better semaphore / command buffer strategy or a
resource pooling layer to safely handle cross-frame parallelism.
(That said, I gave resource pooling a try and was not happy with the
result at all - so I'm still exploring the semaphore strategy)
5. We aggressively use pipeline barriers where events would offer a much
more fine-grained synchronization mechanism. As a result of this, we
might be suffering from GPU bubbles due to too-short dependencies on
objects. (That said, I'm also exploring the use of semaphores as a an
ordering tactic which would allow cross-frame time slicing in theory)
Some minor changes to the vo_gpu and infrastructure, but nothing
consequential.
NOTE: For safety, all use of asynchronous commands / multiple command
pools is currently disabled completely. There are some left-over relics
of this in the code (e.g. the distinction between dev_poll and
pool_poll), but that is kept in place mostly because this will be
re-extended in the future (vulkan rev 2).
The queue count is also currently capped to 1, because of the lack of
cross-frame semaphores means we need the implicit synchronization from
the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
|
|
|
return true;
|
|
|
|
|
|
|
|
error:
|
|
|
|
xlib_uninit(ctx);
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool resize(struct ra_ctx *ctx)
|
|
|
|
{
|
2018-11-10 11:53:33 +00:00
|
|
|
return ra_vk_ctx_resize(ctx, ctx->vo->dwidth, ctx->vo->dheight);
|
vo_gpu: vulkan: initial implementation
This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop!
Current problems / limitations / improvement opportunities:
1. The swapchain/flipping code violates the vulkan spec, by assuming
that the presentation queue will be bounded (in cases where rendering
is significantly faster than vsync). But apparently, there's simply
no better way to do this right now, to the point where even the
stupid cube.c examples from LunarG etc. do it wrong.
(cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370)
2. The memory allocator could be improved. (This is a universal
constant)
3. Could explore using push descriptors instead of descriptor sets,
especially since we expect to switch descriptors semi-often for some
passes (like interpolation). Probably won't make a difference, but
the synchronization overhead might be a factor. Who knows.
4. Parallelism across frames / async transfer is not well-defined, we
either need to use a better semaphore / command buffer strategy or a
resource pooling layer to safely handle cross-frame parallelism.
(That said, I gave resource pooling a try and was not happy with the
result at all - so I'm still exploring the semaphore strategy)
5. We aggressively use pipeline barriers where events would offer a much
more fine-grained synchronization mechanism. As a result of this, we
might be suffering from GPU bubbles due to too-short dependencies on
objects. (That said, I'm also exploring the use of semaphores as a an
ordering tactic which would allow cross-frame time slicing in theory)
Some minor changes to the vo_gpu and infrastructure, but nothing
consequential.
NOTE: For safety, all use of asynchronous commands / multiple command
pools is currently disabled completely. There are some left-over relics
of this in the code (e.g. the distinction between dev_poll and
pool_poll), but that is kept in place mostly because this will be
re-extended in the future (vulkan rev 2).
The queue count is also currently capped to 1, because of the lack of
cross-frame semaphores means we need the implicit synchronization from
the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static bool xlib_reconfig(struct ra_ctx *ctx)
|
|
|
|
{
|
|
|
|
vo_x11_config_vo_window(ctx->vo);
|
|
|
|
return resize(ctx);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int xlib_control(struct ra_ctx *ctx, int *events, int request, void *arg)
|
|
|
|
{
|
|
|
|
int ret = vo_x11_control(ctx->vo, events, request, arg);
|
|
|
|
if (*events & VO_EVENT_RESIZE) {
|
|
|
|
if (!resize(ctx))
|
|
|
|
return VO_ERROR;
|
|
|
|
}
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void xlib_wakeup(struct ra_ctx *ctx)
|
|
|
|
{
|
|
|
|
vo_x11_wakeup(ctx->vo);
|
|
|
|
}
|
|
|
|
|
2023-09-18 02:34:32 +00:00
|
|
|
static void xlib_wait_events(struct ra_ctx *ctx, int64_t until_time_ns)
|
vo_gpu: vulkan: initial implementation
This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop!
Current problems / limitations / improvement opportunities:
1. The swapchain/flipping code violates the vulkan spec, by assuming
that the presentation queue will be bounded (in cases where rendering
is significantly faster than vsync). But apparently, there's simply
no better way to do this right now, to the point where even the
stupid cube.c examples from LunarG etc. do it wrong.
(cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370)
2. The memory allocator could be improved. (This is a universal
constant)
3. Could explore using push descriptors instead of descriptor sets,
especially since we expect to switch descriptors semi-often for some
passes (like interpolation). Probably won't make a difference, but
the synchronization overhead might be a factor. Who knows.
4. Parallelism across frames / async transfer is not well-defined, we
either need to use a better semaphore / command buffer strategy or a
resource pooling layer to safely handle cross-frame parallelism.
(That said, I gave resource pooling a try and was not happy with the
result at all - so I'm still exploring the semaphore strategy)
5. We aggressively use pipeline barriers where events would offer a much
more fine-grained synchronization mechanism. As a result of this, we
might be suffering from GPU bubbles due to too-short dependencies on
objects. (That said, I'm also exploring the use of semaphores as a an
ordering tactic which would allow cross-frame time slicing in theory)
Some minor changes to the vo_gpu and infrastructure, but nothing
consequential.
NOTE: For safety, all use of asynchronous commands / multiple command
pools is currently disabled completely. There are some left-over relics
of this in the code (e.g. the distinction between dev_poll and
pool_poll), but that is kept in place mostly because this will be
re-extended in the future (vulkan rev 2).
The queue count is also currently capped to 1, because of the lack of
cross-frame semaphores means we need the implicit synchronization from
the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
|
|
|
{
|
2023-09-18 02:34:32 +00:00
|
|
|
vo_x11_wait_events(ctx->vo, until_time_ns);
|
vo_gpu: vulkan: initial implementation
This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop!
Current problems / limitations / improvement opportunities:
1. The swapchain/flipping code violates the vulkan spec, by assuming
that the presentation queue will be bounded (in cases where rendering
is significantly faster than vsync). But apparently, there's simply
no better way to do this right now, to the point where even the
stupid cube.c examples from LunarG etc. do it wrong.
(cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370)
2. The memory allocator could be improved. (This is a universal
constant)
3. Could explore using push descriptors instead of descriptor sets,
especially since we expect to switch descriptors semi-often for some
passes (like interpolation). Probably won't make a difference, but
the synchronization overhead might be a factor. Who knows.
4. Parallelism across frames / async transfer is not well-defined, we
either need to use a better semaphore / command buffer strategy or a
resource pooling layer to safely handle cross-frame parallelism.
(That said, I gave resource pooling a try and was not happy with the
result at all - so I'm still exploring the semaphore strategy)
5. We aggressively use pipeline barriers where events would offer a much
more fine-grained synchronization mechanism. As a result of this, we
might be suffering from GPU bubbles due to too-short dependencies on
objects. (That said, I'm also exploring the use of semaphores as a an
ordering tactic which would allow cross-frame time slicing in theory)
Some minor changes to the vo_gpu and infrastructure, but nothing
consequential.
NOTE: For safety, all use of asynchronous commands / multiple command
pools is currently disabled completely. There are some left-over relics
of this in the code (e.g. the distinction between dev_poll and
pool_poll), but that is kept in place mostly because this will be
re-extended in the future (vulkan rev 2).
The queue count is also currently capped to 1, because of the lack of
cross-frame semaphores means we need the implicit synchronization from
the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
const struct ra_ctx_fns ra_ctx_vulkan_xlib = {
|
|
|
|
.type = "vulkan",
|
vo_gpu: semi-fix --gpu-context/--gpu-api options and help output
This was confusing at best. Change it to output the actual choices.
(Seems like in the end it's always me who has to clean up other people's
bullshit.)
Context names were not unique - but they should be, so fix it. The whole
point of the original --opengl-backend option was to side-step the
tricky auto-detection, so you know exactly what you get. The goal of
this commit is to make --gpu-context work the same way. Fix the
non-unique names by appending "vk" to the names.
Keep in mind that this was not suitable for slecting the "UI" backend
anyway, since "x11" would force GLX, whereas people on not-NVIDIA
actually want "x11egl". Users trying to use --gpu-context=x11 to force
the X11 backend would always end up with GLX, which would at least break
VAAPI hardware decoding for them. Basically the idea that this option
could select the "UI" type is completely broken - it selects an
implementation, which implies a UI. Selecting the UI type This would
require a separate mechanism. (Although in theory this separate
mechanism could be part of the --gpu-context option - in any case,
someone would have to implement it.)
To achieve help output that can actually be understood, just duplicate
the code. Most of that code is duplicated anyway, and trying to share
just the list code with the result of making the output unreadable
doesn't make too much sense. If we wanted to save code/effort, we could
just remove the help output altogether.
--gpu-api has non-unique entries, and it would be nice to group them
(e.g. list all OpenGL capable contexts with "opengl"), but C makes this
simple idea too much of a pain, so don't do it.
Also remove a stray tab from the android entry on the manpage.
2017-10-16 08:53:33 +00:00
|
|
|
.name = "x11vk",
|
2024-04-16 03:18:44 +00:00
|
|
|
.description = "X11/Vulkan",
|
vo_gpu: vulkan: initial implementation
This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop!
Current problems / limitations / improvement opportunities:
1. The swapchain/flipping code violates the vulkan spec, by assuming
that the presentation queue will be bounded (in cases where rendering
is significantly faster than vsync). But apparently, there's simply
no better way to do this right now, to the point where even the
stupid cube.c examples from LunarG etc. do it wrong.
(cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370)
2. The memory allocator could be improved. (This is a universal
constant)
3. Could explore using push descriptors instead of descriptor sets,
especially since we expect to switch descriptors semi-often for some
passes (like interpolation). Probably won't make a difference, but
the synchronization overhead might be a factor. Who knows.
4. Parallelism across frames / async transfer is not well-defined, we
either need to use a better semaphore / command buffer strategy or a
resource pooling layer to safely handle cross-frame parallelism.
(That said, I gave resource pooling a try and was not happy with the
result at all - so I'm still exploring the semaphore strategy)
5. We aggressively use pipeline barriers where events would offer a much
more fine-grained synchronization mechanism. As a result of this, we
might be suffering from GPU bubbles due to too-short dependencies on
objects. (That said, I'm also exploring the use of semaphores as a an
ordering tactic which would allow cross-frame time slicing in theory)
Some minor changes to the vo_gpu and infrastructure, but nothing
consequential.
NOTE: For safety, all use of asynchronous commands / multiple command
pools is currently disabled completely. There are some left-over relics
of this in the code (e.g. the distinction between dev_poll and
pool_poll), but that is kept in place mostly because this will be
re-extended in the future (vulkan rev 2).
The queue count is also currently capped to 1, because of the lack of
cross-frame semaphores means we need the implicit synchronization from
the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
|
|
|
.reconfig = xlib_reconfig,
|
|
|
|
.control = xlib_control,
|
|
|
|
.wakeup = xlib_wakeup,
|
|
|
|
.wait_events = xlib_wait_events,
|
|
|
|
.init = xlib_init,
|
|
|
|
.uninit = xlib_uninit,
|
|
|
|
};
|