mpv/video/out/vulkan/context_xlib.c

144 lines
3.7 KiB
C
Raw Normal View History

vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
/*
* This file is part of mpv.
*
* mpv is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation; either
* version 2.1 of the License, or (at your option) any later version.
*
* mpv is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with mpv. If not, see <http://www.gnu.org/licenses/>.
*/
#include "video/out/gpu/context.h"
x11: support xorg present extension This builds off of present_sync which was introduced in a previous commit to support xorg's present extension in all of the X11 backends (sans vdpau) in mpv. It turns out there is an Xpresent library that integrates the xorg present extention with Xlib (which barely anyone seems to use), so this can be added without too much trouble. The workflow is to first setup the event by telling Xorg we would like to receive PresentCompleteNotify (there are others in the extension but this is the only one we really care about). After that, just call XPresentNotifyMSC after every buffer swap with a target_msc of 0. Xorg then returns the last presentation through its usual event loop and we go ahead and use that information to update mpv's values for vsync timing purposes. One theoretical weakness of this approach is that the present event is put on the same queue as the rest of the XEvents. It would be nicer for it be placed somewhere else so we could just wait on that queue without having to deal with other possible events in there. In theory, xcb could do that with special events, but it doesn't really matter in practice. Unsurprisingly, this doesn't work on NVIDIA. Well NVIDIA does actually receive presentation events, but for whatever the calculations used make timings worse which defeats the purpose. This works perfectly fine on Mesa however. Utilizing the previous commit that detects Xrandr providers, we can enable this mechanism for users that have both Mesa and not NVIDIA (to avoid messing up anyone that has a switchable graphics system or such). Patches welcome if anyone figures out how to fix this on NVIDIA. Unlike the EGL/GLX sync extensions, the present extension works with any graphics API (good for vulkan since its timing extension has been in development hell). NVIDIA also happens to have zero support for the EGL/GLX sync extensions, so we can just remove it with no loss. Only Xorg ever used it and other backends already have their own present methods. vo_vdpau VO is a special case that has its own fancying timing code in its flip_page. This presumably works well, and I have no way of testing it so just leave it as it is.
2022-06-10 16:49:38 +00:00
#include "video/out/present_sync.h"
vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
#include "video/out/x11_common.h"
#include "common.h"
#include "context.h"
#include "utils.h"
struct priv {
struct mpvk_ctx vk;
};
x11: avoid wasteful rendering when possible Because wayland is a special snowflake, mpv wound up incorporating a lot of logic into its render loop where visibilty checks are performed before rendering anything (in the name of efficiency of course). Only wayland actually uses this, but there's no reason why other backends (x11 in this commit) can't be smarter. It's far easier on xorg since we can just query _NET_WM_STATE_HIDDEN directly and not have to do silly callback dances. The function, vo_x11_check_net_wm_state_change, already tracks net wm changes, including _NET_WM_STATE_HIDDEN. There is an already existing window_hidden variable but that is actually just for checking if the window was mapped and has nothing to do with this particular atom. mpv also currently assumes that a _NET_WM_STATE_HIDDEN is exactly the same as being minimized but according to the spec, that's not neccesarily true (in practice, it's likely that these are the same though). Anyways, just keep track of this state in a new variable (hidden) and use that for determing if mpv should render or not. There is one catch though: this cannot work if a display sync mode is used. This is why the previous commit is needed. The display sync modes in mpv require a blocking vsync implementation since its render loop is directly driven by vsync. In xorg, if nothing is actually rendered, then there's nothing for eglSwapBuffers (or FIFO for vulkan) to block on so it returns immediately. This, of course, results in completely broken video. We just need to check to make sure that we aren't in a display sync mode before trying to be smart about rendering. Display sync is power inefficient anyways, so no one is really being hurt here. As an aside, this happens to work in wayland because there's basically a custom (and ugly) vsync blocking function + timeout but that's off topic.
2022-04-07 03:36:30 +00:00
static bool xlib_check_visible(struct ra_ctx *ctx)
{
return vo_x11_check_visible(ctx->vo);
}
x11: support xorg present extension This builds off of present_sync which was introduced in a previous commit to support xorg's present extension in all of the X11 backends (sans vdpau) in mpv. It turns out there is an Xpresent library that integrates the xorg present extention with Xlib (which barely anyone seems to use), so this can be added without too much trouble. The workflow is to first setup the event by telling Xorg we would like to receive PresentCompleteNotify (there are others in the extension but this is the only one we really care about). After that, just call XPresentNotifyMSC after every buffer swap with a target_msc of 0. Xorg then returns the last presentation through its usual event loop and we go ahead and use that information to update mpv's values for vsync timing purposes. One theoretical weakness of this approach is that the present event is put on the same queue as the rest of the XEvents. It would be nicer for it be placed somewhere else so we could just wait on that queue without having to deal with other possible events in there. In theory, xcb could do that with special events, but it doesn't really matter in practice. Unsurprisingly, this doesn't work on NVIDIA. Well NVIDIA does actually receive presentation events, but for whatever the calculations used make timings worse which defeats the purpose. This works perfectly fine on Mesa however. Utilizing the previous commit that detects Xrandr providers, we can enable this mechanism for users that have both Mesa and not NVIDIA (to avoid messing up anyone that has a switchable graphics system or such). Patches welcome if anyone figures out how to fix this on NVIDIA. Unlike the EGL/GLX sync extensions, the present extension works with any graphics API (good for vulkan since its timing extension has been in development hell). NVIDIA also happens to have zero support for the EGL/GLX sync extensions, so we can just remove it with no loss. Only Xorg ever used it and other backends already have their own present methods. vo_vdpau VO is a special case that has its own fancying timing code in its flip_page. This presumably works well, and I have no way of testing it so just leave it as it is.
2022-06-10 16:49:38 +00:00
static void xlib_vk_swap_buffers(struct ra_ctx *ctx)
{
x11: remove PresentNotifyMSC from egl/glx/vulkan to fix xpresent timing PresentNotifyMSC turns out to be not only redundant, but also harmful with mesa-backed egl/glx/vulkan VOs because for all of them, mesa uses PresentPixmap behind the scenes when DRI3 is available, which already spawns a PresentCompleteNotify event when the buffer swap actually finishes. This is important because without using the timing information from these PresentCompleteKindPixmap events, there's no way for mpv to know exactly when a frame becomes visible on the display. By using PresentNotifyMSC in conjunction with DRI3-enabled mesa, two problems are created: 1. mpv assumes that a vblank won't elapse (i.e., it assumes the current MSC won't change) between the time when mesa enqueues the buffer swap and the time when mpv calls PresentNotifyMSC to ask xorg for a notification at the next MSC, relative to the current MSC at the time that xorg reads it for the PresentNotifyMSC call. This means that mpv could get a notification one or more vblanks later than it expects, since the intention here is for mpv to get a notification at the MSC that the buffer swap completes. 2. mpv assumes that a buffer swap always takes one vblank to complete, which isn't always true. A buffer swap (i.e., a page flip) could take longer than that depending on hardware conditions (if the GPU is running slowly or needs to exit a low-power state), scheduling delays (under heavy system or GPU load), or unfortunate timing (if the raster scan line happens to be at one of the last few rows of pixels and a vblank elapses just before the buffer swap is enqueued). This causes mpv to have a faulty assumption of when frames become visible. Since mpv already receives the PresentCompleteNotify events generated by mesa's buffer swaps under the hood, the PresentNotifyMSC usage is unneeded and just throws a wrench in mpv's vsync timing when xpresent is enabled. Simply removing the PresentNotifyMSC usage from the egl, glx, and vulkan VOs fixes the xpresent vsync timing.
2023-01-25 02:43:05 +00:00
if (ctx->vo->x11->use_present)
present_sync_swap(ctx->vo->x11->present);
x11: support xorg present extension This builds off of present_sync which was introduced in a previous commit to support xorg's present extension in all of the X11 backends (sans vdpau) in mpv. It turns out there is an Xpresent library that integrates the xorg present extention with Xlib (which barely anyone seems to use), so this can be added without too much trouble. The workflow is to first setup the event by telling Xorg we would like to receive PresentCompleteNotify (there are others in the extension but this is the only one we really care about). After that, just call XPresentNotifyMSC after every buffer swap with a target_msc of 0. Xorg then returns the last presentation through its usual event loop and we go ahead and use that information to update mpv's values for vsync timing purposes. One theoretical weakness of this approach is that the present event is put on the same queue as the rest of the XEvents. It would be nicer for it be placed somewhere else so we could just wait on that queue without having to deal with other possible events in there. In theory, xcb could do that with special events, but it doesn't really matter in practice. Unsurprisingly, this doesn't work on NVIDIA. Well NVIDIA does actually receive presentation events, but for whatever the calculations used make timings worse which defeats the purpose. This works perfectly fine on Mesa however. Utilizing the previous commit that detects Xrandr providers, we can enable this mechanism for users that have both Mesa and not NVIDIA (to avoid messing up anyone that has a switchable graphics system or such). Patches welcome if anyone figures out how to fix this on NVIDIA. Unlike the EGL/GLX sync extensions, the present extension works with any graphics API (good for vulkan since its timing extension has been in development hell). NVIDIA also happens to have zero support for the EGL/GLX sync extensions, so we can just remove it with no loss. Only Xorg ever used it and other backends already have their own present methods. vo_vdpau VO is a special case that has its own fancying timing code in its flip_page. This presumably works well, and I have no way of testing it so just leave it as it is.
2022-06-10 16:49:38 +00:00
}
static void xlib_vk_get_vsync(struct ra_ctx *ctx, struct vo_vsync_info *info)
{
struct vo_x11_state *x11 = ctx->vo->x11;
if (ctx->vo->x11->use_present)
present_sync_get_info(x11->present, info);
x11: support xorg present extension This builds off of present_sync which was introduced in a previous commit to support xorg's present extension in all of the X11 backends (sans vdpau) in mpv. It turns out there is an Xpresent library that integrates the xorg present extention with Xlib (which barely anyone seems to use), so this can be added without too much trouble. The workflow is to first setup the event by telling Xorg we would like to receive PresentCompleteNotify (there are others in the extension but this is the only one we really care about). After that, just call XPresentNotifyMSC after every buffer swap with a target_msc of 0. Xorg then returns the last presentation through its usual event loop and we go ahead and use that information to update mpv's values for vsync timing purposes. One theoretical weakness of this approach is that the present event is put on the same queue as the rest of the XEvents. It would be nicer for it be placed somewhere else so we could just wait on that queue without having to deal with other possible events in there. In theory, xcb could do that with special events, but it doesn't really matter in practice. Unsurprisingly, this doesn't work on NVIDIA. Well NVIDIA does actually receive presentation events, but for whatever the calculations used make timings worse which defeats the purpose. This works perfectly fine on Mesa however. Utilizing the previous commit that detects Xrandr providers, we can enable this mechanism for users that have both Mesa and not NVIDIA (to avoid messing up anyone that has a switchable graphics system or such). Patches welcome if anyone figures out how to fix this on NVIDIA. Unlike the EGL/GLX sync extensions, the present extension works with any graphics API (good for vulkan since its timing extension has been in development hell). NVIDIA also happens to have zero support for the EGL/GLX sync extensions, so we can just remove it with no loss. Only Xorg ever used it and other backends already have their own present methods. vo_vdpau VO is a special case that has its own fancying timing code in its flip_page. This presumably works well, and I have no way of testing it so just leave it as it is.
2022-06-10 16:49:38 +00:00
}
vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
static void xlib_uninit(struct ra_ctx *ctx)
{
struct priv *p = ctx->priv;
ra_vk_ctx_uninit(ctx);
mpvk_uninit(&p->vk);
vo_x11_uninit(ctx->vo);
}
static bool xlib_init(struct ra_ctx *ctx)
{
struct priv *p = ctx->priv = talloc_zero(ctx, struct priv);
struct mpvk_ctx *vk = &p->vk;
int msgl = ctx->opts.probing ? MSGL_V : MSGL_ERR;
if (!mpvk_init(vk, ctx, VK_KHR_XLIB_SURFACE_EXTENSION_NAME))
vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
goto error;
if (!vo_x11_init(ctx->vo))
vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
goto error;
if (!vo_x11_create_vo_window(ctx->vo, NULL, "mpvk"))
vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
goto error;
VkXlibSurfaceCreateInfoKHR xinfo = {
.sType = VK_STRUCTURE_TYPE_XLIB_SURFACE_CREATE_INFO_KHR,
.dpy = ctx->vo->x11->display,
.window = ctx->vo->x11->window,
};
x11: avoid wasteful rendering when possible Because wayland is a special snowflake, mpv wound up incorporating a lot of logic into its render loop where visibilty checks are performed before rendering anything (in the name of efficiency of course). Only wayland actually uses this, but there's no reason why other backends (x11 in this commit) can't be smarter. It's far easier on xorg since we can just query _NET_WM_STATE_HIDDEN directly and not have to do silly callback dances. The function, vo_x11_check_net_wm_state_change, already tracks net wm changes, including _NET_WM_STATE_HIDDEN. There is an already existing window_hidden variable but that is actually just for checking if the window was mapped and has nothing to do with this particular atom. mpv also currently assumes that a _NET_WM_STATE_HIDDEN is exactly the same as being minimized but according to the spec, that's not neccesarily true (in practice, it's likely that these are the same though). Anyways, just keep track of this state in a new variable (hidden) and use that for determing if mpv should render or not. There is one catch though: this cannot work if a display sync mode is used. This is why the previous commit is needed. The display sync modes in mpv require a blocking vsync implementation since its render loop is directly driven by vsync. In xorg, if nothing is actually rendered, then there's nothing for eglSwapBuffers (or FIFO for vulkan) to block on so it returns immediately. This, of course, results in completely broken video. We just need to check to make sure that we aren't in a display sync mode before trying to be smart about rendering. Display sync is power inefficient anyways, so no one is really being hurt here. As an aside, this happens to work in wayland because there's basically a custom (and ugly) vsync blocking function + timeout but that's off topic.
2022-04-07 03:36:30 +00:00
struct ra_vk_ctx_params params = {
.check_visible = xlib_check_visible,
x11: support xorg present extension This builds off of present_sync which was introduced in a previous commit to support xorg's present extension in all of the X11 backends (sans vdpau) in mpv. It turns out there is an Xpresent library that integrates the xorg present extention with Xlib (which barely anyone seems to use), so this can be added without too much trouble. The workflow is to first setup the event by telling Xorg we would like to receive PresentCompleteNotify (there are others in the extension but this is the only one we really care about). After that, just call XPresentNotifyMSC after every buffer swap with a target_msc of 0. Xorg then returns the last presentation through its usual event loop and we go ahead and use that information to update mpv's values for vsync timing purposes. One theoretical weakness of this approach is that the present event is put on the same queue as the rest of the XEvents. It would be nicer for it be placed somewhere else so we could just wait on that queue without having to deal with other possible events in there. In theory, xcb could do that with special events, but it doesn't really matter in practice. Unsurprisingly, this doesn't work on NVIDIA. Well NVIDIA does actually receive presentation events, but for whatever the calculations used make timings worse which defeats the purpose. This works perfectly fine on Mesa however. Utilizing the previous commit that detects Xrandr providers, we can enable this mechanism for users that have both Mesa and not NVIDIA (to avoid messing up anyone that has a switchable graphics system or such). Patches welcome if anyone figures out how to fix this on NVIDIA. Unlike the EGL/GLX sync extensions, the present extension works with any graphics API (good for vulkan since its timing extension has been in development hell). NVIDIA also happens to have zero support for the EGL/GLX sync extensions, so we can just remove it with no loss. Only Xorg ever used it and other backends already have their own present methods. vo_vdpau VO is a special case that has its own fancying timing code in its flip_page. This presumably works well, and I have no way of testing it so just leave it as it is.
2022-06-10 16:49:38 +00:00
.swap_buffers = xlib_vk_swap_buffers,
.get_vsync = xlib_vk_get_vsync,
x11: avoid wasteful rendering when possible Because wayland is a special snowflake, mpv wound up incorporating a lot of logic into its render loop where visibilty checks are performed before rendering anything (in the name of efficiency of course). Only wayland actually uses this, but there's no reason why other backends (x11 in this commit) can't be smarter. It's far easier on xorg since we can just query _NET_WM_STATE_HIDDEN directly and not have to do silly callback dances. The function, vo_x11_check_net_wm_state_change, already tracks net wm changes, including _NET_WM_STATE_HIDDEN. There is an already existing window_hidden variable but that is actually just for checking if the window was mapped and has nothing to do with this particular atom. mpv also currently assumes that a _NET_WM_STATE_HIDDEN is exactly the same as being minimized but according to the spec, that's not neccesarily true (in practice, it's likely that these are the same though). Anyways, just keep track of this state in a new variable (hidden) and use that for determing if mpv should render or not. There is one catch though: this cannot work if a display sync mode is used. This is why the previous commit is needed. The display sync modes in mpv require a blocking vsync implementation since its render loop is directly driven by vsync. In xorg, if nothing is actually rendered, then there's nothing for eglSwapBuffers (or FIFO for vulkan) to block on so it returns immediately. This, of course, results in completely broken video. We just need to check to make sure that we aren't in a display sync mode before trying to be smart about rendering. Display sync is power inefficient anyways, so no one is really being hurt here. As an aside, this happens to work in wayland because there's basically a custom (and ugly) vsync blocking function + timeout but that's off topic.
2022-04-07 03:36:30 +00:00
};
wayland: use callback flag + poll for buffer swap The old way of using wayland in mpv relied on an external renderloop for semi-accurate timings. This had multiple issues though. Display sync would break whenever the window was hidden (since the frame callback stopped being executed) which was really annoying. Also the entire external renderloop logic was kind of fragile and didn't play well with mpv's internal structure (i.e. using presentation time in that old paradigm breaks stats.lua). Basically the problem is that swap buffers blocks on wayland which is crap whenever you hide the mpv window since it looks up the entire player. So you have to make swap buffers not block, but this has a different problem. Timings will be terrible if you use the unblocked swap buffers call. Based on some discussion in #wayland, the trick here is relatively simple and works well enough for our purposes. Instead we basically build a way to block with a timeout in the wayland buffer swap functions. A bool is set in the frame callback function that indicates whether or not mpv is waiting for a frame to be displayed. In the actual buffer swap function, we enter into a while loop waiting for this flag to be set. At the same time, the wl_display is polled to block the thread and wakeup if it receives any events from the compositor. This loop only breaks if enough time has passed or if the frame callback bool is received. In the near future, it is better to set whether or not frame a frame has been displayed in the presentation feedback. However as a first pass, doing it in the frame callback is more than good enough. The "downside" is that we render frames that aren't actually shown on screen when the player is hidden (it seems like wayland people don't like that). But who cares. Accurate timings are way more important. It's probably not too hard to add that behavior back in the player though.
2019-10-07 20:58:36 +00:00
VkInstance inst = vk->vkinst->instance;
VkResult res = vkCreateXlibSurfaceKHR(inst, &xinfo, NULL, &vk->surface);
vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
if (res != VK_SUCCESS) {
MP_MSG(ctx, msgl, "Failed creating Xlib surface\n");
vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
goto error;
}
wayland: use callback flag + poll for buffer swap The old way of using wayland in mpv relied on an external renderloop for semi-accurate timings. This had multiple issues though. Display sync would break whenever the window was hidden (since the frame callback stopped being executed) which was really annoying. Also the entire external renderloop logic was kind of fragile and didn't play well with mpv's internal structure (i.e. using presentation time in that old paradigm breaks stats.lua). Basically the problem is that swap buffers blocks on wayland which is crap whenever you hide the mpv window since it looks up the entire player. So you have to make swap buffers not block, but this has a different problem. Timings will be terrible if you use the unblocked swap buffers call. Based on some discussion in #wayland, the trick here is relatively simple and works well enough for our purposes. Instead we basically build a way to block with a timeout in the wayland buffer swap functions. A bool is set in the frame callback function that indicates whether or not mpv is waiting for a frame to be displayed. In the actual buffer swap function, we enter into a while loop waiting for this flag to be set. At the same time, the wl_display is polled to block the thread and wakeup if it receives any events from the compositor. This loop only breaks if enough time has passed or if the frame callback bool is received. In the near future, it is better to set whether or not frame a frame has been displayed in the presentation feedback. However as a first pass, doing it in the frame callback is more than good enough. The "downside" is that we render frames that aren't actually shown on screen when the player is hidden (it seems like wayland people don't like that). But who cares. Accurate timings are way more important. It's probably not too hard to add that behavior back in the player though.
2019-10-07 20:58:36 +00:00
if (!ra_vk_ctx_init(ctx, vk, params, VK_PRESENT_MODE_FIFO_KHR))
vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
goto error;
ra_add_native_resource(ctx->ra, "x11", ctx->vo->x11->display);
vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
return true;
error:
xlib_uninit(ctx);
return false;
}
static bool resize(struct ra_ctx *ctx)
{
return ra_vk_ctx_resize(ctx, ctx->vo->dwidth, ctx->vo->dheight);
vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
}
static bool xlib_reconfig(struct ra_ctx *ctx)
{
vo_x11_config_vo_window(ctx->vo);
return resize(ctx);
}
static int xlib_control(struct ra_ctx *ctx, int *events, int request, void *arg)
{
int ret = vo_x11_control(ctx->vo, events, request, arg);
if (*events & VO_EVENT_RESIZE) {
if (!resize(ctx))
return VO_ERROR;
}
return ret;
}
static void xlib_wakeup(struct ra_ctx *ctx)
{
vo_x11_wakeup(ctx->vo);
}
static void xlib_wait_events(struct ra_ctx *ctx, int64_t until_time_ns)
vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
{
vo_x11_wait_events(ctx->vo, until_time_ns);
vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
}
const struct ra_ctx_fns ra_ctx_vulkan_xlib = {
.type = "vulkan",
vo_gpu: semi-fix --gpu-context/--gpu-api options and help output This was confusing at best. Change it to output the actual choices. (Seems like in the end it's always me who has to clean up other people's bullshit.) Context names were not unique - but they should be, so fix it. The whole point of the original --opengl-backend option was to side-step the tricky auto-detection, so you know exactly what you get. The goal of this commit is to make --gpu-context work the same way. Fix the non-unique names by appending "vk" to the names. Keep in mind that this was not suitable for slecting the "UI" backend anyway, since "x11" would force GLX, whereas people on not-NVIDIA actually want "x11egl". Users trying to use --gpu-context=x11 to force the X11 backend would always end up with GLX, which would at least break VAAPI hardware decoding for them. Basically the idea that this option could select the "UI" type is completely broken - it selects an implementation, which implies a UI. Selecting the UI type This would require a separate mechanism. (Although in theory this separate mechanism could be part of the --gpu-context option - in any case, someone would have to implement it.) To achieve help output that can actually be understood, just duplicate the code. Most of that code is duplicated anyway, and trying to share just the list code with the result of making the output unreadable doesn't make too much sense. If we wanted to save code/effort, we could just remove the help output altogether. --gpu-api has non-unique entries, and it would be nice to group them (e.g. list all OpenGL capable contexts with "opengl"), but C makes this simple idea too much of a pain, so don't do it. Also remove a stray tab from the android entry on the manpage.
2017-10-16 08:53:33 +00:00
.name = "x11vk",
vo_gpu: vulkan: initial implementation This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result.
2016-09-14 18:54:18 +00:00
.reconfig = xlib_reconfig,
.control = xlib_control,
.wakeup = xlib_wakeup,
.wait_events = xlib_wait_events,
.init = xlib_init,
.uninit = xlib_uninit,
};