Commit Graph

58 Commits

Author SHA1 Message Date
dudemanguy ea4685b233 wayland: use callback flag + poll for buffer swap
The old way of using wayland in mpv relied on an external renderloop for
semi-accurate timings. This had multiple issues though. Display sync
would break whenever the window was hidden (since the frame callback
stopped being executed) which was really annoying. Also the entire
external renderloop logic was kind of fragile and didn't play well with
mpv's internal structure (i.e. using presentation time in that old
paradigm breaks stats.lua).

Basically the problem is that swap buffers blocks on wayland which is
crap whenever you hide the mpv window since it looks up the entire
player. So you have to make swap buffers not block, but this has a
different problem. Timings will be terrible if you use the unblocked
swap buffers call.

Based on some discussion in #wayland, the trick here is relatively
simple and works well enough for our purposes. Instead we basically
build a way to block with a timeout in the wayland buffer swap
functions.

A bool is set in the frame callback function that indicates whether or
not mpv is waiting for a frame to be displayed. In the actual buffer
swap function, we enter into a while loop waiting for this flag to be
set. At the same time, the wl_display is polled to block the thread and
wakeup if it receives any events from the compositor. This loop only
breaks if enough time has passed or if the frame callback bool is
received.

In the near future, it is better to set whether or not frame a frame has
been displayed in the presentation feedback. However as a first pass,
doing it in the frame callback is more than good enough.

The "downside" is that we render frames that aren't actually shown on
screen when the player is hidden (it seems like wayland people don't
like that). But who cares. Accurate timings are way more important. It's
probably not too hard to add that behavior back in the player though.
2019-10-10 17:41:19 +00:00
Anton Kindestam 6290420380 vo: make swapchain-depth option generic for all VOs
In preparation for making vo_drm able to use swapchain-depth
2019-09-28 14:10:01 +03:00
sfan5 e350ceef4c vo_gpu: vulkan: add Android context 2019-09-27 00:05:06 +03:00
Philip Langdale b539eb222b vo/gpu: vulkan: Pass the device name option through to libplacebo
We collect a 'vulkan-device' option today but then don't actually
pass it on, so it's useless. Once that's fixed, it can be used
to select a specific vulkan device by name.

Tested with the new nvidia offload feature to select between the
nvidia and intel GPUs.
2019-08-24 18:38:27 +02:00
Philip Langdale b70ed35ba4 vo_gpu: hwdec_vaapi: Add Vulkan interop
This change introduces a vulkan interop path for the vaapi hwdec.
The basic principles are mostly the same as for EGL, with the
exported dma_buf being imported by Vukan. The biggest difference
is that we cannot reuse the texture as we do with OpenGL - there's
no way to rebind a VkImage to a different piece of memory, as far
as I can see. So, a new texture is created on each map call.

I did not bother implementing a code path for the old libva API as
I think it's safe to assume any system with a working vulkan driver
will have access to a newer libva.

Note that we are using separate layers for the vaapi surface, just
as is done for EGL. This is because libplacebo doesn't support
multiplane images.

This change does not include format negotiation because no driver
implements the vk_ext_image_drm_format_modifier extension that
would be required to do that. In practice, the two formats we care
about (nv12, p010) work correctly, so we are not blocked. A separate
change had to be made in libplacebo to filter out non-fatal validation
errors related to surface sizes due to the lack of format negotiation.
2019-07-08 01:57:02 +02:00
Philip Langdale b74b39dfb5 vo_gpu: vulkan: Add back context_win for libplacebo
Feature parity with the original ra_vk obviously requires win32 support,
so let's put it back in.
2019-04-21 23:55:22 +03:00
Niklas Haas 7006d6752d vo_gpu: vulkan: use libplacebo instead
This commit rips out the entire mpv vulkan implementation in favor of
exposing lightweight wrappers on top of libplacebo instead, which
provides much of the same except in a more up-to-date and polished form.

This (finally) unifies the code base between mpv and libplacebo, which
is something I've been hoping to do for a long time.

Note: The ra_pl wrappers are abstract enough from the actual libplacebo
device type that we can in theory re-use them for other devices like
d3d11 or even opengl in the future, so I moved them to a separate
directory for the time being. However, the rest of the code is still
vulkan-specific, so I've kept the "vulkan" naming and file paths, rather
than introducing a new `--gpu-api` type. (Which would have been ended up
with significantly more code duplicaiton)

Plus, the code and functionality is similar enough that for most users
this should just be a straight-up drop-in replacement.

Note: This commit excludes some changes; specifically, the updates to
context_win and hwdec_cuda are deferred to separate commits for
authorship reasons.
2019-04-21 23:55:22 +03:00
Niklas Haas f0b6860d62 vo_gpu: index desc namespaces by ra
No reason to require them be constant. This allows them to depend on
runtime characteristics of the `ra`.
2019-04-21 23:55:22 +03:00
Niklas Haas 5bcac8580d spirv: remove --spirv-compiler=nvidia
This option has been deprecated upstream for a long time, probably
doesn't even work anymore, and won't work moving forwards as we replace
the vulkan code by libplacebo wrappers.

I haven't removed the option completely yet since in theory we could
still add support for e.g. a native glslang wrapper in the future. But
most likely the future of this code is deletion.

As an aside, fix an issue where the man page didn't mention d3d11.
2018-12-01 15:50:23 +02:00
Niklas Haas b8bb5329a5 vulkan: slightly improve vsync jitter measurements
By design, some vulkan implementations block until vsync during
vkAcquireNextImageKHR. Since mpv only considers the time that
`swap_buffers` spent blocking as constituting part of the vsync, we can
help it out a bit by pre-emptively calling this function here in order
to improve the accuracy of vsync jitter measurements on vulkan.

(If it fails, we just ignore the error and have the user call it a
second time later - maybe it will work then)

On my system this drops vsync-jitter from ~0.030 to ~0.007, an accuracy
of +/- 100μs. (Which *might* have something to do with the fact that
this is the polling interval for command polling)
2018-11-19 00:23:15 +02:00
Niklas Haas 34df6bd82f vo_gpu: vulkan: only rotate the queues on swap
Makes performance slightly better when using multiple queues by avoiding
unnecessary semaphores due to bad queue selection.

Also remove an aeons-old workaround for an nvidia bug that only ever
existed in the earliest beta vulkan drivers anyway.
2018-11-19 00:22:41 +02:00
Philip Langdale c8a065df12 vo_gpu: vulkan: Always use KHR suffix types and defines
I was inconsistent about this originally, as the functionality was
moved into the core spec in 1.1 and so both suffixed and unsuffixed
versions of everything exist and can be mixed together.

There's no reason to fail to build with 1.0.39+ so I'm fixing the
names.
2018-11-03 23:53:08 +02:00
Philip Langdale 621389134a vo_gpu: vulkan: Add a function to get the device UUID
We need this to do device matching for the cuda interop.
2018-10-22 21:35:48 +02:00
Philip Langdale a782197d21 vo_gpu: vulkan: Add arbitrary user data for an ra_vk_buf
This is arguably a little contrived, but in the case of CUDA interop,
we have to track additional state on the cuda side for each exported
buffer. If we want to be able to manage buffers with an ra_buf_pool,
we need some way to keep that CUDA state associated with each created
buffer. The easiest way to do that is to attach it directly to the
buffers.
2018-10-22 21:35:48 +02:00
Philip Langdale 93f800a00f vo_gpu: vulkan: Add support for exporting buffer memory
The CUDA/Vulkan interop works on the basis of memory being exported
from Vulkan and then imported by CUDA. To enable this, we add a way
to declare a buffer as being intended for export, and then add a
function to do the export.

For now, we support the fd and Handle based exports on Linux and
Windows respectively. There are others, which we can support when
a need arises.

Also note that this is just for exporting buffers, rather than
textures (VkImages). Image import on the CUDA side is supposed to
work, but it is currently buggy and waiting for a new driver release.

Finally, at least with my nvidia hardware and drivers, everything
seems to work even if we don't initialise the buffer with the right
exportability options. Nevertheless I'm enforcing it so that we're
following the spec.
2018-10-22 21:35:48 +02:00
Niklas Haas facc63b862 vo_gpu: vulkan: suppress bogus error message on --vulkan-device
Since the code just broke out of the loop on a match rather than jumping
straight to the end of the function body, it ended up hitting the code
path for when the end of the list was reached.
2018-10-21 23:33:36 +02:00
Niklas Haas 0f3e25cb0a vo_gpu: vulkan: fix the buffer size on partial upload
This was pased on the texture height, which was a mistake. In some cases
it could exceed the actual size of the buffer, leading to a vulkan API
error. This didn't seem to cause any problems in practice, since a
too-large synchronization is just bad for performance and shouldn't do
any harm internally, but either way, it was still undefined behavior to
submit a barrier outside of the buffer size.

Fix the calculation, thus fixing this issue.
2018-10-19 22:58:16 +02:00
wm4 f17246fec1 vo_gpu: remove old window screenshot glue code and GL implementation
There is now a better way. Reading the font framebuffer was always a
hack. The new code via VOCTRL_SCREENSHOT renders it into a FBO, which
does not come with the disadvantages of reading the front buffer (like
not being supported by GLES, possibly black regions due to overlapping
windows on some systems).

For now keep VOCTRL_SCREENSHOT_WIN on the VO level, because there are
still some lesser VOs and backends that use it.
2018-02-13 17:45:29 -08:00
Niklas Haas 0870859e3d vo_gpu: add RA_CAP for gl_NumWorkGroups
SPIRV-Cross doesn't support this for the time being. It's possible this
could go away again at a later date.
2018-02-05 23:11:18 -08:00
Niklas Haas 5997248505 vo_gpu: vulkan: correctly enable textureGatherOffset
This also requires a vulkan feature / SPIR-V capability to function
2018-02-05 02:49:03 -08:00
Niklas Haas f151ac57cb vo_gpu: vulkan: don't issue queries for unused timers
The vulkan validation layers warn you if you try requesting a query
result from a timer that hasn't even been started yet, so we have to do
some extra bit of work to keep track of which indices we've seen so far,
and avoid the queries on them.
2018-02-05 02:49:03 -08:00
Niklas Haas f92e45bb8c vo_gpu: vulkan: try enabling required features
Instead of enabling every feature under the sun, make an effort to just
whitelist the ones we actually might use. Turns out the extended storage
format support is needed for some of the storage formats we use, in
particular rgba16.
2018-02-05 02:49:03 -08:00
Niklas Haas 92778873ad vo_gpu: vulkan: add missing buffer barrier fields
These were accidentally omitted.
2018-02-05 02:49:03 -08:00
Niklas Haas d588bdaaf7 vo_gpu: vulkan: fix segfault due to index mismatch
The queue family index and the queue info index are not necessarily the
same, so we're forced to do a check based on the queue family index
itself.

Fixes #5049
2017-12-25 00:47:53 +01:00
Niklas Haas 12c6700a3c vo_gpu: vulkan: fix some image barrier oddities
A vulkan validation layer update pointed out that this was wrong; we
still need to use the access type corresponding to the stage mask, even
if it means our code won't be able to skip the pipeline barrier (which
would be wrong anyway).

In additiona to this, we're also not allowed to specify any source
access mask when transitioning from top_of_pipe, which doesn't make any
sense anyway.
2017-12-25 00:47:53 +01:00
Niklas Haas 97b1482d53 vo_gpu: vulkan: fix sharing mode on malloc'd buffers
Might explain some of the issues in multi-queue scenarios?
2017-12-25 00:47:53 +01:00
Niklas Haas 5abe0bd593 vo_gpu: vulkan: fix dummyPass creation
This violates vulkan spec
2017-12-25 00:47:53 +01:00
Niklas Haas 58e201e6bd vo_gpu: vulkan: fix the rgb565a1 names -> rgb5a1
This is 5 bits per channel, not 565
2017-12-25 00:47:53 +01:00
Niklas Haas 286d421666 vo_gpu: vulkan: allow disabling async tf/comp
Async compute in particular seems to cause problems on some drivers, and
even when supprted the benefits are not that massive from the tests I
have seen, so it's probably safe to keep off by default.

Async transfer on the other hand seems to work better and offers a more
substantial improvement, so it's kept on.
2017-12-25 00:47:53 +01:00
Niklas Haas a6aab5dfd6 vo_gpu: vulkan: refine queue family selection algorithm
This gets confused by e.g. SPARSE_BIT on the TRANSFER_BIT, leading to
situations where "more specialized" is ambiguous and the logic breaks
down. So to fix it, only compare the subset we care about.
2017-12-25 00:47:53 +01:00
Niklas Haas 2d1769a534 vo_gpu: vulkan: prefer vkCmdCopyImage over vkCmdBlitImage
blit() implies scaling, copy() is the equivalent command to use when the
formats are compatible (same pixel size) and the rects have the same
dimensions.
2017-12-25 00:47:53 +01:00
Niklas Haas a42b8b1142 vo_gpu: attempt re-using the FBO format for p->output_tex
This allows RAs with support for non-opaque FBO formats to use a more
appropriate FBO format for the output tex, possibly enabling a more
efficient blit operation.

This requires distinguishing between real formats (which can be used to
create textures) and fake formats (e.g. ra_gl's FBO hack).
2017-12-25 00:47:53 +01:00
Niklas Haas 80540be211 vo_gpu: vulkan: properly depend on the swapchain acquire semaphore
This is now associated with the ra_tex directly and used in the correct
way, rather than hackily done from submit_frame.
2017-12-25 00:47:53 +01:00
Niklas Haas b138bdc01c vo_gpu: vulkan: use correct access flag for present
This needs VK_ACCESS_MEMORY_READ_BIT (spec)
2017-12-25 00:47:53 +01:00
Niklas Haas 8b0a111c59 vo_gpu: vulkan: make the swapchain more robust
Now handles both VK_ERROR_OUT_OF_DATE_KHR and VK_SUBOPTIMAL_KHR for both
vkAcquireNextImageKHR and vkQueuePresentKHR in the correct way.
2017-12-25 00:47:53 +01:00
Niklas Haas dcda8bd36a vo_gpu: aggressively prefer async compute
On AMD devices, we only get one graphics pipe but several compute pipes
which can (in theory) run independently. As such, we should prefer
compute shaders over fragment shaders in scenarios where we expect them
to be better for parallelism.

This is amusingly trivial to do, and actually improves performance even
in a single-queue scenario.
2017-12-25 00:47:53 +01:00
Niklas Haas bded247fb5 vo_gpu: vulkan: support split command pools
Instead of using a single primary queue, we generate multiple
vk_cmdpools and pick the right one dynamically based on the intent.
This has a number of immediate benefits:

1. We can use async texture uploads
2. We can use the DMA engine for buffer updates
3. We can benefit from async compute on AMD GPUs

Unfortunately, the major downside is that due to the lack of QF
ownership tracking, we need to use CONCURRENT sharing for all resources
(buffers *and* images!). In theory, we could try figuring out a way to
get rid of the concurrent sharing for buffers (which is only needed for
compute shader UBOs), but even so, the concurrent sharing mode doesn't
really seem to have a significant impact over here (nvidia). It's
possible that other platforms may disagree.

Our deadlock-avoidance strategy is stupidly simple: Just flush the
command every time we need to switch queues, and make sure all
submission and callbacks happen in FIFO order. This required lifting the
cmds_pending and cmds_queued out from vk_cmdpool to mpvk_ctx, and some
functions died/got moved as a result, but that's a relatively minor
change.

On my hardware this is a fairly significant performance boost, mainly
due to async transfers. (Nvidia doesn't expose separate compute queues
anyway). On AMD, this should be a performance boost as well due to async
compute.
2017-12-25 00:47:53 +01:00
Niklas Haas 6186cc79e6 vo_gpu: allow invalidating FBO in renderpass_run
This is especially interesting for vulkan since it allows completely
skipping the layout transition as part of the renderpass. Unfortunately,
that also means it needs to be put into renderpass_params, as opposed to
renderpass_run_params (unlike #4777).

Closes #4777.
2017-12-25 00:47:53 +01:00
Niklas Haas fb1c7bde42 vo_gpu: vulkan: properly track image dependencies
This uses the new vk_signal mechanism to order all access to textures.
This has several advantageS:

1. It allows real synchronization of image access across multiple frames
   when using multiple queues for parallelism.

2. It allows using events instead of pipeline barriers, which is a
   finer-grained synchronization primitive that allows for more
   efficient layout transitions over longer durations.

This commit also restructures some of the implicit transition code for
renderpasses to be more flexible and correct. (Note: this technically
drops the ability to transition the image out of undefined layout when
not blending, but that was a bug anyway and needs to be done properly)

vo_gpu: vulkan: remove no-longer-true optimization

The change to the output_tex format makes this no longer true, and it
actually seems to hurt performance now as well. So just don't do it
anymore. I also realized it hurts performance when drawing an OSD, so
it's probably not a good idea anyway.
2017-12-25 00:47:53 +01:00
Niklas Haas f2f91cf570 vo_gpu: vulkan: add a vk_signal abstraction
This combines VkSemaphores and VkEvents into a common umbrella
abstraction which can resolve to either.

We aggressively try to prefer VkEvents over VkSemaphores whenever the
conditions are met (1. we can unsignal the semaphore, i.e. it comes from
the same frame; and 2. it comes from the same queue).
2017-12-25 00:47:53 +01:00
Niklas Haas 5feaaba0fd vo_gpu: vulkan: refactor command submission
Instead of being submitted immediately, commands are appended into an
internal submission queue, and the actual submission is done once per
frame (at the same time as queue cycling). Again, the benefits are not
immediately obvious because nothing benefits from this yet, but it will
make more sense for an upcoming vk_signal mechanism.

This also cleans up the way the ra_vk submission interacts with the
synchronization/callbacks from the ra_vk_ctx. Although currently, the
way the dependency is signalled is a bit hacky: normally it would be
associated with the ra_tex itself and waited on in the appropriate stage
implicitly. But that code is just temporary, so I'm keeping it in there
for a better commit order.
2017-12-25 00:47:53 +01:00
Niklas Haas 885497a445 vo_gpu: vulkan: reorganize vk_cmd slightly
Instead of associating a single VkSemaphore with every command buffer
and allowing the user to ad-hoc wait on it during submission, make the
raw semaphores-to-signal array work like the raw semaphores-to-wait-on
array. Doesn't really provide a clear benefit yet, but it's required for
upcoming modifications.
2017-12-25 00:47:53 +01:00
Niklas Haas 4e34615872 vo_gpu: vulkan: refactor vk_cmdpool
1. No more static arrays (deps / callbacks / queues / cmds)
2. Allows safely recording multiple commands at the same time
3. Uses resources optimally by never over-allocating commands
2017-12-25 00:47:53 +01:00
James Ross-Gowan 9b2dae79b1 vo_gpu: d3d11: add RA caps for ra_d3d11
ra_d3d11 uses the SPIR-V compiler to translate GLSL to SPIR-V, which is
then translated to HLSL. This means it always exposes the same GLSL
version that the SPIR-V compiler supports (4.50 for shaderc/glslang.)

Despite claiming to support GLSL 4.50, some features that are tied to
the GLSL version in OpenGL are not supported by ra_d3d11 when targeting
legacy Direct3D feature levels.

This includes two features that mpv relies on:
- Reading from gl_FragCoord in the fragment shader (requires FL 10_0)
- textureGather from any texture component (requires FL 11_0)

These features have been exposed as new RA caps.
2017-11-07 20:27:13 +11:00
James Ross-Gowan 8020a62953 vo_gpu: export the GLSL format qualifier for ra_format
Backported from @haasn's change to libplacebo, except in the current RA,
there's nothing to indicate an ra_format can be bound as a storage
image, so there's no way to force all of these formats to have a
glsl_format. Instead, the layout qualifier will be removed if
glsl_format is NULL.

This is needed for the upcoming ra_d3d11 backend. In Direct3D 11, while
loading float values from unorm images often works as expected, it's
technically undefined behaviour, and in Windows 10, it will cause the
debug layer to spam the log with error messages. Also, apparently in
GLSL, the format name must match the image's format exactly (but in
Direct3D, it just has to have the same component type.)
2017-11-07 20:27:13 +11:00
James Ross-Gowan 41dff03f8d vo_gpu: add namespace query mechanism
Backported from @haasn's change to libplacebo. More flexible than the
previous "shared || non-shared" distinction. The extra flexibility is
needed for Direct3D 11, but it also doesn't hurt code-wise.
2017-11-07 20:27:13 +11:00
wm4 7cfae5adce vo_gpu: semi-fix --gpu-context/--gpu-api options and help output
This was confusing at best. Change it to output the actual choices.
(Seems like in the end it's always me who has to clean up other people's
bullshit.)

Context names were not unique - but they should be, so fix it. The whole
point of the original --opengl-backend option was to side-step the
tricky auto-detection, so you know exactly what you get. The goal of
this commit is to make --gpu-context work the same way. Fix the
non-unique names by appending "vk" to the names.

Keep in mind that this was not suitable for slecting the "UI" backend
anyway, since "x11" would force GLX, whereas people on not-NVIDIA
actually want "x11egl". Users trying to use --gpu-context=x11 to force
the X11 backend would always end up with GLX, which would at least break
VAAPI hardware decoding for them. Basically the idea that this option
could select the "UI" type is completely broken - it selects an
implementation, which implies a UI. Selecting the UI type This would
require a separate mechanism. (Although in theory this separate
mechanism could be part of the --gpu-context option - in any case,
someone would have to implement it.)

To achieve help output that can actually be understood, just duplicate
the code. Most of that code is duplicated anyway, and trying to share
just the list code with the result of making the output unreadable
doesn't make too much sense. If we wanted to save code/effort, we could
just remove the help output altogether.

--gpu-api has non-unique entries, and it would be nice to group them
(e.g. list all OpenGL capable contexts with "opengl"), but C makes this
simple idea too much of a pain, so don't do it.

Also remove a stray tab from the android entry on the manpage.
2017-10-16 10:57:51 +02:00
Rostislav Pehlivanov 4c7c8daf9c wayland_common: implement output tracking, cleanups and bugfixes
This commit:
    - Implements output tracking (e.g. monitor plug/unplug)
    - Creates the surface during registry (no other dependencies)
    - Queues the callback immediately after surface creation
    - Cleaner and better event handling (functions return directly)
    - Better reconfigure handling (resizes reduced to 1 during init)
    - Don't unnecessarily resize  (if dimensions match)

Apart from that fixes 2 potential memory leaks (mime type and window
title), 2 string ownership issues (output name and make need to be
dup'd), fixes some style issues (switches were indented) and finally
adds messages when disabling/enabling idle inhibition.

The callback setter function was removed in preparation for the commit
which will use the frame event cb because it was unnecessary.
2017-10-09 02:23:04 +01:00
Rostislav Pehlivanov 68f9ee7e0b wayland_common: rewrite from scratch
The wayland code was written more than 4 years ago when wayland wasn't
even at version 1.0. This commit rewrites everything in a more modern way,
switches to using the new xdg v6 shell interface which solves a lot of bugs
and makes mpv tiling-friedly, adds support for drag and drop, adds support
for touchscreens, adds support for KDE's server decorations protocol,
and finally adds support for the new idle-inhibitor protocol.

It does not yet use the frame callback as a main rendering loop driver,
this will happen with a later commit.
2017-10-03 19:36:02 +01:00
Niklas Haas f6fd2a05c4 vo_gpu: vulkan: reword comment
This is fixed upstream (and we now know it's a driver bug) so reword the
comment.
2017-09-29 00:48:39 +02:00