The multi device part is mostly theoretical, but it turns out that
plasma does send multiple tranches to the same device so we have to take
that into account. And that means taking into account multiple devices
as well. In theory, a compositor should give us tranches for any device
that will work so as long as we find a match it should be OK. This is
still technically incomplete since mpv's core wouldn't react to a device
being hotplugged.
This is probably fringe, but I encountered it on my machine. The
hwupload filter happily does conversions all on the GPU side if
possible. Makes sense. However, it turns out that there is a difference
between e.g.yuv420p -> vaapi[bgr0] and a vaapi[yuv420p] -> vaapi[bgr0].
The latter worked without any issues on my hardware, but the first
example fails. To remedy this, we need to also do hardware uploads
during probing against formats and track what actually works (i.e.
vaSurfaceExportHandle). This commit only implements it in vaapi since it
is the only known place where this edge case is relevant. But other
hardware decoding drivers could easily add support later if needed.
The linux-dmabuf protocol gives us the main device being used. Using
that along with some drm code, we can get what drm formats are supported
by the GPU. This is pretty crude as we just check against the first
primary plane we find and hope for the best. Also the protocol can
technically update formats at any time (e.g. if the device changes). We
do update what gets stored in wl->gpu_formats but the next commit won't
make any attempt to address this hypothetical scenario. It's complicated
and this is better than nothing. In practice, this will only work
against one constant device during VO runtime. Additionally, we can add
this check to ra_compatible_format to filter out additional unsupported
formats.
As the first step towards handling scenarios where the are multiple hwdecs for
a given image format but backed by different AVHWDeviceTypes, let us annotate
the hwdecs with their corresponding device types.
From this, we can also see how all the existing hwdecs which match the same
image format also match the same device type.
`cuInit` wakes up the nvidia dgpu on nvidia laptops. This is bad news because the wake up process
is blocking and takes a few seconds. It also needlessly increases power consumption.
Sometimes, a VO loads several hwdecs (like `dmabuf_wayland`). When `cuda` is loaded, it calls
`cuInit` before running all interop inits. However, the first checks in the interops do not
require cuda initialization, so we only need to call `cuInit` after those checks.
This commit splits the interop `init` function into `check` and `init`. `check` can be called without
initializing the Cuda backend, so cuInit is only called *after* the first interop check.
With these changes, there's no cuda initialization if no OpenGL/Vulkan backend is available. This prevents
`dmabuf_wayland` and other VOs which automatically load cuda from waking up the nvidia dgpu unnecessarily,
making them start faster and decreasing power consumption on laptops.
Fixes: https://github.com/mpv-player/mpv/issues/13668
As it turns out OES_EGL_image is only defined for OpenGL ES.
OpenGL drivers implement this extension anyway because it used to be
the only way of importing EGLImages into GL.
An equivalent extension for OpenGL was defined with EXT_EGL_image_storage.
The only difference is the interaction with immutability, which requires
textures to be recreated since they can be bound only once.
Note: this commit can in theory cause certain systems to lose vaapi / drmprime
support. Since EXT_EGL_image_storage is 7 years old this hopefully doesn't happen.
If it does, the init checks can be relaxed to still permit OES_EGL_image.
As a result of the work I did the explicitly check for formats
supported by the vo in the f_autoconvert logic, I introduced a
regression in the handling of the rpi4_8 and rpi4_10 formats. These
require special handling because they only exist in the rpi forks and
not upstream ffmpeg, which means they don't have stable pix fmt values
that we can use directly.
Previously, we simply didn't declare them as supported, which was ok,
as nothing was really enforcing the list of supported formats, but that
has changed.
As we still can't simply use the pix fmts, I had to do a slightly
ridiculous dance to look them up by name, and if they exist, then
register them as supported.
Good times.
ffmpeg is again setting the frame dimensions to the coded size, so we
need to reintroduce our work-around to get the logical frame dimensions
from the frames_ctx rather than the frame itself.
ffmpeg commit:
* 9ee4f47c94
This reverts commit c40bd88872.
Make it not possible to build mpv without the latest libplacebo anymore.
This will allow for less code duplication between mpv and libplacebo,
and in the future also let us delete legacy ifdefs and track libplacebo
better.
NV16 is the half subsampled version of NV12 format. Decoders which
support High 4:2:2 of h264 provide the frame in NV16 format to establish
richer colorspace. Similar profiles are also available in HEVC and other
popular codecs. This commit allows NV16 frames to be displayed over
drmprime layers.
Signed-off-by: hbiyik <boogiepop@gmx.com>
The last piece in the puzzle for doing hardware conversions
automatically is ensuring we only consider valid target formats for the
conversion. Although it is unintuitive, some vaapi drivers can expose a
different set of formats for uploads vs for conversions, and that is
the case on the Intel hardware I have here.
Before this change, we would use the upload target list, and our
selection algorithm would pick a format that doesn't work for
conversions, causing everything to fail. Whoops.
Successfully obtaining the conversion target format list is a bit of a
convoluted process, with only parts of it encapsulated by ffmpeg.
Specifically, ffmpeg understands the concept of hardware configurations
that can affect the constraints of a device, but does not define what
configurations are - that is left up to the specific hwdevice type.
In the case of vaapi, we need to create a config for the video
processing endpoint, and use that when querying for constraints.
I decided to encapsulate creation of the config as part of the hwdec
init process, so that the constraint query can be down in the
hwtransfer code in an opaque way. I don't know if any other hardware
will need this capability, but if so, we'll be able to account for it.
Then, when we look at probing, instead of checking for what formats
are supported for transfers, we use the results of the constraint query
with the conversion config. And as that config doesn't depend on the
source format, we only need to do it once.
Historically, we have not attempted to support hw->hw format conversion
in the autoconvert logic. If a user needed to do these kinds of
conversions, they needed to manually insert the hw format's conversion
filter manually (eg: scale_vaapi).
This was usually fine because the general rule is that any format
supported by the hardware can be used as well as any other. ie: You
would only need to do conversion if you have a specific goal in mind.
However, we now have two situations where we can find ourselves with a
hardware format being produced by a decoder that cannot be accepted by
a VO via hwdec-interop:
* dmabuf-wayland can only accept formats that the Wayland compositor
accepts. In the case of GNOME, it can only accept a handful of RGB
formats.
* When decoding via VAAPI on Intel hardware, some of the more unusual
video encodings (4:2:2, 10bit 4:4:4) lead to packed frame formats
which gpu-next cannot handle, causing rendering to fail.
In both these cases (at least when using VAAPI with dmabuf-wayland), if
we could detect the failure case and insert a `scale_vaapi` filter, we
could get successful output automatically. For `hwdec=drm`, there is
currently no conversion filter, so dmabuf-wayland is still out of luck
there.
The basic approach to implementing this is to detect the case where we
are evaluating a hardware format where the VO can accept the hardware
format itself, but may not accept the underlying sw format. In the
current code, we bypass autoconvert as soon as we see the hardware
format is compatible.
My first observation was that we actually have logic in autoconvert to
detect when the input sw format is not in the list of allowed sw
formats passed into the autoconverter. Unfortunately, we never populate
this list, and the way you would expect to do that is vo-query-format
returning sw format information, which it does not. We could define an
extended vo-query-format-2, but we'd still need to implement the
probing logic to fill it in.
On the other hand, we already have the probing logic in the hwupload
filter - and most recently I used that logic to implement conversion
on upload. So if we could leverage that, we could both detect when
hw->hw conversion is required, and pick the best target format.
This exercise is then primarily one of detecting when we are in this
case and letting that code run in a useful way. The hwupload filter is
a bit awkward to work with today, and so I refactored a bunch of the
set up code to actually make it more encapsulated. Now, instead of the
caller instantiating it and then triggering the probe, we probe on
creation and instantiate the correct underlying filter (hwupload vs
conversion) automatically.
This is not technically necessary, because we never touch the fd again
after passing to cuda, but having it there could lead to future code
accidentally using it.
All hwdecs should respect the probing flag and demote their lgoging to
verbose level, so that initialisation failures during probing do not
spam the user. I forgot to do this for the Vulkan hwdec.
The libplacebo sync abstraction is deprecated and we should be using
the more explicit Vulkan semaphore helpers instead. This means that
more of the book keeping moves to our side, but it's not too bad.
There are two main things going on here:
1. After a lot of back and forth, I decided to write the new code with
timeline semaphores to streamline things, and that also means all
the variables are separate - which makes the #ifdefs easier to read.
Which is important because:
2. While pl_sync owned the exported fd/handle, pl_vulkan_sem does not,
so we are responsible for managing them. That means reversing the
previous logic - we now can pass an original fd to CUDA and never
think about it again, while we have to clean up a Win32 Handle
because CUDA will not take ownership.
ffmpeg was previously allocating images for frames as the code size,
rather than the presentation one (1088 vs 1080 in the most common
example). Using the coded size when wrapping images for libplacebo
resulted in incorrect scaling from 1088 -> 1080, but even using the
presentation size wasn't perfect, as discussed in the original
commit.
However, ffmpeg has now been updated to use the presentation size for
the frame images, after discussions that concluded this must be done
because there is no way for a frame consumer to fix the dimensions
without copying the frame.
With that ffmpeg change, we can just use the normal layout information
like all the other hwdecs.
Vulkan Video Decoding has finally become a reality, as it's now
showing up in shipping drivers, and the ffmpeg support has been
merged.
With that in mind, this change introduces HW interop support for
ffmpeg Vulkan frames. The implementation is functionally complete - it
can display frames produced by hardware decoding, and it can work with
ffmpeg vulkan filters. There are still various caveats due to gaps and
bugs in drivers, so YMMV, as always.
Primary testing has been done on Intel, AMD, and nvidia hardware on
Linux with basic Windows testing on nvidia.
Notable caveats:
* Due to driver bugs, video decoding on nvidia does not work right now,
unless you use the Vulkan Beta driver. It can be worked around, but
requires ffmpeg changes that are not considered acceptable to merge.
* Even if those work-arounds are applied, Vulkan filters will not work
on video that was decoded by Vulkan, due to additional bugs in the
nvidia drivers. The filters do work correctly on content decoded some
other way, and then uploaded to Vulkan (eg: Decode with nvdec, upload
with --vf=format=vulkan)
* Vulkan filters can only be used with drivers that support
VK_EXT_descriptor_buffer which doesn't include Intel ANV as yet.
There is an MR outstanding for this.
* When dealing with 1080p content, there may be some visual distortion
in the bottom lines of frames due to chroma scaling incorporating the
extra hidden lines at the bottom of the frame (1080p content is
actually stored as 1088 lines), depending on the hardware/driver
combination and the scaling algorithm. This cannot be easily
addressed as the mechanical fix for it violates the Vulkan spec, and
probably requires a spec change to resolve properly.
All of these caveats will be fixed in either drivers or ffmpeg, and so
will not require mpv changes (unless something unexpected happens)
If you want to run on nvidia with the non-beta drivers, you can this
ffmpeg tree with the work-around patches:
* https://github.com/philipl/FFmpeg/tree/vulkan-nvidia-workarounds
We will need the full ra_ctx to be able to look up all the state
required to initialise an ffmpeg vulkan hwcontext, so pass let's
pass the ra_ctx instead of just the ra.
HEVC hardware decode with drm wasn't working on the RPi 4. Mpv would
report the image format (rpi4_8 for 8-bit and rpi4_10 for 10-bit) wasn't
supported. The change to hwdec_drmprime.c identifies these two formats
as NV12 because it functions exactly the same. The change to
dmabuf_interop_gl.c adds support for P030 which rpi4_10 uses. These
changes were tested on a Pi 4 with this fork of ffmpeg:
https://github.com/jc-kynesim/rpi-ffmpeg.
Signed-off-by: EmperorPenguin18 <60635017+EmperorPenguin18@users.noreply.github.com>