As we are less and less interested in vpdpau, with nvdec and vaapi
being better choices in general on nvidia and AMD respectively, we
might consider removing direct_mode, where we bypass the vdpau
mixer and work directly with yuv textures. Normally, working with
yuv textures would be great, but vdpau built in an assumption that
all frames are delivered as separate fields, causing us to have
to re-interleave them.
nvidia then introduces a new OpenGL extension that can return the
yuv frames as frames, but we can't just unconditionally switch to
that as we'd want to keep supporting older hardware where the drivers
are no longer getting new features. The end result is that we
wouldn't be able to get rid of the old code paths.
Removing direct_mode means we always use the mixer, and work with
rgba frame textures. There are some theoretical limitations to
this, but in practice they probably don't matter much - unsupported
colourspaces don't matter because without 10bit decoding support,
we can't use them anyway, and apparently we're not doing separate
chroma scaling these days, so scaling the rbga doesn't really lose
anything (and the vdpau hq scaling option remains available).
The use of glXGetCurrentDisplay() restricted this to the GLX backend.
But actually it works under EGL as well. Removing the GLX-specific call
and using the general mpv-internal method to get the X "Display" makes
it work in mpv.
I didn't know this. Nvidia didn't list this as extension in the EGL
context when I still used their GPUs.
Note that this might in theory break use of vdpau in some libmpv clients
using the render API. But only if MPV_RENDER_PARAM_X11_DISPLAY is not
used, and they relied on mpv using glXGetCurrentDisplay(). EGL does not
provide such an API, and hwdec_vaapi.c also uses what hwdec_vdpau.c uses
now. Considering that vaapi is preferable these days, it's not bad at
all if these clients get "broken". They can be easily fixed by passing
the display to mpv correctly.
New releases of VDPAU support decoding 4:4:4 content, and that comes
back as NV24 when using 'direct mode' in OpenGL Interop. That means we
need to be a little bit smarter about how we set up the OpenGL
textures.
The testing_only field is not referenced anymore with vaglx removed and
the previous commit dropping all uses.
The ra_hwdec_driver.api field became unused with the previous commit,
but all hwdec interop drivers still initialized it.
Since this touches highly OS-specific code, build regressions are
possible (plus the previous commit might break hw decoding at runtime).
At least hwdec_cuda.c still used the .api field, other than initializing
it.
This is done in several steps:
1. refactor MPGLContext -> struct ra_ctx
2. move GL-specific stuff in vo_opengl into opengl/context.c
3. generalize context creation to support other APIs, and add --gpu-api
4. rename all of the --opengl- options that are no longer opengl-specific
5. move all of the stuff from opengl/* that isn't GL-specific into gpu/
(note: opengl/gl_utils.h became opengl/utils.h)
6. rename vo_opengl to vo_gpu
7. to handle window screenshots, the short-term approach was to just add
it to ra_swchain_fns. Long term (and for vulkan) this has to be moved to
ra itself (and vo_gpu altered to compensate), but this was a stop-gap
measure to prevent this commit from getting too big
8. move ra->fns->flush to ra_gl_ctx instead
9. some other minor changes that I've probably already forgotten
Note: This is one half of a major refactor, the other half of which is
provided by rossy's following commit. This commit enables support for
all linux platforms, while his version enables support for all non-linux
platforms.
Note 2: vo_opengl_cb.c also re-uses ra_gl_ctx so it benefits from the
--opengl- options like --opengl-early-flush, --opengl-finish etc. Should
be a strict superset of the old functionality.
Disclaimer: Since I have no way of compiling mpv on all platforms, some
of these ports were done blindly. Specifically, the blind ports included
context_mali_fbdev.c and context_rpi.c. Since they're both based on
egl_helpers, the port should have gone smoothly without any major
changes required. But if somebody complains about a compile error on
those platforms (assuming anybody actually uses them), you know where to
complain.
In commit c6fafbffac we accidentally set the logical texture size to
the cropped video size, which is not correct. This caused rendering
artifacts in some cases.
Use the video surfaces size instead. Since the current mp_image_params
contains the cropped size only, wrapper texture creation has to be moved
to the _map function. Move the same code for the mixer case (strictly
speaking this is not needed, but seems more symmetric).
(Also there is no need to clear gl_textures on uninit - leftover from
the old hwdec mapper API. So we just drop that part.)
Fixes#4760.
This does two separate rather intrusive things:
1. Make the hwdec context (which does initialization, provides the
device to the decoder, and other basic state) and frame mapping
(getting textures from a mp_image) separate. This is more
flexible, and you could map multiple images at once. It will
help removing some hwdec special-casing from video.c.
2. Switch all hwdec API use to ra. Of course all code is still
GL specific, but in theory it would be possible to support other
backends. The most important change is that the hwdec interop
returns ra objects, instead of anything GL specific. This removes
the last dependency on GL-specific header files from video.c.
I'm mixing these separate changes because both requires essentially
rewriting all the glue code, so better do them at once. For the same
reason, this change isn't done incrementally.
hwdec_ios.m is untested, since I can't test it. Apart from superficial
mistakes, this also requires dealing with Apple's texture format
fuckups: they force you to use GL_LUMINANCE[_ALPHA] instead of GL_RED
and GL_RG. We also need to report the correct format via ra_tex to
the renderer, which is done by find_la_variant(). It's unknown whether
this works correctly.
hwdec_rpi.c as well as vo_rpi.c are still broken. (I need to pull my
RPI out of a dusty pile of devices and cables, so, later.)
Actually GL-specific parts go into gl_utils.c/h, the shader cache
(gl_sc*) into shader_cache.c/h.
No semantic changes of any kind, except that the VAO helper is made
public again as part of gl_utils.c (all while the goal for gl_utils.c
itself is to be included by GL-specific code).
Use the libavutil vdpau frame allocation code instead of our own "old"
code. This also uses its code for copying a video surface to normal
memory (used by vdpau-copy).
Since vdpau doesn't really have an internal pixel format, 4:2:0 can be
accessed as both nv12 and yuv420p - and libavutil prefers to report
yuv420p. The OpenGL interop has to be adjusted accordingly.
Preemption is a potential problem, but it doesn't break it more than it
already is.
This requires a bug fix to FFmpeg's vdpau code, or vdpau-copy (as well
as taking screenshots) will fail. Libav has fixed this bug ages ago.
Until now, we've always converted vdpau video surfaces to RGB, and then
mapped the resulting RGB texture. Change this so that the surface is
mapped as NV12 plane textures.
The reason this wasn't done until now is because vdpau surfaces are
mapped in an "interlaced" way as separate fields, even for progressive
video. This requires messy reinterleraving. It turns out that even
though it's an extra processing step, the result can be faster than
going through the video mixer for RGB conversion.
Other than some potential speed-gain, doing this has multiple other
advantages. We can apply our own color conversion, which is important in
more complex cases. We can correctly apply debanding and potentially
other processing that requires chroma-specific or in-YUV handling.
If deinterlacing is enabled, this switches back to the old RGB
conversion method. Until we have at least a primitive deinterlacer in
vo_opengl, this will stay this way. The d3d11 and vaapi code paths are
similar. (Of course these don't require any crazy field reinterleaving.)
Rename gl_hwdec_driver.map_image to map_frame, and let it fill out a
struct gl_hwdec_frame describing the exact texture layout. This gives
more flexibility to what the hwdec interop can export. In particular, it
can export strange component orders/permutations and textures with
padded size. (The latter originating from cropped video.)
The way gl_hwdec_frame works is in the spirit of the rest of the
vo_opengl video processing code, which tends to put as much information
in immediate state (as part of the dataflow), instead of declaring it
globally. To some degree this duplicates the texplane and img_tex
structs, but until we somehow unify those, it's better to give the hwdec
state its own struct. The fact that changing the hwdec struct would
require changes and testing on at least 4 platform/GPU combinations
makes duplicating it almost a requirement to avoid pain later.
Make gl_hwdec_driver.reinit set the new image format and remove the
gl_hwdec.converted_imgfmt field.
Likewise, gl_hwdec.gl_texture_target is replaced with
gl_hwdec_plane.gl_target.
Split out a init_image_desc function from init_format. The latter is not
called in the hwdec case at all anymore. Setting up most of struct
texplane is also completely separate in the hwdec and normal cases.
video.c does not check whether the hwdec "mapped" image format is
supported. This should not really happen anyway, and if it does, the
hwdec interop backend must fail at creation time, so this is not an
issue.
The main change is with video/hwdec.h. mp_hwdec_info is made opaque (and
renamed to mp_hwdec_devices). Its accessors are mainly thread-safe (or
documented where not), which makes the whole thing saner and cleaner. In
particular, thread-safety rules become less subtle and more obvious.
The new internal API makes it easier to support multiple OpenGL interop
backends. (Although this is not done yet, and it's not clear whether it
ever will.)
This also removes all the API-specific fields from mp_hwdec_ctx and
replaces them with a "ctx" field. For d3d in particular, we drop the
mp_d3d_ctx struct completely, and pass the interfaces directly.
Remove the emulation checks from vaapi.c and vdpau.c; they are
pointless, and the checks that matter are done on the VO layer.
The d3d hardware decoders might slightly change behavior: dxva2-copy
will not use the VO device anymore if the VO supports proper interop.
This pretty much assumes that any in such cases the VO will not use any
form of exclusive mode, which makes using the VO device in copy mode
unnecessary.
This is a big refactor. Some things may be untested and could be broken.
The vdpau_mixer could fail to be recreated properly if preemption
occured at some point before playback initialization (like when using
--hwdec-preload and the opengl-cb API).
Normally, the vdpau_mixer was supposed to be marked invalid when the
components using it detect a preemption, e.g. in hwdec_vdpau.c. This one
didn't mark the vdpau_mixer as invalid if preemption was detected in
reinit(), only in map_image().
It's cleaner to detect preemption directly in the vdpau_mixer, which
ensures it's always recreated correctly.
Since there can be multiple backends for a single API (vaapi can use GLX
or EGL), not logging the exact backend name is annoying. So add it. At
the same time, there is no need to duplicate the name as used by the
--hwdec options, so replace it with using the numeric hwdec API ID.
Surfaces used by hardware decoding formats can be mapped exactly like a
specific software pixel format, e.g. RGBA or NV12. p->image_params is
supposed to be set to this format, but it wasn't.
(How did this ever work?)
Also, setting params->imgfmt in the hwdec interop drivers is pointless
and redundant. (Change them to asserts, because why not.)