This is somewhat imperfect, because detection of hw decoding APIs is
mostly done on demand, and often avoided if not necessary. (For example,
we know very well that there are no hw decoders for certain codecs.)
This also requires every hwdec backend to identify itself (see hwdec.h
changes).
Instead of converting the hw surface to an image in the VO, provide a
generic way to convet hw surfaces, and use this in the screenshot code.
It's all relatively straightforward, except vdpau is being terrible. It
needs a huge chunk of new code, because copying back is not simple.
Before this commit, each hw backend had their own specific struct types
for context, and some, like VDA, had none at all. Add a context struct
(mp_hwdec_ctx) that provides a somewhat generic way to pass the hwdec
context around. Some things get slightly better, some slightly more
verbose.
mp_hwdec_info is still around; it's still needed, but is reduced to its
role of handling delayed loading of the hwdec backend.
Same as with the previous commits.
In theory, vdpau/x11 GL interop doesn't assume GLX. It could use EGL as
well. But since it's always GLX in practice, so we're fine with this.
Remove the gl_hwdec.mpgl field - it's unused now.
It seems the vdpau API does not support these.
Do a semi-expensive emulation of it. On the other hand, it's not like
this is a commonly-used feature. (It might be better to make --vf=flip
always copy instead of flipping it via pointer tricks - but everything
allows flipped images, and even decoders or libavfilter filters could
output them.)
Until now, failure to allocate image data resulted in a crash (i.e.
abort() was called). This was intentional, because it's pretty silly to
degrade playback, and in almost all situations, the OOM will probably
kill you anyway. (And then there's the standard Linux overcommit
behavior, which also will kill you at some point.)
But I changed my opinion, so here we go. This change does not affect
_all_ memory allocations, just image data. Now in most failure cases,
the output will just be skipped. For video filters, this coincidentally
means that failure is treated as EOF (because the playback core assumes
EOF if nothing comes out of the video filter chain). In other
situations, output might be in some way degraded, like skipping frames,
not scaling OSD, and such.
Functions whose return values changed semantics:
mp_image_alloc
mp_image_new_copy
mp_image_new_ref
mp_image_make_writeable
mp_image_setrefp
mp_image_to_av_frame_and_unref
mp_image_from_av_frame
mp_image_new_external_ref
mp_image_new_custom_ref
mp_image_pool_make_writeable
mp_image_pool_get
mp_image_pool_new_copy
mp_vdpau_mixed_frame_create
vf_alloc_out_image
vf_make_out_image_writeable
glGetWindowScreenshot
mpv supports two hardware decoding APIs on Linux: vdpau and vaapi. Each
of these has emulation wrappers. The wrappers are usually slower and
have fewer features than their native opposites. In particular the libva
vdpau driver is practically unmaintained.
Check the vendor string and print a warning if emulation is detected.
Checking vendor strings is a very stupid thing to do, but I find the
thought of people using an emulated API for no reason worse.
Also, make --hwdec=auto never use an API that is detected as emulated.
This doesn't work quite right yet, because once one API is loaded,
vo_opengl doesn't unload it, so no hardware decoding will be used if the
first probed API (usually vdpau) is rejected. But good enough.
Integrate it with the existing surface allocator in vdpau.c. The changes
are a bit violent, because the vdpau API is so non-orthogonal: compared
to video surfaces, output surfaces use a different ID type, different
format types, and different API functions.
Also, introduce IMGFMT_VDPAU_OUTPUT for VdpOutputSurfaces wrapped in
mp_image, rather than hacking it. This is a bit cleaner.
Preparation so that various things related to video can run in different
threads. One part to this is making the video surface pool safe.
Another issue is the preemption mechanism, which continues to give us
endless pain. In theory, it's probably impossible to handle preemption
100% correctly and race-condition free, unless _every_ API user in the
same process uses a central, shared mutex to protect every vdpau API
call. Otherwise, it could happen that one thread recovering from
preemption allocates a vdpau object, and then another thread (which
hasn't recovered yet) happens to free the object for some reason. This
is because objects are referenced by integer IDs, and vdpau will reuse
IDs invalidated by preemption after preemption.
Since this is unreasonable, we're as lazy as possible when it comes to
handling preemption. We don't do any locking around the mp_vdpau_ctx
fields that are normally immutable, and only can change when recovering
from preemption. In practice, this will work, because it doesn't matter
whether not-yet-recovered components use the old or new vdpau function
pointers or device ID. Code calls mp_vdpau_handle_preemption() anyway to
check for the preemption event and possibly to recover, and that
function acquires the lock protecting the preemption state.
Another possible source of potential grandiose fuckup is the fact that
the vdpau library is in fact only a tiny wrapper, and the real driver
lives in a shared object dlopen()ed by the wrapper. The wrapper also
calls dlclose() on the loaded shared object in some situations. One
possible danger is that failing to recreate a vdpau device could trigger
a dlclose() call, and that glibc might unload it. Currently, glibc
implements full unloading of shared objects on the last dlclose() call,
and if that happens, calls to function pointers pointing into the shared
object would obviously crash. Fortunately, it seems the existing vdpau
wrapper won't trigger this case and never unloads the driver once it's
successfully loaded.
To make it short, vdpau preemption opens up endless depths of WTFs.
Another issue is that any participating thread might do the preemption
recovery (whichever comes first). This is easier to implement. The
implication is that we need threadsafe xlib. We just hope and pray that
this will actually work. This also means that once vdpau code is
actually involved in a multithreaded scenario, we have to add
XInitThreads() to the X11 code.
Use the newly provided mp_vdpau_handle_preemption() function, instead of
accessing mp_vdpau_ctx fields directly. Will probably make multithreaded
access to the vdpau context easier.
Mostly unrelated to the actual changes, I've noticed that using hw
decoding with vo_opengl sometimes leads to segfaults inside of nvidia's
libGL when doing the following:
1. use hw decoding + vo_opengl
2. switch to console (will preempt on nvidia systems)
3. switch back to X (mpv will recover, switches to sw decoding)
4. enable hw decoding again
5. exit mpv
Then it segfaults when mpv finally calls exit(). I'll just blame nvidia,
although it seems likely that something in the gl_hwdec_vdpau.c
preemption handling triggers corner cases in nvidia's code.
This was broken for some time, and it didn't recover correctly.
Redo decoder display preemption. Instead of trying to reinitialize the
hw decoder, simply fallback to software decoding. I consider display
preemption a bug in the vdpau API, so being able to _somehow_ recover
playback is good enough.
The approach taking here will probably also make it easier to handle
multithreading.
This was a minor code duplication between vf_vdpaupp.c and vo_vdpau.c.
(In theory, we could always require using vf_vdpaupp with vo_vdpau, but
I think it's better if vo_vdpau can work standalone.)
They were used by ancient libavcodec versions. This also removes the
need to distinguish vdpau image formats at all (since there is only
one), and some code can be simplified.
Move the decoder parts from vo_vdpau.c to a new file vdpau_old.c. This
file is named so because because it's written against the "old"
libavcodec vdpau pseudo-decoder (e.g. "h264_vdpau").
Add support for the "new" libavcodec vdpau support. This was recently
added and replaces the "old" vdpau parts. (In fact, Libav is about to
deprecate and remove the "old" API without deprecation grace period,
so we have to support it now. Moreover, there will probably be no Libav
release which supports both, so the transition is even less smooth than
we could hope, and we have to support both the old and new API.)
Whether the old or new API is used is checked by a configure test: if
the new API is found, it is used, otherwise the old API is assumed.
Some details might be handled differently. Especially display preemption
is a bit problematic with the "new" libavcodec vdpau support: it wants
to keep a pointer to a specific vdpau API function (which can be driver
specific, because preemption might switch drivers). Also, surface IDs
are now directly stored in AVFrames (and mp_images), so they can't be
forced to VDP_INVALID_HANDLE on preemption. (This changes even with
older libavcodec versions, because mp_image always uses the newer
representation to make vo_vdpau.c simpler.)
Decoder initialization in the new code tries to deal with codec
profiles, while the old code always uses the highest profile per codec.
Surface allocation changes. Since the decoder won't call config() in
vo_vdpau.c on video size change anymore, we allow allocating surfaces
of arbitrary size instead of locking it to what the VO was configured.
The non-hwdec code also has slightly different allocation behavior now.
Enabling the old vdpau special decoders via e.g. --vd=lavc:h264_vdpau
doesn't work anymore (a warning suggesting the --hwdec option is
printed instead).